HACKER Q&A
📣 govideo

How did the internet discover my subdomain?


I have a domain that is not live. As expected, loading the domain returns: Error 1016.

However...I have a subdomain with a not obvious name, like: userfileupload.sampledomain.com

This subdomain IS LIVE but has NOT been publicized/posted anywhere. It's a custom URL for authenticated users to upload media with presigned url to my Cloudflare r2 bucket.

I am using CloudFlare for my DNS.

How did the internet find my subdomain? Some sample user agents are: "Expanse, a Palo Alto Networks company, searches across the global IPv4 space multiple times per day to identify customers' presences on the Internet. If you would like to be excluded from our scans, please send IP addresses/domains to: scaninfo@paloaltonetworks.com", "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_7; en-us) AppleWebKit/534.20.8 (KHTML, like Gecko) Version/5.1 Safari/534.20.8", "Mozilla/5.0 (Linux; Android 9; Redmi Note 5 Pro) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.89 Mobile Safari/537.36",

The bots are GET requests which are failing, as designed, but I'm wondering how the bots even knew the subdomain existed?!


  👤 yatralalala Accepted Answer ✓
Hi, our company does this basically "as-a-service".

The options how to find it are basically limitless. Best source is probably Certificate Transparency project as others suggested. But it does not end there, some other things that we do are things like internet crawl, domain bruteforcing on wildcard dns, dangling vhosts identification, default certs on servers (connect to IP on 443 and get default cert) and many others.

Security by obscurity does not work. You can not rely on "people won't find it". Once it's online, everyone can find it. No matter how you hide it.


👤 BLKNSLVR
There are a number of companies, not just Palo Alto Networks, that perform various different scales of scans of the entire IPv4 space, some of them perform these scans multiple times per day.

I setup a set of scripts to log all "uninvited activity" to a couple of my systems, from which I discovered a whole bunch of these scanner "security" companies. Personally, I treat them all as malicious.

There are also services that track Newly Registered Domains (NRDs).

Tangentially:

NRD lists are useful for DNS block lists since a large number of NRDs are used for short term scam sites.

My little, very amateur, project to block them can be found here: https://github.com/UninvitedActivity/UninvitedActivity

Edited to add: Direct link to the list of scanner IP addresses (although hasn't been updated in 8 months - crikey, I've been busy longer than I thought): https://github.com/UninvitedActivity/UninvitedActivity/blob/...


👤 parliament32
Certificate Transparency logs, or they don't actually know the domain name: just port-scanning[1] then making requests to open web ports.

[1] Turns out you can port-scan the entire internet in under 5 minutes: https://github.com/robertdavidgraham/masscan


👤 paxys
Not sure why everyone is going on about certificate transparency logs when the answer is right there in the user agent. The company is scanning the ipv4 space and came upon your IP and port.

👤 Kikawala
Is it available under HTTPS? Then it's probably in a Certificate Transparency log.

👤 andix
I'm surprised nobody mentioned subfinder yet: https://github.com/projectdiscovery/subfinder

Subfinder uses different public and private sources to discover subdomains. Certificate Transparency logs are a great source, but it also has some other options.


👤 codingdave
If it is on DNS, it is discoverable. Even if it were not, the message you pasted says outright that they scan the entire IP space, so they could be hitting your server's IP without having a clue there is a subdomain serving your stuff from it.

👤 pabs3
ArchiveTeam has some docs about this:

https://wiki.archiveteam.org/index.php/Finding_subdomains


👤 LinuxBender
As others have said, likely cert transparency logs. Use a wildcard cert to avoid this. They are free using LetsEncrypt and possibly a couple other ACME providers. I have loads of wildcard certs. Bots will try guessing names but like you I do not use easily guessable names and the bots never find them. I log all DNS answers. I assume cloudflare supports strict-SNI but no idea if they have their own automation around wildcard certs. Sometimes I renew wildcard certs I am not even using just to give the bots something to do.

👤 govideo
Thanks for everyone's perspectives. Very educational and admittedly lots outside the boundaries of my current knowledge. I have thus far relied on CloudFlare's automatic https and simple instant subdomain setup for their worker microservice I'm using.

There are evidently technical/footprint implications of that convenience. Fortunately, I'm not really concerned with the subdomain being publicly known; was more curious how it become publicly known.


👤 ciaovietnam
There is a chance that your subdomain is the first/default virtual host in your web server setup (or the subdomain's access log is the default log file) so any requests to the server's IP address get logged to this virtual host. That means they didn't access your subdomain, they accessed via your server IP address but got logged in your subdomain's access log.

👤 oliwarner
Certificate Transparency would also be my guess. These are logs published by big TLS certificate issuers to cross-check and make sure they're not issuing certificates for domains they have no standing on.

The way around this is to issue a wildcard for your root domain and use that. Your main domain is discoverable but your subs aren't.

There are other routes: leaky extensions, leaky DNS servers, bad internet security system utilities that phone home about traffic. Who knows?

Unless your IP address redirects to your subdomain —not unheard of— it's not somebody IP/port scanning. Webservers don't typically leak anything about the domains they serve for.


👤 vince14
I'm having the same issue.

https://securitytrails.com/ also had my "secret" staging subdomain.

I made a catch-all certificate, so the subdomain didn't show up in CT logs.

It's still a secret to me how my subdomain ended up in their database.


👤 codazoda
This discussion makes me wonder, how hard is it to find a Google Document that was shared with "Anyone with the link"?

👤 xg15
TIL (from this thread) : You can abuse TLS handshakes to effectively reverse-DNS an IP address without ever talking to a DNS server! Is this built into dig yet? :)

(Alright, some IP addresses, not all of them)

I also wonder if this is a potential footgun for eSNI deployments: If you add eSNI support to a server, you must remember to also make regular SNI mandatory - otherwise, an eavesdropper can just ask your server nicely for the domain that the eSNI encryption was trying to hide from it.


👤 lockhead
Most likely passive DNS data, if you use your subdomain you do DNS queries for it. If you use a DNS server to resolve your domains that shares this data, it can be picked up by others.

👤 perching_aix
Using the Certificate Transparency logs I'd imagine.

Also note that your domains are live as they're allocated (they exist). Whether a web server or anything else actually backs them is a different question entirely.

For "secret" subdomains, you'll want a wildcard certificate. That way only that will show on the CT logs. Note that if you serve over IPv4, the underlying host will be eventually discovered anyways by brute-force host enumeration, and the domain can still be discovered using dictionary attacks / enumeration.

Never touched Cloudflare so this is as far as I can help you.


👤 MacGyver101
Let me list some of the ways that precious subdomain could have been leaked

1) CZDS/DNS record sharing program

2) CT Logs

3) Browser SCT audit

4) Browser telemetry

5) DNS logs

6) DPI

7) Antivirus/OS telemetry

8) Virus/Malware/Tracker

9) Brute forcing DNS records

10) DNSSEC

11) Server softwares with AutoTLS

12) Servers screaming their hostnames over any protocol/banner thing

13) Typing anything on the browser search bar

14) Posting it anywhere

And many other novel ways I can't think of right now. I have successfully hidden some of my subdomains in the past but it definitely requires dedication. Simple silly mistakes can make all your efforts go waste. Ask any red/blue teamer.

Want to hide something? Roll everything on your own.


👤 andix
If a HTTPS service should be hard to discover, an easy way is to hide it behind a subdirectory. Something like https://subdomain.domain.example/hard_to_find_secret_string.

Another option are wildcard certificates.

This obviously can't be the only protection. But if an attacker doesn't know about a service, or misses it during discovery, they can't attack it.


👤 8bitchemistry
Did you ever email the URL to somebody? We had the same issue years ago where google seemed to be crawling/indexing new subdomains it finds in emails.

👤 thedougd
Some CAs (Amazon) allow not publishing to the Certificate Transparency Log. But if you do this, browsers will block the connection by default. Chromium browsers have a policy option to skip this check for selected URLs. See: CertificateTransparencyEnforcementDisabledForURLs.

Some may find this more desirable than wildcard certificates and their drawbacks.


👤 arkfil
paloAlto (network devices like firewalls etc) is able to scan the sites that users want to visit behind their devices. these are very popular devices in many companies. users can also have agents installed on their computers that also have access to the sites they visit.

👤 1vuio0pswjnm7
Why not experiment with multiple variations. For example, as part of the experiment, run own DNS, use non-standard DNS encryption like CurveDNS, or even no DNS at all, use non-standard port for HTTPS, self-signed CA, TLS with no SNI extension, or even TCPCurve instead of CAs and TLS. If non-discoverability is the goal, there are inifinite ways to deviate from web developer norms.

If "the internet fails to find the subdomain" when using non-standard practices and conventions then perhaps "following the internet's recommendations", e.g., use Cloudflare, etc., might be partially at cause for discoverability.

Would be surprised if Expanse scans more than a relatively small selection of common ports.


👤 daggersandscars
DNS query type AXFR allows for subdomain querying. There are security restrictions around who can do it on what DNS servers. Given the number of places online one can run a subdomain query, I suspect it's mostly a matter of paying the right fees to the right DNS provider.

👤 zeagle
Can I ask an adjacent question? I have a bunh of DNS A name entries for locallyaccessedservice.mydomain.tld point to my 10.0.0.x NAS's nginx reverse proxy so I can use HTTPS and DNS to access them locally and via Tailscale. My cert is for *.domain.tld. It's nothing critical and only accessible within my LAN, but is there any reason I shouldn't be doing this from a security point of view? I guess someone could phish that to another globally accessible server if DNS changed and I wouldn't notice but I don't see how that would be an issue. There are a couple nginx services exposed to public but not those specific domains so I guess that is an attack vector since.

👤 supermatt
1) Are you sure that they are using the subdomain? They could be connecting via IP or an alternate host address.

2) Are you using TLS? Unless you are using a wildcard cert, then the FQDN will have been published as part of the certificate transparency logs.


👤 webpagealert
DNS Leaks or Public Records DNS Propagation: When you create a subdomain (e.g., blog.yoursite.com), your DNS provider (e.g., Cloudflare, GoDaddy) updates global DNS servers. These records are public and visible to anyone who queries the DNS (e.g., via dig blog.yoursite.com). WHOIS Data: If your domain registration details are public (not privacy-protected), your subdomain’s ownership info may be exposed.

👤 mightybyte
If you've made any kind of DNS entries involving this subdomain, then congratulations, you've notified the world of its existence. There are tools out there that leverage this information and let you get all the subdomains for a domain. Here's the first one I found in a quick search:

https://pentest-tools.com/information-gathering/find-subdoma...


👤 alberth
This site will find any subdomain, for any domain, so long as it previously had a certificate (ssl/tls)

https://crt.sh/


👤 jcalx
Some bots scan using giant lists of subdomains, e.g. https://github.com/danielmiessler/SecLists/tree/master/Disco.... Your subdomain may be on that giant combined_subdomains list, or perhaps some other lists that other tools use.

👤 AtNightWeCode
Assuming this is not direct traffic to your IP people will say it is because of TLS logs. Maybe it is in your case. But if you spin up a CF worker on a subdomain to it you will also get hit by traffic immediately. And those certificates are wildcards. I think CF leaks subdomains in some cases. Never seen this behavior when using CF just as a DNS server though.

👤 melson
Someone might used open-source tool like sublist3r

👤 fsflover
Could it be that Chrome shared the web page with advertisers?

https://www.ghacks.net/2021/03/16/wonder-about-the-data-goog...


👤 itscrush
> I am using CloudFlare for my DNS.

Based on this it sounds like you exposed your resource and advertised it for others. Reverse dns, get IP, scan IP.

Probably simpler, you exposed resource on IPV4 publicly, if it exists, it'll be scanned. There's probably 100s of companies scanning entire 0.0.0.0/0 space at all times.


👤 fmxsh
One way is query your ip, retrieve certificates, read the domain(s) in it.

I made my server return self signed certificate without domain name in it, in case of access to wevserver's port 433 by ip instead of by domain name.


👤 CGamesPlay
Be careful with these. I had a subdomain like this (completely unlisted) with a Google OAuth flow on it, using a development mode Google app. Somehow, the domain was discovered, and Google decided that using their OAuth flow was a phishing scam, and delisted my entire toplevel domain as a result!

👤 OuterVale

👤 eat
DNS enumeration (brute force) with a good wordlist, zone transfer, or leaking the name through a certificate served when accessing your host via IP address are all possibilities.

The name "userfileupload" is far from not-obvious, so that would be my guess.



👤 TZubiri
Maybe it's a cloudflare controlled scanner?

Maybe you published the subdomain in a cert?

Snooped traffic is unlikely.

This is a good question, if you don't publish a subdomain, scanners should not reach it. If they do, there's a leak in your infra.


👤 fsckboy
LPT, this is an object lesson in the weakness of security through obscurity

👤 rempargo
I assume you host this with a https certificate, so you can look your subdomains at:

https://crt.sh/?q=sampledomain.com


👤 ThePowerOfFuet
Others are saying CT logs but my own subdomains are on wildcard certificates, in which case I suspect they are discovered by DPI analysis of DNS traffic and resold, such as by Team Cymru.

👤 bashwizard
Like people have said already; Certificate Transparency logs.

There are countless of tools to use for subdomain enumeration. I personally use subfinder or amass when doing recon on bug bounty targets.


👤 ralferoo
If you're using HTTPS, then you're probably using letsencrypt and so your subdomain will appear on the CT logs that are publicly accessible.

One thing you could do is use a wildcard certificate, and then use a non-obvious subdomain from that. I actually have something similar - in my set up, all my web-traffic goes to haproxy frontends which forward traffic to the appropriate backend, and I was sick of setting up multiple new certificates for each new subdomain, so I just replaced them all with a single wildcard cert instead. This means that I'm not advertising each new subdomain on the CT list, and even though they all look nominally the same when visiting - same holding page on index and same /api handling, just one of the subdomains decodes an additional URL path that provides access to status monitoring.

Separately, that Palo Alto Networks company is a real pain. They connect to absolutely everything in their attempts to spam the internet. Frankly, I'm sick of even my mail servers being bombarded with HTTP requests on port 25 and the resultant log spam.


👤 spl757
Does the IP address for that subdomain have a DNS PTR record set? If it does, someone can discover the subdomain by querying the PTR record for the IP.

👤 f4c39012
CSP headers can leak urls, but I assume that isn't the cause here if the subdomain is an entirely separate project

👤 nusl
It's pretty common to bruteforce subdomains of a domain you might be interested in, specially by attackers.

👤 immibis
Additionally to what other people said, you can assume Cloudflare is selling lists of DNS names to someone.

👤 Saris
>I am using CloudFlare for my DNS.

Could have been discovered from the SSL cert request for the subdomain.


👤 Gabrys1
> Expanse, a Palo Alto Networks company, searches across the global IPv4 space multiple

So my guess is reverse DNS


👤 clvx
Put it behind ipv6 and it won’t likely happen again. The address space is massive

👤 aspbee555
cloudflare uses certificates with numerous other site names included on the certificate as alt names so your site name could have been discovered by any other site that happens to use that same cert

👤 3oil3
What happens if you google your subdomain? Maybe the bots have some sort of dictionary files and they just run them, and when there is a match, then they append it with some .html extension, or maybe they prepend it to the match as a subdomain of it?

👤 bbarnett
If you ever email a link and it hits gmail, Google will index it.

👤 _trampeltier
Did you send a link over Email, Whatsapp or something like?

👤 pagealert
By tightening DNS, server, and firewall configurations, you can minimize exposure of your internal subdomains to bots.

👤 DeborahMatthews
Your subdomain may have been discovered through certificate transparency logs, search engine crawling, passive DNS, https://arzhost.com/blogs/openssl-unable-to-write-random-sta... leaked links, or third-party analytics tools.

👤 pagealert
The discovery of your unpublished subdomain by bots likely stems from a combination of technical factors related to DNS, server configuration, and bot behavior. Here's a breakdown of the possible reasons and solutions:

1. DNS Leaks or Wildcard Records Wildcard DNS Entries: If your main domain (sampledomain.com) has a wildcard DNS record (e.g., .sampledomain.com), any subdomain (including userfileupload.sampledomain.com) could be automatically resolved to your server’s IP. Even if the main domain is inactive, the wildcard might expose the subdomain.

Exposed Subdomain DNS Records: If the subdomain’s DNS records (e.g., A/CNAME records) are explicitly configured but not removed, bots could reverse-engineer them via DNS queries or IP scans.

Fix: Remove or restrict wildcard DNS entries and delete unused subdomain records from your DNS provider (e.g., Cloudflare).

2. Server IP Scanning IP-Based Discovery: Bots like Expanse systematically scan IP addresses to identify active services. If your subdomain’s server is listening on ports 80/443 (HTTP/HTTPS), bots may:

Perform a port scan to detect open ports. Attempt common subdomains (e.g., userfileupload, upload, media) on the detected IP to guess valid domains. Fix:

Block unnecessary ports (e.g., close port 80/443 if unused). Use a firewall (e.g., ufw or Cloudflare Firewall Rules) to reject requests from suspicious IPs. 3. Cloudflare’s Default Behavior Page Rules or Workers: If the subdomain is configured with Cloudflare Workers, default error pages, or caching rules, it might generate responses that bots can crawl. For example:

A 404 Not Found page with a custom message could be indexed by search engines. Worker scripts might inadvertently expose endpoints (e.g., /_worker.js). Fix:

Delete unused subdomains from Cloudflare’s DNS settings. Ensure Workers/routes are only enabled for intended domains. 4. Reverse DNS Lookup IP-to-Domain Mapping: If your server’s IP address is shared or part of a broader range, bots might reverse-resolve the IP to discover associated domains (e.g., via dig -x ).

Fix:

Use a dedicated IP address for sensitive subdomains. Contact your ISP to request removal from public IP databases. 5. Authentication Flaws Presigned URLs in Error Messages: If the subdomain’s server returns detailed error messages (e.g., 403 Forbidden) when accessed without authentication, bots might parse these messages to infer valid endpoints or credentials.

Fix:

Customize error pages to show generic messages (e.g., "Access Denied"). Log and block IPs attempting brute-force access. How to Prevent Future Discoveries Remove Unused DNS Records: Delete the subdomain from Cloudflare’s DNS settings entirely. Disable Wildcards: Avoid .sampledomain.com wildcards to limit exposure. Firewall Rules: Block IPs from scanners (e.g., Palo Alto Networks, Expanse) using Cloudflare’s DDoS Protection or a firewall. Monitor Logs: Use tools like grep or Cloudflare logs to track access patterns and block suspicious IPs. Use Authentication: Require API keys, tokens, or OAuth for all subdomain requests. Example Workflow for Debugging bash # Check Cloudflare DNS records for the subdomain: dig userfileupload.sampledomain.com +trace

# Inspect server logs for recent requests: grep -E "^ERROR|DENY" /var/log/nginx/access.log

# Block Expanse IPs via Cloudflare Firewall: # 1. Go to Cloudflare > Firewall > Tools. # 2. Add a custom rule to block IPs (e.g., from scaninfo@paloaltonetworks.com). By tightening DNS, server, and firewall configurations, you can minimize exposure of your internal subdomains to bots.


👤 artursapek
presumably it has a DNS record