However...I have a subdomain with a not obvious name, like: userfileupload.sampledomain.com
This subdomain IS LIVE but has NOT been publicized/posted anywhere. It's a custom URL for authenticated users to upload media with presigned url to my Cloudflare r2 bucket.
I am using CloudFlare for my DNS.
How did the internet find my subdomain? Some sample user agents are: "Expanse, a Palo Alto Networks company, searches across the global IPv4 space multiple times per day to identify customers' presences on the Internet. If you would like to be excluded from our scans, please send IP addresses/domains to: scaninfo@paloaltonetworks.com", "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_7; en-us) AppleWebKit/534.20.8 (KHTML, like Gecko) Version/5.1 Safari/534.20.8", "Mozilla/5.0 (Linux; Android 9; Redmi Note 5 Pro) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.89 Mobile Safari/537.36",
The bots are GET requests which are failing, as designed, but I'm wondering how the bots even knew the subdomain existed?!
The options how to find it are basically limitless. Best source is probably Certificate Transparency project as others suggested. But it does not end there, some other things that we do are things like internet crawl, domain bruteforcing on wildcard dns, dangling vhosts identification, default certs on servers (connect to IP on 443 and get default cert) and many others.
Security by obscurity does not work. You can not rely on "people won't find it". Once it's online, everyone can find it. No matter how you hide it.
I setup a set of scripts to log all "uninvited activity" to a couple of my systems, from which I discovered a whole bunch of these scanner "security" companies. Personally, I treat them all as malicious.
There are also services that track Newly Registered Domains (NRDs).
Tangentially:
NRD lists are useful for DNS block lists since a large number of NRDs are used for short term scam sites.
My little, very amateur, project to block them can be found here: https://github.com/UninvitedActivity/UninvitedActivity
Edited to add: Direct link to the list of scanner IP addresses (although hasn't been updated in 8 months - crikey, I've been busy longer than I thought): https://github.com/UninvitedActivity/UninvitedActivity/blob/...
[1] Turns out you can port-scan the entire internet in under 5 minutes: https://github.com/robertdavidgraham/masscan
Subfinder uses different public and private sources to discover subdomains. Certificate Transparency logs are a great source, but it also has some other options.
There are evidently technical/footprint implications of that convenience. Fortunately, I'm not really concerned with the subdomain being publicly known; was more curious how it become publicly known.
The way around this is to issue a wildcard for your root domain and use that. Your main domain is discoverable but your subs aren't.
There are other routes: leaky extensions, leaky DNS servers, bad internet security system utilities that phone home about traffic. Who knows?
Unless your IP address redirects to your subdomain —not unheard of— it's not somebody IP/port scanning. Webservers don't typically leak anything about the domains they serve for.
https://securitytrails.com/ also had my "secret" staging subdomain.
I made a catch-all certificate, so the subdomain didn't show up in CT logs.
It's still a secret to me how my subdomain ended up in their database.
(Alright, some IP addresses, not all of them)
I also wonder if this is a potential footgun for eSNI deployments: If you add eSNI support to a server, you must remember to also make regular SNI mandatory - otherwise, an eavesdropper can just ask your server nicely for the domain that the eSNI encryption was trying to hide from it.
Also note that your domains are live as they're allocated (they exist). Whether a web server or anything else actually backs them is a different question entirely.
For "secret" subdomains, you'll want a wildcard certificate. That way only that will show on the CT logs. Note that if you serve over IPv4, the underlying host will be eventually discovered anyways by brute-force host enumeration, and the domain can still be discovered using dictionary attacks / enumeration.
Never touched Cloudflare so this is as far as I can help you.
1) CZDS/DNS record sharing program
2) CT Logs
3) Browser SCT audit
4) Browser telemetry
5) DNS logs
6) DPI
7) Antivirus/OS telemetry
8) Virus/Malware/Tracker
9) Brute forcing DNS records
10) DNSSEC
11) Server softwares with AutoTLS
12) Servers screaming their hostnames over any protocol/banner thing
13) Typing anything on the browser search bar
14) Posting it anywhere
And many other novel ways I can't think of right now. I have successfully hidden some of my subdomains in the past but it definitely requires dedication. Simple silly mistakes can make all your efforts go waste. Ask any red/blue teamer.
Want to hide something? Roll everything on your own.
Another option are wildcard certificates.
This obviously can't be the only protection. But if an attacker doesn't know about a service, or misses it during discovery, they can't attack it.
Some may find this more desirable than wildcard certificates and their drawbacks.
If "the internet fails to find the subdomain" when using non-standard practices and conventions then perhaps "following the internet's recommendations", e.g., use Cloudflare, etc., might be partially at cause for discoverability.
Would be surprised if Expanse scans more than a relatively small selection of common ports.
2) Are you using TLS? Unless you are using a wildcard cert, then the FQDN will have been published as part of the certificate transparency logs.
https://pentest-tools.com/information-gathering/find-subdoma...
https://www.ghacks.net/2021/03/16/wonder-about-the-data-goog...
Based on this it sounds like you exposed your resource and advertised it for others. Reverse dns, get IP, scan IP.
Probably simpler, you exposed resource on IPV4 publicly, if it exists, it'll be scanned. There's probably 100s of companies scanning entire 0.0.0.0/0 space at all times.
I made my server return self signed certificate without domain name in it, in case of access to wevserver's port 433 by ip instead of by domain name.
The name "userfileupload" is far from not-obvious, so that would be my guess.
Maybe you published the subdomain in a cert?
Snooped traffic is unlikely.
This is a good question, if you don't publish a subdomain, scanners should not reach it. If they do, there's a leak in your infra.
There are countless of tools to use for subdomain enumeration. I personally use subfinder or amass when doing recon on bug bounty targets.
One thing you could do is use a wildcard certificate, and then use a non-obvious subdomain from that. I actually have something similar - in my set up, all my web-traffic goes to haproxy frontends which forward traffic to the appropriate backend, and I was sick of setting up multiple new certificates for each new subdomain, so I just replaced them all with a single wildcard cert instead. This means that I'm not advertising each new subdomain on the CT list, and even though they all look nominally the same when visiting - same holding page on index and same /api handling, just one of the subdomains decodes an additional URL path that provides access to status monitoring.
Separately, that Palo Alto Networks company is a real pain. They connect to absolutely everything in their attempts to spam the internet. Frankly, I'm sick of even my mail servers being bombarded with HTTP requests on port 25 and the resultant log spam.
Could have been discovered from the SSL cert request for the subdomain.
So my guess is reverse DNS
1. DNS Leaks or Wildcard Records Wildcard DNS Entries: If your main domain (sampledomain.com) has a wildcard DNS record (e.g., .sampledomain.com), any subdomain (including userfileupload.sampledomain.com) could be automatically resolved to your server’s IP. Even if the main domain is inactive, the wildcard might expose the subdomain.
Exposed Subdomain DNS Records: If the subdomain’s DNS records (e.g., A/CNAME records) are explicitly configured but not removed, bots could reverse-engineer them via DNS queries or IP scans.
Fix: Remove or restrict wildcard DNS entries and delete unused subdomain records from your DNS provider (e.g., Cloudflare).
2. Server IP Scanning IP-Based Discovery: Bots like Expanse systematically scan IP addresses to identify active services. If your subdomain’s server is listening on ports 80/443 (HTTP/HTTPS), bots may:
Perform a port scan to detect open ports. Attempt common subdomains (e.g., userfileupload, upload, media) on the detected IP to guess valid domains. Fix:
Block unnecessary ports (e.g., close port 80/443 if unused). Use a firewall (e.g., ufw or Cloudflare Firewall Rules) to reject requests from suspicious IPs. 3. Cloudflare’s Default Behavior Page Rules or Workers: If the subdomain is configured with Cloudflare Workers, default error pages, or caching rules, it might generate responses that bots can crawl. For example:
A 404 Not Found page with a custom message could be indexed by search engines. Worker scripts might inadvertently expose endpoints (e.g., /_worker.js). Fix:
Delete unused subdomains from Cloudflare’s DNS settings.
Ensure Workers/routes are only enabled for intended domains.
4. Reverse DNS Lookup
IP-to-Domain Mapping: If your server’s IP address is shared or part of a broader range, bots might reverse-resolve the IP to discover associated domains (e.g., via dig -x Fix: Use a dedicated IP address for sensitive subdomains.
Contact your ISP to request removal from public IP databases.
5. Authentication Flaws
Presigned URLs in Error Messages: If the subdomain’s server returns detailed error messages (e.g., 403 Forbidden) when accessed without authentication, bots might parse these messages to infer valid endpoints or credentials. Fix: Customize error pages to show generic messages (e.g., "Access Denied").
Log and block IPs attempting brute-force access.
How to Prevent Future Discoveries
Remove Unused DNS Records: Delete the subdomain from Cloudflare’s DNS settings entirely.
Disable Wildcards: Avoid .sampledomain.com wildcards to limit exposure.
Firewall Rules: Block IPs from scanners (e.g., Palo Alto Networks, Expanse) using Cloudflare’s DDoS Protection or a firewall.
Monitor Logs: Use tools like grep or Cloudflare logs to track access patterns and block suspicious IPs.
Use Authentication: Require API keys, tokens, or OAuth for all subdomain requests.
Example Workflow for Debugging
bash
# Check Cloudflare DNS records for the subdomain:
dig userfileupload.sampledomain.com +trace # Inspect server logs for recent requests:
grep -E "^ERROR|DENY" /var/log/nginx/access.log # Block Expanse IPs via Cloudflare Firewall:
# 1. Go to Cloudflare > Firewall > Tools.
# 2. Add a custom rule to block IPs (e.g., from scaninfo@paloaltonetworks.com).
By tightening DNS, server, and firewall configurations, you can minimize exposure of your internal subdomains to bots.