HACKER Q&A
📣 tempElena

How to find subdomains and paths for a website


I want to scrap a random website, all of its paths, and subdomains. One way I thought of doing it was to scrap the initial site, find links on that site, and recursively keep scrapping. Is there any better way to find paths and subdomains? What do you guys think, do you have any suggestions? I have seen many YC companies scrapping data somehow.


  👤 WantonQuantum Accepted Answer ✓
Threre's no perfect way to scrape an entire site if some parts aren't reachable from links but there are a lot of tools to help with what you want to do. A google search turns up lots of information and alternative approaches:

https://www.google.com/search?q=how+to+scrape+a+complete+web...

Be sure to honor robots.txt:

https://en.wikipedia.org/wiki/Robots.txt


👤 stop50
1. Look at the certificate and look at the CA that signed it; 2. research which certificate transparency logs are used by that CA; 3. search the domain in the CT log. 4. Now you can scrape the site(while respecting the robots.txt like WantonQuantum mentioned)

👤 pabs3
ArchiveTeam have a page on this topic:

https://wiki.archiveteam.org/index.php/Finding_subdomains


👤 whatamidoingyo
Are you looking for something like Gobuster?

https://github.com/OJ/gobuster