How to find subdomains and paths for a website

Question

I want to scrap a random website, all of its paths, and subdomains. One way I thought of doing it was to scrap the initial site, find links on that site, and recursively keep scrapping. Is there any better way to find paths and subdomains? What do you guys think, do you have any suggestions? I have seen many YC companies scrapping data somehow.

WantonQuantum · Accepted Answer

Threre's no perfect way to scrape an entire site if some parts aren't reachable from links but there are a lot of tools to help with what you want to do. A google search turns up lots of information and alternative approaches:
https://www.google.com/search?q=how+to+scrape+a+complete+web...
Be sure to honor robots.txt:
https://en.wikipedia.org/wiki/Robots.txt

stop50 · Answer

1. Look at the certificate and look at the CA that signed it; 2. research which certificate transparency logs are used by that CA; 3. search the domain in the CT log. 4. Now you can scrape the site(while respecting the robots.txt like WantonQuantum mentioned)

pabs3 · Answer

ArchiveTeam have a page on this topic:https://wiki.archiveteam.org/index.php/Finding_subdomains

whatamidoingyo · Answer

Are you looking for something like Gobuster?https://github.com/OJ/gobuster

How to find subdomains and paths for a website

1. Look at the certificate and look at the CA that signed it; 2. research which certificate transparency logs are used by that CA; 3. search the domain in the CT log. 4. Now you can scrape the site(while respecting the robots.txt like WantonQuantum mentioned)

ArchiveTeam have a page on this topic:
https://wiki.archiveteam.org/index.php/Finding_subdomains

Are you looking for something like Gobuster?
https://github.com/OJ/gobuster