How to find subdomains and paths for a website
I want to scrap a random website, all of its paths, and subdomains. One way I thought of doing it was to scrap the initial site, find links on that site, and recursively keep scrapping. Is there any better way to find paths and subdomains? What do you guys think, do you have any suggestions?
I have seen many YC companies scrapping data somehow.
1. Look at the certificate and look at the CA that signed it;
2. research which certificate transparency logs are used by that CA;
3. search the domain in the CT log.
4. Now you can scrape the site(while respecting the robots.txt like WantonQuantum mentioned)