HACKER Q&A
📣 georgehill

Is there a service that offers Common Crawl as an API?


I am trying to do some data analysis work. I don't want the full dataset. I want only two things: give me the hostname, and give me all the pages or URLs with their HTML.


  👤 pluto_modadic Accepted Answer ✓
there's index.commoncrawl.org where you can ask for a domain with wildcards.

👤 phillipseamore
Not that I know of but there are various tools like https://github.com/alwalxed/wayurls