HACKER Q&A
📣 gregw134

Improving LLM Performance


I'm trying to figure out whether it's possible to use LLM's to categorize the internet. Using back of napkin math, if it takes a few seconds per web page this would take $XXX,XXX+ to process the common crawl. Does anyone have tips on speeding up LLMs? Is it possible to use LLMs to train a cheaper student model? Thanks!


  👤 throwaway888abc Accepted Answer ✓
You should have look at DMOZ.

https://dmoz-odp.org/

It feels so old to write this, but back in the days this one was important.

Found this on Kagle: https://www.kaggle.com/datasets/shawon10/url-classification-...


👤 PaulHoule
Common Crawl is full of real junk, you'd need some kind of classifier just to pick out the stuff that's worth classifying...