HACKER Q&A
📣 tmaly

Training a model on all HN data?


I just had a thought, maybe dang could chime in. Has anyone considered training a model or fine tuning a model on all of hacker news discussions?


  👤 minimaxir Accepted Answer ✓
It's relatively straightforward to download all HN submissions/comments via BigQuery and then finetune an LLM, there's just not much point to it.

You can safely assume all modern LLMs have been trained in part on HN data.


👤 anigbrowl
HN was part of the training set for ChatGPT. But it might be interesting to train/fine tune on HN alone. You could weight by karma or conversely you might identify shortcomings in the karma system.

👤 pavel_lishin
To what end?