Training a model on all HN data?

Question

I just had a thought, maybe dang could chime in. Has anyone considered training a model or fine tuning a model on all of hacker news discussions?

minimaxir · Accepted Answer

It's relatively straightforward to download all HN submissions/comments via BigQuery and then finetune an LLM, there's just not much point to it.You can safely assume all modern LLMs have been trained in part on HN data.

anigbrowl · Answer

HN was part of the training set for ChatGPT. But it might be interesting to train/fine tune on HN alone. You could weight by karma or conversely you might identify shortcomings in the karma system.

pavel_lishin · Answer

To what end?