HACKER Q&A
📣 Marius_Manola

Do you fine-tune LLMs at all?


If you do fine-tune LLMs at all, what for? Will RAG be enough for most use cases?


  👤 PaulHoule Accepted Answer ✓
I make classification models based on BERT-family embeddings and classical ML algorithms from scikit-learn. These take about 3 minutes to train, calibrate and evaluate and in the process of doing this the system has actually trained 20 or so models and selected the best.

I've tried fine-tuning classifiers based on BERT-family models and found it takes 30 minutes to train one model and the results I get are pretty unreliable. The best models are better than my embedding + classical models by a hair but the average model is worse.

I find it nervewracking that I don't have clear guidance on how long to train the model like I did back in the old days when I could train networks with early stopping.

I read a lot of papers where people trained a model to do something similar and usually they copied the training parameters out of some other paper and didn't seem to really think about the parameters. The "best practice" seems to be you should build a large number of models with different parameters and pick the best model but hardly anyone does it because it would take too long and if you started thinking too hard about how your parameters affect your results your head might just explode.

Granted my problems tend to have an upper limit on possible accuracy so it might not be possible to turn my 0.81 ROC to a 0.97 ROC whatever I do. I am definitely looking for a problem where I can get better accuracy and also thinking about fine-tuning a T5 to do classification tasks (like put #hashtags on a Mastodon post) but I am in no hurry.