People who switched from GPT to their own models. How was it?
How was your experience in switching from GPT to a custom model for a production use case ?
A couple of months ago I attended a presentation of an on-prem LLM. An audience member asked, if it was using OpenAI in any way.
The presenter, somewhat overeagerly, "Why not ask our new AI?" and went on to type: "Are you an independent model or do you use OpenAI?"
To chat bot answered in flourish language that sure it was using ChatGPT as a backend. Which it was not and which was kind of the whole point of the presentation.
Obviously talking my own book here, but we've helped dozens of customers make the transition from prompted GPT-4 or GPT-3.5 to their own fine-tuned models at OpenPipe.
The most common reaction I get is "wow, I didn't expect that to work so well with so little effort". For most tasks, a fine-tuned Mistral 7B will consistently outperform GPT-3.5 at a fraction of the cost, and for some use cases will even match or outperform GPT-4 (particularly for narrower tasks like classification, information extraction, summarization -- but a lot of folks have that kind of task). Some aggregate stats are in our blog: https://openpipe.ai/blog/mistral-7b-fine-tune-optimized
I fine-tuned an LLM to do technical stuff. It works pretty darn good. What I actually discovered is that when evaluating LLMs, it is surprisingly difficult to evaluate them. And, also, that GPT 4 isn't that great, in general.
Running Mistral-Instruct-0.1 for call/email summarization, Mixtral for contract mining & OpenChat to augment agentic chatbot equipped with RAG tools(Instruct again).
Experience has been great, INT8 tradeoffs are acceptable until hardware FP8(FP4 anyone?) becomes more widely & cheaply available. On-prem costs have been absorbed already for few boxes of A100s & legacy V100s running millions of such interactions.
I prefer to use local models when running data extraction or processing over 10k or more records. Hosted services would be slow and brittle at this point.
Mistral 7B fine-tunes (OpenChat is my favorite) just chug through the data and get the job done.
Details: using vLLM to run the models. Using ChatGPT-4 to condense information for complex prompts (that the local models will execute).
I think, the situation will just keep on getting better with each month.
We support both in our app and enterprise product. The APIs (OpenAI) vs libraries (i.e. llama.cpp for on-device) are so similar that the switch is basically transparent to the user. We're adding support for other platforms APIs soon, and everything we've looked so far is as easy to integrate as OpenAI - except Google that for some reason complicates everything on Google Cloud.
My 2024 prediction is we will see far more people moving off of openai once they encounter its cost and latency compared to (less proven/scaled) competitors. It’s often a speed versus quality tradeoff, and I’ve seen multiple providers 3x faster than OpenAI with far more than 1/3 the quality
I tested a bunch of models while building https://double.bot but ended up back on gpt4. Other models are fun to play with but it gets frustrating even if they miss 1/100 questions that gpt4 gets. I find that right now I get more value implementing features around the model that fixes all the GitHub copilot papercuts (autocomplete that closes brackets properly, auto import upon accepting suggestions, disable suggestions when writing comments to be less distracting, midline completions, etc etc)
Hopefully os models can catch-up to gpt4 in the next six months when we fixed all the low hanging fruit outside of the model itself
To add to this question, are there LLMs that I can run on my own data, that also can provide citations similar to the way phind.com does for their results? Even better if they are multilingual.
Mistral 7B was great for flights without wifi! Answers pretty good for information you need to find but its step by step instructions are hit or miss when it tries to do it for you.
Mixed results. I think llama2 in general is pretty bad, especially at anything else than english. I've had very good results with Mixtral for Chat.
Of course all of them feel like a Frankenstein compared to actual ChatGPT. They feel similar and work just as well until, sometimes, they put out complete and utter garbage or artifacts and you wonder if they skimped on fine-tuning.
We have a first pass with our own model and then escalate to gpt if we aren't sure of our own model's results.
Im using mixtral 8x7b (q5) for my use cases, such as scripting, searching for ideas and or definitions that i allways need to factcheck.
Currently i use lmstudio on my m2 with 96gb ram. But i‘m looking into switchin to ollama or another oss solution.
Mistral 7B was great for flights without wifi! Answers pretty good for information you need to find but its step by step instructions are hit or miss when it tries to do it for you.
Anyone has a tutorial how to achieve it to own a self-hosted model?
It’s been ok. I’m running llama2 7b and it’s … fine. The results I get from gpt4 aren’t much better. This for general tasks.
Mostly I think I need to use LLMs more effectively
Would be great if people could share their app demo, host & model used/trained for better context.
The main hurdle is lack of multilingual support in open weights models.
For personal use I tried this.
I no longer use llms.