Has anyone deployed LLMs to production?

Question

I have been trying to tune Gemini flash to do some classification for me and it's not performing well at all. I had to change a lot of prompts and still it didn't seem to "learn" anything from the training set. The classification embarrassingly lacks common sense.Has anyone used AI for anything useful? Apart from programming of course.

muzani · Accepted Answer

They're great at first level customer service. Lots of questions are repetitive and they go through this better than humans. It was the biggest boost to customer satisfaction rating.
On the other end, I actually canceled a $100/month subscription once through email (it was company email that I no longer had access too). Gave evidence. It canceled the subscription within 20 mins.
Also gemini flash is unreliable. The best cost efficiency today seems to be gpt-4.1. The cheaper models seem to be okay for summarization mostly. Gemini Flash was much better a year ago, still unreliable, but at least it followed instructions.

mooreds · Answer

We use it heavily for doc search. We bought Kapa.ai a few years ago and leverage their solution, not an in-house build.

byoung2 · Answer

I was having trouble getting GPT-4o to extract data like address, email, phone, tracking number from random emails in an inbox. Sometimes it would do it perfectly and other times it would fail miserably on a similar email. Then I tried asking it to first markup the email with schema.org metadata. Then I asked it to extract the data from the schema.org markup. That worked nearly every time.Maybe there is an extra step you can work into your prompt that would help it get to the proper classification

nkristoffersen · Answer

I use roughly 100 billion LLM tokens for NLP purposes per month. A mix of self hosted and cloud hosted models. But I have not attempted any fine tuning. Just prompt, (and perhaps more importantly) context engineering.

lopesyong77 · Answer

LLMs in production? Yeah we've shipped a few. The key is to stop treating them like magic - they're just really fancy pattern matchers. Gemini Flash is fast but dumb as rocks for classification unless you engineer the hell out of your prompts.
What worked for us: • Start with tiny focused tasks instead of broad classification
• Chain multiple smaller models with strict output formatting
• Build proper evaluation metrics before tuning
• Accept that 90% accuracy is often good enough
For non-coding use, we've had success with document routing and basic sentiment tagging. The real win is combining LLMs with traditional ML - let the LLM handle fuzzy text parsing, then pipe clean data into simpler classifiers.
BTW if you're into audio generation, check out our AI ASMR generator at https://asmrvideo.org - we're using fine-tuned models to create those sweet brain tingles. Way more reliable than trying to make Gemini understand common sense.

incomingpain · Answer

I have Microsoft's Phi4 deployed onto https://mapleintel.ca for the AI side. Currently over 44,000 ips in that list.I tried 'reasoning plus' but it was so much slower.