HACKER Q&A
📣 rkwz

What do you do with Local LLMs?


There are quite a few local LLMs released in the past few months, do you run them? If so, what are your usecases?

Personally, I find them good, but very slow (RTX 3050, Mistral 7B) and hard to make them output in a consistent format (JSON, bullet points). GPT 3.5 makes it look like a pointless exercise from a speed and consistency perspective.

Any usecases for Local LLMs apart from them being local so we can feed sensitive documents?


  👤 gymbeaux Accepted Answer ✓
Other than feeding sensitive documents? I’m not sure. OpenAI has certainly gone out of its way to eliminate the need for local LLMs- they offer fine-tuning of GPT 3.5 and 4, so it’s harder to argue you might get better results from a local model for a particular task with particular data. Though I don’t have personal experience with the ChatGPT fine-tuning.

It’s fun running models from HuggingFace on my computer. Finally, something that utilizes my computer’s 64GB of RAM and 24GB of VRAM. It’s neat seeing the immense performance difference between CPU (Ryzen 7 5700X) and GPU (RTX 3090) offloading.

I think as with most “Cloud vs on-prem” arguments, it comes down to cost vs convenience. Building an application on Azure or AWS is as easy as it gets, but if money is scarce, you can’t beat on-prem for raw resources.

I’m writing a program right now that will query ChatGPT 4 with… A LOT of tokens. We project it will cost between $5k and $15k and probably run for around 2 days. *OR* I could feed that same data through a local model running on the RTX 3090 and it’ll cost like $20 in electricity, and take maybe 6 days.


👤 haxel
I use Mistral (7B v0.2 instruct, 6-bit quantized) to generate the title-text for messages that I send to myself via a Discord bot.

Right now, I'm prompting Mistral to generate these titles in "clickbait" style. I fold the topic of the message and other context into the prompt.

My intention is to shift my attention to the message, which shifts my attention to something else I need to do, because I tend to over-focus on whatever I'm doing at the moment.

It doesn't matter whether what I'm doing at the moment is "good" or "bad". Based on probability, I should almost always switch my attention when I receive such a message because I should have switched an hour ago.

To guarantee consistent JSON output, I use a llama.cpp grammar (converted from JSON schema)

Generation is via CPU (Ryzen 5800) because it's an async background operation and also because my 1070 GPU is being used by Stable Diffusion XL Turbo to generate the image that goes along with the message.


👤 _akhe
The most profitable use case for local LLMs will be one where the end user doesn't even know a local LLM is running, in the same way that a user doesn't know what libraries Photoshop is running, to them it's just Photoshop.

For example, lets say some image editing software decided to use Stable Diffusion to fill in image data in one of their Content Aware tools or something, they would not tell the user to install and run Ollama or sdapi from their CLI. They would install the LLMs when you install the app, and talk to it when you use the app. The end user would never know an LLM is being ran locally, any more than they know DirectX is running. (some might)

I like this use case because image/music/video editing software already requires good CPU/GPU, and in the case of Photoshop, I'm used to my fans blaring when I run Filter Gallery (lol) I as the end user would not need to know that LLMs are being invoked as I use software.

I think this use case is a lot stronger than any cloud-based one as long as it's this expensive to run GPU in the cloud - and the fact that present cloud behavior is to use one of the Big 3, anyone looking for cloud AI will use an OpenAI or another major provider - in the end something from Microsoft, Google, etc.


👤 brokenmachine
So after a day up, the only actual answers here are somebody using it to distract themselves from getting too engrossed in what they're doing by sending a clickbait discord post to themself, and someone who wants an offline search engine that hallucinates.

AI seems totally not like a giant bubble.


👤 muzani
Cut off the internet and use it as an offline search engine. That way you can really just lock yourself in a room and build something without the distraction of social media and memes.