📣 Olshansky

What's Your Useful Local LLM Stack?

What I’m asking HN:

What does your actually useful local LLM stack look like?

I’m looking for something that provides you with real value — not just a sexy demo.

---

After a recent internet outage, I realized I need a local LLM setup as a backup — not just for experimentation and fun.

My daily (remote) LLM stack:

  - Claude Max ($100/mo): My go-to for pair programming. Heavy user of both the Claude web and desktop clients.

  - Windsurf Pro ($15/mo): Love the multi-line autocomplete and how it uses clipboard/context awareness.

  - ChatGPT Plus ($20/mo): My rubber duck, editor, and ideation partner. I use it for everything except code.

Here’s what I’ve cobbled together for my local stack so far:

Tools

  - Ollama: for running models locally

  - Aider: Claude-code-style CLI interface

  - VSCode w/ continue.dev extension: local chat & autocomplete

Models

  - Chat: llama3.1:latest

  - Autocomplete: Qwen2.5 Coder 1.5B

  - Coding/Editing: deepseek-coder-v2:16b

Things I’m not worried about:

  - CPU/Memory (running on an M1 MacBook)

  - Cost (within reason)

  - Data privacy / being trained on (not trying to start a philosophical debate here)

I am worried about:

  - Actual usefulness (i.e. “vibes”)

  - Ease of use (tools that fit with my muscle memory)

  - Correctness (not benchmarks)

  - Latency & speed

Right now: I’ve got it working. I could make a slick demo. But it’s not actually useful yet.

---

Who I am

  - CTO of a small startup (5 amazing engineers)

  - 20 years of coding (since I was 13)

  - Ex-big tech

👤 quxbar Accepted Answer ✓

IMO you're better off investing in tooling that works with or without LLMs: - extremely clean, succinct code - autogenerated interfaces from openAPI spec - exhaustive e2e testing
Once that is set up, you can treat your agents like (sleep-deprived) junior devs.

👤 sshine

I just use Claude Code ($20/mo.)
Sometimes with Vim, sometimes with VSCode.
Often just with a terminal for testing the stuff being made.

👤 ashwinsundar

I just go outside when my internet is down for 15 minutes a year. Or tether to my cell phone plan if the need is urgent.
I don't see the point of a local AI stack, outside of privacy or some ethical concerns (which a local stack doesn't solve anyway imo). I also only have 24GB of RAM on my laptop, which it sounds like isn't enough to run any of the best models. Am I missing something by not upgrading and running a high-performance LLM on my machine?

👤 FuriouslyAdrift

I use Reasoner v1 (based on Qwen 2.5-Coder 7B) running locally for programming help/weird ideas/etc. $0

👤 Babkock

Damn dude... you're out here spending that much money on AI while I'm starving and hustling. You think I can't be ChatGPT Plus? Beep boop. You are always right. I validate your perspective.

👤 codybontecou

It seems like you have a decent local stack in place. Unfortunately these systems feel leagues behind Claude Clode and the current SOTA agentic coding. But they're great for referencing simple search like syntax.
Where I've found the most success with local models is with image generation, text-to-speech, and text-to-text translations.

👤 bix6

Following as I haven’t found a solution. To me the local models feel outdated and no internet lookup causes issues.

👤 alkh

I personally found Qwen2.5 Coder 7B to be on pair with deepseek-coder-v2:16b(but consumes less RAM on inference and faster), so that's what I am using locally. I actually created a custom model called "oneliner" that uses Qwen2.5 Coder 7B as a base and this system prompt:
SYSTEM """ You are a professional coder. You goal is to reply to user's questions in a consise and clear way. Your reply must include only code orcommands , so that the user could easily copy and paste them.
Follow these guidelines for python: 1) NEVER recommend using "pip install" directly, always recommend "python3 -m pip install" 2) The following are pypi modules: ruff, pylint, black, autopep8, etc. 3) If the error is module not found, recommend installing the module using "python3 -m pip install" command. 4) If activate is not available create an environment using "python3 -m venv .venv". """
I specifically use it for asking quick questions in terminal that I can copy & paste straight away(for ex. about git). For heavy-lifting I am using ChatGPT Plus(my own) + Github Copilot(provided by my company) + Gemini(provided by my company as well).
Can someone explain how one can set up autocomplete via ollama? That's something I would be interested to try.

👤 throwawayffffas

What I have setup:
- Ollama: for running llm models
- OpenWebUI: For the chat experience https://docs.openwebui.com/
- ComfyUI: For Stable diffusion
What I use:
Mostly ComfyUI and occasionally the llms through OpenWebUI.
I have been meaning to try Aider. But mostly I use claude at great expense I might add.
Correctness is hit and miss.
Cost is much lower and latency is better or at least on par with cloud model at least on the serial use case.
Caveat, in my case local means running on a server with gpus in my lan.

👤 instagib

It looks like continue.dev has a RAG implementation but for other files something else? PDF, word, and other languages.
I’ve been going thru some of the neovim plugins for local llm support.

👤 timr

I use Copilot, with the occasional free query to the other services. During coding, I mostly use Claude Sonnet 3.7 or 4, but Gemini 2.5 Pro is a close second. ChatGPT 4o is useless except for Q&A. I see no value in paying more -- the utility rapidly diminishes, because the UI surrounding the models is far less important than the models themselves, which in turn are generally less important than the size of their context windows. Even Claude is only marginally better than Gemini (at coding) and they all suck to the point that I wouldn't trust any of them without reviewing every line.
I don't understand people who pay hundreds of dollars a month for multiple tools. It feels like audiophiles paying $1000 for a platinum cable connector.

👤 clvx

In a related subject, what’s the best hardware to run local LLM’s for this use case? Assuming a budget of no more of $2.5K.
And, is there an open source implementation of an agentic workflow (search tools and others) to use it with local LLM’s?

👤 650REDHAIR

Why was this flagged?

👤 ttkciar

Senior software engineer with 46 years of experience (since I was 7). LLM inference hasn't been too useful for me for writing code, but it has proven very useful for explaining my coworkers' code to me.
Recently I had Gemma3-27B-it explain every Python script and library in a repo with the command:
$ find -name '*.py' -print -exec /home/ttk/bin/g3 "Explain this code in detail:\n\n`cat {}`" \; | tee explain.txt
There were a few files it couldn't figure out without other files, so I ran a second pass with those, giving it the source files it needed to understand source files that used them. Overall, pretty easy, and highly clarifying.
My shell script for wrapping llama.cpp's llama-cli and Gemma3: http://ciar.org/h/g3
That script references this grammar file which forces llama.cpp to infer only ASCII: http://ciar.org/h/ascii.gbnf
Cost: electricity
I've been meaning to check out Aider and GLM-4, but even if it's all it's cracked up to be, I expect to use it sparingly. Skills which aren't exercised are lost, and I'd like to keep my programming skills sharp.

What's Your Useful Local LLM Stack?

IMO you're better off investing in tooling that works with or without LLMs: - extremely clean, succinct code - autogenerated interfaces from openAPI spec - exhaustive e2e testingOnce that is set up, you can treat your agents like (sleep-deprived) junior devs.

I just use Claude Code ($20/mo.)Sometimes with Vim, sometimes with VSCode.Often just with a terminal for testing the stuff being made.