Gemma 4 dropped two days ago and it's a pretty direct answer to this
question. Google DeepMind built it explicitly for local deployment, the 26B MoE activates only 3.8B parameters during inference (so it runs at roughly 4B cost while hitting near-31B benchmark quality), and the smaller E4B variant runs fully offline on an 8GB laptop. The 31B Dense currently ranks third among all open models on the Arena AI leaderboard.
The quality-per-parameter gap between local and cloud is closing faster
than most people expected.
That said, "worth it" still depends heavily on your hardware. A 4070 Ti
gets you a very different answer than a 3060.
Disclosure: I'm building localllm-advisor.com, free and client-side, which also helps answer these types of questions. It shows which models fit your GPU with quantization options and estimated tok/s, or which GPU you'd need to run a specific model. Relevant to the question so I'm mentioning it, but take it for what it is.