I had in mind software developer basics: powerful CPU to compile code faster, plenty of RAM to appease tools, and integrated graphics only.
I was thinking something like an AMD Ryzen 5 9600X or an Intel Core i5-14600K: pretty powerful, but not going as far as 9900X or 265K, where spending more only really gets you more cores (OK, Core 14?00 → Core Ultra 2?5 is a solid single-core boost), and returns diminish.
But I’m interested in doing local GenAI, and might let that influence my part choice. The bits of GenAI I’ve used locally (up to 8B LLMs, and Stable Diffusion) have been sometimes handy and/or fun, but are all either too slow or too low quality to use particularly seriously.
The question, then: when building a machine that might be used for some local GenAI, what’s worthwhile, and what’s a waste of time? (Maybe the answer is “just wait, things aren’t ready yet” or “those online services are cheaper than you think, you’ll never break even if you run local”; I don’t know.)
A few more specific questions:
1. Is a small dedicated GPU of any value at all? (Given how speculative such uses are, my feeling is that I’d have a hard time justifying to myself even one with 8GB of VRAM. Certainly nothing like 24GB!) I’ve heard of offloading some layers to a GPU, but it’s hard to find anyone talking about concrete performance effects.
2. Are NPUs worthwhile, or completely overrated, or not even usable just yet (and is that expected to change?)? e.g. AMD Ryzen AI in an 8700G, or Intel AI Boost in a Core Ultra 245K.
3. What sorts of speeds can you get out of some of your mid-sized LLMs (say, 32B), running on CPU only? And does performance scale to more cores?
If someone who’s done stuff like this wanted to write a detailed blog post covering these sorts of things (or has already written it, and I just haven’t found it), they’d be my hero for at least three and a quarter hours (discontiguous).
—⁂—
¹ My laptop is an ASUS Zephyrus G15 (2021), GA503QM. Has an NVIDIA 3060 GPU with 6GB of VRAM at 80W TDP. I would genuinely have preferred not to have a dGPU, but no one was selling anything with good CPU and half-decent screen, without also including dGPU. Such a waste of money and weight. I’ve used it at all about half a dozen times. Dual-GPU is such a bother, especially where it involves NVIDIA and Linux, though I’m glad to see that’s improving.
I have a Ryzen 3 4100. Just tested Qwen2.5-Coder-32B-Instruct-Q3_K_S.gguf with llama.cpp.
CPU-only:
54.08 t/s prompt eval
2.69 t/s inference
---
CPU + 52/65 layers offloaded to GPU (RTX 3060 12GB):
166.79 t/s prompt eval
6.62 t/s inference