At home, I did the math, and its cheaper for me to buy credits for openai and use gpt4 than investing in graphics cards.I use maybe 5 dollars a month max
- B550MH motherboard
- Ryzen 3 4100 CPU
- 32GB (2x16) RAM cranked up to 3200MHz (prompt generation in memory bound)
- 256GB M.2 NVMe (helps with loading models faster)
- Nvidia 3060 12GB
Software-wise, I use llamafile because on the CPU it's faster by 10-20% for prompt processing than llama.cpp.
Performance "Meta-Llama-3.1-8B-Instruct-Q4_K_M.gguf":
CPU-only: 23.47 t/s (processing), 8.73 t/s (generation)
GPU: 941.5 t/s (processing), 29.4 t/s (generation)
I just that realized my 32 GB Mac M2 Max Studio is pretty good at running relatively large models using Ollama. And there's the Continue.dev VS Code plugin that can use it, but I feel that the suggested defaults aren't very optimal for this config.