Is it feasible to run a model on device for complete privacy?
Tried Gemma, Qwen and a few others. Need vision and larger context windows for an application I am working on. Results were quite poor Gemma 4E2B probably the best of the ones I tired but still fell apart and keep hallucinating with ~5000 tokens. Cloud based models had no problems even even Gemini 3.1 Flash-Lite and GPT-5.4 mini do a lot better and a way faster.
Feasible but too expensive! I get that privacy is a priority for you but unfortunately if you want quality models you'd still have to maybe use frontier closed models..
It's technically feasible, really just a question of whether this is worth $10,000(s) to you and you're willing to spend it.