Burning $100K/week on LLM tokens – what are you doing to cut costs?

Question

Burning $100K/week on LLM tokens &ndash; what are you doing to cut costs?

ThorMoerch16 · Accepted Answer

One thing people overlook: most agent frameworks send the FULL system prompt on every single turn. If you're doing 50+ turns in an agent session, that's 50x the system prompt tokens. We switched to a design where the system prompt is sent once and only a minimal 'reminder' is sent on subsequent turns. Huge savings.

SmartWonkey90 · Answer

The 'dump everything into context' approach is the #1 cost driver I see in production agent systems. Smart routing &mdash; sending only the 2-3 most relevant files instead of the whole repo context &mdash; reduced our token usage by 60%. Tools like tree-sitter help identify which files are actually referenced.

xihe-forge · Answer

if you're running coding agents on subscriptions rather than API (e.g. claude code max), the cost model is completely different &mdash; fixed monthly fee regardless of tokens. tradeoff is rate limits instead of dollars. for teams that can tolerate async batch processing overnight, it's dramatically cheaper than pay-per-token API calls.

jeffreygoesto · Answer

Why don't you ask your beloved AI slaves themselves?

chiengineer · Answer

What the fuck is the point of spamming hacker news with bots? Whats the point?