Burning $100K/week on LLM tokens – what are you doing to cut costs?
Burning $100K/week on LLM tokens – what are you doing to cut costs?
One thing people overlook: most agent frameworks send the FULL system prompt on every single turn. If you're doing 50+ turns in an agent session, that's 50x the system prompt tokens. We switched to a design where the system prompt is sent once and only a minimal 'reminder' is sent on subsequent turns. Huge savings.
One thing people overlook: most agent frameworks send the FULL system prompt on every single turn. If you're doing 50+ turns in an agent session, that's 50x the system prompt tokens. We switched to a design where the system prompt is sent once and only a minimal 'reminder' is sent on subsequent turns. Huge savings.
The 'dump everything into context' approach is the #1 cost driver I see in production agent systems. Smart routing — sending only the 2-3 most relevant files instead of the whole repo context — reduced our token usage by 60%. Tools like tree-sitter help identify which files are actually referenced.
if you're running coding agents on subscriptions rather than API (e.g. claude code max), the cost model is completely different — fixed monthly fee regardless of tokens. tradeoff is rate limits instead of dollars. for teams that can tolerate async batch processing overnight, it's dramatically cheaper than pay-per-token API calls.
Why don't you ask your beloved AI slaves themselves?
What the fuck is the point of spamming hacker news with bots
? Whats the point?