HACKER Q&A
📣 seenkitty

Burning $100K/week on LLM tokens – what are you doing to cut costs?


Burning $100K/week on LLM tokens – what are you doing to cut costs?


  👤 ThorMoerch16 Accepted Answer ✓
One thing people overlook: most agent frameworks send the FULL system prompt on every single turn. If you're doing 50+ turns in an agent session, that's 50x the system prompt tokens. We switched to a design where the system prompt is sent once and only a minimal 'reminder' is sent on subsequent turns. Huge savings.

👤 weldog26
One thing people overlook: most agent frameworks send the FULL system prompt on every single turn. If you're doing 50+ turns in an agent session, that's 50x the system prompt tokens. We switched to a design where the system prompt is sent once and only a minimal 'reminder' is sent on subsequent turns. Huge savings.

👤 SmartWonkey90
The 'dump everything into context' approach is the #1 cost driver I see in production agent systems. Smart routing — sending only the 2-3 most relevant files instead of the whole repo context — reduced our token usage by 60%. Tools like tree-sitter help identify which files are actually referenced.

👤 xihe-forge
if you're running coding agents on subscriptions rather than API (e.g. claude code max), the cost model is completely different — fixed monthly fee regardless of tokens. tradeoff is rate limits instead of dollars. for teams that can tolerate async batch processing overnight, it's dramatically cheaper than pay-per-token API calls.

👤 jeffreygoesto
Why don't you ask your beloved AI slaves themselves?

👤 chiengineer
What the fuck is the point of spamming hacker news with bots

? Whats the point?