By “misbehaving” I mean things like: -runaway spend -latency issues -prompt loops -tool abuse or unexpected external calls -data leakage risks -cascading failures across downstream services
In most systems I’ve seen, there is good observability. You can see logs, traces, cost dashboards. But the actual shutdown mechanism often ends up being manual: disable a feature flag, revoke an API key, roll back a deployment, rate limit something upstream.
I am trying to understand what people are doing in practice.
-What is your actual kill mechanism? -Is it bound to a model endpoint, an agent instance, a workflow, a Kubernetes workload, something else? -Is shutdown automated under certain conditions, or always human-approved? -What did you discover only after your first real incident?
Concrete examples would be extremely helpful.
1. Circuit breakers per-agent with token/cost ceilings. If an agent burns through more than X tokens in Y seconds, it gets hard-stopped at the proxy layer before the request even hits the model provider. This catches runaway loops fast.
2. Tool-level allowlists with runtime revocation. Each agent has an explicit list of tools/APIs it can call. We can revoke individual tool access without killing the whole agent — useful when you discover it's hammering one specific external service.
3. Graceful degradation before kill. For non-critical paths, we drop to a cached/static fallback rather than killing outright. Full kill is reserved for safety-critical cases (data leakage risk, unauthorized external calls).
4. The actual kill mechanism is boring on purpose: a feature flag that gates the agent entrypoint, backed by a fast-propagating config system (sub-second). Kubernetes pod kills are too slow when you need to stop something mid-execution.
The thing we learned the hard way: observability without automated circuit breakers is just watching a fire. Our first incident was a prompt loop that we could see clearly in traces but took 8 minutes to manually kill because the on-call had to figure out which deployment to roll back. Now the circuit breaker fires automatically and pages the human to decide whether to re-enable.
Biggest gap I still see: there's no good standard for "agent-level observability" the way we have for microservices. Traces help but they don't capture the semantic intent of what an agent was trying to do when it went off the rails.