I might be overthinking this, but:
For a coding agent like Devin, the output of a tool use is usually in the system environment like the terminal state.
When assigned a task, Devin would (probably) first pull the github repo onto a ephemeral firecracker-like container instance that is hardened and isolated. It would then connect to and call an LLM API service with context of (user query, terminal state, existing code) and execute tool-calls recommended by the LLM.
It'd continue doing so in a loop until it's goal (user query) is reached.
Now, how are they storing effective 'agent state' to fight against container reliability concerns. I've always read that stateful API services are a bad idea.
In a way the sandbox service is also an API, but if user sessions are mapped 1:1 to sandboxes, how do you achieve statelessness in the compute layer without storing or caching entire system states somewhere?
I can't speak for other providers, but the way we handle state is by persisting to mounted volumes that are re-mounted on restart. I'd argue that there are a few different types of sandboxes, as there are sandboxes designed for pure code execution, such as in an LLM chat, and there are more longer-lived dev environments more like a standard development environment, etc (we are the latter). If it helps, we did a write up last year of our experiences with trying to build on top of ephemeral type architectures.