Replay Log¶

niels-bantilan: "A replay log is essentially a service that records the state of an agent and its subtasks at a super granular level at each step."

Benefits: - Avoid re-executing tasks — expensive LLM calls and tool calls are memoised. - Prevent memory and context loss across crashes. - Data lineage — "You get data lineage just out of the box." - Debuggability — you can step backwards through the agent's state.

Flyte 2 implementation¶

"The first time you hit the LLM, the output of that agent state is stored just to object store in S3 or something. You don't need to write any of your serialization deserialization code. It's just there."

Why this matters¶

Replay logs are the durability primitive under cloud agent fleets. Without them, Zach Lloyd's cloud-agent-primitives become fragile: every spot-instance preemption loses an agent's context. With them, spot instances become a cost optimisation rather than a reliability risk (as demonstrated by Bantilan's deep-research case study).