Graph State Machine
The Engine
The executor is a state machine that receives three inputs: a GraphSpec (the declarative graph definition), a Goal (the high-level objective), and input data (initial key-value pairs seeded into shared memory). It runs a while steps < max_steps loop, advancing one node at a time until the graph reaches a terminal state or a safety limit is hit.
The design is declarative. Define the goal and graph shape, and the executor handles sequencing, branching, data flow, and error recovery. Nodes declare what they read and write. Edges declare when they fire. The executor stitches it all together at runtime.
The executor is deliberately narrow in scope. It does not interact with LLMs directly, does not run tool calls, does not evaluate judge verdicts, and does not manage session lifecycle. Those responsibilities belong to the nodes themselves (EventLoopNode for LLM interaction, JudgeProtocol for evaluation) and the outer ExecutionStream for session management. The executor's only job is graph traversal and state management.
The Main Loop
Each iteration of the main loop performs a fixed ten-step cycle. The executor repeats this cycle until no more nodes are reachable or the step limit is exceeded.
Edge Resolution
After a node completes, the executor evaluates every outgoing edge to decide where to go next. Five condition types cover the spectrum from unconditional transitions to goal-aware LLM routing.
always
Unconditional transition. The edge fires every time the source node completes, regardless of outcome. Used for linear sequences where every node leads to exactly one successor.
on_success
Fires only when the source node completes without error. The default happy-path connector. If the node raises an exception or the judge escalates, this edge is skipped.
on_failure
Fires only when the source node fails. Provides fallback paths for error handling, routing to recovery nodes or human handoff points. If no failure edge exists, the failure propagates to the graph level.
conditional
Evaluates an expression against node output using safe_eval, a restricted AST evaluator that allows only comparisons, boolean logic, and membership tests. No arbitrary code execution. Expressions reference output keys directly, such as "confidence >= 0.8" or "category in ['billing', 'technical']". Evaluated in declaration order, first match wins.
llm_decide
Goal-aware LLM routing. The executor sends the graph goal, current node output, and a list of candidate target nodes to a fast LLM. The model returns a JSON response naming the target node and its reasoning. If the LLM call fails or returns an unrecognised target, a declared fallback edge fires instead. This is the only edge type that costs an LLM call.
Priority rules: Conditional edges are evaluated in declaration order, and the first match wins. This prevents accidental fan-out from multiple conditional edges firing simultaneously. The always type is evaluated last and only fires if no conditional edge matched.
Fan-Out / Fan-In
When edge resolution produces multiple traversable edges from a single node, the executor fans out into parallel branches. Each branch executes independently, and the executor waits for all branches to converge before continuing.
fail_all
If any branch fails, cancel all other branches immediately and propagate the failure. Use when branches are interdependent and partial results are meaningless.
continue_others
If a branch fails, let the remaining branches finish. Collect partial results from successful branches. Use when branches are independent and partial results have value.
wait_all
Wait for all branches to complete regardless of individual success or failure. Collect all results and all errors. Use when you need a complete picture before deciding how to proceed.
Memory conflicts: When parallel branches write to the same memory key, the conflict strategy determines the outcome: last_wins (default, last branch to finish overwrites), first_wins (first branch to finish persists), or error (raise an exception on conflict).
Retry & Failure
The system has two deliberate retry layers that operate at different levels. Understanding which layer owns retry for a given node type is critical to avoiding catastrophic retry multiplication.
Executor-Level Retries
Applies to non-event-loop nodes (routers, functions, simple processors). Exponential backoff with a default maximum of 3 attempts. The executor catches the exception, waits, and re-runs the entire node from scratch. Each retry gets a fresh NodeContext with clean state.
Judge-Level Retries
Applies to event loop nodes (LLM interaction nodes). The judge inside the node handles RETRY/ACCEPT/ESCALATE verdicts internally. The executor overrides max_retries to 0 for these nodes to prevent double-retry: if the executor also retried, a node with 3 executor retries and 50 internal iterations could run 150 LLM turns on a single failure.
When both retry layers are exhausted, on_failure edges provide fallback paths. If no failure edge exists, the node's failure propagates to the graph level and the execution terminates with a failed status. See Judge & Escalation for details on how the judge's RETRY/ACCEPT/ESCALATE verdicts work inside event loop nodes.
Continuous Conversation
By default, the executor threads a single conversation across all nodes in the graph. Each node inherits the accumulated context from every previous node, creating a continuous narrative rather than isolated interactions.
Tool & Output Accumulation
Tool results and node outputs accumulate across the entire graph execution. Each node sees every tool call and output from every previous node in the conversation history. This gives downstream nodes full context about what has already been done.
The Onion Model
System prompts use a three-layer structure. The Identity layer (outermost) is the agent's persona, stable across all nodes. The Narrative layer is the graph-level goal and context. The Focus layer (innermost) is the current node's specific instructions and output requirements. Each node peels to its own focus while retaining the outer layers.
Phase Transitions
When moving between nodes, the executor inserts transition markers into the conversation: metacognitive reflection on what was accomplished and what comes next. Opportunistic compaction summarises older turns to keep the context window manageable. The "isolated" mode alternative gives each node a fresh conversation with no inherited history.
Checkpointing & Recovery
The executor writes checkpoints at well-defined points so that execution can be resumed after a crash, pause, or cancellation. All checkpoint paths save identical state: memory contents, execution path, visit counts, and the resume node.
| Mechanism | Trigger | Behaviour |
|---|---|---|
| node_start | Before a node begins execution | Saves the current state so that a crash during execution can resume from the start of the failed node rather than replaying the entire graph. |
| node_complete | After a node finishes and outputs are written to memory | Saves the updated state including the node's outputs. On resume, the completed node is skipped and execution continues from the next node. |
| HITL pause | Node is declared in graph.pause_nodes | Execution halts at a human-in-the-loop gate. State is saved and the execution returns a "paused" status. External input is required before resuming. |
| User-requested pause | External cancellation signal via asyncio.Event | The executor checks the pause flag at node boundaries. When set, it finishes the current node, saves state, and halts cleanly. Resumable from the next node. |
| Cancellation | Hard stop requested during execution | Flushes the output accumulator to shared memory so partial work survives. Saves state with the current node as the resume point. Execution terminates immediately. |
Design Boundaries
The executor is deliberately narrow. It owns graph traversal and state management, and nothing else. Three responsibilities are explicitly excluded, each owned by a different component.
LLM Interaction
The executor never calls an LLM directly (except for llm_decide edges). All LLM interaction is the responsibility of EventLoopNode, which manages the conversation loop, tool dispatch, and response streaming internally. The executor just calls node.execute() and waits for the result.
Judge Evaluation
Quality evaluation lives inside event loop nodes, not in the executor. The judge renders ACCEPT/RETRY/ESCALATE verdicts within the node's internal loop. The executor only sees the final result: either a successful output or an escalation. See the Judge & Escalation page for details.
Session Lifecycle
Session creation, authentication, billing, and teardown are handled by the outer ExecutionStream. The executor receives a fully initialised context and returns a result. It has no knowledge of who started the execution, how the session was created, or what happens after the graph completes.
Execution Quality Tracking
The executor classifies overall run quality into three levels. "Clean" means all nodes completed successfully. "Degraded" means some nodes failed but the graph reached a terminal state via fallback edges. "Failed" means the graph could not reach a terminal state. This classification is reported alongside the final output, giving callers a machine-readable signal about execution confidence.
