Agentic Loops
Loop Taxonomy
The agent produces output, a judge evaluates it, and the agent retries with feedback until the quality threshold is met or the iteration budget is exhausted. The foundation of most quality-gate patterns.
The agent proposes a change, implements it, runs it against an objective metric, and accepts or rejects based on the result. Designed for overnight autonomous experimentation where human velocity is the bottleneck. Karpathy's autoresearch is the canonical example — ~12 ML experiments per hour, 100+ overnight.
Code-specific pattern. The agent writes code, runs a test suite, reads the failures, fixes them, and loops until all tests pass or the budget expires. Tests are the objective evaluator — no LLM judge needed.
Parallel width rather than sequential depth. N candidates are generated simultaneously, scored against a rubric, and the best is selected. No iteration — a single wide pass. Trades compute for quality without the risk of runaway depth.
The agent acts, observes the result, reflects on what went wrong or could be improved, then acts again with that reflection in context. ReAct and Reflexion are the named patterns. The reflection step is verbal — no external evaluator required.
An outer loop manages strategy or task decomposition while inner loops handle individual sub-tasks. Multi-agent systems naturally produce this shape — an orchestrator loops on planning while worker agents loop on execution.
Anatomy of a Loop
The agent that does the actual work each iteration — generating code, running experiments, writing content. Receives feedback from the previous iteration as additional context.
Determines whether the worker's output meets the acceptance criteria. Can be deterministic (test runner, metric threshold), LLM-powered (quality judge), or hybrid. The evaluator's signal is the loop's steering mechanism.
Reads the evaluator's signal and decides whether to loop again, accept the output, or escalate. Enforces budget constraints. The controller is the circuit breaker — without it, loops run forever.
What the loop carries from one iteration to the next: previous outputs, evaluation scores, feedback, iteration count. Poorly managed state leads to context window saturation or the agent losing track of its own history.
Real Implementations
AI agent autonomously conducts ML research on a GPT training codebase. The agent modifies train.py, runs a 5-minute training session, measures validation loss, and accepts or rejects the change. Human role shifts from writing code to writing Markdown instructions (program.md). Designed for overnight unsupervised runs.
Native loop command in Claude Code. Runs a prompt or skill on a recurring interval. Syntax: /loop 5m /task — executes /task every 5 minutes until cancelled. Default interval is 10 minutes. Designed for polling, monitoring, and recurring agentic tasks within a live session.
Agent retains a verbal memory of past failures and reflects on them before each new attempt. Rather than storing raw trajectories, it stores natural language reflections — compact and directly usable as context. Shown to significantly outperform ReAct on reasoning and code tasks without any gradient updates.
Generate a large pool of candidate solutions, filter against public test cases, then rank and select. The test suite is the evaluator — no LLM judge. Wide generation (Best-of-N) combined with deterministic verification is the dominant pattern for competitive programming and production code generation.
Loop Control
| Mechanism | Trigger | When to Use | Risk if Missing |
|---|---|---|---|
| Max Iterations | Iteration count ≥ N | Always — every loop must have a hard ceiling | Infinite loop, runaway cost |
| Score Threshold | Evaluator score ≥ target | When quality can be measured numerically | Loop runs to max even when output is already acceptable |
| No-Improvement Window | Last N iterations show no score gain | Research and exploration loops — detect plateau | Wasted compute on diminishing returns |
| Time / Token Budget | Elapsed time or tokens exceed limit | Overnight runs, cost-constrained workflows | Unbounded spend on long-running loops |
| Circuit Breaker | Consecutive failures or error rate spike | Any loop that executes external actions | Cascading failures, corrupted state |
| Convergence Check | Output diff from previous iteration below threshold | Refinement loops — detect when agent is spinning | Agent loops producing nearly identical output with no progress |
Failure Modes
The agent finds a way to maximise the evaluator's score without actually improving the output — exploiting gaps in the scoring rubric rather than solving the problem.
Fix: Use held-out test sets, multiple diverse evaluators, or human spot-checks on accepted outputs.
Each iteration appends to the context window — feedback, previous outputs, reflection. After enough iterations the context saturates and model quality degrades sharply.
Fix: Compress or summarise loop state between iterations. Store scores and key feedback only.
The agent alternates between two states without converging — fix A introduces bug B, fixing B re-introduces bug A. Common in verification loops with conflicting test constraints.
Fix: Track change history in state. Detect repeated patterns and escalate rather than retry.
An LLM-based judge gives inconsistent verdicts across iterations — accepting work it previously rejected or vice versa — causing non-deterministic loop behaviour.
Fix: Use deterministic evaluators where possible. For LLM judges, fix temperature to 0 and pin the model version.
The external world changes mid-loop — an API changes, a file is modified externally, a rate limit kicks in — and the loop continues operating on assumptions that are no longer true.
Fix: Re-validate environment state at the start of each iteration. Treat external failures as loop-breaking escalations.
Over many iterations the agent gradually shifts what it is optimising for — particularly in reflection loops where earlier reflections bias later reasoning away from the original objective.
Fix: Re-inject the original objective into context at fixed intervals. Anchor every reflection to the stated goal.
