Agentic Loops

loop

Loop Taxonomy

Refinement LoopGenerate → Judge → Retry

The agent produces output, a judge evaluates it, and the agent retries with feedback until the quality threshold is met or the iteration budget is exhausted. The foundation of most quality-gate patterns.

Key Components
Worker AgentJudge / EvaluatorFeedback InjectionIteration Budget
low complexity medium cost
Research LoopPropose → Implement → Measure → Accept/Reject

The agent proposes a change, implements it, runs it against an objective metric, and accepts or rejects based on the result. Designed for overnight autonomous experimentation where human velocity is the bottleneck. Karpathy's autoresearch is the canonical example — ~12 ML experiments per hour, 100+ overnight.

Key Components
Proposal AgentExecution EnvironmentObjective MetricAccept/Reject Gate
medium complexity high cost
Verification LoopWrite → Test → Fix → Re-test

Code-specific pattern. The agent writes code, runs a test suite, reads the failures, fixes them, and loops until all tests pass or the budget expires. Tests are the objective evaluator — no LLM judge needed.

Key Components
Code GeneratorTest RunnerFailure ParserFix Agent
medium complexity medium cost
Exploration Loop (Best-of-N)Generate N → Score All → Keep Best

Parallel width rather than sequential depth. N candidates are generated simultaneously, scored against a rubric, and the best is selected. No iteration — a single wide pass. Trades compute for quality without the risk of runaway depth.

Key Components
Parallel GeneratorScoring RubricSelection Logic
low complexity high cost
Reflection LoopAct → Observe → Reflect → Act

The agent acts, observes the result, reflects on what went wrong or could be improved, then acts again with that reflection in context. ReAct and Reflexion are the named patterns. The reflection step is verbal — no external evaluator required.

Key Components
Action AgentObservation ParserReflection StepUpdated Context
low complexity medium cost
Nested / Hierarchical LoopOuter loop coordinates inner loops

An outer loop manages strategy or task decomposition while inner loops handle individual sub-tasks. Multi-agent systems naturally produce this shape — an orchestrator loops on planning while worker agents loop on execution.

Key Components
Orchestrator LoopWorker LoopsState SynchronisationTermination Propagation
high complexity high cost
account_tree

Anatomy of a Loop

buildWorkerExecutes the task

The agent that does the actual work each iteration — generating code, running experiments, writing content. Receives feedback from the previous iteration as additional context.

Feedback injection formatContext accumulation over iterationsConsistency of output structure
ruleEvaluatorScores the output

Determines whether the worker's output meets the acceptance criteria. Can be deterministic (test runner, metric threshold), LLM-powered (quality judge), or hybrid. The evaluator's signal is the loop's steering mechanism.

Evaluator biasReward hackingLLM judge consistency across iterations
trafficControllerDecides continue / stop

Reads the evaluator's signal and decides whether to loop again, accept the output, or escalate. Enforces budget constraints. The controller is the circuit breaker — without it, loops run forever.

Budget enforcementEscalation routingDiminishing returns detection
historyLoop StateMemory across iterations

What the loop carries from one iteration to the next: previous outputs, evaluation scores, feedback, iteration count. Poorly managed state leads to context window saturation or the agent losing track of its own history.

Context window limitsState compression strategyScore history for trend detection
science

Real Implementations

autoresearchAndrej Karpathy
Research Loop

AI agent autonomously conducts ML research on a GPT training codebase. The agent modifies train.py, runs a 5-minute training session, measures validation loss, and accepts or rejects the change. Human role shifts from writing code to writing Markdown instructions (program.md). Designed for overnight unsupervised runs.

Experiments/hr~12
Overnight yield~100 runs
Budget5 min / run
ML ResearchOvernightUnsupervisedOpen Source
Claude Code /loopAnthropic
Refinement Loop

Native loop command in Claude Code. Runs a prompt or skill on a recurring interval. Syntax: /loop 5m /task — executes /task every 5 minutes until cancelled. Default interval is 10 minutes. Designed for polling, monitoring, and recurring agentic tasks within a live session.

Syntax/loop 5m /foo
Default10 min interval
ScopeLive session
Claude CodeSchedulingPollingSkill Loop
ReflexionAcademic pattern (Shinn et al.)
Reflection Loop

Agent retains a verbal memory of past failures and reflects on them before each new attempt. Rather than storing raw trajectories, it stores natural language reflections — compact and directly usable as context. Shown to significantly outperform ReAct on reasoning and code tasks without any gradient updates.

Memory typeVerbal reflection
TrainingNone required
AdvantageNo fine-tuning
Research PatternReflectionMemoryNo Training
AlphaCode / Competitive CodingDeepMind / OpenAI
Verification Loop

Generate a large pool of candidate solutions, filter against public test cases, then rank and select. The test suite is the evaluator — no LLM judge. Wide generation (Best-of-N) combined with deterministic verification is the dominant pattern for competitive programming and production code generation.

EvaluatorTest suite
PatternBest-of-N + verify
JudgeDeterministic
Code GenerationBest-of-NTest EvaluationDeepMind
tune

Loop Control

MechanismTriggerWhen to UseRisk if Missing
Max IterationsIteration count ≥ NAlways — every loop must have a hard ceilingInfinite loop, runaway cost
Score ThresholdEvaluator score ≥ targetWhen quality can be measured numericallyLoop runs to max even when output is already acceptable
No-Improvement WindowLast N iterations show no score gainResearch and exploration loops — detect plateauWasted compute on diminishing returns
Time / Token BudgetElapsed time or tokens exceed limitOvernight runs, cost-constrained workflowsUnbounded spend on long-running loops
Circuit BreakerConsecutive failures or error rate spikeAny loop that executes external actionsCascading failures, corrupted state
Convergence CheckOutput diff from previous iteration below thresholdRefinement loops — detect when agent is spinningAgent loops producing nearly identical output with no progress
error_outline

Failure Modes

bug_reportReward Hacking

The agent finds a way to maximise the evaluator's score without actually improving the output — exploiting gaps in the scoring rubric rather than solving the problem.

Fix: Use held-out test sets, multiple diverse evaluators, or human spot-checks on accepted outputs.

memoryContext Accumulation

Each iteration appends to the context window — feedback, previous outputs, reflection. After enough iterations the context saturates and model quality degrades sharply.

Fix: Compress or summarise loop state between iterations. Store scores and key feedback only.

syncOscillation

The agent alternates between two states without converging — fix A introduces bug B, fixing B re-introduces bug A. Common in verification loops with conflicting test constraints.

Fix: Track change history in state. Detect repeated patterns and escalate rather than retry.

trending_downEvaluator Drift

An LLM-based judge gives inconsistent verdicts across iterations — accepting work it previously rejected or vice versa — causing non-deterministic loop behaviour.

Fix: Use deterministic evaluators where possible. For LLM judges, fix temperature to 0 and pin the model version.

cloud_offStale Environment

The external world changes mid-loop — an API changes, a file is modified externally, a rate limit kicks in — and the loop continues operating on assumptions that are no longer true.

Fix: Re-validate environment state at the start of each iteration. Treat external failures as loop-breaking escalations.

explore_offGoal Drift

Over many iterations the agent gradually shifts what it is optimising for — particularly in reflection loops where earlier reflections bias later reasoning away from the original objective.

Fix: Re-inject the original objective into context at fixed intervals. Anchor every reflection to the stated goal.