Will Percey — Portfolio

Agentic Loops

> >

loop

Loop Taxonomy

Refinement LoopGenerate → Judge → Retry

The agent produces output, a judge evaluates it, and the agent retries with feedback until the quality threshold is met or the iteration budget is exhausted. The foundation of most quality-gate patterns.

Key Components

Worker AgentJudge / EvaluatorFeedback InjectionIteration Budget

low complexity medium cost

Research LoopPropose → Implement → Measure → Accept/Reject

The agent proposes a change, implements it, runs it against an objective metric, and accepts or rejects based on the result. Designed for overnight autonomous experimentation where human velocity is the bottleneck. Karpathy's autoresearch is the canonical example — ~12 ML experiments per hour, 100+ overnight.

Key Components

Proposal AgentExecution EnvironmentObjective MetricAccept/Reject Gate

medium complexity high cost

Verification LoopWrite → Test → Fix → Re-test

Code-specific pattern. The agent writes code, runs a test suite, reads the failures, fixes them, and loops until all tests pass or the budget expires. Tests are the objective evaluator — no LLM judge needed.

Key Components

Code GeneratorTest RunnerFailure ParserFix Agent

medium complexity medium cost

Exploration Loop (Best-of-N)Generate N → Score All → Keep Best

Parallel width rather than sequential depth. N candidates are generated simultaneously, scored against a rubric, and the best is selected. No iteration — a single wide pass. Trades compute for quality without the risk of runaway depth.

Key Components

Parallel GeneratorScoring RubricSelection Logic

low complexity high cost

Reflection LoopAct → Observe → Reflect → Act

The agent acts, observes the result, reflects on what went wrong or could be improved, then acts again with that reflection in context. ReAct and Reflexion are the named patterns. The reflection step is verbal — no external evaluator required.

Key Components

Action AgentObservation ParserReflection StepUpdated Context

low complexity medium cost

Nested / Hierarchical LoopOuter loop coordinates inner loops

An outer loop manages strategy or task decomposition while inner loops handle individual sub-tasks. Multi-agent systems naturally produce this shape — an orchestrator loops on planning while worker agents loop on execution.

Key Components

Orchestrator LoopWorker LoopsState SynchronisationTermination Propagation

high complexity high cost

account_tree

Anatomy of a Loop

buildWorkerExecutes the task

The agent that does the actual work each iteration — generating code, running experiments, writing content. Receives feedback from the previous iteration as additional context.

Feedback injection formatContext accumulation over iterationsConsistency of output structure

ruleEvaluatorScores the output

Determines whether the worker's output meets the acceptance criteria. Can be deterministic (test runner, metric threshold), LLM-powered (quality judge), or hybrid. The evaluator's signal is the loop's steering mechanism.

Evaluator biasReward hackingLLM judge consistency across iterations

trafficControllerDecides continue / stop

Reads the evaluator's signal and decides whether to loop again, accept the output, or escalate. Enforces budget constraints. The controller is the circuit breaker — without it, loops run forever.

Budget enforcementEscalation routingDiminishing returns detection

historyLoop StateMemory across iterations

What the loop carries from one iteration to the next: previous outputs, evaluation scores, feedback, iteration count. Poorly managed state leads to context window saturation or the agent losing track of its own history.

Context window limitsState compression strategyScore history for trend detection

science

Real Implementations

autoresearchAndrej Karpathy

Research Loop

AI agent autonomously conducts ML research on a GPT training codebase. The agent modifies train.py, runs a 5-minute training session, measures validation loss, and accepts or rejects the change. Human role shifts from writing code to writing Markdown instructions (program.md). Designed for overnight unsupervised runs.

Experiments/hr~12

Overnight yield~100 runs

Budget5 min / run

ML ResearchOvernightUnsupervisedOpen Source

Claude Code /loopAnthropic

Refinement Loop

Native loop command in Claude Code. Runs a prompt or skill on a recurring interval. Syntax: /loop 5m /task — executes /task every 5 minutes until cancelled. Default interval is 10 minutes. Designed for polling, monitoring, and recurring agentic tasks within a live session.

Syntax/loop 5m /foo

Default10 min interval

ScopeLive session

Claude CodeSchedulingPollingSkill Loop

ReflexionAcademic pattern (Shinn et al.)

Reflection Loop

Agent retains a verbal memory of past failures and reflects on them before each new attempt. Rather than storing raw trajectories, it stores natural language reflections — compact and directly usable as context. Shown to significantly outperform ReAct on reasoning and code tasks without any gradient updates.

Memory typeVerbal reflection

TrainingNone required

AdvantageNo fine-tuning

Research PatternReflectionMemoryNo Training

AlphaCode / Competitive CodingDeepMind / OpenAI

Verification Loop

Generate a large pool of candidate solutions, filter against public test cases, then rank and select. The test suite is the evaluator — no LLM judge. Wide generation (Best-of-N) combined with deterministic verification is the dominant pattern for competitive programming and production code generation.

EvaluatorTest suite

PatternBest-of-N + verify

JudgeDeterministic

Code GenerationBest-of-NTest EvaluationDeepMind

tune

Loop Control

Mechanism	Trigger	When to Use	Risk if Missing
Max Iterations	Iteration count ≥ N	Always — every loop must have a hard ceiling	Infinite loop, runaway cost
Score Threshold	Evaluator score ≥ target	When quality can be measured numerically	Loop runs to max even when output is already acceptable
No-Improvement Window	Last N iterations show no score gain	Research and exploration loops — detect plateau	Wasted compute on diminishing returns
Time / Token Budget	Elapsed time or tokens exceed limit	Overnight runs, cost-constrained workflows	Unbounded spend on long-running loops
Circuit Breaker	Consecutive failures or error rate spike	Any loop that executes external actions	Cascading failures, corrupted state
Convergence Check	Output diff from previous iteration below threshold	Refinement loops — detect when agent is spinning	Agent loops producing nearly identical output with no progress

error_outline

Failure Modes

bug_reportReward Hacking

The agent finds a way to maximise the evaluator's score without actually improving the output — exploiting gaps in the scoring rubric rather than solving the problem.

Fix: Use held-out test sets, multiple diverse evaluators, or human spot-checks on accepted outputs.

memoryContext Accumulation

Each iteration appends to the context window — feedback, previous outputs, reflection. After enough iterations the context saturates and model quality degrades sharply.

Fix: Compress or summarise loop state between iterations. Store scores and key feedback only.

syncOscillation

The agent alternates between two states without converging — fix A introduces bug B, fixing B re-introduces bug A. Common in verification loops with conflicting test constraints.

Fix: Track change history in state. Detect repeated patterns and escalate rather than retry.

trending_downEvaluator Drift

An LLM-based judge gives inconsistent verdicts across iterations — accepting work it previously rejected or vice versa — causing non-deterministic loop behaviour.

Fix: Use deterministic evaluators where possible. For LLM judges, fix temperature to 0 and pin the model version.

cloud_offStale Environment

The external world changes mid-loop — an API changes, a file is modified externally, a rate limit kicks in — and the loop continues operating on assumptions that are no longer true.

Fix: Re-validate environment state at the start of each iteration. Treat external failures as loop-breaking escalations.

explore_offGoal Drift

Over many iterations the agent gradually shifts what it is optimising for — particularly in reflection loops where earlier reflections bias later reasoning away from the original objective.

Fix: Re-inject the original objective into context at fixed intervals. Anchor every reflection to the stated goal.