Will Percey — Portfolio

Agentic Errors

> > Updated Feb 2026

conversion_path

Drift

An agent is not a single inference. It is a process that reasons across multiple steps, invokes tools and external services, retries or branches when needed, accumulates context over time, and operates inside a changing environment. Because of that, the unit of failure is no longer a single output, but the sequence of decisions that leads to it.

visibility

Awareness & Context

forum

Social Protocol Violation

Agent ignores implicit social norms, sending messages at inappropriate times, using the wrong register or tone, or violating conversational turn-taking conventions that humans follow instinctively.

compress

Context Collapse

Agent loses track of relevant context across long interactions or handoffs, leading to contradictory actions, repeated questions, or ignoring previously established constraints and preferences.

trending_up

Escalation Blindness

Agent fails to recognize when a situation has escalated beyond its competence, continuing to operate autonomously on high-stakes decisions that require human review or intervention.

sync_problem

State Desync

Agent's internal model of the world diverges from actual state. It acts on stale data, misses concurrent changes by other actors, or fails to verify assumptions after delays.

link_off

Context Constraint Failure

Agent answers a question without honouring constraints embedded in the context. It optimises for the surface-level question while ignoring implicit requirements that make the answer invalid.

Common for:GeminiGPTClaude

route

Planning & Execution

moving

Goal Drift

Agent gradually shifts away from the original objective through a sequence of individually reasonable sub-decisions. The final outcome satisfies intermediate goals but misses the user's actual intent.

account_tree

Subtask Explosion

Agent recursively decomposes a task into an unmanageable number of subtasks, spending more effort on planning and coordination than on actual execution. Complexity grows faster than progress.

lock

Premature Commitment

Agent commits to a specific approach too early without sufficient exploration of alternatives. When the chosen path fails, sunk-cost reasoning prevents pivoting to a better strategy.

autorenew

Recovery Thrashing

Agent enters a loop of failed recovery attempts where each fix introduces a new problem, triggering another fix. No stable state is reached because the agent never steps back to reassess the root cause.

build

Tool & Action

handyman

Tool Misselection

Agent chooses the wrong tool for a task when a better option exists, such as using a search API when a database query would be precise, or calling an expensive model for a simple lookup.

blur_on

Side Effect Blindness

Agent executes actions without considering unintended consequences: modifying shared state, triggering notifications, consuming rate limits, or creating irreversible changes in external systems.

auto_fix_high

Hallucinated Affordance

Agent attempts to use a tool capability that doesn't exist, calling non-existent API endpoints, passing unsupported parameters, or assuming a tool can perform operations outside its actual specification.

admin_panel_settings

Permission Circumvention

Agent uses alternative tools or indirect paths to bypass intentional permission restrictions, achieving a denied action through a less-guarded route rather than respecting the access boundary.

troubleshoot

Tool Mechanism Speculation

Agent invents explanations for how tools work internally rather than treating them as interfaces. Instead of using a tool and accepting its output, the agent constructs theories about caching, indexing, or processing order and adjusts its behaviour based on these fabricated internals.

Common for:Gemini

gpp_maybe

Tool Distrust

Agent doesn't accept tool outputs at face value. It constructs theories about hidden factors affecting results, second-guesses returned data, or adds unnecessary verification steps. The agent treats reliable tool responses as suspect rather than working with the information provided.

Common for:Gemini

psychology

Behavioral

speed

Confidence Miscalibration

Agent expresses high certainty on uncertain outputs or low certainty on well-supported conclusions. Poorly calibrated confidence misleads users and downstream agents about the reliability of information.

Common for:GPT

thumb_up

Sycophantic Drift

Agent progressively aligns its responses with perceived user preferences rather than ground truth. Over multiple turns, it reinforces incorrect assumptions and avoids necessary disagreement.

Common for:GPT

swap_horiz

Polarized Reasoning

Agent treats decisions as binary choices between extremes, missing the nuanced middle ground where competing concerns are balanced. When corrected away from one extreme, it overshoots to the opposite rather than converging.

Common for:GPTClaude

science

Contaminated Reasoning

Agent's reasoning process is corrupted by factors that should not influence the decision. The output reflects the contaminant rather than appropriate analysis of the task. Common contaminants include persona bleed, prior commitment bias, and context anchoring.

theater_comedy

Persona Bleed

Character definition overrides helpful behaviour. Agent prioritises "what would my persona do?" over "what does the user need?", letting role-play leak into practical responses.

history

Prior Commitment Bias

Earlier statements in the conversation constrain reasoning even when wrong. Agent doubles down on initial positions rather than updating when presented with new information.

filter_center_focus

Context Anchoring

Recent or heavily-elaborated content in the conversation disproportionately influences reasoning on new topics. The agent struggles to generate independent analysis, instead bending new questions toward established themes. Fresh conversations on the same topic would yield different outputs unconstrained by prior discussion.

emoji_events

Reward Hacking

Agent optimises for measurable success criteria rather than the underlying intent. It finds shortcuts that satisfy the metric while failing the actual goal, such as closing tickets without fixing problems or gaming evaluation benchmarks.

school

False Familiarity

Training knowledge creates an illusion of competence that prevents the agent from seeking current or specific information. The agent behaves as if certain when the situation calls for verification. Unlike confidence miscalibration, this is not about expressing certainty but about acting on assumed knowledge rather than using available tools to check.

noise_aware

Noise Attendance

Agent treats incidental data as meaningful input. Timing, latency, metadata, environmental signals: anything observable gets reasoned about, regardless of relevance to the task. This can manifest as tool mechanism speculation, distrust of outputs, or elaborate theories built from irrelevant signals.

Common for:Gemini

database

Memory & State

fast_forward

Premature State Inference

Agent treats intent, preparation, or discussion of a future state as if it's already current. Planning a project becomes having completed it. Considering a decision becomes having made it.

edit_off

Lossy Correction

Agent captures a correction but loses the context that made it necessary. The updated fact is stored without the reasoning, making it vulnerable to being overwritten or misinterpreted later.

hourglass_empty

Temporal Decay

Time-relative references become invalid as context ages. "Next month," "recently," or "upcoming" lose meaning without anchoring. A fact that was true when captured may no longer apply, but nothing signals this.

merge_type

Invented Consistency

When memory contains contradictory facts, agent makes up connections between them to force coherence rather than flagging the inconsistency or seeking clarification.

horizontal_rule

Uniform Weighting

All memory items appear equally important. Passing mentions get the same weight as core facts. No signal distinguishes central information from incidental detail.

groups

Multi-Agent

sync_lock

Coordination Deadlock

Multiple agents wait on each other to act first, creating a circular dependency where no agent can proceed. Common when shared resources require exclusive access or sequential handoff protocols are ambiguous.

group_off

Responsibility Diffusion

No single agent takes ownership of a cross-cutting concern, each assuming another agent will handle it. Critical tasks fall through the cracks because accountability isn't explicitly assigned.

campaign

Echo Chamber

Agents reinforce each other's outputs without critical evaluation. Agent A makes a claim, Agent B accepts it as fact, Agent A later cites Agent B's acceptance as validation.

Common for:Gemini

domino_mask

Cascade Failure

One agent's error propagates through the system. Downstream agents trust upstream outputs, so a single bad decision corrupts everything that follows.

move_item

Handoff Loss

Critical context is lost when work passes between agents. Each agent operates on incomplete information about what the previous agent did and why.

swords

Competing Objectives

Agents optimise for different goals that conflict. Each agent succeeds locally while the system fails globally.

content_copy

Redundant Execution

Multiple agents perform the same work independently, unaware that another agent is handling it. Wastes resources, can create conflicts when outputs differ.

handshake

Trust Miscalibration

Agents over-trust or under-trust each other's outputs. Either they accept bad information uncritically, or they waste effort re-verifying reliable work.