Will Percey — Portfolio

Hallucinations & Grounding

> > Updated Mar 2026

Types of Hallucinations

Factual Hallucinations

The model generates statements that contradict established facts or real-world knowledge. This includes incorrect dates, false statistics, non-existent citations, made-up historical events, or wrong scientific claims.

False ClaimsWrong FactsFake Citations

Contextual Hallucinations

The model generates information not supported by the provided context, even if factually true in general. In RAG systems, this occurs when the model ignores retrieved documents and generates from parametric memory.

UngroundedIgnores ContextBreaks Citations

Instruction Hallucinations

The model fails to follow explicit instructions while appearing to comply. It may claim to have performed a task it didn't do, fabricate tool outputs, or misrepresent its own capabilities.

False ComplianceFake Tool OutputPhantom Actions

Entity Hallucinations

The model invents non-existent entities such as fake people, organizations, products, URLs, or API endpoints. These plausible-sounding inventions can lead to broken links and failed integrations.

Fake PeopleInvalid URLsMade-up APIs

biotech

The Mechanism Behind Entity Hallucinations

Anthropic's interpretability research traced the internal circuitry of Claude to find out why entity hallucinations happen. The answer inverts the conventional explanation.

Refusal is the Default

The conventional view is that models hallucinate because they're trained to always produce output — completion machines that fill gaps with plausible text. Interpretability research found the opposite. Claude has a circuit that is on by default that causes it to state it lacks sufficient information. The natural state is to decline.

Known Entity Recognition Overrides It

What allows Claude to answer at all is a separate "known answer" feature. When Claude recognises a well-known entity — say Michael Jordan — this feature activates and inhibits the default refusal circuit. Hallucinations happen when this recognition system misfires: an unfamiliar name triggers enough familiarity that the "known entity" feature incorrectly activates and suppresses refusal, leaving Claude with no actual knowledge but no refusal either.

science

Anthropic confirmed this by intervention: artificially activating the "known answer" features while asking about unknown entities reliably caused hallucination. Inhibiting the "can't answer" features did the same. Hallucination isn't recklessness — it's a safety default being incorrectly overridden by a misfiring recognition system.

psychology

Causes in Agentic Applications

Multi-Turn Context Drift

In long conversations, the model's understanding of context degrades. Early information may be forgotten or conflated. The attention mechanism favors recent tokens, causing 'lost in the middle' effects.

Tool Output Misinterpretation

Agents frequently call external tools and must interpret their outputs. Hallucinations occur when models misread responses, assume success when tools failed, or fabricate outputs entirely.

Planning Overconfidence

When generating multi-step plans, models may hallucinate capabilities they don't have, assume tool availability that doesn't exist, or create logically impossible sequences.

Memory System Corruption

Agents using external memory can hallucinate when retrieving or updating memories. They may recall events that didn't happen, merge distinct memories, or write corrupted summaries.

Inter-Agent Communication

In multi-agent systems, hallucinations can propagate between agents. One agent's fabricated output becomes another's trusted input, causing errors to compound across the system.

Reward Hacking Behaviors

Models optimized to appear helpful may generate confident-sounding but false information rather than admitting uncertainty. This produces fluent, convincing hallucinations.

shield

Prevention Techniques

Retrieval-Augmented Generation (RAG)

Ground model outputs in retrieved documents from trusted knowledge bases. RAG reduces hallucinations by providing factual context the model must synthesize rather than generate from memory.

Document GroundingVector Search

Citation Requirements

Require the model to cite specific sources for each claim. Citations create accountability and enable verification. Prompting like 'cite the exact passage supporting this claim' forces grounding.

Inline CitationsSource Traceability

Structured Output Constraints

Use JSON schemas, Pydantic models, or other structured formats to constrain generation. Structured outputs reduce hallucination surface area with explicit fields like 'source_document' or 'confidence_score'.

Schema EnforcementType Safety

Uncertainty Expression

Train and prompt models to express uncertainty rather than hallucinating confident answers. Calibrated models should say 'I don't know' or 'I'm not certain' when appropriate.

Calibrated ConfidenceHedged Language

Detection Methods

Self-Consistency Checking

Generate multiple responses to the same query and check for consistency. Hallucinations often vary across samples while factual information remains stable.

Entailment Verification

Use NLI models to verify that generated claims are entailed by source documents. Claims labeled as 'contradiction' or 'neutral' indicate potential hallucinations.

External Knowledge Verification

Verify factual claims against external knowledge bases, search engines, or fact-checking APIs. Catches factual hallucinations that internal checks might miss.

LLM-as-Judge Detection

Use a separate LLM to evaluate whether responses are grounded in provided context. The judge model analyzes source documents and generated text, flagging unsupported claims.

Confessions Integration

OpenAI's confession technique can detect hallucinations with 84% accuracy. After generation, prompt the model to honestly assess whether it fabricated information.

Token Probability Analysis

Analyze token-level probabilities during generation. Low-probability tokens or high entropy sequences may indicate uncertainty and potential hallucination.

Detection Methods Comparison

Method	Latency Impact	Accuracy	Best For
Self-Consistency	High (multiple samples)	Medium	Factual queries, QA
NLI Entailment	Low-Medium	High for RAG	Document-grounded responses
External Verification	Medium-High	High for facts	Factual claims, citations
LLM-as-Judge	Medium	High	Complex grounding assessment
Confessions	Low	84%	Post-hoc monitoring
Token Probability	Very Low	Medium	Real-time flagging

smart_toy

Agentic-Specific Patterns

Tool Output Verification

Never trust agent-generated tool outputs without verification. Parse and validate actual tool responses. Implement schemas for expected outputs. Log discrepancies between claimed and actual tool results.

Schema validation for tool outputs
Claimed vs actual comparison
Structured error handling
Output logging and audit

Memory Validation

Implement verification when reading from and writing to agent memory systems. Hash or version memory entries. Cross-check retrieved memories against other sources. Implement decay scoring for older memories.

Memory entry versioning
Read/write verification
Confidence decay over time
Periodic memory audits

Multi-Agent Consensus

In multi-agent systems, require consensus before acting on potentially hallucinated information. Multiple agents independently verify claims before propagation. Trace information provenance through the system.

Independent verification
Consensus requirements
Confidence aggregation
Provenance tracking

Action Verification Loops

Before executing consequential actions, verify the agent's understanding and plan. Implement 'think-verify-act' patterns where plans are validated before execution. Use human-in-the-loop for high-stakes decisions.

Pre-action verification
Think-verify-act pattern
Human-in-the-loop for high stakes
Rollback capabilities

monitoring

Production Monitoring

Hallucination Rate Tracking

Track hallucination rates as a key quality metric. Sample production outputs for human evaluation. Implement automated detection on all outputs. Alert on rate increases.

Citation Verification Logging

Log all citations and verify against source documents. Track citation accuracy over time. Identify frequently hallucinated sources. Build dashboards showing grounding quality.

User Feedback Integration

Collect and analyze user feedback on factual accuracy. Implement 'flag as incorrect' mechanisms. Use feedback to identify hallucination patterns and improve detection.

security

Defense in Depth: No single technique eliminates hallucinations. Combine prevention (RAG, citations, uncertainty), detection (NLI, confessions, LLM-as-judge), and monitoring (rate tracking, user feedback) for comprehensive coverage.