Will Percey — Portfolio

Agentic Anatomy

> > Updated Feb 2026

psychology

Reasoning & Planning

Reasoning Engine

Core LLM that processes context windows, generates structured output, and chains thought sequences to reach conclusions.

Planning Strategies

Task decomposition approaches such as ReAct (reason then act), Plan-and-Execute, reflection loops, and Tree of Thoughts.

Orchestration

Coordination layer managing agent state, routing between components, handling errors, and enforcing policies.

build

Tools & Skills

Tool Calling

Function calling via schema-driven selection and result parsing. Open protocols like MCP (client-server tool integration) and UTCP (direct agent-to-API calling with pluggable transports) standardise how agents discover and invoke tools.

Skills System

Reusable capability modules that bundle prompts, tools, and logic into composable units an agent can invoke by name.

Document & Image Processing

PDF parsing, OCR, image analysis, vision models, and structured data extraction from unstructured sources.

Audio Processing

Speech-to-text transcription, audio understanding, voice input pipelines, and real-time audio stream ingestion.

memory

Memory & Context

Memory Systems

Short-term working memory (context window), long-term storage (vector databases, graph databases), and episodic recall of past interactions.

Sessions

Persistent agent state across interactions, session lifecycle management, checkpointing, and resumability after interruption.

Conversation Management

Strategies for staying within token limits: sliding window (drop oldest), summarising (compress via secondary model), and hybrid approaches.

webhook

Hooks & Intervention

Tool Hooks

Blocking functions that execute before a tool runs, capturing reasoning context and validating intent against policies before allowing execution.

Streaming Hooks

Block-level stream monitors that intercept reasoning and response blocks, enabling redirect injection before problematic content completes.

Lifecycle Hooks

Callbacks at agent lifecycle points (initialisation, turn start, turn end, error) for logging, metrics, and custom orchestration logic.

shield

Safety & Trust

Guardrails

Input and output stream monitors that enforce content policies in real time, with interrupt capability when violations are detected.

Stream Safeguards

Agents evaluating agents. Intercept reasoning blocks and redirect the model's approach before harmful or off-policy responses are composed.

arrow_forward

Zero Trust

Bidirectional verification where agents validate instructions from humans and humans verify agent outputs. Least privilege tool access, human-in-the-loop checkpoints.

assessment

Evaluation & Quality

Output Evaluators

LLM-as-judge scoring of final responses against rubrics, detecting hallucination, relevance, and policy compliance.

Trajectory Evaluators

Analysis of the tool call sequence an agent took, checking path efficiency, unnecessary steps, and goal alignment.

Helpfulness Evaluators

Task completion metrics, user satisfaction proxies, and whether the agent actually resolved the stated need.

Custom Evaluators

Domain-specific evaluation logic using a Case (input/expected) and Experiment (model + evaluator) structure for systematic testing.

Judge & Escalation

Inline quality gate that evaluates worker output against success criteria before the pipeline continues. Tiered assessment: Level 0 checks that expected output keys are present. Level 2 uses a fast LLM to score response quality against the goal's success criteria in a conversation-aware context. Three verdicts drive the next step. ACCEPT: output meets criteria, continue to the next node. RETRY: output is insufficient, return to the worker with feedback. ESCALATE: the worker cannot satisfy the goal within its retry budget, hand control to an orchestration layer for oversight, replanning, or human intervention.

arrow_forward

hearing

Multimodal & Accessibility

Bidirectional Streaming

Real-time two-way communication supporting simultaneous audio and text. Voice activity detection, turn-based and push-to-talk modes, BidiAgent pattern.

Audio-First Models

Speech-native models (Nova Sonic, OpenAI Realtime, Gemini Live) that process audio directly rather than transcribing to text first, reducing latency and preserving tone.

Accessibility

Alternative interaction modes, screen reader support, audio-primary interfaces for users who cannot use text, and adaptive response formats.