Agentic Anatomy
Reasoning & Planning
Reasoning Engine
Core LLM that processes context windows, generates structured output, and chains thought sequences to reach conclusions.
Planning Strategies
Task decomposition approaches such as ReAct (reason then act), Plan-and-Execute, reflection loops, and Tree of Thoughts.
Orchestration
Coordination layer managing agent state, routing between components, handling errors, and enforcing policies.
Tools & Skills
Tool Calling
Function calling via schema-driven selection and result parsing. Open protocols like MCP (client-server tool integration) and UTCP (direct agent-to-API calling with pluggable transports) standardise how agents discover and invoke tools.
Skills System
Reusable capability modules that bundle prompts, tools, and logic into composable units an agent can invoke by name.
Document & Image Processing
PDF parsing, OCR, image analysis, vision models, and structured data extraction from unstructured sources.
Audio Processing
Speech-to-text transcription, audio understanding, voice input pipelines, and real-time audio stream ingestion.
Memory & Context
Memory Systems
Short-term working memory (context window), long-term storage (vector databases, graph databases), and episodic recall of past interactions.
Sessions
Persistent agent state across interactions, session lifecycle management, checkpointing, and resumability after interruption.
Conversation Management
Strategies for staying within token limits: sliding window (drop oldest), summarising (compress via secondary model), and hybrid approaches.
Hooks & Intervention
Tool Hooks
Blocking functions that execute before a tool runs, capturing reasoning context and validating intent against policies before allowing execution.
Streaming Hooks
Block-level stream monitors that intercept reasoning and response blocks, enabling redirect injection before problematic content completes.
Lifecycle Hooks
Callbacks at agent lifecycle points (initialisation, turn start, turn end, error) for logging, metrics, and custom orchestration logic.
Safety & Trust
Guardrails
Input and output stream monitors that enforce content policies in real time, with interrupt capability when violations are detected.
Zero Trust
Bidirectional verification where agents validate instructions from humans and humans verify agent outputs. Least privilege tool access, human-in-the-loop checkpoints.
Evaluation & Quality
Output Evaluators
LLM-as-judge scoring of final responses against rubrics, detecting hallucination, relevance, and policy compliance.
Trajectory Evaluators
Analysis of the tool call sequence an agent took, checking path efficiency, unnecessary steps, and goal alignment.
Helpfulness Evaluators
Task completion metrics, user satisfaction proxies, and whether the agent actually resolved the stated need.
Custom Evaluators
Domain-specific evaluation logic using a Case (input/expected) and Experiment (model + evaluator) structure for systematic testing.
Multimodal & Accessibility
Bidirectional Streaming
Real-time two-way communication supporting simultaneous audio and text. Voice activity detection, turn-based and push-to-talk modes, BidiAgent pattern.
Audio-First Models
Speech-native models (Nova Sonic, OpenAI Realtime, Gemini Live) that process audio directly rather than transcribing to text first, reducing latency and preserving tone.
Accessibility
Alternative interaction modes, screen reader support, audio-primary interfaces for users who cannot use text, and adaptive response formats.
