Agentic Zero Trust
Bidirectional Zero Trust
Traditional Zero Trust assumes threats come from outside the perimeter. Agentic Bidirectional Zero Trust recognises the model itself as an untrusted actor inside the perimeter. The agent inherits user permissions, not elevated ones. It is a delegate, not a trusted service account. Every action must be verified, every output must be validated, and every permission must be scoped to exactly what is needed for the current operation.
Core Principles
Never Execute Because the Model Said So
Model output is a suggestion, not an instruction. No action should proceed solely because the model requested it. Every tool call, API request, or state change requires independent authorisation from the control plane, not from the model's reasoning.
Gate Per-Action, Not Per-Session
A session-level approval does not grant blanket permission for all subsequent actions. Each individual action must be evaluated against the current policy context. Permissions are checked at invocation time, not at session start.
Treat Confidence as Untrusted
When a model reports high confidence, treat it as unverified. Confidence scores from language models are not calibrated probabilities. They reflect token likelihood, not factual certainty. Build verification into the pipeline rather than relying on the model's self-assessment.
Monitor Patterns, Not Just Calls
Individual tool calls may each appear benign while the sequence reveals adversarial intent. Monitor for behavioural patterns: unusual call ordering, data exfiltration sequences, privilege escalation chains, and systematic probing of boundaries across multiple turns.
Assume Compromise at Any Time
Treat every model invocation as potentially compromised by prompt injection, training data poisoning, or adversarial inputs. The model may be acting in good faith or it may be manipulated. Your architecture should produce correct outcomes in both cases.
Design for Graceful Failure
When the trust boundary is breached, the system should degrade safely rather than catastrophically. Denied actions should produce clear feedback. Blocked sequences should not corrupt state. The user should always be able to understand what happened and why.
Implementation Patterns
Action Gating
Every tool invocation passes through a policy gate before execution. The gate evaluates the requested action against the current user's permissions, the agent's delegated scope, rate limits, and contextual constraints. Actions outside the allowed set are rejected before they reach the tool layer.
Circuit Breakers
Automatic trip-wires that halt agent execution when anomalous patterns are detected. Triggers include exceeding action budgets, repeated failures, rapid privilege escalation attempts, or accessing resources outside the expected scope. Once tripped, execution pauses until a human reviews the state.
Guardian Agents
A separate model instance monitors the primary agent's behaviour in real time. The guardian has read-only access to the action stream and can flag or block actions that violate policy. Crucially, the guardian operates on a different prompt and context, making coordinated compromise significantly harder.
Do-Nothing as First-Class
Inaction is always a valid response. The system explicitly supports "take no action" as an output rather than forcing the agent to produce a tool call for every turn. This prevents the model from fabricating actions when the correct response is to wait, ask, or decline.
Policy Enforcement Layer
A declarative policy layer sits between the agent and all external systems. Policies define what actions are allowed, what data can be accessed, and what conditions must be met. Policies are versioned, auditable, and independent of the model. They cannot be modified by the agent itself.
