Agentic Zero Trust

shield

Bidirectional Zero Trust

Traditional Zero Trust assumes threats come from outside the perimeter. Agentic Bidirectional Zero Trust recognises the model itself as an untrusted actor inside the perimeter. The agent inherits user permissions, not elevated ones. It is a delegate, not a trusted service account. Every action must be verified, every output must be validated, and every permission must be scoped to exactly what is needed for the current operation.

verified_user

Core Principles

block

Never Execute Because the Model Said So

Model output is a suggestion, not an instruction. No action should proceed solely because the model requested it. Every tool call, API request, or state change requires independent authorisation from the control plane, not from the model's reasoning.

lock_open

Gate Per-Action, Not Per-Session

A session-level approval does not grant blanket permission for all subsequent actions. Each individual action must be evaluated against the current policy context. Permissions are checked at invocation time, not at session start.

help_outline

Treat Confidence as Untrusted

When a model reports high confidence, treat it as unverified. Confidence scores from language models are not calibrated probabilities. They reflect token likelihood, not factual certainty. Build verification into the pipeline rather than relying on the model's self-assessment.

monitoring

Monitor Patterns, Not Just Calls

Individual tool calls may each appear benign while the sequence reveals adversarial intent. Monitor for behavioural patterns: unusual call ordering, data exfiltration sequences, privilege escalation chains, and systematic probing of boundaries across multiple turns.

warning

Assume Compromise at Any Time

Treat every model invocation as potentially compromised by prompt injection, training data poisoning, or adversarial inputs. The model may be acting in good faith or it may be manipulated. Your architecture should produce correct outcomes in both cases.

safety_check

Design for Graceful Failure

When the trust boundary is breached, the system should degrade safely rather than catastrophically. Denied actions should produce clear feedback. Blocked sequences should not corrupt state. The user should always be able to understand what happened and why.

construction

Implementation Patterns

rule

Action Gating

Every tool invocation passes through a policy gate before execution. The gate evaluates the requested action against the current user's permissions, the agent's delegated scope, rate limits, and contextual constraints. Actions outside the allowed set are rejected before they reach the tool layer.

electric_bolt

Circuit Breakers

Automatic trip-wires that halt agent execution when anomalous patterns are detected. Triggers include exceeding action budgets, repeated failures, rapid privilege escalation attempts, or accessing resources outside the expected scope. Once tripped, execution pauses until a human reviews the state.

admin_panel_settings

Guardian Agents

A separate model instance monitors the primary agent's behaviour in real time. The guardian has read-only access to the action stream and can flag or block actions that violate policy. Crucially, the guardian operates on a different prompt and context, making coordinated compromise significantly harder.

do_not_disturb

Do-Nothing as First-Class

Inaction is always a valid response. The system explicitly supports "take no action" as an output rather than forcing the agent to produce a tool call for every turn. This prevents the model from fabricating actions when the correct response is to wait, ask, or decline.

policy

Policy Enforcement Layer

A declarative policy layer sits between the agent and all external systems. Policies define what actions are allowed, what data can be accessed, and what conditions must be met. Policies are versioned, auditable, and independent of the model. They cannot be modified by the agent itself.