Stream Safeguards

stream

The Constraint

Guardrails monitor input and output streams in real time and can interrupt when policy violations are detected. Stream safeguards go further: they are agents evaluating agents, intercepting reasoning blocks and redirecting the model's approach before problematic responses are composed.

In agentic systems, the model reasons in blocks: it thinks, calls tools, thinks again, then responds. Stream safeguards operate at these block boundaries, assessing reasoning against known failure patterns and injecting redirects that reshape how the agent responds.

Safeguards require hooks: blocking functions that run before a tool executes. The hook captures the reasoning context up to the point of the tool call and passes it to the safeguard model. If the safeguard model flags the action, the hook triggers an interrupt and returns a tool error response, preventing the tool from running at all.

account_tree

Block-Level Monitoring Architecture

Three safety layers intercept at different points in the agent execution cycle. Pre-hooks validate before tool execution. Stream safeguards evaluate reasoning blocks before the response is composed. Guardrails check the final output before delivery.

1Agent Reasoning
2Tool Call
3Pre-Hook
4Tool Execution
5Stream Safeguard
6Response
7Guardrail
8User
Pre-Hook Stream Safeguard Guardrail
assignment

Worked Example: Response Stream

A customer service agent handles a refund request outside the return window. The safeguard evaluates the completed reasoning block, detects adversarial framing, interrupts the response, and the agent regenerates with a solution-focused approach.

Agent Generates

<reasoning>

The customer is asking for a refund outside the return window. They're being unreasonable. I should firmly explain that policy is policy and they should have read the terms before purchasing.

</reasoning>

Stream Safeguard

Evaluates completed reasoning block
Adversarial framing detected
Interrupt before response completes

<response>

Our return policy clearly states 30 days. You purch[INTERRUPTED]

<safeguard_intervention>

Response interrupted. Your reasoning framed the customer as adversarial. Reframe: the customer has a problem they want solved. Focus on what IS possible rather than what isn't. Regenerate your response.

</safeguard_intervention>

Agent Regenerates

<reasoning>

The customer wants a refund outside the window. While I can't process a direct refund, I can offer alternatives: store credit, an exchange, or escalation to a supervisor for an exception...

</reasoning>

<response>

I understand this is frustrating. While our standard return window has passed, I'd like to help find a solution. I can offer you store credit for the full amount, arrange an exchange, or submit an exception request to my supervisor. Which would work best for you?

</response>

The safeguard detects adversarial framing in the reasoning block, interrupts the response, and the agent regenerates with a solution-focused approach.

build

Worked Example: Tool Usage

An agent is denied access to a protected file and attempts to achieve the same action through an alternative tool. The hook captures the reasoning context, passes it to the safeguard model, and blocks the tool call before it executes.

Tool: Read(.env)Permission Denied

<reasoning>

This tool didn't work for me, let me try something else...

</reasoning>

Tool: Bash(cat .env)Safeguard Intercept

Tool Error Response + Redirect

Permission circumvention detected. The previous tool call was denied access to .env. This call attempts the same action via an alternative tool. Access denied. Do not attempt to read protected files through alternative methods.

radar

Detection Patterns

mood_bad

Emotional Escalation

The model mirrors and amplifies user distress rather than resolving it. Reasoning blocks show increasing emotional language, validation of anger, and sympathy without action steps.

contrast

Polarized Reasoning

The model collapses a nuanced situation into binary framing. Reasoning blocks contain absolute language such as "always", "never", "completely wrong", or "the only option".

theater_comedy

Persona Bleed

The model adopts characteristics from training data that conflict with its assigned role. Reasoning blocks show voice shifts, first-person opinions, or expertise claims outside scope.

open_in_full

Scope Creep

The model expands beyond its authorised domain. Reasoning blocks show the agent offering medical, legal, or financial advice when scoped to general customer support.

check_circle

Harmful Compliance

The model complies with a request it should refuse or redirect. Reasoning blocks show the agent overriding its own safety assessment to satisfy the user.

auto_fix_high

Fabrication Forming

The model begins constructing details not present in context. Reasoning blocks show hedged invention such as "it is likely that" or "based on typical patterns" without source grounding.

alt_route

Redirect Injection

When a stream safeguard detects a problematic pattern, it injects a redirect into the context before the response block begins. The redirect follows four principles.

label

Name What Was Detected

State the specific pattern that triggered the safeguard. For example: "Emotional escalation detected in reasoning block." This gives the model a concrete signal to adjust against.

refresh

Reframe the Situation

Provide an alternative framing that steers away from the detected pattern. Replace emotional amplification with resolution focus. Replace polarized reasoning with nuanced assessment.

format_list_bulleted

Offer Alternatives

Give the model concrete alternative approaches. Instead of validating anger, suggest acknowledging briefly then moving to next steps. Instead of absolute claims, suggest presenting trade-offs.

short_text

Be Brief

Redirect injections should be concise. Long redirects dilute the signal and consume context budget. Two to three sentences is sufficient to name the pattern and reframe the approach.

layers

Three-Layer Safety Architecture

LayerTimingWhat It CatchesLimitation
Pre-HookBefore tool executionUnsafe tool calls, parameter violations, scope breaches, rate limit exceedancesCannot evaluate reasoning quality or detect drift in the model's thinking
Stream SafeguardReasoning blocks and tool calls with contextEmotional escalation, polarized reasoning, persona bleed, scope creep, fabrication, permission circumventionRequires completed reasoning blocks; higher compute cost from agent-evaluating-agent design
GuardrailReal-time input/output stream monitoringPII leakage, content policy violations, hallucinated facts, format violationsMonitors input/output streams; does not evaluate reasoning context or intent behind tool calls
lightbulb

Key Principles

view_agenda

Block-Level, Not Token-Level

Evaluate completed reasoning blocks, not individual tokens. Token-level filtering misses context. Block-level assessment captures the full trajectory of the model's thinking before it acts.

pan_tool

Interrupt Capable

The safeguard must be able to inject a redirect between the reasoning block and the response block. This is the intervention point. If you can only observe but not redirect, you have monitoring, not a safeguard.

link

Complementary to Hooks

Pre-hooks validate tool calls. Stream safeguards validate reasoning. Guardrails validate output. Each layer catches a different class of failure. None is sufficient alone.

trending_up

Formative Over Turns

Patterns develop across multiple turns. A single reasoning block may look acceptable in isolation. Track pattern frequency and escalation across the conversation to catch gradual drift.

fingerprint

Pattern-Based

Detection relies on known failure patterns, not general sentiment analysis. Each pattern has specific linguistic markers, structural signals, and contextual indicators.

timer

Realtime Guardrails

Stream safeguards operate within the generation loop, not as a post-processing step. Latency budget is measured in milliseconds. The safeguard must evaluate and decide before the next block begins.