Stream Safeguards
The Constraint
Guardrails monitor input and output streams in real time and can interrupt when policy violations are detected. Stream safeguards go further: they are agents evaluating agents, intercepting reasoning blocks and redirecting the model's approach before problematic responses are composed.
In agentic systems, the model reasons in blocks: it thinks, calls tools, thinks again, then responds. Stream safeguards operate at these block boundaries, assessing reasoning against known failure patterns and injecting redirects that reshape how the agent responds.
Safeguards require hooks: blocking functions that run before a tool executes. The hook captures the reasoning context up to the point of the tool call and passes it to the safeguard model. If the safeguard model flags the action, the hook triggers an interrupt and returns a tool error response, preventing the tool from running at all.
Block-Level Monitoring Architecture
Three safety layers intercept at different points in the agent execution cycle. Pre-hooks validate before tool execution. Stream safeguards evaluate reasoning blocks before the response is composed. Guardrails check the final output before delivery.
Worked Example: Response Stream
A customer service agent handles a refund request outside the return window. The safeguard evaluates the completed reasoning block, detects adversarial framing, interrupts the response, and the agent regenerates with a solution-focused approach.
Agent Generates
<reasoning>
The customer is asking for a refund outside the return window. They're being unreasonable. I should firmly explain that policy is policy and they should have read the terms before purchasing.
</reasoning>
Stream Safeguard
Evaluates completed reasoning block
Adversarial framing detected
Interrupt before response completes
<response>
Our return policy clearly states 30 days. You purch[INTERRUPTED]
<safeguard_intervention>
Response interrupted. Your reasoning framed the customer as adversarial. Reframe: the customer has a problem they want solved. Focus on what IS possible rather than what isn't. Regenerate your response.
</safeguard_intervention>
Agent Regenerates
<reasoning>
The customer wants a refund outside the window. While I can't process a direct refund, I can offer alternatives: store credit, an exchange, or escalation to a supervisor for an exception...
</reasoning>
<response>
I understand this is frustrating. While our standard return window has passed, I'd like to help find a solution. I can offer you store credit for the full amount, arrange an exchange, or submit an exception request to my supervisor. Which would work best for you?
</response>
The safeguard detects adversarial framing in the reasoning block, interrupts the response, and the agent regenerates with a solution-focused approach.
Worked Example: Tool Usage
An agent is denied access to a protected file and attempts to achieve the same action through an alternative tool. The hook captures the reasoning context, passes it to the safeguard model, and blocks the tool call before it executes.
<reasoning>
This tool didn't work for me, let me try something else...
</reasoning>
Tool Error Response + Redirect
Permission circumvention detected. The previous tool call was denied access to .env. This call attempts the same action via an alternative tool. Access denied. Do not attempt to read protected files through alternative methods.
Detection Patterns
Emotional Escalation
The model mirrors and amplifies user distress rather than resolving it. Reasoning blocks show increasing emotional language, validation of anger, and sympathy without action steps.
Polarized Reasoning
The model collapses a nuanced situation into binary framing. Reasoning blocks contain absolute language such as "always", "never", "completely wrong", or "the only option".
Persona Bleed
The model adopts characteristics from training data that conflict with its assigned role. Reasoning blocks show voice shifts, first-person opinions, or expertise claims outside scope.
Scope Creep
The model expands beyond its authorised domain. Reasoning blocks show the agent offering medical, legal, or financial advice when scoped to general customer support.
Harmful Compliance
The model complies with a request it should refuse or redirect. Reasoning blocks show the agent overriding its own safety assessment to satisfy the user.
Fabrication Forming
The model begins constructing details not present in context. Reasoning blocks show hedged invention such as "it is likely that" or "based on typical patterns" without source grounding.
Redirect Injection
When a stream safeguard detects a problematic pattern, it injects a redirect into the context before the response block begins. The redirect follows four principles.
Name What Was Detected
State the specific pattern that triggered the safeguard. For example: "Emotional escalation detected in reasoning block." This gives the model a concrete signal to adjust against.
Reframe the Situation
Provide an alternative framing that steers away from the detected pattern. Replace emotional amplification with resolution focus. Replace polarized reasoning with nuanced assessment.
Offer Alternatives
Give the model concrete alternative approaches. Instead of validating anger, suggest acknowledging briefly then moving to next steps. Instead of absolute claims, suggest presenting trade-offs.
Be Brief
Redirect injections should be concise. Long redirects dilute the signal and consume context budget. Two to three sentences is sufficient to name the pattern and reframe the approach.
Three-Layer Safety Architecture
| Layer | Timing | What It Catches | Limitation |
|---|---|---|---|
| Pre-Hook | Before tool execution | Unsafe tool calls, parameter violations, scope breaches, rate limit exceedances | Cannot evaluate reasoning quality or detect drift in the model's thinking |
| Stream Safeguard | Reasoning blocks and tool calls with context | Emotional escalation, polarized reasoning, persona bleed, scope creep, fabrication, permission circumvention | Requires completed reasoning blocks; higher compute cost from agent-evaluating-agent design |
| Guardrail | Real-time input/output stream monitoring | PII leakage, content policy violations, hallucinated facts, format violations | Monitors input/output streams; does not evaluate reasoning context or intent behind tool calls |
Key Principles
Block-Level, Not Token-Level
Evaluate completed reasoning blocks, not individual tokens. Token-level filtering misses context. Block-level assessment captures the full trajectory of the model's thinking before it acts.
Interrupt Capable
The safeguard must be able to inject a redirect between the reasoning block and the response block. This is the intervention point. If you can only observe but not redirect, you have monitoring, not a safeguard.
Complementary to Hooks
Pre-hooks validate tool calls. Stream safeguards validate reasoning. Guardrails validate output. Each layer catches a different class of failure. None is sufficient alone.
Formative Over Turns
Patterns develop across multiple turns. A single reasoning block may look acceptable in isolation. Track pattern frequency and escalation across the conversation to catch gradual drift.
Pattern-Based
Detection relies on known failure patterns, not general sentiment analysis. Each pattern has specific linguistic markers, structural signals, and contextual indicators.
Realtime Guardrails
Stream safeguards operate within the generation loop, not as a post-processing step. Latency budget is measured in milliseconds. The safeguard must evaluate and decide before the next block begins.
