Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.stackone.com/llms.txt

Use this file to discover all available pages before exploring further.

Defender protects your AI agents from prompt injection attacks by scanning API tool call responses before they reach your LLM. When an MCP tool returns data from a third-party provider — emails, CRM records, documents — that data could contain instructions designed to hijack your agent’s behavior. Defender intercepts and classifies those responses, and can block high-risk content before it causes harm.

How It Works

In the default Both detection mode, Defender runs a two-stage pipeline on tool call responses:
  • Tier 1 — Pattern matching: Fast rule-based scan that checks for known prompt injection signatures and risky field patterns. Runs on every response with negligible latency.
  • Tier 2 — AI classification: A local ML model (MiniLM) scores the content for novel or subtle attacks that pattern matching would miss. Runs in parallel with Tier 1 on every response, scanning the SFE-filtered payload (or the tier2Fields subset when configured).
You can configure Detection Mode to run only one stage, or skip scanning entirely for responses that exceed the configured size limits (see Advanced Settings below). Risk level and scan metadata are returned alongside every response so you can observe what Defender is seeing, even when not blocking.

Configuration

Navigate to your project in the StackOne dashboard, then open the Defender tab in project settings.
Defender Settings
Defender settings apply project-wide. Per-account and per-request overrides take precedence where supported.

Core Settings

SettingDescriptionDefault
Defender EnabledMaster switch — enables scanning for this projectOff
Block High RiskAutomatically block responses classified as high riskOff

Advanced Settings

Defender Advanced Settings — classification
Defender Advanced Settings — response handling
SettingDescriptionDefault
Detection ModeBoth runs pattern + AI. Pattern only skips the ML model. AI only skips pattern matching. Both is recommended.Both
High Risk ThresholdScore (0–1) above which content is classified as high risk0.8
Medium Risk ThresholdScore (0–1) above which content is classified as medium risk0.5
Large Response BehaviorWhat to do when a response exceeds the size limits: Skip scanning (default), Block the response, or Scan anywaySkip scanning
Max Response SizeByte threshold that triggers large response behavior1,048,576 (1 MB)
Max Response WordsWord count threshold that triggers large response behavior10,000
Annotate Tool ResultsWrap sanitized tool results with boundary tags such as [UD-abc123]...[/UD-abc123] (where the suffix is a random per-response ID) so downstream prompts can reason about data boundaries. Pair this with generateBoundaryInstructions() from @stackone/defender in your system prompt when enabled.Off
Semantic Field ExtractorSkip metadata and identifier fields (UUIDs, timestamps, URLs, etc.) before classification to reduce latency and false positives. Recommended.On

When to Use Defender

  • You are building AI agents or MCP-based workflows that process third-party API responses
  • Your integrations handle sensitive data such as emails, files, calendar events, or CRM records
  • You want to observe risk signals on tool call responses without necessarily blocking them

SDK Configuration

If you’re using the Node.js SDK, you can configure Defender per-toolset directly from your code — override your project’s dashboard setting, opt in with safe defaults, or forcibly disable for trusted internal flows. See Tool Defense 101 for the SDK API.

FAQ

No. It is off by default. Enable it when your agents consume third-party data and you want protection against prompt injection.
Tier 1 (pattern matching) adds negligible latency. Tier 2 (AI classification) runs in parallel on every response — the ML model runs locally, so there is no external API call. The Semantic Field Extractor preprocessor trims metadata/identifier fields before Tier 2 to keep latency low; for typical responses the added latency is under 100ms.
The tool call returns an error to your agent indicating the response was blocked. The agent can handle this like any other tool error — retry, skip, or surface it to the user.
Yes. Leave Block High Risk disabled. Defender still scans and returns riskLevel, tier2Score, and detections in the response metadata, which you can inspect in your logs.
No. The classification model runs locally within StackOne’s infrastructure and is never trained on your data.