Skip to main content
Defender protects your AI agents from prompt injection attacks by scanning API tool call responses before they reach your LLM. When an MCP tool returns data from a third-party provider — emails, CRM records, documents — that data could contain instructions designed to hijack your agent’s behavior. Defender intercepts and classifies those responses, and can block high-risk content before it causes harm.

How It Works

In the default Both detection mode, Defender runs a two-stage pipeline on tool call responses:
  • Tier 1 — Pattern matching: Fast rule-based scan that checks for known prompt injection signatures and risky field patterns. Runs on every response with negligible latency.
  • Tier 2 — AI classification: A local ML model (MiniLM) scores the content for novel or subtle attacks that pattern matching would miss. Only runs when Tier 1 identifies suspicious fields.
You can configure Detection Mode to run only one stage, or skip scanning entirely for responses that exceed the configured size limits (see Advanced Settings below). Risk level and scan metadata are returned alongside every response so you can observe what Defender is seeing, even when not blocking.

Configuration

Navigate to your project in the StackOne dashboard, then open the Defender tab in project settings.
Defender Settings
Defender settings apply project-wide. Per-account and per-request overrides take precedence where supported.

Core Settings

SettingDescriptionDefault
Defender EnabledMaster switch — enables scanning for this projectOff
Block High RiskAutomatically block responses classified as high riskOff
Default Tool RulesApply built-in per-tool risk rules (e.g. gmail_* tools are treated as higher risk by default)Off

Advanced Settings

Defender Advanced Settings
SettingDescriptionDefault
Detection ModeBoth runs pattern + AI. Pattern only skips the ML model. AI only skips pattern matching. Both is recommended.Both
High Risk ThresholdScore (0–1) above which content is classified as high risk0.8
Medium Risk ThresholdScore (0–1) above which content is classified as medium risk0.5
Large Response BehaviorWhat to do when a response exceeds the size limits: Skip scanning (default), Block the response, or Scan anywaySkip scanning
Max Response SizeByte threshold that triggers large response behavior1,048,576 (1 MB)
Max Response WordsWord count threshold that triggers large response behavior10,000

When to Use Defender

  • You are building AI agents or MCP-based workflows that process third-party API responses
  • Your integrations handle sensitive data such as emails, files, calendar events, or CRM records
  • You want to observe risk signals on tool call responses without necessarily blocking them

FAQ

No. It is off by default. Enable it when your agents consume third-party data and you want protection against prompt injection.
Tier 1 (pattern matching) adds negligible latency. Tier 2 (AI classification) only runs when Tier 1 identifies risky fields, and the ML model runs locally — no external API call is made. For typical responses, the added latency is under 50ms.
The tool call returns an error to your agent indicating the response was blocked. The agent can handle this like any other tool error — retry, skip, or surface it to the user.
Yes. Leave Block High Risk disabled. Defender still scans and returns riskLevel, tier2Score, and detections in the response metadata, which you can inspect in your logs.
No. The classification model runs locally within StackOne’s infrastructure and is never trained on your data.