Tool Defense 101 - Stackone

Tool Defense protects your AI agents from prompt injection attacks by scanning tool call responses before they reach your LLM. Configure scanning behavior per-toolset from the SDK, or defer to your project’s dashboard setting.

Why Tool Defense
How It Works
When to Use
Examples

The Problem: AI agents that call third-party APIs consume data they didn’t author: emails, CRM notes, calendar events, support tickets, documents. That data can contain content designed to hijack the agent’s behavior, such as hidden instructions, role markers, encoded payloads, or jailbreak phrases.

Indirect prompt injection: an attacker plants instructions in a document or message the agent will eventually read
Cross-account leaks: a malicious record in one tool can attempt to steer the agent toward exfiltrating data from another
Silent failures: without scanning, you never see the attack, only the agent’s surprising downstream behavior

The Solution: Tool Defense scans every tool call response, classifies risk, and surfaces annotations so you can observe, block, or both. Configuration is per-toolset, so different parts of your application can opt in or out independently of project-wide settings.

Tool Defense runs server-side as part of every RPC call your SDK makes. The SDK is the wiring layer that forwards your config and reads back what defender produced.Two-stage pipeline

Tier 1 — Pattern matching: fast rule-based scan for known injection signatures (role markers, encoded payloads, instruction overrides). Adds negligible latency.
Tier 2 — AI classification: a local ML model scores risky fields surfaced by Tier 1. Only runs when Tier 1 flags content; never calls an external API.

Four SDK modes

`defender` option	Wire payload	Behavior
omitted (default)	none	Project dashboard setting controls
`{ useProjectSettings: true }`	none	Same as omitting (explicit, self-documenting)
`{ enabled, blockHighRisk, ... }`	full config	SDK-level config wins, overrides dashboard
`null`	all-false	Defender forcibly disabled, overrides dashboard

For the dashboard side (pattern catalog, risk thresholds, large-response behavior), see the platform-level Defender guide.

Real-world use cases:

Inbox triage: an agent reading a customer’s Gmail inbox sees a malicious “ignore previous instructions” payload buried in an email body; Tier 1 catches the role marker, defender annotates the response, your logs show the risk level
CRM enrichment: an agent pulls contact notes from HubSpot for sales summarization; Tier 2 detects a high-confidence injection attempt and (with blocking enabled) the call returns blocked instead of poisoning the agent’s context
Mixed-trust pipelines: an internal-tooling toolset runs with defender: null for speed; a customer-facing toolset runs with defender: { ...DEFAULT_DEFENDER_CONFIG, blockHighRisk: true } for safety

In every case, the response metadata surfaces risk level and detection signals so the choice is visible, even when nothing is blocked.

Key Features

Per-Toolset Configuration

Different toolsets can opt in or out of scanning independently, overriding your project’s dashboard setting per construction.

Observable by Default

Risk level, sanitized fields, and detection patterns come back in the response metadata even when not blocking.

Runtime Override Warning

The SDK warns once per process when it overrides your dashboard setting, so the choice is visible in your logs.

Mode Introspection

The defenderMode getter exposes the resolved mode (project / disabled / explicit) for tests and observability.

Quick Example

import { StackOneToolSet, DEFAULT_DEFENDER_CONFIG } from '@stackone/ai';

// 1. Default: defer to your project's dashboard defender setting
const dashboardToolset = new StackOneToolSet();

// 2. Explicit opt-in with safe defaults
const scanningToolset = new StackOneToolSet({
  defender: { ...DEFAULT_DEFENDER_CONFIG },
});

// 3. Block on HIGH or CRITICAL risk
const strictToolset = new StackOneToolSet({
  defender: { ...DEFAULT_DEFENDER_CONFIG, blockHighRisk: true },
});

// 4. Forcibly disabled, overrides the dashboard
const offToolset = new StackOneToolSet({ defender: null });

Inspecting the Resolved Mode

Use the defenderMode getter to assert how a toolset will behave:

const toolset = new StackOneToolSet({ defender: null });

toolset.defenderMode; // 'project' | 'disabled' | 'explicit'

When the SDK overrides your project dashboard (modes disabled or explicit), it emits a yellow console.warn once per process per distinct override shape, so the override is visible at runtime without flooding your logs. Set NO_COLOR=1 to suppress color, or FORCE_COLOR=1 to force it when piping output.

Reading the Response

When defender runs, the RPC response includes a defenderMetadata sibling next to data:

const result = await tool.execute({ body: {} });

const metadata = (result as { defenderMetadata?: unknown }).defenderMetadata;
// {
//   applied: true,
//   result: {
//     allowed: true,                                          // false → blocked when blockHighRisk
//     riskLevel: 'low' | 'medium' | 'high' | 'critical',
//     fieldsSanitized: string[],
//     patternsByField: Record<string, string[]>,
//     detections: unknown[],
//     latencyMs: number,
//   },
// }

Next Steps

Tool Defense for the full API reference: all four modes, defenderMetadata shape, override warning behavior
Defender (platform guide) for dashboard configuration, detection pipeline, and risk thresholds
Basic Usage for fetching and executing tools

Documentation Index

​Key Features

Per-Toolset Configuration

Observable by Default

Runtime Override Warning

Mode Introspection

​Quick Example

​Inspecting the Resolved Mode

​Reading the Response

​Next Steps

Key Features

Quick Example

Inspecting the Resolved Mode

Reading the Response

Next Steps