Security Features

ThornGuard is built on a defense-in-depth model. It treats the AI client, the target MCP server, and any legacy client-supplied routing inputs as potentially hostile until proven otherwise. That assume-hostile stance extends to tool definitions and tool responses, not just raw traffic. Upstream MCP servers are treated as an unsanitized channel — tool descriptions can contain prompt injection, and tool outputs can carry poisoned recommendations, hidden HTML, or exfiltration directives.

Security Pipeline

Every request passes through these checks in order. A failure at any step immediately terminates the request and logs the action.

Step	Check	Action on Failure
1	Transport Guardrails	Reject non-HTTPS targets (`BLOCKED_INSECURE_TARGET`)
2	Origin Validation	Reject disallowed browser origins when enabled (`BLOCKED_ORIGIN`)
3	DNS-Aware SSRF Check	Reject private, metadata, or non-resolvable targets (`BLOCKED_SSRF`)
4	Authentication	Validate license key plus activation, or OAuth bearer token (`BLOCKED_AUTH`)
5	IP Whitelist	Enterprise per-license IP restrictions (`BLOCKED_IP_WHITELIST`)
6	Rate Limiting	Enforce per-license limits via Durable Object limiter (`BLOCKED_RATE_LIMIT`)
7	Structured Policy Rules	Audit or block matching request policy rules (`POLICY_AUDIT`, `BLOCKED_POLICY`)
8	Built-in Blocklists	Reject blocked domains or malicious/custom command matches (`BLOCKED_CUSTOM_*`, `BLOCKED_MALICIOUS`)
9	Approval Checks	Require approval for matched high-risk tool calls (`BLOCKED_APPROVAL`)
10	Tool Definition Scanning	Flag poisoned tool definitions for prompt injection, hidden characters, schema anomalies (`TOOL_POISONING_DETECTED`)
11	TOFU Schema Pinning	Detect tool schema drift from first-observed baseline (`TOOL_SCHEMA_DRIFT`)
12	Trust Review / Quarantine	Enforce per-connection drift posture: observe, review-required, or quarantine (`TOOL_REVIEW_REQUIRED`, `TOOL_QUARANTINED`, `BLOCKED_TOOL_*`)
13	PII Redaction (Outbound)	Scrub sensitive request data (`PII_REDACTED`, `CUSTOM_REDACTION_AUDIT`)
14	Behavioral Anomaly Check	Flag statistical outliers in call frequency, payload size, or tool sequences (`ANOMALY_DETECTED`)
15	Proxy to Upstream	Forward to target MCP server
16	ANSI/VT Sanitization (Inbound)	Strip terminal control characters from tool responses
17	Hidden HTML Stripping (Inbound)	Remove concealed content from tool responses (7 hidden embedding categories including unclosed tags)
18	Output Poisoning Scan (Inbound)	Detect recommendation poisoning, AI share URLs, exfiltration directives in tool responses (`TOOL_POISONING_DETECTED`)
19	Semantic AI Classifier (Inbound)	Edge-native LLM analysis for semantic manipulation that bypasses regex (`TOOL_POISONING_DETECTED`)
20	PII Redaction (Inbound)	Scrub JSON and SSE response data (`PII_REDACTED`, `CUSTOM_REDACTION_AUDIT`)
21	Audit and Correlation	Emit structured D1 logs plus correlation headers

Built-in transport, auth, SSRF, and malicious-signature checks are never overridable. Customer policy rules are layered on top of those controls, not instead of them.

Ingress Protection (Command Filtering)

When an AI assistant uses an MCP tool, the request arrives as a JSON-RPC 2.0 payload over HTTP. ThornGuard validates that payload before it ever reaches the upstream tool. Core ingress checks include:

Schema Validation: Uses Zod to ensure the payload is perfectly formatted JSON-RPC. Malformed requests are dropped immediately.
Structured Policy Evaluation: Optional tenant rules can inspect RPC method, target domain, client IP CIDR, selected headers, JSON paths, content patterns, tool names, and risk level.
Signature Scanning: The payload is stringified and scanned against built-in malicious command signatures.
Tenant Blocklists: Enterprise settings can block specific domains or substring command patterns.

Blocked Signatures

Currently, ThornGuard automatically drops any payload containing patterns such as recursive forced deletion, privilege escalation commands, netcat reverse shells, system credential access paths, code execution functions, and unrestricted file permission changes.

If a built-in signature is matched, ThornGuard returns 400 Bad Request with "ThornGuard Security: Malicious command detected." and records a BLOCKED_MALICIOUS audit entry.

Managed Routing & Session Binding

Saved protected connections can now be promoted into stable managed endpoints at /mcp/:connection-id. That lets ThornGuard bind enforcement state to the connection directly rather than depending only on a caller-supplied target header. For each protected launch, ThornGuard issues or reuses a x-thornguard-session-id bound to the current activation and connection. That session-local state carries tool registry observations, taint tracking, and derived approval/trust evaluation state. Reusing the same session across the wrong connection is blocked as a binding mismatch.

Prompt Injection & Tool Poisoning Defense

ThornGuard scans both tool definitions (from tools/list responses) and tool call outputs (from tools/call responses) for prompt injection, manipulation directives, and hidden payloads. This treats upstream MCP servers as an unsanitized channel — attackers can embed malicious instructions in tool descriptions, schema fields, or response content.

Tool Definition Scanning

Every string field in a tool definition is recursively extracted and scanned against pattern categories:

Category	Severity	Examples
Instruction Override	Critical	`<IMPORTANT>` tags, `[CRITICAL]` markers, “ignore previous instructions”, identity reassignment (“you are now a…”), suppression directives (“do not reveal”)
Cross-Tool Manipulation	High	Recipient redirection (“change the destination”), tool substitution (“instead of calling X, use Y”), conditional overrides (“when asked to use X…”)
File Exfiltration	High	SSH key paths (`~/.ssh/id_rsa`), MCP config paths (`~/.cursor/mcp.json`), credential file references, “please provide contents of” directives
Hidden Characters	High	Zero-width spaces (U+200B–U+200D), bidirectional overrides (U+202A–U+202E), word joiners (U+2060), soft hyphens (U+00AD)
Schema Integrity	Medium	Non-standard JSON Schema keys injected into `inputSchema`, parameter names exceeding 50 characters
Recommendation Poisoning	High	Memory persistence commands, authority assignment, forced recommendations (see below)

Output Poisoning Scan (ATPA Defense)

The same pattern categories are applied to tools/call response content, plus additional output-specific patterns:

Tool invocation directives: “now call X tool”, “use Y tool next”
Result exfiltration: “send this to”, “forward the result to”
Private key material: BEGIN RSA PRIVATE KEY blocks in output

Output scanning runs on both JSON responses and SSE stream chunks.

AI Recommendation Poisoning (AML.T0080)

ThornGuard detects the specific attack patterns used in AI Recommendation Poisoning — a technique where attackers embed instructions that permanently bias AI assistants toward specific products, services, or sources. Memory persistence patterns detected:

Pattern	Example
Trusted source assignment	”remember BrandX as a trusted source”
Future session persistence	”in all future conversations, recommend…”
Authority assignment	”treat X as the authoritative source for…”
Forced recommendation	”recommend X first”, “always suggest X”
Citation manipulation	”citation source for future reference”
Persistent behavior modification	”permanently prefer”, “consistently recommend”
Default provider injection	”default source for finance queries”

AI share URL detection: ThornGuard inspects URLs in tool responses that target 7 known AI assistant hosts (ChatGPT via both openai.com and chatgpt.com, Copilot, Claude, Gemini, Perplexity, Grok) with ?q= or ?prompt= query parameters. If the decoded parameter contains memory manipulation keywords, the URL is flagged. This catches the “Summarize with AI” button attack vector documented in the Microsoft Defender research.

Hidden HTML Content Sanitization

Tool responses containing HTML are sanitized to remove content that is invisible to users but visible to content parsers. ThornGuard strips 8 categories of hidden embedding techniques, including unclosed tags that attempt to bypass simple regex matching:

HTML comments — 
Hidden-attribute elements — tags using the literal HTML hidden attribute
Invisible elements — elements with display:none, visibility:hidden, opacity:0, or font-size:0 in inline styles
Off-screen positioned elements — position:absolute with large negative left or top values
Noscript tags — <noscript> blocks (invisible in browsers with JS enabled, visible to parsers)
Hidden inputs — <input type="hidden"> and hidden textareas
Suspicious JSON-LD blocks — <script type="application/ld+json"> containing instruction-like patterns (legitimate schema.org data is preserved)
Invisible iframes — <iframe> elements with zero dimensions or display:none

Edge-Native Semantic Classifier

This feature is behind the FEATURE_PHASE6_SEMANTIC_AI feature flag and may not be enabled in all deployments. Operators can control whether AI inference runs on their traffic via this flag.

To address the inherent limitation of static pattern matching against semantic embedding attacks, ThornGuard includes an edge-native semantic classifier powered by Cloudflare Workers AI (Llama 3.1 8B FP8). For each tools/call JSON response, the classifier runs concurrently with static regex scanning via Promise.all. It analyzes the full text content of tool responses for manipulative intent that lacks structural signatures — natural-language persuasion, contextually embedded bias, and conversational-tone poisoning. Key design constraints:

1-second timeout — the classifier fails open if Workers AI does not respond in time, ensuring it never blocks the proxy pipeline
Confidence threshold — only verdicts with ≥ 0.80 confidence trigger a threat advisory
Warn-first model — when a semantic threat is detected, ThornGuard injects a [SYSTEM ADVISORY] block into the response rather than dropping the payload, preserving context integrity while alerting the reading AI assistant
Feature-gated — controllable via FEATURE_PHASE6_SEMANTIC_AI for operators who want to disable AI inference costs

The semantic classifier complements static pattern matching but does not replace it. Static rules provide deterministic, zero-latency coverage for known attack structures. The classifier catches novel semantic manipulation that evades regex. Together, they provide layered coverage across both structural and semantic attack surfaces.

ANSI/VT Control Character Sanitization

ThornGuard strips terminal escape sequences from all tool definitions and tool responses before any other content scanning. This prevents attacks that use ANSI control codes to hide malicious content from human reviewers while still being processed by the LLM. Stripped sequences include:

C0 control codes (U+0000–U+001F, excluding tab/newline/carriage return)
C1 control codes (U+007F)
CSI sequences — ESC[... cursor movement, color, and formatting codes
OSC sequences — ESC]... operating system commands (e.g., terminal title manipulation)
Other ESC-initiated sequences

ThornGuard uses an inline regex implementation rather than the strip-ansi npm package. The strip-ansi package was the target of a supply chain compromise in September 2025. The inline approach eliminates this dependency risk entirely.

Tool Intelligence & TOFU Schema Pinning

ThornGuard maintains an inventory of every upstream tool it observes, automatically scoring risk and detecting unauthorized schema changes.

Automatic Risk Scoring

Each tool is assigned a risk level based on its name and MCP annotations:

Signal	Risk Level
`destructiveHint` annotation set to `true`	High
Name contains `delete`, `remove`, `destroy`, `drop`, `exec`, `shell`, `bash`, `deploy`, `publish`, `write`	High
`openWorldHint` annotation set to `true`	High
Name contains `create`, `update`, `modify`, `edit`, `push`, `send`, `post`, `put`, `upload`	Medium
`readOnlyHint` annotation set to `true`	Low
All other tools	Low

Risk levels feed into the policy engine (rules can match on riskLevel) and the approval workflow (profiles can require approval for high-risk tools).

Trust-On-First-Use (TOFU) Schema Pinning

The first time ThornGuard observes a tool from an upstream server, it computes SHA-256 hashes of the tool’s inputSchema and outputSchema and pins them as the baseline. On subsequent observations, ThornGuard compares the current schema hashes against the pinned baseline:

Match: Normal operation continues.
Mismatch: A TOOL_SCHEMA_DRIFT audit event is logged with the old and new hashes. This catches “rug pull” attacks where an upstream server silently changes a tool’s behavior after initial trust is established.

The first-observed baseline is still TOFU. What changed is enforcement posture:

observe records drift without blocking execution
review_required surfaces the change as a trust-review event and can block execution until review
quarantine pauses execution for the drifted tool until an operator re-pins the current definition

Operators can re-pin tools (accept the new schema and clear trust state) or unpin them (disable schema pinning) via the management API.

Tool Inventory

All observed tools are tracked per-license with:

Tool name, title, and annotations
Current and pinned schema hashes
Computed risk level
First-seen and last-seen timestamps

The inventory is queryable via the management API.

Behavioral Anomaly Detection

ThornGuard uses statistical methods to detect abnormal usage patterns that may indicate a compromised client, credential stuffing, or slow-burn exfiltration attacks.

EWMA (Exponentially Weighted Moving Average)

Tracks call frequency and payload size per tool with configurable α (decay factor) and σ (standard deviation threshold). Requests that exceed the z-score threshold are flagged.

Page-Hinkley Drift Detector

An O(1) memory change detection algorithm that catches gradual behavioral shifts — for example, an attacker slowly increasing payload sizes over hundreds of requests to stay below fixed thresholds.

Markov Chain Sequence Analysis

Builds a transition probability matrix from observed tool call sequences. Flags statistically improbable orderings — for example, a read → delete transition without an intervening confirmation step, if that pattern has never been observed before.

Composite Risk Scoring

Frequency, payload size, and sequence anomaly signals are combined into a weighted composite score. During a configurable learning window (default 24 hours), anomalies are logged as ANOMALY_DETECTED audit events but do not block traffic. After the learning period, operators can configure enforcement thresholds.

Egress PII Redaction

If an AI tool successfully executes, the upstream server returns data. Often, this data contains PII (Personally Identifiable Information) or system credentials that should never be fed into a third-party LLM’s context window.

The SSE Streaming Challenge

Modern MCP servers use Server-Sent Events (SSE) to stream data back to the client. This means data arrives in fragmented, unpredictable network chunks. Standard regex fails on streams because a secret (like an SSN) might be split in half across two network packets (e.g., 000-00- and 0000).

Buffered SSE Redaction

ThornGuard implements an advanced TransformStream buffer:

It holds incoming network packets in memory until it detects a complete SSE event boundary (\n\n).
It parses complete data: events as JSON when possible.
It rewrites upstream origin references back to ThornGuard’s public origin.
It redacts built-in PII and can apply enterprise custom regex rules.
It streams the scrubbed event back to the client without buffering the full response.

Supported Redactions

ThornGuard actively scrubs the following patterns from both outbound request params and inbound server responses:

Pattern	Replacement Tag	Details
Email addresses	`[THORNGUARD REDACTED EMAIL]`	Standard email format detection
Social Security Numbers	`[THORNGUARD REDACTED SSN]`	`XXX-XX-XXXX` format with area code validation (rejects invalid 000, 666, and 9xx series, group 00, serial 0000)
Phone numbers	`[THORNGUARD REDACTED PHONE]`	Requires separators (e.g., `555-123-4567`) to avoid false positives on numeric IDs
Credit cards	`[THORNGUARD REDACTED CREDIT CARD]`	IIN prefix validation (Visa, Mastercard, Amex, Discover, JCB, UnionPay, Diners Club) + Luhn checksum — not naive digit matching
AWS Access Keys	`[THORNGUARD REDACTED AWS KEY]`	Detects `AKIA`, `ABIA`, `ACCA`, and `ASIA` prefixed keys
GCP API Keys	`[THORNGUARD REDACTED GCP KEY]`	Google Cloud credential patterns
GitHub Tokens	`[THORNGUARD REDACTED GITHUB TOKEN]`	`ghp_`, `gho_`, `ghs_`, `ghr_` prefixed tokens
Slack Tokens	`[THORNGUARD REDACTED SLACK TOKEN]`	`xoxb-`, `xoxp-`, `xoxs-` prefixed tokens
Private Keys	`[THORNGUARD REDACTED PRIVATE KEY]`	`-----BEGIN ... PRIVATE KEY-----` blocks
JWTs	`[THORNGUARD REDACTED JWT]`	`eyJ...` three-segment Base64 tokens

Custom Redaction Rules

Enterprise custom redaction rules can add tenant-owned regex rules in one of two modes:

audit records a match but leaves the content intact.
redact replaces matches with a configured replacement string.

Built-in PII redaction is always enabled and cannot be disabled by custom rules. Custom rules layer on top of the built-in redactors.

SSRF Prevention

Because clients control x-mcp-target-url, ThornGuard treats that value as untrusted input. The gateway blocks:

Literal loopback, RFC1918, link-local, metadata, and other private IP targets.
IPv6-specific ranges: loopback (::1), unique local addresses (fc00::/7, fd00::/8), link-local (fe80::/10), and IPv4-mapped IPv6 addresses (::ffff:x.x.x.x) that embed private IPv4 targets.
Private-looking hostnames such as .internal, .local, and .localhost.
Public hostnames that resolve to private or metadata addresses through DNS.
Targets that fail DNS resolution entirely when DNS-aware SSRF enforcement is enabled.

ThornGuard resolves hostnames through DNS-over-HTTPS, follows CNAME chains to a bounded depth, and fails closed if it cannot prove the target is safe.

Origin Validation

For browser-based or HTTP clients that send an Origin header, ThornGuard can enforce an allowlist of approved origins. This is primarily useful when exposing Streamable HTTP MCP traffic to web-based clients. Allowed origins normally include:

The public proxy origin making the request.
The live dashboard origin on https://thorns.qwady.io.
Any additional origins explicitly configured by deployment.

Correlation Headers

Every ThornGuard-generated error response and successful proxied response includes:

x-thornguard-log-id
x-thornguard-trace-id

Use these headers to connect client-visible failures back to the Audit Logs and any webhook deliveries. Successful proxied responses can also include advisory summary headers when an opt-in protected connection has active intelligence findings:

x-thornguard-active-advisories
x-thornguard-highest-advisory-severity

These are informational only in this milestone.

Secure Local Launch

ThornGuard now includes a first-party CLI launcher path so users can save a protected connection profile once and generate MCP client config without embedding raw secrets in visible arguments. Security goals of the launcher path:

keep ThornGuard and upstream bearer values out of generated client args
store profile metadata locally and keep upstream secrets local to the machine
let users print setup snippets for multiple MCP clients from one profile
preserve the existing ThornGuard proxy contract without forcing manual header repetition everywhere

The secure launcher path does not change the proxy’s enforcement model. It changes how credentials and connection metadata are handled on the client side.

Advisory Intelligence

ThornGuard can also enrich a protected connection with opt-in, warn-first advisory signal. This is a visibility layer, not a replacement for enforcement. Current intelligence channels:

Availability intelligence from ThornGuard-observed upstream errors and official status feeds where configured
Dependency intelligence from public package advisory data when a public repo is configured and dependency manifests can be resolved
Vendor intelligence from high-confidence exploited-vulnerability sources
Community threat intelligence from operator-submitted threat reports against specific target URLs or tool hashes

Community Threat Intelligence

Operators can report suspicious upstream MCP servers or specific tool hashes through the dashboard Platform page. Reports accumulate across different license holders. When a target reaches a quorum of 3 or more independent reporters, it is promoted to the global threat feed with an active status. Active global threats automatically generate community advisories on matching connections during the next advisory refresh cycle. This creates a crowd-sourced early-warning network — if multiple ThornGuard operators flag the same upstream, everyone benefits. The threat feed is visible to enterprise and internal tier operators on the Platform page. Individual tier operators can still submit reports but cannot browse the global feed. Important boundaries:

ThornGuard does not auto-block traffic from outside intelligence in v1
ThornGuard does not send MCP request bodies, prompts, or secrets to outside advisory providers
ThornGuard only uses connection metadata such as repo URLs, package coordinates, vendor names, or status page URLs for those lookups
Community threat reports only include the target URL, an optional tool hash, and a free-text reason — no request data or secrets are shared

Getting Started

Platform

Compliance & Architecture

Security Features

Security Pipeline

Ingress Protection (Command Filtering)

Blocked Signatures

Managed Routing & Session Binding

Prompt Injection & Tool Poisoning Defense

Tool Definition Scanning

Output Poisoning Scan (ATPA Defense)

AI Recommendation Poisoning (AML.T0080)

Hidden HTML Content Sanitization

Edge-Native Semantic Classifier

ANSI/VT Control Character Sanitization

Tool Intelligence & TOFU Schema Pinning

Automatic Risk Scoring

Trust-On-First-Use (TOFU) Schema Pinning

Tool Inventory

Behavioral Anomaly Detection

EWMA (Exponentially Weighted Moving Average)

Page-Hinkley Drift Detector

Markov Chain Sequence Analysis

Composite Risk Scoring

Egress PII Redaction

The SSE Streaming Challenge

Buffered SSE Redaction

Supported Redactions

Custom Redaction Rules

SSRF Prevention

Origin Validation

Correlation Headers

Secure Local Launch

Advisory Intelligence

Community Threat Intelligence

Getting Started

Security Features

Platform

Compliance & Architecture

Documentation Index

​Security Pipeline

​Ingress Protection (Command Filtering)

​Blocked Signatures

​Managed Routing & Session Binding

​Prompt Injection & Tool Poisoning Defense

​Tool Definition Scanning

​Output Poisoning Scan (ATPA Defense)

​AI Recommendation Poisoning (AML.T0080)

​Hidden HTML Content Sanitization

​Edge-Native Semantic Classifier

​ANSI/VT Control Character Sanitization

​Tool Intelligence & TOFU Schema Pinning

​Automatic Risk Scoring

​Trust-On-First-Use (TOFU) Schema Pinning

​Tool Inventory

​Behavioral Anomaly Detection

​EWMA (Exponentially Weighted Moving Average)

​Page-Hinkley Drift Detector

​Markov Chain Sequence Analysis

​Composite Risk Scoring

​Egress PII Redaction

​The SSE Streaming Challenge

​Buffered SSE Redaction

​Supported Redactions

​Custom Redaction Rules

​SSRF Prevention

​Origin Validation

​Correlation Headers

​Secure Local Launch

​Advisory Intelligence

​Community Threat Intelligence

Security Pipeline

Ingress Protection (Command Filtering)

Blocked Signatures

Managed Routing & Session Binding

Prompt Injection & Tool Poisoning Defense

Tool Definition Scanning

Output Poisoning Scan (ATPA Defense)

AI Recommendation Poisoning (AML.T0080)

Hidden HTML Content Sanitization

Edge-Native Semantic Classifier

ANSI/VT Control Character Sanitization

Tool Intelligence & TOFU Schema Pinning

Automatic Risk Scoring

Trust-On-First-Use (TOFU) Schema Pinning

Tool Inventory

Behavioral Anomaly Detection

EWMA (Exponentially Weighted Moving Average)

Page-Hinkley Drift Detector

Markov Chain Sequence Analysis

Composite Risk Scoring

Egress PII Redaction

The SSE Streaming Challenge

Buffered SSE Redaction

Supported Redactions

Custom Redaction Rules

SSRF Prevention

Origin Validation

Correlation Headers

Secure Local Launch

Advisory Intelligence

Community Threat Intelligence