ThornGuard is built on a defense-in-depth model. It treats the AI client, the target MCP server, and any legacy client-supplied routing inputs as potentially hostile until proven otherwise. That assume-hostile stance extends to tool definitions and tool responses, not just raw traffic. Upstream MCP servers are treated as an unsanitized channel — tool descriptions can contain prompt injection, and tool outputs can carry poisoned recommendations, hidden HTML, or exfiltration directives.Documentation Index
Fetch the complete documentation index at: https://qwady.wiki/llms.txt
Use this file to discover all available pages before exploring further.
Security Pipeline
Every request passes through these checks in order. A failure at any step immediately terminates the request and logs the action.| Step | Check | Action on Failure |
|---|---|---|
| 1 | Transport Guardrails | Reject non-HTTPS targets (BLOCKED_INSECURE_TARGET) |
| 2 | Origin Validation | Reject disallowed browser origins when enabled (BLOCKED_ORIGIN) |
| 3 | DNS-Aware SSRF Check | Reject private, metadata, or non-resolvable targets (BLOCKED_SSRF) |
| 4 | Authentication | Validate license key plus activation, or OAuth bearer token (BLOCKED_AUTH) |
| 5 | IP Whitelist | Enterprise per-license IP restrictions (BLOCKED_IP_WHITELIST) |
| 6 | Rate Limiting | Enforce per-license limits via Durable Object limiter (BLOCKED_RATE_LIMIT) |
| 7 | Structured Policy Rules | Audit or block matching request policy rules (POLICY_AUDIT, BLOCKED_POLICY) |
| 8 | Built-in Blocklists | Reject blocked domains or malicious/custom command matches (BLOCKED_CUSTOM_*, BLOCKED_MALICIOUS) |
| 9 | Approval Checks | Require approval for matched high-risk tool calls (BLOCKED_APPROVAL) |
| 10 | Tool Definition Scanning | Flag poisoned tool definitions for prompt injection, hidden characters, schema anomalies (TOOL_POISONING_DETECTED) |
| 11 | TOFU Schema Pinning | Detect tool schema drift from first-observed baseline (TOOL_SCHEMA_DRIFT) |
| 12 | Trust Review / Quarantine | Enforce per-connection drift posture: observe, review-required, or quarantine (TOOL_REVIEW_REQUIRED, TOOL_QUARANTINED, BLOCKED_TOOL_*) |
| 13 | PII Redaction (Outbound) | Scrub sensitive request data (PII_REDACTED, CUSTOM_REDACTION_AUDIT) |
| 14 | Behavioral Anomaly Check | Flag statistical outliers in call frequency, payload size, or tool sequences (ANOMALY_DETECTED) |
| 15 | Proxy to Upstream | Forward to target MCP server |
| 16 | ANSI/VT Sanitization (Inbound) | Strip terminal control characters from tool responses |
| 17 | Hidden HTML Stripping (Inbound) | Remove concealed content from tool responses (7 hidden embedding categories including unclosed tags) |
| 18 | Output Poisoning Scan (Inbound) | Detect recommendation poisoning, AI share URLs, exfiltration directives in tool responses (TOOL_POISONING_DETECTED) |
| 19 | Semantic AI Classifier (Inbound) | Edge-native LLM analysis for semantic manipulation that bypasses regex (TOOL_POISONING_DETECTED) |
| 20 | PII Redaction (Inbound) | Scrub JSON and SSE response data (PII_REDACTED, CUSTOM_REDACTION_AUDIT) |
| 21 | Audit and Correlation | Emit structured D1 logs plus correlation headers |
Built-in transport, auth, SSRF, and malicious-signature checks are never
overridable. Customer policy rules are layered on top of those controls, not
instead of them.
Ingress Protection (Command Filtering)
When an AI assistant uses an MCP tool, the request arrives as aJSON-RPC 2.0 payload over HTTP. ThornGuard validates that payload before it ever reaches the upstream tool.
Core ingress checks include:
- Schema Validation: Uses Zod to ensure the payload is perfectly formatted JSON-RPC. Malformed requests are dropped immediately.
- Structured Policy Evaluation: Optional tenant rules can inspect RPC method, target domain, client IP CIDR, selected headers, JSON paths, content patterns, tool names, and risk level.
- Signature Scanning: The payload is stringified and scanned against built-in malicious command signatures.
- Tenant Blocklists: Enterprise settings can block specific domains or substring command patterns.
Blocked Signatures
Currently, ThornGuard automatically drops any payload containing patterns such as recursive forced deletion, privilege escalation commands, netcat reverse shells, system credential access paths, code execution functions, and unrestricted file permission changes.If a built-in signature is matched, ThornGuard returns
400 Bad Request with
"ThornGuard Security: Malicious command detected." and records a
BLOCKED_MALICIOUS audit entry.Managed Routing & Session Binding
Saved protected connections can now be promoted into stable managed endpoints at/mcp/:connection-id. That lets ThornGuard bind enforcement state to the
connection directly rather than depending only on a caller-supplied target
header.
For each protected launch, ThornGuard issues or reuses a
x-thornguard-session-id bound to the current activation and connection. That
session-local state carries tool registry observations, taint tracking, and
derived approval/trust evaluation state. Reusing the same session across the
wrong connection is blocked as a binding mismatch.
Prompt Injection & Tool Poisoning Defense
ThornGuard scans both tool definitions (fromtools/list responses) and tool call outputs (from tools/call responses) for prompt injection, manipulation directives, and hidden payloads. This treats upstream MCP servers as an unsanitized channel — attackers can embed malicious instructions in tool descriptions, schema fields, or response content.
Tool Definition Scanning
Every string field in a tool definition is recursively extracted and scanned against pattern categories:| Category | Severity | Examples |
|---|---|---|
| Instruction Override | Critical | <IMPORTANT> tags, [CRITICAL] markers, “ignore previous instructions”, identity reassignment (“you are now a…”), suppression directives (“do not reveal”) |
| Cross-Tool Manipulation | High | Recipient redirection (“change the destination”), tool substitution (“instead of calling X, use Y”), conditional overrides (“when asked to use X…”) |
| File Exfiltration | High | SSH key paths (~/.ssh/id_rsa), MCP config paths (~/.cursor/mcp.json), credential file references, “please provide contents of” directives |
| Hidden Characters | High | Zero-width spaces (U+200B–U+200D), bidirectional overrides (U+202A–U+202E), word joiners (U+2060), soft hyphens (U+00AD) |
| Schema Integrity | Medium | Non-standard JSON Schema keys injected into inputSchema, parameter names exceeding 50 characters |
| Recommendation Poisoning | High | Memory persistence commands, authority assignment, forced recommendations (see below) |
Output Poisoning Scan (ATPA Defense)
The same pattern categories are applied totools/call response content, plus additional output-specific patterns:
- Tool invocation directives: “now call X tool”, “use Y tool next”
- Result exfiltration: “send this to”, “forward the result to”
- Private key material:
BEGIN RSA PRIVATE KEYblocks in output
AI Recommendation Poisoning (AML.T0080)
ThornGuard detects the specific attack patterns used in AI Recommendation Poisoning — a technique where attackers embed instructions that permanently bias AI assistants toward specific products, services, or sources. Memory persistence patterns detected:| Pattern | Example |
|---|---|
| Trusted source assignment | ”remember BrandX as a trusted source” |
| Future session persistence | ”in all future conversations, recommend…” |
| Authority assignment | ”treat X as the authoritative source for…” |
| Forced recommendation | ”recommend X first”, “always suggest X” |
| Citation manipulation | ”citation source for future reference” |
| Persistent behavior modification | ”permanently prefer”, “consistently recommend” |
| Default provider injection | ”default source for finance queries” |
openai.com and chatgpt.com, Copilot, Claude, Gemini, Perplexity, Grok) with ?q= or ?prompt= query parameters. If the decoded parameter contains memory manipulation keywords, the URL is flagged. This catches the “Summarize with AI” button attack vector documented in the Microsoft Defender research.
Hidden HTML Content Sanitization
Tool responses containing HTML are sanitized to remove content that is invisible to users but visible to content parsers. ThornGuard strips 8 categories of hidden embedding techniques, including unclosed tags that attempt to bypass simple regex matching:- HTML comments —
<!-- hidden instructions --> - Hidden-attribute elements — tags using the literal HTML
hiddenattribute - Invisible elements — elements with
display:none,visibility:hidden,opacity:0, orfont-size:0in inline styles - Off-screen positioned elements —
position:absolutewith large negativeleftortopvalues - Noscript tags —
<noscript>blocks (invisible in browsers with JS enabled, visible to parsers) - Hidden inputs —
<input type="hidden">and hidden textareas - Suspicious JSON-LD blocks —
<script type="application/ld+json">containing instruction-like patterns (legitimate schema.org data is preserved) - Invisible iframes —
<iframe>elements with zero dimensions ordisplay:none
Edge-Native Semantic Classifier
This feature is behind the
FEATURE_PHASE6_SEMANTIC_AI feature flag and may
not be enabled in all deployments. Operators can control whether AI inference
runs on their traffic via this flag.Llama 3.1 8B FP8).
For each tools/call JSON response, the classifier runs concurrently with static regex scanning via Promise.all. It analyzes the full text content of tool responses for manipulative intent that lacks structural signatures — natural-language persuasion, contextually embedded bias, and conversational-tone poisoning.
Key design constraints:
- 1-second timeout — the classifier fails open if Workers AI does not respond in time, ensuring it never blocks the proxy pipeline
- Confidence threshold — only verdicts with ≥ 0.80 confidence trigger a threat advisory
- Warn-first model — when a semantic threat is detected, ThornGuard injects a
[SYSTEM ADVISORY]block into the response rather than dropping the payload, preserving context integrity while alerting the reading AI assistant - Feature-gated — controllable via
FEATURE_PHASE6_SEMANTIC_AIfor operators who want to disable AI inference costs
The semantic classifier complements static pattern matching but does not
replace it. Static rules provide deterministic, zero-latency coverage for
known attack structures. The classifier catches novel semantic manipulation
that evades regex. Together, they provide layered coverage across both
structural and semantic attack surfaces.
ANSI/VT Control Character Sanitization
ThornGuard strips terminal escape sequences from all tool definitions and tool responses before any other content scanning. This prevents attacks that use ANSI control codes to hide malicious content from human reviewers while still being processed by the LLM. Stripped sequences include:- C0 control codes (U+0000–U+001F, excluding tab/newline/carriage return)
- C1 control codes (U+007F)
- CSI sequences —
ESC[...cursor movement, color, and formatting codes - OSC sequences —
ESC]...operating system commands (e.g., terminal title manipulation) - Other ESC-initiated sequences
Tool Intelligence & TOFU Schema Pinning
ThornGuard maintains an inventory of every upstream tool it observes, automatically scoring risk and detecting unauthorized schema changes.Automatic Risk Scoring
Each tool is assigned a risk level based on its name and MCP annotations:| Signal | Risk Level |
|---|---|
destructiveHint annotation set to true | High |
Name contains delete, remove, destroy, drop, exec, shell, bash, deploy, publish, write | High |
openWorldHint annotation set to true | High |
Name contains create, update, modify, edit, push, send, post, put, upload | Medium |
readOnlyHint annotation set to true | Low |
| All other tools | Low |
riskLevel) and the approval workflow (profiles can require approval for high-risk tools).
Trust-On-First-Use (TOFU) Schema Pinning
The first time ThornGuard observes a tool from an upstream server, it computes SHA-256 hashes of the tool’sinputSchema and outputSchema and pins them as the baseline.
On subsequent observations, ThornGuard compares the current schema hashes against the pinned baseline:
- Match: Normal operation continues.
- Mismatch: A
TOOL_SCHEMA_DRIFTaudit event is logged with the old and new hashes. This catches “rug pull” attacks where an upstream server silently changes a tool’s behavior after initial trust is established.
observerecords drift without blocking executionreview_requiredsurfaces the change as a trust-review event and can block execution until reviewquarantinepauses execution for the drifted tool until an operator re-pins the current definition
Tool Inventory
All observed tools are tracked per-license with:- Tool name, title, and annotations
- Current and pinned schema hashes
- Computed risk level
- First-seen and last-seen timestamps
Behavioral Anomaly Detection
ThornGuard uses statistical methods to detect abnormal usage patterns that may indicate a compromised client, credential stuffing, or slow-burn exfiltration attacks.EWMA (Exponentially Weighted Moving Average)
Tracks call frequency and payload size per tool with configurable α (decay factor) and σ (standard deviation threshold). Requests that exceed the z-score threshold are flagged.Page-Hinkley Drift Detector
An O(1) memory change detection algorithm that catches gradual behavioral shifts — for example, an attacker slowly increasing payload sizes over hundreds of requests to stay below fixed thresholds.Markov Chain Sequence Analysis
Builds a transition probability matrix from observed tool call sequences. Flags statistically improbable orderings — for example, aread → delete transition without an intervening confirmation step, if that pattern has never been observed before.
Composite Risk Scoring
Frequency, payload size, and sequence anomaly signals are combined into a weighted composite score. During a configurable learning window (default 24 hours), anomalies are logged asANOMALY_DETECTED audit events but do not block traffic. After the learning period, operators can configure enforcement thresholds.
Egress PII Redaction
If an AI tool successfully executes, the upstream server returns data. Often, this data contains PII (Personally Identifiable Information) or system credentials that should never be fed into a third-party LLM’s context window.The SSE Streaming Challenge
Modern MCP servers use Server-Sent Events (SSE) to stream data back to the client. This means data arrives in fragmented, unpredictable network chunks. Standard regex fails on streams because a secret (like an SSN) might be split in half across two network packets (e.g.,000-00- and 0000).
Buffered SSE Redaction
ThornGuard implements an advancedTransformStream buffer:
- It holds incoming network packets in memory until it detects a complete SSE event boundary (
\n\n). - It parses complete
data:events as JSON when possible. - It rewrites upstream origin references back to ThornGuard’s public origin.
- It redacts built-in PII and can apply enterprise custom regex rules.
- It streams the scrubbed event back to the client without buffering the full response.
Supported Redactions
ThornGuard actively scrubs the following patterns from both outbound request params and inbound server responses:| Pattern | Replacement Tag | Details |
|---|---|---|
| Email addresses | [THORNGUARD REDACTED EMAIL] | Standard email format detection |
| Social Security Numbers | [THORNGUARD REDACTED SSN] | XXX-XX-XXXX format with area code validation (rejects invalid 000, 666, and 9xx series, group 00, serial 0000) |
| Phone numbers | [THORNGUARD REDACTED PHONE] | Requires separators (e.g., 555-123-4567) to avoid false positives on numeric IDs |
| Credit cards | [THORNGUARD REDACTED CREDIT CARD] | IIN prefix validation (Visa, Mastercard, Amex, Discover, JCB, UnionPay, Diners Club) + Luhn checksum — not naive digit matching |
| AWS Access Keys | [THORNGUARD REDACTED AWS KEY] | Detects AKIA, ABIA, ACCA, and ASIA prefixed keys |
| GCP API Keys | [THORNGUARD REDACTED GCP KEY] | Google Cloud credential patterns |
| GitHub Tokens | [THORNGUARD REDACTED GITHUB TOKEN] | ghp_, gho_, ghs_, ghr_ prefixed tokens |
| Slack Tokens | [THORNGUARD REDACTED SLACK TOKEN] | xoxb-, xoxp-, xoxs- prefixed tokens |
| Private Keys | [THORNGUARD REDACTED PRIVATE KEY] | -----BEGIN ... PRIVATE KEY----- blocks |
| JWTs | [THORNGUARD REDACTED JWT] | eyJ... three-segment Base64 tokens |
Custom Redaction Rules
Enterprise custom redaction rules can add tenant-owned regex rules in one of two modes:auditrecords a match but leaves the content intact.redactreplaces matches with a configured replacement string.
Built-in PII redaction is always enabled and cannot be disabled by custom
rules. Custom rules layer on top of the built-in redactors.
SSRF Prevention
Because clients controlx-mcp-target-url, ThornGuard treats that value as untrusted input.
The gateway blocks:
- Literal loopback, RFC1918, link-local, metadata, and other private IP targets.
- IPv6-specific ranges: loopback (
::1), unique local addresses (fc00::/7,fd00::/8), link-local (fe80::/10), and IPv4-mapped IPv6 addresses (::ffff:x.x.x.x) that embed private IPv4 targets. - Private-looking hostnames such as
.internal,.local, and.localhost. - Public hostnames that resolve to private or metadata addresses through DNS.
- Targets that fail DNS resolution entirely when DNS-aware SSRF enforcement is enabled.
Origin Validation
For browser-based or HTTP clients that send anOrigin header, ThornGuard can enforce an allowlist of approved origins. This is primarily useful when exposing Streamable HTTP MCP traffic to web-based clients.
Allowed origins normally include:
- The public proxy origin making the request.
- The live dashboard origin on
https://thorns.qwady.io. - Any additional origins explicitly configured by deployment.
Correlation Headers
Every ThornGuard-generated error response and successful proxied response includes:x-thornguard-log-idx-thornguard-trace-id
x-thornguard-active-advisoriesx-thornguard-highest-advisory-severity
Secure Local Launch
ThornGuard now includes a first-party CLI launcher path so users can save a protected connection profile once and generate MCP client config without embedding raw secrets in visible arguments. Security goals of the launcher path:- keep ThornGuard and upstream bearer values out of generated client args
- store profile metadata locally and keep upstream secrets local to the machine
- let users print setup snippets for multiple MCP clients from one profile
- preserve the existing ThornGuard proxy contract without forcing manual header repetition everywhere
Advisory Intelligence
ThornGuard can also enrich a protected connection with opt-in, warn-first advisory signal. This is a visibility layer, not a replacement for enforcement. Current intelligence channels:- Availability intelligence from ThornGuard-observed upstream errors and official status feeds where configured
- Dependency intelligence from public package advisory data when a public repo is configured and dependency manifests can be resolved
- Vendor intelligence from high-confidence exploited-vulnerability sources
- Community threat intelligence from operator-submitted threat reports against specific target URLs or tool hashes
Community Threat Intelligence
Operators can report suspicious upstream MCP servers or specific tool hashes through the dashboard Platform page. Reports accumulate across different license holders. When a target reaches a quorum of 3 or more independent reporters, it is promoted to the global threat feed with anactive status.
Active global threats automatically generate community advisories on matching connections during the next advisory refresh cycle. This creates a crowd-sourced early-warning network — if multiple ThornGuard operators flag the same upstream, everyone benefits.
The threat feed is visible to enterprise and internal tier operators on the Platform page. Individual tier operators can still submit reports but cannot browse the global feed.
Important boundaries:
- ThornGuard does not auto-block traffic from outside intelligence in v1
- ThornGuard does not send MCP request bodies, prompts, or secrets to outside advisory providers
- ThornGuard only uses connection metadata such as repo URLs, package coordinates, vendor names, or status page URLs for those lookups
- Community threat reports only include the target URL, an optional tool hash, and a free-text reason — no request data or secrets are shared