Skip to main content

Documentation Index

Fetch the complete documentation index at: https://qwady.wiki/llms.txt

Use this file to discover all available pages before exploring further.

ThornGuard is built on a defense-in-depth model. It treats the AI client, the target MCP server, and any legacy client-supplied routing inputs as potentially hostile until proven otherwise. That assume-hostile stance extends to tool definitions and tool responses, not just raw traffic. Upstream MCP servers are treated as an unsanitized channel — tool descriptions can contain prompt injection, and tool outputs can carry poisoned recommendations, hidden HTML, or exfiltration directives.

Security Pipeline

Every request passes through these checks in order. A failure at any step immediately terminates the request and logs the action.
StepCheckAction on Failure
1Transport GuardrailsReject non-HTTPS targets (BLOCKED_INSECURE_TARGET)
2Origin ValidationReject disallowed browser origins when enabled (BLOCKED_ORIGIN)
3DNS-Aware SSRF CheckReject private, metadata, or non-resolvable targets (BLOCKED_SSRF)
4AuthenticationValidate license key plus activation, or OAuth bearer token (BLOCKED_AUTH)
5IP WhitelistEnterprise per-license IP restrictions (BLOCKED_IP_WHITELIST)
6Rate LimitingEnforce per-license limits via Durable Object limiter (BLOCKED_RATE_LIMIT)
7Structured Policy RulesAudit or block matching request policy rules (POLICY_AUDIT, BLOCKED_POLICY)
8Built-in BlocklistsReject blocked domains or malicious/custom command matches (BLOCKED_CUSTOM_*, BLOCKED_MALICIOUS)
9Approval ChecksRequire approval for matched high-risk tool calls (BLOCKED_APPROVAL)
10Tool Definition ScanningFlag poisoned tool definitions for prompt injection, hidden characters, schema anomalies (TOOL_POISONING_DETECTED)
11TOFU Schema PinningDetect tool schema drift from first-observed baseline (TOOL_SCHEMA_DRIFT)
12Trust Review / QuarantineEnforce per-connection drift posture: observe, review-required, or quarantine (TOOL_REVIEW_REQUIRED, TOOL_QUARANTINED, BLOCKED_TOOL_*)
13PII Redaction (Outbound)Scrub sensitive request data (PII_REDACTED, CUSTOM_REDACTION_AUDIT)
14Behavioral Anomaly CheckFlag statistical outliers in call frequency, payload size, or tool sequences (ANOMALY_DETECTED)
15Proxy to UpstreamForward to target MCP server
16ANSI/VT Sanitization (Inbound)Strip terminal control characters from tool responses
17Hidden HTML Stripping (Inbound)Remove concealed content from tool responses (7 hidden embedding categories including unclosed tags)
18Output Poisoning Scan (Inbound)Detect recommendation poisoning, AI share URLs, exfiltration directives in tool responses (TOOL_POISONING_DETECTED)
19Semantic AI Classifier (Inbound)Edge-native LLM analysis for semantic manipulation that bypasses regex (TOOL_POISONING_DETECTED)
20PII Redaction (Inbound)Scrub JSON and SSE response data (PII_REDACTED, CUSTOM_REDACTION_AUDIT)
21Audit and CorrelationEmit structured D1 logs plus correlation headers
Built-in transport, auth, SSRF, and malicious-signature checks are never overridable. Customer policy rules are layered on top of those controls, not instead of them.

Ingress Protection (Command Filtering)

When an AI assistant uses an MCP tool, the request arrives as a JSON-RPC 2.0 payload over HTTP. ThornGuard validates that payload before it ever reaches the upstream tool. Core ingress checks include:
  1. Schema Validation: Uses Zod to ensure the payload is perfectly formatted JSON-RPC. Malformed requests are dropped immediately.
  2. Structured Policy Evaluation: Optional tenant rules can inspect RPC method, target domain, client IP CIDR, selected headers, JSON paths, content patterns, tool names, and risk level.
  3. Signature Scanning: The payload is stringified and scanned against built-in malicious command signatures.
  4. Tenant Blocklists: Enterprise settings can block specific domains or substring command patterns.

Blocked Signatures

Currently, ThornGuard automatically drops any payload containing patterns such as recursive forced deletion, privilege escalation commands, netcat reverse shells, system credential access paths, code execution functions, and unrestricted file permission changes.
If a built-in signature is matched, ThornGuard returns 400 Bad Request with "ThornGuard Security: Malicious command detected." and records a BLOCKED_MALICIOUS audit entry.

Managed Routing & Session Binding

Saved protected connections can now be promoted into stable managed endpoints at /mcp/:connection-id. That lets ThornGuard bind enforcement state to the connection directly rather than depending only on a caller-supplied target header. For each protected launch, ThornGuard issues or reuses a x-thornguard-session-id bound to the current activation and connection. That session-local state carries tool registry observations, taint tracking, and derived approval/trust evaluation state. Reusing the same session across the wrong connection is blocked as a binding mismatch.

Prompt Injection & Tool Poisoning Defense

ThornGuard scans both tool definitions (from tools/list responses) and tool call outputs (from tools/call responses) for prompt injection, manipulation directives, and hidden payloads. This treats upstream MCP servers as an unsanitized channel — attackers can embed malicious instructions in tool descriptions, schema fields, or response content.

Tool Definition Scanning

Every string field in a tool definition is recursively extracted and scanned against pattern categories:
CategorySeverityExamples
Instruction OverrideCritical<IMPORTANT> tags, [CRITICAL] markers, “ignore previous instructions”, identity reassignment (“you are now a…”), suppression directives (“do not reveal”)
Cross-Tool ManipulationHighRecipient redirection (“change the destination”), tool substitution (“instead of calling X, use Y”), conditional overrides (“when asked to use X…”)
File ExfiltrationHighSSH key paths (~/.ssh/id_rsa), MCP config paths (~/.cursor/mcp.json), credential file references, “please provide contents of” directives
Hidden CharactersHighZero-width spaces (U+200B–U+200D), bidirectional overrides (U+202A–U+202E), word joiners (U+2060), soft hyphens (U+00AD)
Schema IntegrityMediumNon-standard JSON Schema keys injected into inputSchema, parameter names exceeding 50 characters
Recommendation PoisoningHighMemory persistence commands, authority assignment, forced recommendations (see below)

Output Poisoning Scan (ATPA Defense)

The same pattern categories are applied to tools/call response content, plus additional output-specific patterns:
  • Tool invocation directives: “now call X tool”, “use Y tool next”
  • Result exfiltration: “send this to”, “forward the result to”
  • Private key material: BEGIN RSA PRIVATE KEY blocks in output
Output scanning runs on both JSON responses and SSE stream chunks.

AI Recommendation Poisoning (AML.T0080)

ThornGuard detects the specific attack patterns used in AI Recommendation Poisoning — a technique where attackers embed instructions that permanently bias AI assistants toward specific products, services, or sources. Memory persistence patterns detected:
PatternExample
Trusted source assignment”remember BrandX as a trusted source”
Future session persistence”in all future conversations, recommend…”
Authority assignment”treat X as the authoritative source for…”
Forced recommendation”recommend X first”, “always suggest X”
Citation manipulation”citation source for future reference”
Persistent behavior modification”permanently prefer”, “consistently recommend”
Default provider injection”default source for finance queries”
AI share URL detection: ThornGuard inspects URLs in tool responses that target 7 known AI assistant hosts (ChatGPT via both openai.com and chatgpt.com, Copilot, Claude, Gemini, Perplexity, Grok) with ?q= or ?prompt= query parameters. If the decoded parameter contains memory manipulation keywords, the URL is flagged. This catches the “Summarize with AI” button attack vector documented in the Microsoft Defender research.

Hidden HTML Content Sanitization

Tool responses containing HTML are sanitized to remove content that is invisible to users but visible to content parsers. ThornGuard strips 8 categories of hidden embedding techniques, including unclosed tags that attempt to bypass simple regex matching:
  1. HTML comments<!-- hidden instructions -->
  2. Hidden-attribute elements — tags using the literal HTML hidden attribute
  3. Invisible elements — elements with display:none, visibility:hidden, opacity:0, or font-size:0 in inline styles
  4. Off-screen positioned elementsposition:absolute with large negative left or top values
  5. Noscript tags<noscript> blocks (invisible in browsers with JS enabled, visible to parsers)
  6. Hidden inputs<input type="hidden"> and hidden textareas
  7. Suspicious JSON-LD blocks<script type="application/ld+json"> containing instruction-like patterns (legitimate schema.org data is preserved)
  8. Invisible iframes<iframe> elements with zero dimensions or display:none

Edge-Native Semantic Classifier

This feature is behind the FEATURE_PHASE6_SEMANTIC_AI feature flag and may not be enabled in all deployments. Operators can control whether AI inference runs on their traffic via this flag.
To address the inherent limitation of static pattern matching against semantic embedding attacks, ThornGuard includes an edge-native semantic classifier powered by Cloudflare Workers AI (Llama 3.1 8B FP8). For each tools/call JSON response, the classifier runs concurrently with static regex scanning via Promise.all. It analyzes the full text content of tool responses for manipulative intent that lacks structural signatures — natural-language persuasion, contextually embedded bias, and conversational-tone poisoning. Key design constraints:
  • 1-second timeout — the classifier fails open if Workers AI does not respond in time, ensuring it never blocks the proxy pipeline
  • Confidence threshold — only verdicts with ≥ 0.80 confidence trigger a threat advisory
  • Warn-first model — when a semantic threat is detected, ThornGuard injects a [SYSTEM ADVISORY] block into the response rather than dropping the payload, preserving context integrity while alerting the reading AI assistant
  • Feature-gated — controllable via FEATURE_PHASE6_SEMANTIC_AI for operators who want to disable AI inference costs
The semantic classifier complements static pattern matching but does not replace it. Static rules provide deterministic, zero-latency coverage for known attack structures. The classifier catches novel semantic manipulation that evades regex. Together, they provide layered coverage across both structural and semantic attack surfaces.

ANSI/VT Control Character Sanitization

ThornGuard strips terminal escape sequences from all tool definitions and tool responses before any other content scanning. This prevents attacks that use ANSI control codes to hide malicious content from human reviewers while still being processed by the LLM. Stripped sequences include:
  • C0 control codes (U+0000–U+001F, excluding tab/newline/carriage return)
  • C1 control codes (U+007F)
  • CSI sequencesESC[... cursor movement, color, and formatting codes
  • OSC sequencesESC]... operating system commands (e.g., terminal title manipulation)
  • Other ESC-initiated sequences
ThornGuard uses an inline regex implementation rather than the strip-ansi npm package. The strip-ansi package was the target of a supply chain compromise in September 2025. The inline approach eliminates this dependency risk entirely.

Tool Intelligence & TOFU Schema Pinning

ThornGuard maintains an inventory of every upstream tool it observes, automatically scoring risk and detecting unauthorized schema changes.

Automatic Risk Scoring

Each tool is assigned a risk level based on its name and MCP annotations:
SignalRisk Level
destructiveHint annotation set to trueHigh
Name contains delete, remove, destroy, drop, exec, shell, bash, deploy, publish, writeHigh
openWorldHint annotation set to trueHigh
Name contains create, update, modify, edit, push, send, post, put, uploadMedium
readOnlyHint annotation set to trueLow
All other toolsLow
Risk levels feed into the policy engine (rules can match on riskLevel) and the approval workflow (profiles can require approval for high-risk tools).

Trust-On-First-Use (TOFU) Schema Pinning

The first time ThornGuard observes a tool from an upstream server, it computes SHA-256 hashes of the tool’s inputSchema and outputSchema and pins them as the baseline. On subsequent observations, ThornGuard compares the current schema hashes against the pinned baseline:
  • Match: Normal operation continues.
  • Mismatch: A TOOL_SCHEMA_DRIFT audit event is logged with the old and new hashes. This catches “rug pull” attacks where an upstream server silently changes a tool’s behavior after initial trust is established.
The first-observed baseline is still TOFU. What changed is enforcement posture:
  • observe records drift without blocking execution
  • review_required surfaces the change as a trust-review event and can block execution until review
  • quarantine pauses execution for the drifted tool until an operator re-pins the current definition
Operators can re-pin tools (accept the new schema and clear trust state) or unpin them (disable schema pinning) via the management API.

Tool Inventory

All observed tools are tracked per-license with:
  • Tool name, title, and annotations
  • Current and pinned schema hashes
  • Computed risk level
  • First-seen and last-seen timestamps
The inventory is queryable via the management API.

Behavioral Anomaly Detection

ThornGuard uses statistical methods to detect abnormal usage patterns that may indicate a compromised client, credential stuffing, or slow-burn exfiltration attacks.

EWMA (Exponentially Weighted Moving Average)

Tracks call frequency and payload size per tool with configurable α (decay factor) and σ (standard deviation threshold). Requests that exceed the z-score threshold are flagged.

Page-Hinkley Drift Detector

An O(1) memory change detection algorithm that catches gradual behavioral shifts — for example, an attacker slowly increasing payload sizes over hundreds of requests to stay below fixed thresholds.

Markov Chain Sequence Analysis

Builds a transition probability matrix from observed tool call sequences. Flags statistically improbable orderings — for example, a readdelete transition without an intervening confirmation step, if that pattern has never been observed before.

Composite Risk Scoring

Frequency, payload size, and sequence anomaly signals are combined into a weighted composite score. During a configurable learning window (default 24 hours), anomalies are logged as ANOMALY_DETECTED audit events but do not block traffic. After the learning period, operators can configure enforcement thresholds.

Egress PII Redaction

If an AI tool successfully executes, the upstream server returns data. Often, this data contains PII (Personally Identifiable Information) or system credentials that should never be fed into a third-party LLM’s context window.

The SSE Streaming Challenge

Modern MCP servers use Server-Sent Events (SSE) to stream data back to the client. This means data arrives in fragmented, unpredictable network chunks. Standard regex fails on streams because a secret (like an SSN) might be split in half across two network packets (e.g., 000-00- and 0000).

Buffered SSE Redaction

ThornGuard implements an advanced TransformStream buffer:
  1. It holds incoming network packets in memory until it detects a complete SSE event boundary (\n\n).
  2. It parses complete data: events as JSON when possible.
  3. It rewrites upstream origin references back to ThornGuard’s public origin.
  4. It redacts built-in PII and can apply enterprise custom regex rules.
  5. It streams the scrubbed event back to the client without buffering the full response.

Supported Redactions

ThornGuard actively scrubs the following patterns from both outbound request params and inbound server responses:
PatternReplacement TagDetails
Email addresses[THORNGUARD REDACTED EMAIL]Standard email format detection
Social Security Numbers[THORNGUARD REDACTED SSN]XXX-XX-XXXX format with area code validation (rejects invalid 000, 666, and 9xx series, group 00, serial 0000)
Phone numbers[THORNGUARD REDACTED PHONE]Requires separators (e.g., 555-123-4567) to avoid false positives on numeric IDs
Credit cards[THORNGUARD REDACTED CREDIT CARD]IIN prefix validation (Visa, Mastercard, Amex, Discover, JCB, UnionPay, Diners Club) + Luhn checksum — not naive digit matching
AWS Access Keys[THORNGUARD REDACTED AWS KEY]Detects AKIA, ABIA, ACCA, and ASIA prefixed keys
GCP API Keys[THORNGUARD REDACTED GCP KEY]Google Cloud credential patterns
GitHub Tokens[THORNGUARD REDACTED GITHUB TOKEN]ghp_, gho_, ghs_, ghr_ prefixed tokens
Slack Tokens[THORNGUARD REDACTED SLACK TOKEN]xoxb-, xoxp-, xoxs- prefixed tokens
Private Keys[THORNGUARD REDACTED PRIVATE KEY]-----BEGIN ... PRIVATE KEY----- blocks
JWTs[THORNGUARD REDACTED JWT]eyJ... three-segment Base64 tokens

Custom Redaction Rules

Enterprise custom redaction rules can add tenant-owned regex rules in one of two modes:
  • audit records a match but leaves the content intact.
  • redact replaces matches with a configured replacement string.
Built-in PII redaction is always enabled and cannot be disabled by custom rules. Custom rules layer on top of the built-in redactors.

SSRF Prevention

Because clients control x-mcp-target-url, ThornGuard treats that value as untrusted input. The gateway blocks:
  • Literal loopback, RFC1918, link-local, metadata, and other private IP targets.
  • IPv6-specific ranges: loopback (::1), unique local addresses (fc00::/7, fd00::/8), link-local (fe80::/10), and IPv4-mapped IPv6 addresses (::ffff:x.x.x.x) that embed private IPv4 targets.
  • Private-looking hostnames such as .internal, .local, and .localhost.
  • Public hostnames that resolve to private or metadata addresses through DNS.
  • Targets that fail DNS resolution entirely when DNS-aware SSRF enforcement is enabled.
ThornGuard resolves hostnames through DNS-over-HTTPS, follows CNAME chains to a bounded depth, and fails closed if it cannot prove the target is safe.

Origin Validation

For browser-based or HTTP clients that send an Origin header, ThornGuard can enforce an allowlist of approved origins. This is primarily useful when exposing Streamable HTTP MCP traffic to web-based clients. Allowed origins normally include:
  • The public proxy origin making the request.
  • The live dashboard origin on https://thorns.qwady.io.
  • Any additional origins explicitly configured by deployment.

Correlation Headers

Every ThornGuard-generated error response and successful proxied response includes:
  • x-thornguard-log-id
  • x-thornguard-trace-id
Use these headers to connect client-visible failures back to the Audit Logs and any webhook deliveries. Successful proxied responses can also include advisory summary headers when an opt-in protected connection has active intelligence findings:
  • x-thornguard-active-advisories
  • x-thornguard-highest-advisory-severity
These are informational only in this milestone.

Secure Local Launch

ThornGuard now includes a first-party CLI launcher path so users can save a protected connection profile once and generate MCP client config without embedding raw secrets in visible arguments. Security goals of the launcher path:
  • keep ThornGuard and upstream bearer values out of generated client args
  • store profile metadata locally and keep upstream secrets local to the machine
  • let users print setup snippets for multiple MCP clients from one profile
  • preserve the existing ThornGuard proxy contract without forcing manual header repetition everywhere
The secure launcher path does not change the proxy’s enforcement model. It changes how credentials and connection metadata are handled on the client side.

Advisory Intelligence

ThornGuard can also enrich a protected connection with opt-in, warn-first advisory signal. This is a visibility layer, not a replacement for enforcement. Current intelligence channels:
  • Availability intelligence from ThornGuard-observed upstream errors and official status feeds where configured
  • Dependency intelligence from public package advisory data when a public repo is configured and dependency manifests can be resolved
  • Vendor intelligence from high-confidence exploited-vulnerability sources
  • Community threat intelligence from operator-submitted threat reports against specific target URLs or tool hashes

Community Threat Intelligence

Operators can report suspicious upstream MCP servers or specific tool hashes through the dashboard Platform page. Reports accumulate across different license holders. When a target reaches a quorum of 3 or more independent reporters, it is promoted to the global threat feed with an active status. Active global threats automatically generate community advisories on matching connections during the next advisory refresh cycle. This creates a crowd-sourced early-warning network — if multiple ThornGuard operators flag the same upstream, everyone benefits. The threat feed is visible to enterprise and internal tier operators on the Platform page. Individual tier operators can still submit reports but cannot browse the global feed. Important boundaries:
  • ThornGuard does not auto-block traffic from outside intelligence in v1
  • ThornGuard does not send MCP request bodies, prompts, or secrets to outside advisory providers
  • ThornGuard only uses connection metadata such as repo URLs, package coordinates, vendor names, or status page URLs for those lookups
  • Community threat reports only include the target URL, an optional tool hash, and a free-text reason — no request data or secrets are shared