Skip to content

Detection Engine

CloneGuard uses three independent detection signals, each evaluated separately.

Signal 1: Pattern Matching

240 compiled regex rules across 34 categories. Each rule specifies:

  • ID -- unique identifier (e.g., IO-001, RH-003, BRW-005)
  • Regex -- compiled pattern
  • Severity -- HIGH, MEDIUM, or LOW
  • Category -- attack type classification
  • Scan mode restriction -- some rules only fire in STRICT mode to avoid false positives on regular code

Categories

Core categories cover instruction override, authority impersonation, behavioral manipulation, privilege escalation, encoding obfuscation, unicode anomalies, exfiltration, credential harvesting, environment variable hijacking, build script attacks, CI/CD poisoning, config file injection, git hook exploitation, MCP tool poisoning, reasoning hijack, markdown/SVG injection, terminal escape, memory poisoning, viral propagation, and more.

Agent-type expansion packs add domain-specific patterns:

Pack Prefix Patterns Covers
Browser agents BRW 8 DOM injection, URL redirect, cookie theft, extension abuse
Autonomous agents AUT 12 Goal hijacking, delegation abuse, sandbox escape, SSTI, deserialization
Financial agents FIN 8 Transaction manipulation, approval bypass, ledger tampering
CI/CD agents CIC 8 Workflow injection, release poisoning, secret exfiltration

Scan Modes

Rules behave differently depending on the file being scanned:

Mode Files Behavior
STRICT CLAUDE.md, .cursorrules, GEMINI.md, AGENTS.md HIGH = block; all patterns active
STANDARD README.md, package.json, Makefile HIGH = warning; CI hygiene patterns suppressed
LENIENT Test files, fixtures Severity downgraded

Signal 2: Semantic Classifier

A fine-tuned MiniLM-L6-v2 model exported to ONNX format. Runs locally with no external API calls.

Metric Value
F1 (5-fold CV) 94.34% +/- 0.77%
Recall (5-fold CV) 93.68% +/- 1.77%
Precision (5-fold CV) 96.23% +/- 0.79%
Inference speed ~16ms per sample
Model size 87 MB (ONNX)

The classifier catches attacks that regex cannot: synonym substitution, social engineering rewording, encoding evasion, homoglyphs, and counter-defensive framing.

Per-ScanMode detection thresholds:

Mode Suspicious Malicious
STRICT 0.50 0.80
STANDARD 0.65 0.88
LENIENT 0.75 0.92

Install with pip install "cloneguard[mini]".

Signal 3: Behavioral Sequences (CaMeL-lite)

Tracks tool-call patterns across the agent session. Detects multi-step attack sequences where individual steps appear benign but the combination is malicious.

Rule Detects Mode
SEQ-001 Sensitive file read followed by network exfiltration (WebFetch) Enforce
SEQ-002 Sensitive file read followed by Bash curl/wget to external URL Enforce
SEQ-005 Write to agent config files Enforce
SEQ-003 Same MCP tool called >5 times within last 10 events Advisory
SEQ-004 Write to build-sensitive target followed by build command Advisory
SEQ-006 MCP tool call following sensitive file read Advisory

SEQ-001/002/006 use session-wide typed markers -- padding with benign events between the read and exfiltration steps does not evade detection.

Signal Evaluation

The three signals are evaluated independently -- any signal crossing its threshold is sufficient to raise a detection. There is no weighted fusion layer. Each signal contributes an independent verdict:

  • Pattern signal -- high precision, low recall on novel attacks
  • Semantic signal -- strong on evasion, weaker on truncation/fragmentation
  • Behavioral signal -- orthogonal to content-based signals, catches sequences

False positive rates were calibrated against 208,127 real coding-agent sessions from published SWE-bench datasets.