Detection Engine¶
CloneGuard uses three independent detection signals, each evaluated separately.
Signal 1: Pattern Matching¶
240 compiled regex rules across 34 categories. Each rule specifies:
- ID -- unique identifier (e.g.,
IO-001,RH-003,BRW-005) - Regex -- compiled pattern
- Severity -- HIGH, MEDIUM, or LOW
- Category -- attack type classification
- Scan mode restriction -- some rules only fire in STRICT mode to avoid false positives on regular code
Categories¶
Core categories cover instruction override, authority impersonation, behavioral manipulation, privilege escalation, encoding obfuscation, unicode anomalies, exfiltration, credential harvesting, environment variable hijacking, build script attacks, CI/CD poisoning, config file injection, git hook exploitation, MCP tool poisoning, reasoning hijack, markdown/SVG injection, terminal escape, memory poisoning, viral propagation, and more.
Agent-type expansion packs add domain-specific patterns:
| Pack | Prefix | Patterns | Covers |
|---|---|---|---|
| Browser agents | BRW | 8 | DOM injection, URL redirect, cookie theft, extension abuse |
| Autonomous agents | AUT | 12 | Goal hijacking, delegation abuse, sandbox escape, SSTI, deserialization |
| Financial agents | FIN | 8 | Transaction manipulation, approval bypass, ledger tampering |
| CI/CD agents | CIC | 8 | Workflow injection, release poisoning, secret exfiltration |
Scan Modes¶
Rules behave differently depending on the file being scanned:
| Mode | Files | Behavior |
|---|---|---|
| STRICT | CLAUDE.md, .cursorrules, GEMINI.md, AGENTS.md | HIGH = block; all patterns active |
| STANDARD | README.md, package.json, Makefile | HIGH = warning; CI hygiene patterns suppressed |
| LENIENT | Test files, fixtures | Severity downgraded |
Signal 2: Semantic Classifier¶
A fine-tuned MiniLM-L6-v2 model exported to ONNX format. Runs locally with no external API calls.
| Metric | Value |
|---|---|
| F1 (5-fold CV) | 94.34% +/- 0.77% |
| Recall (5-fold CV) | 93.68% +/- 1.77% |
| Precision (5-fold CV) | 96.23% +/- 0.79% |
| Inference speed | ~16ms per sample |
| Model size | 87 MB (ONNX) |
The classifier catches attacks that regex cannot: synonym substitution, social engineering rewording, encoding evasion, homoglyphs, and counter-defensive framing.
Per-ScanMode detection thresholds:
| Mode | Suspicious | Malicious |
|---|---|---|
| STRICT | 0.50 | 0.80 |
| STANDARD | 0.65 | 0.88 |
| LENIENT | 0.75 | 0.92 |
Install with pip install "cloneguard[mini]".
Signal 3: Behavioral Sequences (CaMeL-lite)¶
Tracks tool-call patterns across the agent session. Detects multi-step attack sequences where individual steps appear benign but the combination is malicious.
| Rule | Detects | Mode |
|---|---|---|
| SEQ-001 | Sensitive file read followed by network exfiltration (WebFetch) | Enforce |
| SEQ-002 | Sensitive file read followed by Bash curl/wget to external URL | Enforce |
| SEQ-005 | Write to agent config files | Enforce |
| SEQ-003 | Same MCP tool called >5 times within last 10 events | Advisory |
| SEQ-004 | Write to build-sensitive target followed by build command | Advisory |
| SEQ-006 | MCP tool call following sensitive file read | Advisory |
SEQ-001/002/006 use session-wide typed markers -- padding with benign events between the read and exfiltration steps does not evade detection.
Signal Evaluation¶
The three signals are evaluated independently -- any signal crossing its threshold is sufficient to raise a detection. There is no weighted fusion layer. Each signal contributes an independent verdict:
- Pattern signal -- high precision, low recall on novel attacks
- Semantic signal -- strong on evasion, weaker on truncation/fragmentation
- Behavioral signal -- orthogonal to content-based signals, catches sequences
False positive rates were calibrated against 208,127 real coding-agent sessions from published SWE-bench datasets.