Skip to content

Architecture Overview

CloneGuard operates as an independent trust boundary between the AI coding agent and the commands it executes. It cannot be disabled by repository content because it runs at the hook layer, outside the agent's control.

Defense Layers

Repository files
       |
  Layer 0: Pre-execution scan (before agent launches)
       |
  Agent starts
       |
  Layer 1: InstructionsLoaded (scans CLAUDE.md, rules files)
       |
  Agent works...
       |
  Layer 2: PostToolUse (scans tool output for injected instructions)
       |
  Layer 3: PreToolUse (gates writes, builds, config changes)
       |
  Tool executes (or is blocked/constrained)

Layer 0: Pre-Execution Wrapper

Scans all high-priority and medium-priority files in the repository before the agent starts. Blocks agent launch if critical findings are detected.

Cannot be disabled by repository content -- executes before the agent reads any files.

Layer 1: InstructionsLoaded

Scans instruction files (CLAUDE.md, .claude/rules/*.md, .cursorrules) when the agent loads them. Uses STRICT scan mode -- HIGH severity findings result in a hard block.

Layer 2: PostToolUse

Scans all tool output for injection patterns. Catches payloads injected via tool results -- for example, a cat command reading a malicious file that contains prompt injection.

Layer 3: PreToolUse

Gates dangerous operations before execution:

  • Blocks writes to protected paths (~/.claude/settings.json, trust stores)
  • Scans content being written to sensitive files (package.json, Makefile)
  • Warns on build commands (npm install, pip install, cargo build)
  • Blocks allowlist manipulation (cloneguard allow in Bash)
  • Blocks bypass attempts (cloneguard --bypass, claude --bypass)

Detection Signals

Three independent signals are computed for each tool call:

Signal Method Speed Strengths
Pattern 240 compiled regex rules <50ms Fast, predictable, catches known patterns
Semantic MiniLM-L6-v2 ONNX classifier ~16ms/sample Catches synonym substitution, encoding evasion, social engineering
Behavioral CaMeL-lite session-wide monitoring <0.5ms/event Catches multi-step sequences (read credentials, then exfiltrate)

Signals are evaluated independently -- any signal crossing its threshold is sufficient to raise a detection. See Detection Engine for details.

Enforcement Pipeline

Detection signals (evaluated independently)
       |
  Verdict: SAFE / SUSPICIOUS / MALICIOUS
       |
  Policy engine (YAML rules, per-tool overrides)
       |
  Action: allow / constrain (sandbox) / block

When enforcement is enabled, SUSPICIOUS verdicts can constrain the tool call via OS-level sandboxing (Landlock on Linux, Seatbelt on macOS) without affecting CloneGuard itself. See Enforcement for details.

Default mode is dry-run: detect and log, do not enforce.

Audit Trail

Every detection event emits structured NDJSON logs -- one line per event, machine-readable. See Audit for details.