Skip to content

Architecture overview

HYDRA × PDX is a three-layer system: capture, routing, and output, connected by a continuous feedback loop.

Layer 1 — Capture

Two independent data sources feed the pipeline:

HYDRA (passive)

An SSH honeypot exposed on port 2222. Attackers connect, authenticate (any credentials accepted), and interact with a simulated Linux environment. Every command they type is processed by llama-3.3-70b (via Groq API) which generates contextual responses in real time.

HYDRA is not a static replay system. It maintains a mutable virtual filesystem per session (Copy-on-Write), rotates three personas (fintech_trading, crypto_validator, corp_ad), and includes deep anti-fingerprinting to defeat standard honeypot detection techniques.

The command router handles a 9-step pipeline: sanitize → PromptGuard → expand → split → pipes → classify → execute → mutate VFS → log. 65+ built-in commands handle common utilities natively; everything else is routed to the LLM with full VFS context.

Each session produces a JSONL file with structured events: auth_attempt, session_start, command_executed, injection_detected, session_end.

Burp Suite bridge (active)

A Java extension + Python proxy that connects PDX to Burp Suite. During a web pentest, every HTTP request/response pair is analyzed and converted into the same .pdx delta format used by HYDRA.

This means the same pipeline processes both passive honeypot data and active pentest findings.

Layer 2 — Routing

The SessionClassifier first separates signal from noise — only 2.2% of sessions (human-like interactions) pass through. It identifies 5 categories of automated traffic (bot_ephemeral, bot_exec_scanner, bot_dropper, bot_recon, unclassified) and one signal category (likely_human).

The DataRouter then reads each event and classifies it into one or more streams:

  • Defensive — every command_executed and auth_attempt goes here. Used for detection pattern training.
  • Offensive — commands matching known attack patterns (credential access, privilege escalation, lateral movement) go here. Used for attack chain reconstruction.
  • Both — most events qualify for both streams. The same cat /etc/shadow is both a detection alert (defensive) and a credential extraction technique (offensive).

The classification is based on a MITRE ATT&CK mapping with 11 tactics, each having a dual description in the codebase.

Layer 3 — Output

Seven training generators produce structured datasets:

Generator Format Purpose
SFT detection Instruction/output pairs Train models to identify attack patterns
DPO lure quality Chosen/rejected pairs Measure which persona retains attackers longest
SFT attack chains Instruction/output pairs Reconstruct offensive TTPs
RAFT kill chains Multi-step sequences Complete post-exploitation sequences
ReAct dual Thought/Action/Observation Dual-perspective analysis
CoT Chain-of-thought 5+ step logical reasoning with CWE/CVE refs
JS analysis Code/verdict pairs Client-side vulnerability detection

Before export, everything passes through a Quality Pipeline: trigram-based deduplication, quality filtering (with human-validation weighting), curriculum ordering (simple examples first, complex kill chains last), and temporal decay (90-day half-life).

Fine-tuning runs locally via Unsloth with LoRA adapters on Qwen or Llama models.

The feedback loop

The pipeline is not one-directional. The PDX Pipeline Orchestrator coordinates a continuous feedback cycle:

  1. SessionAnalyzer parses HYDRA logs into structured deltas
  2. FeedbackEngine clusters attacker behavior into profiles and computes aggregate metrics
  3. LureGenerator translates metrics into concrete filesystem changes (new decoy files, updated credentials, richer .bash_history)
  4. FeedbackStore writes feedback.yaml
  5. FeedbackConsumer (inside HYDRA) polls every 60s and injects updates into the live VFS

The orchestrator can run in three modes: one-shot (--once), watch mode (--watch for continuous monitoring), or auto mode (triggered by HYDRA's AutoTraining when min_sessions threshold is reached).

The result: the honeypot improves itself continuously. After 48h of feedback, the fintech persona's average engagement rose from ~800 to 1,349 commands per session.

Multi-model analysis

PDX doesn't rely on a single LLM. A 4-tier cascade analyzes each security delta:

Tier 1: Copilot local (7B)     — fast first-pass on every delta
   ↓ uncertain?
Tier 2: Teacher local (32B)    — detailed second-pass
   ↓ still uncertain?
Tier 3: Anthropic API          — when complexity requires it
   ↓ unavailable?
Tier 4: WebChat fallback       — marked REQUIRES HUMAN VALIDATION

Each tier produces a verdict (VULNERABLE, NOT_VULN, INFORMATIONAL, UNCERTAIN, FALSE_POS). When tiers disagree, the conflict is flagged. Nothing is discarded.

Data enrichment

8 collectors enrich every observation with external context: NVD/NIST (CVEs), ExploitDB (known exploits), OWASP (web classifications), MITRE ATT&CK (tactics/techniques), Nuclei (detection signatures), CWE (weakness classifications), IETF RFCs (protocol specs), and Linux man pages (command documentation).

Infrastructure

Component Location Tech
HYDRA Google Cloud VPS Python, Paramiko, Groq API
PDX Local machine Python, Unsloth, Ollama
Burp bridge Local (during pentests) Java extension + Python proxy
Documentation Cloudflare Pages MkDocs Material
Tunnel Cloudflare (cloudflared) SSH tunnel to VPS