Architecture overview¶
HYDRA × PDX is a three-layer system: capture, routing, and output, connected by a continuous feedback loop.
Layer 1 — Capture¶
Two independent data sources feed the pipeline:
HYDRA (passive)¶
An SSH honeypot exposed on port 2222. Attackers connect, authenticate (any credentials accepted), and interact with a simulated Linux environment. Every command they type is processed by llama-3.3-70b (via Groq API) which generates contextual responses in real time.
HYDRA is not a static replay system. It maintains a mutable virtual filesystem per session (Copy-on-Write), rotates three personas (fintech_trading, crypto_validator, corp_ad), and includes deep anti-fingerprinting to defeat standard honeypot detection techniques.
The command router handles a 9-step pipeline: sanitize → PromptGuard → expand → split → pipes → classify → execute → mutate VFS → log. 65+ built-in commands handle common utilities natively; everything else is routed to the LLM with full VFS context.
Each session produces a JSONL file with structured events: auth_attempt, session_start, command_executed, injection_detected, session_end.
Burp Suite bridge (active)¶
A Java extension + Python proxy that connects PDX to Burp Suite. During a web pentest, every HTTP request/response pair is analyzed and converted into the same .pdx delta format used by HYDRA.
This means the same pipeline processes both passive honeypot data and active pentest findings.
Layer 2 — Routing¶
The SessionClassifier first separates signal from noise — only 2.2% of sessions (human-like interactions) pass through. It identifies 5 categories of automated traffic (bot_ephemeral, bot_exec_scanner, bot_dropper, bot_recon, unclassified) and one signal category (likely_human).
The DataRouter then reads each event and classifies it into one or more streams:
- Defensive — every
command_executedandauth_attemptgoes here. Used for detection pattern training. - Offensive — commands matching known attack patterns (credential access, privilege escalation, lateral movement) go here. Used for attack chain reconstruction.
- Both — most events qualify for both streams. The same
cat /etc/shadowis both a detection alert (defensive) and a credential extraction technique (offensive).
The classification is based on a MITRE ATT&CK mapping with 11 tactics, each having a dual description in the codebase.
Layer 3 — Output¶
Seven training generators produce structured datasets:
| Generator | Format | Purpose |
|---|---|---|
| SFT detection | Instruction/output pairs | Train models to identify attack patterns |
| DPO lure quality | Chosen/rejected pairs | Measure which persona retains attackers longest |
| SFT attack chains | Instruction/output pairs | Reconstruct offensive TTPs |
| RAFT kill chains | Multi-step sequences | Complete post-exploitation sequences |
| ReAct dual | Thought/Action/Observation | Dual-perspective analysis |
| CoT | Chain-of-thought | 5+ step logical reasoning with CWE/CVE refs |
| JS analysis | Code/verdict pairs | Client-side vulnerability detection |
Before export, everything passes through a Quality Pipeline: trigram-based deduplication, quality filtering (with human-validation weighting), curriculum ordering (simple examples first, complex kill chains last), and temporal decay (90-day half-life).
Fine-tuning runs locally via Unsloth with LoRA adapters on Qwen or Llama models.
The feedback loop¶
The pipeline is not one-directional. The PDX Pipeline Orchestrator coordinates a continuous feedback cycle:
- SessionAnalyzer parses HYDRA logs into structured deltas
- FeedbackEngine clusters attacker behavior into profiles and computes aggregate metrics
- LureGenerator translates metrics into concrete filesystem changes (new decoy files, updated credentials, richer
.bash_history) - FeedbackStore writes
feedback.yaml - FeedbackConsumer (inside HYDRA) polls every 60s and injects updates into the live VFS
The orchestrator can run in three modes: one-shot (--once), watch mode (--watch for continuous monitoring), or auto mode (triggered by HYDRA's AutoTraining when min_sessions threshold is reached).
The result: the honeypot improves itself continuously. After 48h of feedback, the fintech persona's average engagement rose from ~800 to 1,349 commands per session.
Multi-model analysis¶
PDX doesn't rely on a single LLM. A 4-tier cascade analyzes each security delta:
Tier 1: Copilot local (7B) — fast first-pass on every delta
↓ uncertain?
Tier 2: Teacher local (32B) — detailed second-pass
↓ still uncertain?
Tier 3: Anthropic API — when complexity requires it
↓ unavailable?
Tier 4: WebChat fallback — marked REQUIRES HUMAN VALIDATION
Each tier produces a verdict (VULNERABLE, NOT_VULN, INFORMATIONAL, UNCERTAIN, FALSE_POS). When tiers disagree, the conflict is flagged. Nothing is discarded.
Data enrichment¶
8 collectors enrich every observation with external context: NVD/NIST (CVEs), ExploitDB (known exploits), OWASP (web classifications), MITRE ATT&CK (tactics/techniques), Nuclei (detection signatures), CWE (weakness classifications), IETF RFCs (protocol specs), and Linux man pages (command documentation).
Infrastructure¶
| Component | Location | Tech |
|---|---|---|
| HYDRA | Google Cloud VPS | Python, Paramiko, Groq API |
| PDX | Local machine | Python, Unsloth, Ollama |
| Burp bridge | Local (during pentests) | Java extension + Python proxy |
| Documentation | Cloudflare Pages | MkDocs Material |
| Tunnel | Cloudflare (cloudflared) | SSH tunnel to VPS |