Architecture overview¶

HYDRA × PDX is a three-layer system: capture, routing, and output, connected by a continuous feedback loop.

Layer 1 — Capture¶

Two independent data sources feed the pipeline:

HYDRA (passive)¶

An SSH honeypot exposed on port 2222. Attackers connect, authenticate (any credentials accepted), and interact with a simulated Linux environment. Every command they type is processed by llama-3.3-70b (via Groq API) which generates contextual responses in real time.

HYDRA is not a static replay system. It maintains a mutable virtual filesystem per session (Copy-on-Write), rotates three personas (fintech_trading, crypto_validator, corp_ad), and includes deep anti-fingerprinting to defeat standard honeypot detection techniques.

The command router handles a 9-step pipeline: sanitize → PromptGuard → expand → split → pipes → classify → execute → mutate VFS → log. 65+ built-in commands handle common utilities natively; everything else is routed to the LLM with full VFS context.

Each session produces a JSONL file with structured events: auth_attempt, session_start, command_executed, injection_detected, session_end.

Burp Suite bridge (active)¶

A Java extension + Python proxy that connects PDX to Burp Suite. During a web pentest, every HTTP request/response pair is analyzed and converted into the same .pdx delta format used by HYDRA.

This means the same pipeline processes both passive honeypot data and active pentest findings.

Layer 2 — Routing¶

The SessionClassifier first separates signal from noise — only 2.2% of sessions (human-like interactions) pass through. It identifies 5 categories of automated traffic (bot_ephemeral, bot_exec_scanner, bot_dropper, bot_recon, unclassified) and one signal category (likely_human).

The DataRouter then reads each event and classifies it into one or more streams:

Defensive — every command_executed and auth_attempt goes here. Used for detection pattern training.
Offensive — commands matching known attack patterns (credential access, privilege escalation, lateral movement) go here. Used for attack chain reconstruction.
Both — most events qualify for both streams. The same cat /etc/shadow is both a detection alert (defensive) and a credential extraction technique (offensive).

The classification is based on a MITRE ATT&CK mapping with 11 tactics, each having a dual description in the codebase.

Layer 3 — Output¶

Seven training generators produce structured datasets:

Generator	Format	Purpose
SFT detection	Instruction/output pairs	Train models to identify attack patterns
DPO lure quality	Chosen/rejected pairs	Measure which persona retains attackers longest
SFT attack chains	Instruction/output pairs	Reconstruct offensive TTPs
RAFT kill chains	Multi-step sequences	Complete post-exploitation sequences
ReAct dual	Thought/Action/Observation	Dual-perspective analysis
CoT	Chain-of-thought	5+ step logical reasoning with CWE/CVE refs
JS analysis	Code/verdict pairs	Client-side vulnerability detection

Before export, everything passes through a Quality Pipeline: trigram-based deduplication, quality filtering (with human-validation weighting), curriculum ordering (simple examples first, complex kill chains last), and temporal decay (90-day half-life).

Fine-tuning runs locally via Unsloth with LoRA adapters on Qwen or Llama models.

The feedback loop¶

The pipeline is not one-directional. The PDX Pipeline Orchestrator coordinates a continuous feedback cycle:

SessionAnalyzer parses HYDRA logs into structured deltas
FeedbackEngine clusters attacker behavior into profiles and computes aggregate metrics
LureGenerator translates metrics into concrete filesystem changes (new decoy files, updated credentials, richer .bash_history)
FeedbackStore writes feedback.yaml
FeedbackConsumer (inside HYDRA) polls every 60s and injects updates into the live VFS

The orchestrator can run in three modes: one-shot (--once), watch mode (--watch for continuous monitoring), or auto mode (triggered by HYDRA's AutoTraining when min_sessions threshold is reached).

The result: the honeypot improves itself continuously. After 48h of feedback, the fintech persona's average engagement rose from ~800 to 1,349 commands per session.

Multi-model analysis¶

PDX doesn't rely on a single LLM. A 4-tier cascade analyzes each security delta:

Tier 1: Copilot local (7B)     — fast first-pass on every delta
   ↓ uncertain?
Tier 2: Teacher local (32B)    — detailed second-pass
   ↓ still uncertain?
Tier 3: Anthropic API          — when complexity requires it
   ↓ unavailable?
Tier 4: WebChat fallback       — marked REQUIRES HUMAN VALIDATION

Each tier produces a verdict (VULNERABLE, NOT_VULN, INFORMATIONAL, UNCERTAIN, FALSE_POS). When tiers disagree, the conflict is flagged. Nothing is discarded.

Data enrichment¶

8 collectors enrich every observation with external context: NVD/NIST (CVEs), ExploitDB (known exploits), OWASP (web classifications), MITRE ATT&CK (tactics/techniques), Nuclei (detection signatures), CWE (weakness classifications), IETF RFCs (protocol specs), and Linux man pages (command documentation).

Infrastructure¶

Component	Location	Tech
HYDRA	Google Cloud VPS	Python, Paramiko, Groq API
PDX	Local machine	Python, Unsloth, Ollama
Burp bridge	Local (during pentests)	Java extension + Python proxy
Documentation	Cloudflare Pages	MkDocs Material
Tunnel	Cloudflare (cloudflared)	SSH tunnel to VPS