HYDRA × PDX¶

Dual-use cybersecurity pipeline — LLM-powered honeypot meets security training data generator.

The problem¶

In cybersecurity, a honeypot is a fake server deliberately exposed on the internet to attract attackers. You let them in, watch what they do, and learn from their techniques.

The problem is that today's honeypots are trivially detectable. An experienced attacker runs uname -r and sees the wrong kernel. Or checks /proc/1/cgroup and spots Docker traces. Tools like Cowrie — the most popular SSH honeypot — get fingerprinted in under 30 seconds.

Result: attackers disconnect instantly. Your logs are noise, not intelligence.

The hypothesis¶

What if the terminal could intelligently answer any command an attacker types — in real time, with memory, and without leaving any trace that it's fake?

And what if the captured data could automatically produce both offensive and defensive training datasets — from the same raw events?

That's what HYDRA × PDX does.

How it works¶

graph TB
    A[Attacker via SSH] --> B[HYDRA Honeypot]
    P[Pentester via Burp] --> C[Burp Extension]
    B --> D[DataRouter]
    C --> D
    D --> E[Defensive stream]
    D --> F[Offensive stream]
    D --> G[Combined ReAct]
    E --> H[Fine-tuning
Unsloth / LoRA]
    F --> H
    G --> H
    H --> |feedback.yaml| B

The system has two data sources:

Source	Type	What it captures
HYDRA	Passive	Attackers connect to a public SSH honeypot. Every command is answered by an LLM in real time. 65+ built-in commands, 3 personas, anti-fingerprinting.
Burp Suite	Active	During web pentests, HTTP deltas flow through a Java extension into the same pipeline.

Both sources produce events in the same .pdx format. Both converge into a single DataRouter that classifies each event into defensive, offensive, or both streams simultaneously.

Key numbers¶

Metric	Value
SSH sessions captured	3,508
Signal sessions (human)	78 (2.2%)
Defensive events generated	8,668
Offensive events generated	4,910
MITRE ATT&CK tactics covered	5/5
Longest session	36.3 minutes
Personas	3 (fintech, crypto, corp AD)
Built-in commands	65+
Training generators	7 formats
Data collectors	8 sources

What's in the docs¶

Architecture ¶

How the full system fits together — capture, routing, output, and feedback loop.

HYDRA ¶

The LLM-powered honeypot: 9-step command pipeline, personas, virtual filesystem, anti-fingerprinting, PromptGuard, feedback loop.

PDX ¶

The pipeline: .pdx format, Delta Vector 16D, DataRouter, Burp bridge, 7 training generators, quality pipeline.

HYDRA × PDX¶

The problem¶

The hypothesis¶

How it works¶

Key numbers¶

What's in the docs¶

Architecture ¶

HYDRA ¶

PDX ¶

Observations ¶

Guides ¶

Reference ¶

Links¶

HYDRA × PDX¶

The problem¶

The hypothesis¶

How it works¶

Key numbers¶

What's in the docs¶

Architecture¶

HYDRA¶

PDX¶

Observations¶

Guides¶

Reference¶

Links¶

Architecture ¶

HYDRA ¶

PDX ¶

Observations ¶

Guides ¶

Reference ¶