Skip to content

HYDRA × PDX

Dual-use cybersecurity pipeline — LLM-powered honeypot meets security training data generator.


The problem

In cybersecurity, a honeypot is a fake server deliberately exposed on the internet to attract attackers. You let them in, watch what they do, and learn from their techniques.

The problem is that today's honeypots are trivially detectable. An experienced attacker runs uname -r and sees the wrong kernel. Or checks /proc/1/cgroup and spots Docker traces. Tools like Cowrie — the most popular SSH honeypot — get fingerprinted in under 30 seconds.

Result: attackers disconnect instantly. Your logs are noise, not intelligence.

The hypothesis

What if the terminal could intelligently answer any command an attacker types — in real time, with memory, and without leaving any trace that it's fake?

And what if the captured data could automatically produce both offensive and defensive training datasets — from the same raw events?

That's what HYDRA × PDX does.

How it works

graph TB
    A[Attacker via SSH] --> B[HYDRA Honeypot]
    P[Pentester via Burp] --> C[Burp Extension]
    B --> D[DataRouter]
    C --> D
    D --> E[Defensive stream]
    D --> F[Offensive stream]
    D --> G[Combined ReAct]
    E --> H[Fine-tuning
Unsloth / LoRA] F --> H G --> H H --> |feedback.yaml| B

The system has two data sources:

Source Type What it captures
HYDRA Passive Attackers connect to a public SSH honeypot. Every command is answered by an LLM in real time. 65+ built-in commands, 3 personas, anti-fingerprinting.
Burp Suite Active During web pentests, HTTP deltas flow through a Java extension into the same pipeline.

Both sources produce events in the same .pdx format. Both converge into a single DataRouter that classifies each event into defensive, offensive, or both streams simultaneously.

Key numbers

Metric Value
SSH sessions captured 3,508
Signal sessions (human) 78 (2.2%)
Defensive events generated 8,668
Offensive events generated 4,910
MITRE ATT&CK tactics covered 5/5
Longest session 36.3 minutes
Personas 3 (fintech, crypto, corp AD)
Built-in commands 65+
Training generators 7 formats
Data collectors 8 sources

What's in the docs

Architecture

How the full system fits together — capture, routing, output, and feedback loop.

HYDRA

The LLM-powered honeypot: 9-step command pipeline, personas, virtual filesystem, anti-fingerprinting, PromptGuard, feedback loop.

PDX

The pipeline: .pdx format, Delta Vector 16D, DataRouter, Burp bridge, 7 training generators, quality pipeline.

Observations

What we found in 3,508 sessions: Kinsing botnets, Solana targeting, credential propagation, prompt injection via SSH.

Guides

Quick start, deployment, fine-tuning, troubleshooting.

Reference

MITRE mapping, API, configuration, FAQ, changelog.