DataRouter¶
The DataRouter is the component that turns raw events into dual-use training data. It reads each HYDRA/Burp event and classifies it into defensive, offensive, or both streams.
The dual-use mapping¶
Every MITRE ATT&CK tactic has two descriptions in the codebase — one defensive, one offensive:
| Tactic | Defensive perspective | Offensive perspective |
|---|---|---|
discovery | Detect recon and enumeration | System discovery techniques |
credential-access | Detect credential extraction | Credential harvesting methods |
privilege-escalation | Detect privesc attempts | SUID, sudo, kernel exploits |
lateral-movement | Detect SSH pivots, tunnels | How to move between systems |
persistence | Detect cron, bashrc injection | How to maintain access |
defense-evasion | Detect log clearing, timestomp | Anti-forensics techniques |
execution | Detect remote code execution | Dropper and payload techniques |
exfiltration | Detect data exfiltration | DNS/HTTP exfil methods |
collection | Detect sensitive data gathering | What to target first |
initial-access | Identify access vectors used | Exploitation of access vectors |
command-and-control | Detect C2 channels | C2 establishment techniques |
Classification logic¶
# Simplified from data_router.py
if event_type == "auth_attempt":
→ defensive + offensive
elif event_type == "command_executed":
→ always defensive
→ offensive IF matches attack pattern OR has MITRE tag
elif event_type == "injection_detected":
→ defensive only
Most command_executed events qualify for both streams. This is intentional — the same observation teaches both detection and technique.
Generated datasets¶
Defensive output¶
SFT detection patterns — instruction/output pairs that train a model to identify attacks:
{
"instruction": "A SSH user executes: `find / -perm -4000`. Identify the MITRE tactic and threat level.",
"output": "Tactic: privilege-escalation\nTechnique: SUID binary enumeration\nThreat: High\nAction: Log, alert, monitor follow-up commands."
}
DPO lure effectiveness — chosen/rejected pairs measuring which persona retains attackers:
{
"prompt": "Session: 34 commands, 3 MITRE tactics, 94s duration. Evaluate lure effectiveness.",
"chosen": "Highly productive session. 3 tactics captured. The attacker engaged deeply with the fintech credentials...",
"rejected": "The session lasted 94s. No specific recommendation."
}
Offensive output¶
SFT attack chains — observed TTPs converted into pentest instructions:
{
"instruction": "How to perform credential-access on a Linux server?",
"output": "Technique: Read AWS credentials\nCommand: `cat /root/.aws/credentials`\nContext: Post-exploitation credential harvesting."
}
RAFT kill chains — complete multi-step post-exploitation sequences from real sessions (5+ commands):
{
"instruction": "Describe a complete post-exploitation sequence on a Linux server.",
"output": "1. `uname -a` — discovery\n2. `cat /etc/passwd` — discovery\n3. `ls /root/.ssh` — credential-access\n4. `cat /root/.aws/credentials` — credential-access\n5. `find / -perm -4000` — privilege-escalation"
}
Combined output¶
ReAct dual-perspective — same command sequence analyzed from both angles:
{
"instruction": "Analyze this sequence from both offensive and defensive perspectives:\ncat /etc/shadow\nfind / -perm -4000\nsudo -l",
"output": "═══ Offensive analysis ═══\nSequence: credential extraction → SUID enumeration → sudo check...\n═══ Defensive analysis ═══\nIoC: shadow file access → SUID scan → sudo enumeration. Recommend: alert on this pattern in SIEM."
}
Running the DataRouter¶
# Split raw HYDRA logs into defensive + offensive events
python -m pdx.training.data_router split
# Generate training datasets
python -m pdx.training.data_router generate --all
# Check status
python -m pdx.training.data_router status
Output structure:
training_output/data_router/
├── split_stats.json
├── defensive/
│ ├── raw_events.jsonl (8,668 events)
│ ├── sft_detection_patterns.jsonl
│ └── dpo_lure_quality.jsonl
├── offensive/
│ ├── raw_events.jsonl (4,910 events)
│ ├── sft_attack_chains.jsonl
│ └── raft_kill_chains.jsonl
└── combined/
└── react_dual_perspective.jsonl