DataRouter¶

The DataRouter is the component that turns raw events into dual-use training data. It reads each HYDRA/Burp event and classifies it into defensive, offensive, or both streams.

The dual-use mapping¶

Every MITRE ATT&CK tactic has two descriptions in the codebase — one defensive, one offensive:

Tactic	Defensive perspective	Offensive perspective
`discovery`	Detect recon and enumeration	System discovery techniques
`credential-access`	Detect credential extraction	Credential harvesting methods
`privilege-escalation`	Detect privesc attempts	SUID, sudo, kernel exploits
`lateral-movement`	Detect SSH pivots, tunnels	How to move between systems
`persistence`	Detect cron, bashrc injection	How to maintain access
`defense-evasion`	Detect log clearing, timestomp	Anti-forensics techniques
`execution`	Detect remote code execution	Dropper and payload techniques
`exfiltration`	Detect data exfiltration	DNS/HTTP exfil methods
`collection`	Detect sensitive data gathering	What to target first
`initial-access`	Identify access vectors used	Exploitation of access vectors
`command-and-control`	Detect C2 channels	C2 establishment techniques

Classification logic¶

# Simplified from data_router.py
if event_type == "auth_attempt":
    → defensive + offensive

elif event_type == "command_executed":
    → always defensive
    → offensive IF matches attack pattern OR has MITRE tag

elif event_type == "injection_detected":
    → defensive only

Most command_executed events qualify for both streams. This is intentional — the same observation teaches both detection and technique.

Generated datasets¶

Defensive output¶

SFT detection patterns — instruction/output pairs that train a model to identify attacks:

{
  "instruction": "A SSH user executes: `find / -perm -4000`. Identify the MITRE tactic and threat level.",
  "output": "Tactic: privilege-escalation\nTechnique: SUID binary enumeration\nThreat: High\nAction: Log, alert, monitor follow-up commands."
}

DPO lure effectiveness — chosen/rejected pairs measuring which persona retains attackers:

{
  "prompt": "Session: 34 commands, 3 MITRE tactics, 94s duration. Evaluate lure effectiveness.",
  "chosen": "Highly productive session. 3 tactics captured. The attacker engaged deeply with the fintech credentials...",
  "rejected": "The session lasted 94s. No specific recommendation."
}

Offensive output¶

SFT attack chains — observed TTPs converted into pentest instructions:

{
  "instruction": "How to perform credential-access on a Linux server?",
  "output": "Technique: Read AWS credentials\nCommand: `cat /root/.aws/credentials`\nContext: Post-exploitation credential harvesting."
}

RAFT kill chains — complete multi-step post-exploitation sequences from real sessions (5+ commands):

{
  "instruction": "Describe a complete post-exploitation sequence on a Linux server.",
  "output": "1. `uname -a` — discovery\n2. `cat /etc/passwd` — discovery\n3. `ls /root/.ssh` — credential-access\n4. `cat /root/.aws/credentials` — credential-access\n5. `find / -perm -4000` — privilege-escalation"
}

Combined output¶

ReAct dual-perspective — same command sequence analyzed from both angles:

{
  "instruction": "Analyze this sequence from both offensive and defensive perspectives:\ncat /etc/shadow\nfind / -perm -4000\nsudo -l",
  "output": "═══ Offensive analysis ═══\nSequence: credential extraction → SUID enumeration → sudo check...\n═══ Defensive analysis ═══\nIoC: shadow file access → SUID scan → sudo enumeration. Recommend: alert on this pattern in SIEM."
}

Running the DataRouter¶

# Split raw HYDRA logs into defensive + offensive events
python -m pdx.training.data_router split

# Generate training datasets
python -m pdx.training.data_router generate --all

# Check status
python -m pdx.training.data_router status

Output structure:

training_output/data_router/
├── split_stats.json
├── defensive/
│   ├── raw_events.jsonl          (8,668 events)
│   ├── sft_detection_patterns.jsonl
│   └── dpo_lure_quality.jsonl
├── offensive/
│   ├── raw_events.jsonl          (4,910 events)
│   ├── sft_attack_chains.jsonl
│   └── raft_kill_chains.jsonl
└── combined/
    └── react_dual_perspective.jsonl