Training generators¶

PDX includes 7 training data generators, each producing a different format optimized for a specific fine-tuning objective.

Generator overview¶

Generator	Format	Source	Purpose
SFT detection	Instruction → output	Defensive	Teach models to identify attack patterns
DPO lure	Chosen vs rejected	Defensive	Measure which persona retains attackers best
SFT attack	Instruction → output	Offensive	Reconstruct offensive TTPs as pentest instructions
RAFT kill chain	Multi-step sequence	Offensive	Complete post-exploitation sequences
ReAct dual	Thought/Action/Observation	Combined	Dual-perspective analysis of same events
CoT	Chain-of-thought	Both	5+ step logical reasoning with CWE/CVE references
JS analysis	Code → verdict	Offensive	Client-side vulnerability detection in JavaScript

SFT detection (defensive)¶

Generates instruction/output pairs that teach a model to analyze SSH commands through a defensive lens.

{
  "instruction": "A SSH user executes: `cat /etc/shadow`. Identify the MITRE ATT&CK tactic and threat level.",
  "output": "Tactic: credential-access\nTechnique: Credential extraction from shadow file\nThreat: High\nAction: Log, alert, monitor for follow-up privilege escalation.",
  "source": "hydra_defensive",
  "mitre_tactic": "credential-access"
}

DPO lure effectiveness (defensive)¶

Generates preference pairs measuring persona engagement quality. The "chosen" response provides detailed analysis; the "rejected" is a shallow response.

{
  "prompt": "Session: 34 commands, 3 MITRE tactics, 217s duration. Evaluate the lure effectiveness.",
  "chosen": "Highly productive session. 3 tactics captured (discovery, credential-access, privilege-escalation). The attacker spent 217s exploring, indicating deep engagement. The AWS credential decoys were accessed. Recommendation: maintain current configuration.",
  "rejected": "The session lasted 217s. No specific recommendation."
}

SFT attack chains (offensive)¶

Converts observed attacker TTPs into structured pentest instructions.

{
  "instruction": "How to perform credential-access on a Linux server?",
  "output": "Technique: Read AWS credentials from configuration files\nCommand: `cat /root/.aws/credentials`\nContext: Post-exploitation credential harvesting in cloud environments.\nPrecautions: Always operate within scope of authorized engagement.",
  "source": "hydra_offensive_extraction",
  "mitre_tactic": "credential-access"
}

RAFT kill chains (offensive)¶

Generates complete multi-step exploitation sequences from real sessions with 5+ commands.

{
  "instruction": "Describe a complete post-exploitation sequence on a Linux server.",
  "output": "Post-exploitation sequence observed:\n1. `uname -a` — Phase: discovery\n2. `cat /etc/passwd` — Phase: discovery\n3. `ls /root/.ssh` — Phase: credential-access\n4. `cat /root/.bash_history` — Phase: credential-access\n5. `find / -perm -4000` — Phase: privilege-escalation\n6. `sudo -l` — Phase: privilege-escalation\n\nTotal: 6 steps covering 3 MITRE ATT&CK tactics.",
  "session_id": "0115acd5",
  "num_commands": 22
}

ReAct dual-perspective (combined)¶

Analyzes the same command sequence from both offensive and defensive viewpoints using the Thought → Action → Observation → Conclusion format.

CoT — Chain-of-Thought¶

Produces detailed reasoning chains with minimum 5 logical steps, referencing CWEs and CVEs when applicable. Includes chain templates — pre-built multi-vulnerability exploitation sequences:

Chain name	Vulnerabilities	Combined severity
Cross-origin session manipulation	CORS + missing CSRF + SameSite=None	0.85
Apache path traversal to RCE	Apache 2.4.49 + /cgi-bin/	0.95
XSS to session hijack	Reflected XSS + no HttpOnly + no CSP	0.90
SSRF whitelist bypass	Open redirect + SSRF	0.85
File upload to web shell	PHP upload + dir listing + PHP exec	0.95
JWT algorithm none + IDOR	JWT `alg:none` + IDOR	0.95

JS analysis¶

Analyzes client-side JavaScript code for vulnerabilities:

Scenario	Type	Verdict	Severity
Hardcoded API key in JS	`CREDENTIAL_EXPOSURE`	VULNERABLE	0.9
`innerHTML = location.hash`	`DOM_XSS`	VULNERABLE	0.85
HTTP fetch with credentials	`INSECURE_TRANSPORT`	VULNERABLE	0.7
JWT in localStorage	`INSECURE_STORAGE`	INFORMATIONAL	0.4
Recursive merge without proto check	`PROTOTYPE_POLLUTION`	UNCERTAIN	0.6
Blacklist-only XSS sanitization	`INSUFFICIENT_SANITIZATION`	VULNERABLE	0.8
Proper CSRF token implementation	`CSRF_PROTECTION`	NOT_VULN	0.05

The JS generator also includes false positive scenarios — examples where the observation looks like a vulnerability but isn't. This trains models to avoid over-reporting.