Skip to content

Training generators

PDX includes 7 training data generators, each producing a different format optimized for a specific fine-tuning objective.

Generator overview

Generator Format Source Purpose
SFT detection Instruction → output Defensive Teach models to identify attack patterns
DPO lure Chosen vs rejected Defensive Measure which persona retains attackers best
SFT attack Instruction → output Offensive Reconstruct offensive TTPs as pentest instructions
RAFT kill chain Multi-step sequence Offensive Complete post-exploitation sequences
ReAct dual Thought/Action/Observation Combined Dual-perspective analysis of same events
CoT Chain-of-thought Both 5+ step logical reasoning with CWE/CVE references
JS analysis Code → verdict Offensive Client-side vulnerability detection in JavaScript

SFT detection (defensive)

Generates instruction/output pairs that teach a model to analyze SSH commands through a defensive lens.

{
  "instruction": "A SSH user executes: `cat /etc/shadow`. Identify the MITRE ATT&CK tactic and threat level.",
  "output": "Tactic: credential-access\nTechnique: Credential extraction from shadow file\nThreat: High\nAction: Log, alert, monitor for follow-up privilege escalation.",
  "source": "hydra_defensive",
  "mitre_tactic": "credential-access"
}

DPO lure effectiveness (defensive)

Generates preference pairs measuring persona engagement quality. The "chosen" response provides detailed analysis; the "rejected" is a shallow response.

{
  "prompt": "Session: 34 commands, 3 MITRE tactics, 217s duration. Evaluate the lure effectiveness.",
  "chosen": "Highly productive session. 3 tactics captured (discovery, credential-access, privilege-escalation). The attacker spent 217s exploring, indicating deep engagement. The AWS credential decoys were accessed. Recommendation: maintain current configuration.",
  "rejected": "The session lasted 217s. No specific recommendation."
}

SFT attack chains (offensive)

Converts observed attacker TTPs into structured pentest instructions.

{
  "instruction": "How to perform credential-access on a Linux server?",
  "output": "Technique: Read AWS credentials from configuration files\nCommand: `cat /root/.aws/credentials`\nContext: Post-exploitation credential harvesting in cloud environments.\nPrecautions: Always operate within scope of authorized engagement.",
  "source": "hydra_offensive_extraction",
  "mitre_tactic": "credential-access"
}

RAFT kill chains (offensive)

Generates complete multi-step exploitation sequences from real sessions with 5+ commands.

{
  "instruction": "Describe a complete post-exploitation sequence on a Linux server.",
  "output": "Post-exploitation sequence observed:\n1. `uname -a` — Phase: discovery\n2. `cat /etc/passwd` — Phase: discovery\n3. `ls /root/.ssh` — Phase: credential-access\n4. `cat /root/.bash_history` — Phase: credential-access\n5. `find / -perm -4000` — Phase: privilege-escalation\n6. `sudo -l` — Phase: privilege-escalation\n\nTotal: 6 steps covering 3 MITRE ATT&CK tactics.",
  "session_id": "0115acd5",
  "num_commands": 22
}

ReAct dual-perspective (combined)

Analyzes the same command sequence from both offensive and defensive viewpoints using the Thought → Action → Observation → Conclusion format.

CoT — Chain-of-Thought

Produces detailed reasoning chains with minimum 5 logical steps, referencing CWEs and CVEs when applicable. Includes chain templates — pre-built multi-vulnerability exploitation sequences:

Chain name Vulnerabilities Combined severity
Cross-origin session manipulation CORS + missing CSRF + SameSite=None 0.85
Apache path traversal to RCE Apache 2.4.49 + /cgi-bin/ 0.95
XSS to session hijack Reflected XSS + no HttpOnly + no CSP 0.90
SSRF whitelist bypass Open redirect + SSRF 0.85
File upload to web shell PHP upload + dir listing + PHP exec 0.95
JWT algorithm none + IDOR JWT alg:none + IDOR 0.95

JS analysis

Analyzes client-side JavaScript code for vulnerabilities:

Scenario Type Verdict Severity
Hardcoded API key in JS CREDENTIAL_EXPOSURE VULNERABLE 0.9
innerHTML = location.hash DOM_XSS VULNERABLE 0.85
HTTP fetch with credentials INSECURE_TRANSPORT VULNERABLE 0.7
JWT in localStorage INSECURE_STORAGE INFORMATIONAL 0.4
Recursive merge without proto check PROTOTYPE_POLLUTION UNCERTAIN 0.6
Blacklist-only XSS sanitization INSUFFICIENT_SANITIZATION VULNERABLE 0.8
Proper CSRF token implementation CSRF_PROTECTION NOT_VULN 0.05

The JS generator also includes false positive scenarios — examples where the observation looks like a vulnerability but isn't. This trains models to avoid over-reporting.