Fine-tuning¶
PDX generates training datasets that can be used to fine-tune LLMs for cybersecurity tasks — both defensive (detection, alerting) and offensive (pentest assistance, TTP reconstruction).
Requirements¶
| Resource | Minimum | Recommended |
|---|---|---|
| GPU VRAM | 8 GB | 16 GB+ |
| RAM | 16 GB | 32 GB |
| Disk | 20 GB | 50 GB |
| Framework | Unsloth | Unsloth |
Supported base models¶
| Model | Size | Best for |
|---|---|---|
| Qwen 2.5 | 7B / 14B | Fast iteration, good multilingual |
| Llama 3.3 | 8B / 70B | English-focused, strong reasoning |
Quick fine-tune¶
python training/finetune_pdx.py \
--dataset training_output/data_router/defensive/sft_detection_patterns.jsonl \
--model qwen \
--epochs 3 \
--rank 16
Options¶
| Flag | Default | Description |
|---|---|---|
--dataset | — | Path to JSONL training data |
--model | qwen | Base model (qwen or llama) |
--epochs | 3 | Training epochs |
--rank | 16 | LoRA rank (higher = more capacity, more VRAM) |
--resume | false | Resume from checkpoint |
Training a defensive model¶
# Generate defensive datasets first
python -m pdx.training.data_router generate --defensive
# Train on detection patterns
python training/finetune_pdx.py \
--dataset training_output/data_router/defensive/sft_detection_patterns.jsonl \
--model qwen --epochs 5 --rank 16
# Train on lure effectiveness (DPO)
python training/finetune_pdx.py \
--dataset training_output/data_router/defensive/dpo_lure_quality.jsonl \
--model qwen --epochs 3 --rank 16
After training, the model can:
- Identify MITRE ATT&CK tactics from SSH command sequences
- Score the threat level of observed commands
- Evaluate which persona configurations are most effective
Training an offensive model¶
# Generate offensive datasets first
python -m pdx.training.data_router generate --offensive
# Train on attack chains
python training/finetune_pdx.py \
--dataset training_output/data_router/offensive/sft_attack_chains.jsonl \
--model llama --epochs 5 --rank 32
# Train on kill chains (RAFT)
python training/finetune_pdx.py \
--dataset training_output/data_router/offensive/raft_kill_chains.jsonl \
--model llama --epochs 3 --rank 32
After training, the model can:
- Reconstruct post-exploitation sequences from partial observations
- Suggest next steps in a pentest based on current position
- Map observed commands to MITRE ATT&CK techniques
Training a dual-perspective model¶
# Generate combined datasets
python -m pdx.training.data_router generate --combined
# Train on dual-perspective analysis
python training/finetune_pdx.py \
--dataset training_output/data_router/combined/react_dual_perspective.jsonl \
--model qwen --epochs 5 --rank 32
This produces a model that can analyze the same command sequence from both offensive and defensive perspectives — the most unique output of the PDX pipeline.
VRAM management¶
The fine-tuning script includes automatic VRAM safety checks:
[VRAM] Total: 15.8 GB | Used: 2.1 GB | Free: 13.7 GB
[VRAM] Estimated requirement for Qwen-7B + LoRA rank 16: ~6 GB
[VRAM] Status: OK — proceeding
If Ollama is running, the script will warn you — Ollama and fine-tuning compete for VRAM.
For limited VRAM (8GB)
Use --rank 8 and --model qwen (7B). This fits in 6GB VRAM with room for the training batch.