Skip to content

Delta Vector 16D

Every security observation in PDX is encoded as a 16-dimensional vector, with each dimension scored between 0.0 and 1.0. This is what makes PDX datasets fundamentally different from flat label systems.

The 16 dimensions

# Dimension What it measures High value means
1 severity Raw gravity of the observation Critical finding
2 confidence Certainty of the verdict Multiple tiers agreed
3 exploitability Ease of real-world exploitation Script-kiddie exploitable
4 auth_relevance Impact on authentication/authorization Auth bypass possible
5 data_exposure Level of sensitive data exposed PII, credentials, keys
6 injection_surface Available injection surface area Multiple injection points
7 config_weakness Configuration weakness detected Default/weak config
8 crypto_weakness Cryptographic weakness Broken or weak crypto
9 logic_flaw Application logic vulnerability Business logic bypass
10 timing_anomaly Exploitable timing difference Timing side-channel
11 version_risk Risk from known-vulnerable version Unpatched CVE
12 chain_potential Chainability with other deltas Useful in exploit chain
13 persistence Post-exploitation persistence capability Can maintain access
14 noise_level False positive probability Likely false positive
15 novelty How new/unusual the technique is Never seen before
16 context_dependency How much context affects exploitability Stack-dependent

Why 16 dimensions

A traditional vulnerability scanner outputs: "XSS, severity: high." That's a single label.

A PDX delta for the same finding encodes: severity 0.8, confidence 0.7, exploitability 0.9, chain_potential 0.85 (because there's also a missing HttpOnly cookie), noise_level 0.15, novelty 0.3. The model doesn't just learn "this is XSS" — it learns the full semantics of the observation.

The same vector serves both training streams:

  • Defensive: a high noise_level means "be careful, this might be a false positive"
  • Offensive: a high chain_potential means "this vulnerability alone is medium, but combined with others it becomes critical"

Fingerprint Vector (FP_LABELS)

In addition to the delta vector, PDX also captures a 16-dimension fingerprint of the target environment:

# Dimension What it measures
1 stack_complexity How complex the technology stack is
2 exposure_surface External attack surface size
3 auth_sophistication Quality of auth implementation
4 waf_strength WAF/filtering effectiveness
5 patch_recency How recently patched
6 api_surface API endpoint count and exposure
7 crypto_maturity Quality of crypto implementation
8 error_verbosity How much info errors leak
9 session_strength Session management quality
10 input_validation Input validation thoroughness
11 infrastructure_age How old the infrastructure is
12 monitoring_presence Whether monitoring is detected
13 cdn_proxy_layers CDN/proxy layers present
14 custom_code_ratio Custom vs framework code
15 documentation_leak Internal docs exposed
16 historical_vuln_density Past vulnerability density

The cross-product Delta × Fingerprint creates a 32-dimensional space that captures both "what was found" and "in what context" — enabling models to learn that the same vulnerability has different implications depending on the target environment.

Normalization goal

The long-term goal is to establish the .pdx delta vector as an open standard for cybersecurity training data. A .pdx file is:

  • Readable by the multi-model router
  • Exportable to JSONL for fine-tuning
  • Compatible with the Burp Suite bridge
  • Usable with any LoRA fine-tuning framework (Unsloth, PEFT, axolotl)