Feedback loop¶

The feedback loop is HYDRA's self-improvement mechanism. Every 60 seconds, accumulated session data is analyzed and pushed back to the honeypot as configuration adjustments — new lures, refined prompts, and adapted personas.

How it works¶

graph LR
    S[HYDRA sessions] --> A[SessionAnalyzer]
    A --> F[FeedbackEngine]
    F --> P[AttackerProfiles]
    F --> M[AggregateMetrics]
    F --> PP[PromptPatches]
    F --> L[LureGenerator]
    L --> FS[FeedbackStore]
    FS --> |feedback.yaml| H[HYDRA FeedbackConsumer]
    H --> |runtime update| S

Pipeline components¶

SessionAnalyzer¶

Converts raw HYDRA JSONL logs into structured PDX deltas. Each session is parsed into individual events with MITRE tags, timing data, and classification metadata.

FeedbackEngine¶

The core analysis module. It processes accumulated sessions to produce:

Attacker profiles — Behavioral clustering groups sessions by TTP patterns. A session that focuses on credential harvesting (cat /etc/shadow, find *.pem) is profiled differently from one performing network reconnaissance (ss -tlnp, ip addr). Profiles influence which lures are most relevant.

Aggregate metrics — Frequency counts of TTPs, commands, MITRE tactics, and credential patterns across all sessions. These metrics drive lure prioritization — if 60% of attackers look for AWS credentials, the feedback loop ensures AWS-related decoy files are prominent.

Prompt patches — Suggested adjustments to the LLM system prompt based on observed attacker behavior. If attackers frequently trigger inconsistent responses for a specific command category, the prompt patch adds context to handle it better.

LureGenerator¶

Translates aggregate metrics into concrete filesystem changes:

New decoy files placed in frequently-explored directories
Updated credential content matching observed targeting patterns
Adjusted file timestamps to match realistic modification dates
New entries in .bash_history reflecting the persona's "daily activity"

FeedbackStore¶

Writes the accumulated feedback to data/feedback.yaml — the bridge between the PDX analysis pipeline and the live HYDRA instance.

FeedbackConsumer¶

Runs inside HYDRA and polls feedback.yaml every 60 seconds. When changes are detected:

New lure files are injected into the VFS blueprint
Prompt patches are appended to the LLM system prompt
Persona weights are adjusted based on engagement metrics
Changes take effect for the next session — active sessions are never disrupted

The feedback.yaml format¶

version: "1.1.0"
generated_at: "2026-04-10T14:30:00Z"
sessions_analyzed: 78

attacker_profiles:
  - profile_id: "credential_hunter"
    sessions: 34
    primary_tactics: ["credential-access", "discovery"]
    avg_duration_s: 127
    common_commands: ["cat /etc/shadow", "find *.pem", "ls .ssh"]

  - profile_id: "recon_scanner"
    sessions: 22
    primary_tactics: ["discovery"]
    avg_duration_s: 45
    common_commands: ["uname -a", "cat /etc/passwd", "ip addr"]

aggregate_metrics:
  top_commands:
    - command: "cat /etc/shadow"
      count: 65
      tactic: "credential-access"
    - command: "uname -a"
      count: 755
      tactic: "discovery"
  top_credentials:
    - username: "root"
      count: 1146
    - username: "sol"
      count: 213

lure_recommendations:
  - path: "/root/.config/gcloud/credentials.json"
    content: "decoy_gcp_service_account"
    reason: "Cloud credential targeting trending upward"
  - path: "/opt/app/.env.production"
    content: "decoy_database_url"
    reason: "High engagement with .env files across personas"

prompt_patches:
  - target: "system_prompt"
    action: "append"
    content: "When the user runs docker commands, simulate a Docker 24.0.7 environment with 3 running containers."
    reason: "12 sessions attempted Docker enumeration with no coherent response"

Observing the loop in action¶

The feedback loop's impact is measurable through persona engagement trends:

Before feedback: fintech_trading averaged 800 commands per session
After 48h of feedback: fintech_trading averaged 1,349 commands per session
The improvement came from two changes: adding GCP credential decoys (matching observed cloud targeting) and enriching .bash_history with more database administration commands

Configuration¶

The feedback loop is configured in hydra_link.yaml:

hydra:
  feedback_store: "data/feedback.yaml"

  auto_collect:
    enabled: true
    min_sessions: 10          # Minimum sessions before generating feedback
    pdx_destination: "training_output/hydra_collected"

The min_sessions threshold prevents feedback generation from too-small sample sizes. With fewer than 10 signal sessions, patterns may not be statistically meaningful.

Running the orchestrator¶

The feedback loop is managed by the PDX Pipeline Orchestrator:

# One-shot: analyze current logs and generate feedback
python -m pdx.pipeline.orchestrator --logs-dir logs/ --once

# Watch mode: continuously monitor for new sessions
python -m pdx.pipeline.orchestrator --logs-dir logs/ --watch

# Auto mode: triggered by HYDRA's AutoTraining
# (configured in hydra_link.yaml)

In watch mode, the orchestrator polls the logs directory for new .jsonl files and triggers the full pipeline automatically — session analysis, feedback generation, and store update.