Skip to content

Feedback loop

The feedback loop is HYDRA's self-improvement mechanism. Every 60 seconds, accumulated session data is analyzed and pushed back to the honeypot as configuration adjustments — new lures, refined prompts, and adapted personas.

How it works

graph LR
    S[HYDRA sessions] --> A[SessionAnalyzer]
    A --> F[FeedbackEngine]
    F --> P[AttackerProfiles]
    F --> M[AggregateMetrics]
    F --> PP[PromptPatches]
    F --> L[LureGenerator]
    L --> FS[FeedbackStore]
    FS --> |feedback.yaml| H[HYDRA FeedbackConsumer]
    H --> |runtime update| S

Pipeline components

SessionAnalyzer

Converts raw HYDRA JSONL logs into structured PDX deltas. Each session is parsed into individual events with MITRE tags, timing data, and classification metadata.

FeedbackEngine

The core analysis module. It processes accumulated sessions to produce:

Attacker profiles — Behavioral clustering groups sessions by TTP patterns. A session that focuses on credential harvesting (cat /etc/shadow, find *.pem) is profiled differently from one performing network reconnaissance (ss -tlnp, ip addr). Profiles influence which lures are most relevant.

Aggregate metrics — Frequency counts of TTPs, commands, MITRE tactics, and credential patterns across all sessions. These metrics drive lure prioritization — if 60% of attackers look for AWS credentials, the feedback loop ensures AWS-related decoy files are prominent.

Prompt patches — Suggested adjustments to the LLM system prompt based on observed attacker behavior. If attackers frequently trigger inconsistent responses for a specific command category, the prompt patch adds context to handle it better.

LureGenerator

Translates aggregate metrics into concrete filesystem changes:

  • New decoy files placed in frequently-explored directories
  • Updated credential content matching observed targeting patterns
  • Adjusted file timestamps to match realistic modification dates
  • New entries in .bash_history reflecting the persona's "daily activity"

FeedbackStore

Writes the accumulated feedback to data/feedback.yaml — the bridge between the PDX analysis pipeline and the live HYDRA instance.

FeedbackConsumer

Runs inside HYDRA and polls feedback.yaml every 60 seconds. When changes are detected:

  1. New lure files are injected into the VFS blueprint
  2. Prompt patches are appended to the LLM system prompt
  3. Persona weights are adjusted based on engagement metrics
  4. Changes take effect for the next session — active sessions are never disrupted

The feedback.yaml format

version: "1.1.0"
generated_at: "2026-04-10T14:30:00Z"
sessions_analyzed: 78

attacker_profiles:
  - profile_id: "credential_hunter"
    sessions: 34
    primary_tactics: ["credential-access", "discovery"]
    avg_duration_s: 127
    common_commands: ["cat /etc/shadow", "find *.pem", "ls .ssh"]

  - profile_id: "recon_scanner"
    sessions: 22
    primary_tactics: ["discovery"]
    avg_duration_s: 45
    common_commands: ["uname -a", "cat /etc/passwd", "ip addr"]

aggregate_metrics:
  top_commands:
    - command: "cat /etc/shadow"
      count: 65
      tactic: "credential-access"
    - command: "uname -a"
      count: 755
      tactic: "discovery"
  top_credentials:
    - username: "root"
      count: 1146
    - username: "sol"
      count: 213

lure_recommendations:
  - path: "/root/.config/gcloud/credentials.json"
    content: "decoy_gcp_service_account"
    reason: "Cloud credential targeting trending upward"
  - path: "/opt/app/.env.production"
    content: "decoy_database_url"
    reason: "High engagement with .env files across personas"

prompt_patches:
  - target: "system_prompt"
    action: "append"
    content: "When the user runs docker commands, simulate a Docker 24.0.7 environment with 3 running containers."
    reason: "12 sessions attempted Docker enumeration with no coherent response"

Observing the loop in action

The feedback loop's impact is measurable through persona engagement trends:

  • Before feedback: fintech_trading averaged 800 commands per session
  • After 48h of feedback: fintech_trading averaged 1,349 commands per session
  • The improvement came from two changes: adding GCP credential decoys (matching observed cloud targeting) and enriching .bash_history with more database administration commands

Configuration

The feedback loop is configured in hydra_link.yaml:

hydra:
  feedback_store: "data/feedback.yaml"

  auto_collect:
    enabled: true
    min_sessions: 10          # Minimum sessions before generating feedback
    pdx_destination: "training_output/hydra_collected"

The min_sessions threshold prevents feedback generation from too-small sample sizes. With fewer than 10 signal sessions, patterns may not be statistically meaningful.

Running the orchestrator

The feedback loop is managed by the PDX Pipeline Orchestrator:

# One-shot: analyze current logs and generate feedback
python -m pdx.pipeline.orchestrator --logs-dir logs/ --once

# Watch mode: continuously monitor for new sessions
python -m pdx.pipeline.orchestrator --logs-dir logs/ --watch

# Auto mode: triggered by HYDRA's AutoTraining
# (configured in hydra_link.yaml)

In watch mode, the orchestrator polls the logs directory for new .jsonl files and triggers the full pipeline automatically — session analysis, feedback generation, and store update.