FAQ¶

General¶

Is HYDRA legal to deploy?

Yes — a honeypot is a server you own, exposed on the internet to attract connections. You're not attacking anyone; you're observing who attacks you. All decoy credentials are fictional and non-functional. However, always check your local jurisdiction's laws regarding data collection and monitoring. In the EU, GDPR applies to IP addresses — consider logging policies carefully.

Can HYDRA replace a commercial honeypot?

HYDRA is a research project, not a production-grade enterprise product. It demonstrates that LLM-powered honeypots produce higher-quality engagement data than static honeypots. For production deployment, you'd want monitoring, alerting, and hardening beyond what HYDRA currently provides.

How much does it cost to run?

The VPS costs ~5–10€/month (1 vCPU, 1GB RAM). The Groq API free tier handles moderate traffic. Under heavy bot traffic (thousands of sessions/day), you may hit rate limits — the paid Groq tier or switching to a local LLM (via Ollama) are alternatives. No GPU is needed on the VPS.

What happens if the Groq API goes down?

Built-in commands (65+) continue to work without any API dependency. Only unknown commands that require LLM generation would fail. In practice, most bot traffic only triggers built-in commands (uname, ls, cat), so a temporary API outage has minimal impact on data collection.

Technical¶

Why Groq and not OpenAI/Anthropic?

Speed. Groq's inference latency is 50–100ms for llama-3.3-70b, compared to 500–2000ms for equivalent models on other providers. In a honeypot, response latency must feel natural — too fast or too slow is suspicious. Groq hits the sweet spot.

Why not run the LLM locally?

You can, via Ollama. PDX supports local models through the LocalCopilotEngine (7B) and LocalTeacherEngine (32B). However, running a 70B model locally requires significant GPU resources, and the VPS hosting HYDRA typically doesn't have a GPU. The architecture separates capture (VPS, no GPU) from analysis (local, GPU optional).

How does HYDRA handle concurrent sessions?

Each session gets an independent VFS fork (Copy-on-Write) and a separate LLM context. Sessions are completely isolated — one attacker's actions never affect another's. The Groq API handles concurrent requests natively. Under very heavy load, the LRU cache prevents redundant API calls.

Can attackers break out of HYDRA?

No. HYDRA doesn't execute real commands — everything is simulated. The VFS is an in-memory data structure, not a real filesystem. Network commands (ssh, scp, wget) are intercepted and produce simulated output. There's no way to reach the host OS, the network, or any real resource through HYDRA.

What's the difference between .pdx and STIX/SARIF?

STIX and SARIF are reporting formats — they describe findings for human readers. The .pdx format is a training format — every observation carries a 16-dimensional vector that a model can learn from. The goal is different: .pdx optimizes for machine learning, not human reporting.

How are MITRE ATT&CK tags assigned?

Via 20+ regex/heuristic patterns in the DataRouter. For example, cat /etc/shadow matches the credential-access tactic, find / -perm -4000 matches privilege-escalation. The matching is conservative — ambiguous commands are tagged only with discovery unless they match a specific pattern.

Data & Privacy¶

Does HYDRA store attacker IP addresses?

Yes — IP addresses are logged in the JSONL session files. They're used for deduplication and behavioral analysis. If you need to comply with GDPR or similar regulations, you can hash or anonymize IPs by modifying the logger configuration.

Are the decoy credentials real?

No. All AWS keys, Solana keypairs, database passwords, and other credentials in HYDRA's personas are completely fictional. They cannot be used to access any real system. This is a fundamental ethical requirement for responsible honeypot operation.

Can I use the training data commercially?

The training data generated by PDX from HYDRA sessions is derived from attacker behavior observed on your own infrastructure. You own this data. However, consult a lawyer regarding any specific commercial use case, especially if your jurisdiction has specific rules about data collected via honeypots.

PDX Pipeline¶

Why 7 generators? Isn't SFT enough?

Different fine-tuning objectives require different data formats. SFT teaches factual associations (command → tactic). DPO teaches preferences (which persona is better). RAFT teaches multi-step reasoning (kill chains). ReAct teaches dual-perspective analysis. Each format trains a different capability. A model fine-tuned on all formats outperforms one trained on SFT alone.

How does curriculum ordering work?

Training entries are sorted from simple to complex: single-command observations first, then multi-step sequences (3–5 commands), then complex kill chains (5+ commands covering multiple MITRE tactics), and finally edge cases (false positives, prompt injections). This follows the curriculum learning principle — models converge faster and more stably when they learn easy patterns before hard ones.

What's the 90-day decay?

Older training data gradually loses weight using a half-life decay function. A vulnerability observed 6 months ago is less relevant than one observed yesterday — attack patterns evolve. The decay ensures the training set stays current. Rare, high-value observations (like the GLaDOS prompt injection) use negative decay — they become more important over time.