BlastCage isolates your AI agent's execution from your credentials, memory, and files using hardware-separated Clean/Dirty zone architecture. Open source core.
Chained vulnerabilities allow attackers to execute arbitrary commands on the host with user-level permissions.
API keys, OAuth tokens, and user profiles stored in plaintext Markdown and JSON files under ~/.clawdbot/
Malicious instructions hidden in a forwarded WhatsApp message persist in memory for weeks, enabling delayed multi-turn attack chains.
Hundreds of backdoored skills on ClawHub. One proof-of-concept was downloaded by 16 developers in 7 countries within 8 hours.
| Attack Vector | Default Moltbot | With BlastCage | How |
|---|---|---|---|
| Exposed admin panels | VULNERABLE | BLOCKED | B has no external ports. A has no persistent admin UI. |
| Plaintext credential theft | VULNERABLE | BLOCKED | Credentials in encrypted Vault on B. A only gets short-lived tokens. |
| Auth bypass via reverse proxy | VULNERABLE | BLOCKED | B exposes zero network ports. No auth to bypass. |
| Persistent memory poisoning | VULNERABLE | BLOCKED | A has no persistent memory. Memory lives in B, unreachable from A. |
| Info-stealer malware | VULNERABLE | BLOCKED | Nothing to steal on A. No credentials, no history, VM destroyed after task. |
| RCE (CVE-2026-25253) | FULL COMPROMISE | CONTAINED | RCE lands in ephemeral sandbox with no credentials and allowlist egress. |
| Malicious skills/plugins | FULL COMPROMISE | CONTAINED | Malicious code runs in sandbox. Cannot reach B's data or exfiltrate to non-allowlisted hosts. |
| Indirect prompt injection | FULL COMPROMISE | MITIGATED | Cannot prevent injection, but blast radius limited to single session with no credentials. |
No architecture can fully prevent indirect prompt injection — it's a fundamental limitation of current LLM technology. What we do is reduce the blast radius: a successful attack on a default Moltbot gives the attacker everything. A successful attack through BlastCage gives them access to one ephemeral container with no credentials, no memory, no files, and an egress allowlist — for the duration of a single task.
BlastCage is an open-source trust-zone isolation architecture for AI agents. It separates agent execution (the "Dirty Zone") from your credentials, memory, and files (the "Clean Zone") using a deterministic filter gateway. Even if the agent's runtime is fully compromised, the attacker gets nothing of value.
Credentials live in an encrypted Vault on Server B (Clean Zone), which has no external network interface. The AI agent on Server A (Dirty Zone) never sees raw credentials — it only receives short-lived, scoped tokens through the filter gateway. When the task ends, the Dirty Zone is destroyed.
No architecture can fully prevent indirect prompt injection — it's a fundamental limitation of current LLM technology. What BlastCage does is reduce the blast radius: a successful injection gives the attacker access to one ephemeral container with no credentials, no memory, no files, and an egress allowlist — for the duration of a single task.
Yes. The full Clean/Filter/Dirty architecture is open source and ships as a Docker Compose project. You can deploy it on your own servers. The managed service ($9/month for founding members) handles deployment, updates, monitoring, and key management for you.
Docker isolates processes, but a containerized agent still has access to mounted credentials, persistent memory, and unrestricted egress. BlastCage enforces three-zone hardware separation: the agent's runtime physically cannot reach the credential store, and all communication passes through a deterministic (non-AI) filter gateway with JSON Schema enforcement and allowlist egress.