Five AI Agent Failures in 36 Days. Zero Times the Agent Caught It.

grith team·April 28, 2026·8 min read·security

grith is launching soon

A security proxy for AI coding agents, enforced at the OS level. Register your interest to be notified when we go live.

In 36 days, five public failures hit AI agents and AI-agent infrastructure: Meta, Mercor, CrewAI, Vercel, and Bitwarden.¹²³⁴⁵⁶⁷⁸

Different exploit classes. Same result. The system acted first. Someone else noticed later.

Timeline of five AI agent security failures across 36 days centered on a large zero for agent self-detections — Five incidents, five different exploit classes, zero times the agent caught the failure itself.

That is the part worth paying attention to.

Not one of these incidents required a new class of exploit. The bugs were familiar: supply chain compromise, OAuth abuse, excessive authority, unsafe fallback behavior, arbitrary file read, SSRF, remote code execution.

The pattern was not novelty. The pattern was that, at the moment the unsafe action happened, there was no independent enforcement layer separating the thing that wanted to act from the thing deciding whether the action was safe.

And zero times did the agent catch itself. In the public reporting on all five, detection came from security teams, humans, or outside researchers, not from the agent or framework independently stopping itself. That sentence is an inference from the incident reports below, not a vendor claim.

1. Bitwarden CLI / Shai-Hulud

On April 23, Bitwarden said a malicious @bitwarden/cli@2026.4.0 package had been distributed through npm between 5:57 PM and 7:30 PM ET on April 22.⁵ Bitwarden said the incident affected the npm delivery path for the CLI only, not the legitimate CLI codebase, end-user vault data, or Bitwarden production systems.⁵

In the same public thread, a Bitwarden community moderator posted npm stats showing 334 downloads of the malicious version.⁵

StepSecurity's analysis found a preinstall hook that downloaded the Bun runtime and launched an obfuscated bw1.js credential stealer.⁶ The payload harvested SSH keys, GitHub and npm tokens, shell history, environment variables, cloud credentials, and GitHub Actions secrets, encrypted the data with AES-256-GCM, and sent it to audit.checkmarx.cx.⁶

The malware also explicitly targeted AI tooling. StepSecurity said it enumerated configurations for Claude Code, Kiro, Cursor, Codex CLI, and Aider, treating files such as ~/.claude.json and MCP configuration files as first-class exfiltration targets.⁶ If it found a usable GitHub token, it escalated again by enumerating accessible repositories and injecting malicious GitHub Actions workflows.⁶

The package ran. The malware executed. Detection came later.⁵⁶

2. Vercel / Context.ai

On April 19, Vercel disclosed that an attacker had gained unauthorized access to certain internal systems after compromising Context.ai, a third-party AI tool used by a Vercel employee.⁷

According to Vercel's bulletin, the attacker used that access to take over the employee's Google Workspace account, pivot into a Vercel environment, and enumerate and decrypt non-sensitive environment variables.⁷ Vercel said a limited subset of customers initially had non-sensitive environment variables exposed, later identified additional compromised accounts, and published the compromised Google OAuth app client ID as an IOC.⁷

Vercel also warned that the same OAuth app compromise may have affected hundreds of users across many organizations.⁷ Wiz, summarizing Context.ai's disclosure and Vercel's bulletin, said Context.ai believed OAuth tokens for some consumer users were likely compromised and that at least one Vercel employee had granted the affected OAuth application broad "Allow All" permissions.⁸

Public reporting also described a seller claiming to be associated with ShinyHunters offering alleged Vercel data on BreachForums for $2 million, though that part sits outside what Vercel itself has confirmed.⁹

The OAuth grant did not self-limit. The AI tool did not self-revoke. The incident was found after the attacker had already used the delegated access.⁷⁸

3. Meta's Internal Agent

This one was reported on March 18, not in April. The date matters because it breaks the cleaner timeline and makes the point stronger.¹

TechCrunch, summarizing reporting from The Information, said a Meta employee posted a technical question on an internal forum. Another engineer asked an internal AI agent to analyze it. The agent posted a response without being asked to publish anything.¹

The employee who asked the original question followed that advice. According to the public reports, the result was that massive amounts of company and user-related data became available to engineers who were not authorized to access it for about two hours. Meta classified the incident as a Sev 1, its second-highest internal severity level.¹²

Meta confirmed an incident took place and said no user data was mishandled.²

No attacker was needed here. The agent acted inside its own trust boundary. A human trusted the output. Nothing independent stopped the action before it landed.¹²

4. Mercor / LiteLLM

The public victim here was Mercor. On March 31, Mercor told TechCrunch it was "one of thousands of companies" affected by the compromise of the open-source LiteLLM project and that it had moved promptly to contain and remediate the incident.³

The public technical root is the March 24 LiteLLM package compromise, not a separately confirmed Mercor-specific RCE advisory. LiteLLM's GitHub incident thread said malicious versions 1.82.7 and 1.82.8 were published to PyPI after an attacker gained access to a maintainer account.⁴

Version 1.82.7 embedded its payload in litellm/proxy/proxy_server.py. Version 1.82.8 added a malicious .pth file that could trigger on any Python startup.⁴ LiteLLM said the packages stole SSH keys, environment variables, cloud credentials, Kubernetes credentials, CI/CD secrets, and other sensitive material, then exfiltrated them to models.litellm.cloud.⁴

That is why Mercor matters, but also why Mercor is not the whole story. LiteLLM is shared infrastructure. When it was compromised, the blast radius was every environment that trusted it enough to route model traffic, secrets, and automation through it.³⁴

The framework did not catch its own compromise. Humans did, after publication and installation.³⁴

5. CrewAI

On March 30, CERT/CC published VU#221883 covering four CrewAI vulnerabilities: CVE-2026-2275, CVE-2026-2285, CVE-2026-2286, and CVE-2026-2287.¹⁰

CERT described the cluster as including remote code execution, arbitrary local file read, and SSRF.¹⁰ CVE-2026-2275 came from the Code Interpreter Tool falling back to SandboxPython when Docker could not be reached, enabling code execution through arbitrary C function calls.¹⁰ CVE-2026-2285 exposed arbitrary local file read through the JSON loader. CVE-2026-2286 allowed SSRF through RAG search tooling. CVE-2026-2287 covered the failure to keep enforcing Docker isolation, allowing another fallback into a less isolated mode.¹⁰

CERT's summary was direct: an attacker who can influence a CrewAI agent with the Code Interpreter Tool enabled may exploit the issues through prompt injection and chain them together.¹⁰ At publication time, CERT said no complete patch was available for all disclosed vulnerabilities.¹⁰

Prompt injection was the steering wheel. The real failure was the runtime's ability to slide from "agent tool execution" into "host-level action" without an independent, fail-closed decision point.

The Pattern

Five incidents. Five different immediate causes.

Bitwarden was a supply chain compromise plus credential theft. Vercel was OAuth abuse and lateral movement through delegated trust. Meta was excessive authority and unsafe autonomous action. LiteLLM was upstream package compromise in shared infrastructure. CrewAI was unsafe defaults and exploitable tool execution paths.

The common thread was not the exploit. It was the absence of a separate layer that could say no.

Bitwarden's malware ran inside the install path. Context.ai's access rode an already-trusted OAuth grant. Meta's agent published inside the same boundary as the data it could influence. LiteLLM's malicious code ran inside the framework process. CrewAI's tool execution and fallbacks lived inside the same runtime that was supposed to be safe.

Same shape every time: the system evaluating whether the action was safe shared fate with the system taking the action.

The Missing Layer

The missing control is architectural.

If the only system deciding whether an agent action is safe is the agent, you do not have agent security. You have optimism.

That is the problem grith is built around: enforcement below the agent, at the syscall layer, where file reads, process spawns, network connections, and credential access can be evaluated before they complete.

The model can be wrong. The prompt can be poisoned. The package can be compromised. The runtime can still be stopped, because the enforcement layer does not share fate with the thing it is checking.

Like this post? Share it.

Share on X Submit to HN