The Day an AI Agent Merged Malicious Code (And What We Learned)

The Day an AI Agent Merged Malicious Code (And What We Learned)

Sam Partee's avatar
Sam Partee
JUNE 23, 2025
3 MIN READ
THOUGHT LEADERSHIP
Rays decoration image
Ghost Icon

Yesterday started like any other day. Coffee, standup, code review. Then I heard about an incident that made me put down everything.

An organization's AI agent had been compromised. Not through some exotic zero-day or sophisticated attack vector. No, this was far more elegant—and terrifying. Their LLM-powered browser agent had autonomously merged a malicious pull request on GitHub. As a real employee. With real permissions.

The attack vector? A carefully crafted email sitting in the user's inbox, containing instructions the agent interpreted as legitimate commands.

The Model Did Nothing Wrong

Here's what keeps me up at night: the LLM performed flawlessly. It read content, understood intent, and took action—exactly as designed. The catastrophic failure wasn't in the AI. It was in the architecture.

Someone gave an AI agent unrestricted browser access with the full privileges of a logged-in user. It's like handing your car keys to someone who's great at reading maps but has never heard of traffic laws.

The State of Agent Security (Spoiler: It's Bad)

I've reviewed dozens of agent architectures over the past year. The security models range from "non-existent" to "fingers crossed." The typical approach:

  1. Give the agent broad access
  2. Hope it makes good decisions
  3. ???
  4. Profit (or get pwned)

This is fundamentally broken. We're building agents like it's 2010 and nobody's heard of the principle of least privilege.

What Actually Works

After helping architect over 100 LLM-based applications at Redis and now building Arcade.dev, here's what I've learned about making agents that won't burn down your infrastructure:

1. Principle of Least Privilege (But Actually Do It)

Your agent should never have more access than absolutely necessary for the current task.

  • Reading emails? Read-only access to the inbox
  • Sending a message? Access to compose, not delete
  • Reviewing PRs? Read access, never merge

Put simply: giving an agent your user token is a really bad idea. Don't believe me? Wait until attackers get better at prompt injection. You'll regret it.

2. Execution Sandboxing

Running tools in isolated environments isn't paranoia—it's table stakes. Whether it's containerization, separate processes, or API-level restrictions, you need barriers between what an agent attempts and what damage it can cause.

At Arcade.dev, every tool runs in its own sandboxed environment. An agent can't even see tools it shouldn't have access to, let alone execute them.

3. Audit Everything (And I Mean Everything)

Every tool call, every decision, every piece of data accessed needs to be logged. Not just for compliance, but for forensics when things go wrong. And they will go wrong.

This isn't your typical observability. You need to capture:

  • The reasoning behind each tool call
  • All parameters passed
  • Complete results returned
  • Most importantly: who approved it

4. Human Circuit Breakers

Critical actions need human approval. Period.

Your agent can draft the email, prepare the code commit, or plan the database migration. But a human should pull the trigger. Yes, it reduces autonomy. It also prevents your agent from accidentally dropping prod because someone asked it to "clean up the test database" in an ambiguously worded Slack message.

The Uncomfortable Truth

The browser agent attack revealed what we've been avoiding: we're giving AI agents capabilities without corresponding safeguards. It's irresponsible engineering.

The platforms getting this right treat security as foundational architecture, not a checkbox feature. They:

  • Enforce user-scoped permissions by default
  • Validate every action against policy
  • Provide audit trails without you having to think about it
  • Make the secure path the easy path

Building Trust Through Architecture

Here's the thing: building trustworthy agents isn't about constraining AI capabilities. It's about channeling them safely.

At Arcade.dev, we've built our entire platform around this principle. Every integration uses OAuth with fine-grained permissions. Every tool call is scoped to the user who authorized it. Every action is logged and auditable. Not because we don't trust AI, but because we understand that security is what enables trust.

The agent that merged malicious code wasn't poorly trained. It wasn't using an inferior model. It was just given too much power with too little oversight.

Don't let this be your cautionary tale.


Sam Partee is CTO and co-founder at Arcade.dev, where he's building the infrastructure for secure, production-ready AI agents. Previously, he was Principal AI Engineer at Redis, where he helped architect over 100 LLM-based applications. He's a major contributor to Langchain, LlamaIndex, and believes that the future of AI is in what it can do, not just what it can say.

Want to build agents that act without breaking everything? Check out Arcade.dev or find me on Twitter.

SHARE THIS POST

RECENT ARTICLES

Rays decoration image
THOUGHT LEADERSHIP

AgentKit Ships, But Production Agents Still Need Authentication

OpenAI just dropped AgentKit at DevDay, and the demos look clean—visual workflow builders, embedded chat interfaces, evaluation frameworks. Ramp went from blank canvas to live buyer agent in hours instead of months. LY Corporation built a multi-agent workflow in under two hours. But here's what the launch post doesn't tell you: most of those demos will hit a wall before production. What AgentKit Actually Shipped AgentKit is three things bundled together: Agent Builder gives you a visual can

Rays decoration image
THOUGHT LEADERSHIP

Agent Auth: The Problem That Kills Production Agents

Your agent needs to pull data from Google Drive, post a summary to Slack, and create a Jira ticket. Simple request. But whose credentials does it use? Should it have permission to delete your entire Drive folder? This authorization problem kills agent demos before they reach production. It's not about users logging into your agent (LangGraph Platform handles that). It's about your agent accessing other services on behalf of those users. If you're building real agents, you've hit this wall. The

PRODUCT RELEASE

Your AI Agent Doesn't Know Who the Hell You Are (And That's a Problem)

Picture this: You walk into a newly opened restaurant for the first time, excited by the positive reviews, and confidently stride to a window-side table. As soon as you're settled, the waiter approaches, but before they can speak, you say, "The usual, please." The waiter stares at you like you've lost your mind. They've never seen you before. They have no idea what "the usual" means. That's your AI agent every time you start a new conversation. It has absolutely no idea who you are, what you w

Blog CTA Icon

Get early access to Arcade, and start building now.