AgentKit Ships, But Production Agents Still Need Authentication

AgentKit Ships, But Production Agents Still Need Authentication

Shub Argha's avatar
Shub Argha
OCTOBER 6, 2025
4 MIN READ
THOUGHT LEADERSHIP
Rays decoration image
Ghost Icon

OpenAI just dropped AgentKit at DevDay, and the demos look clean—visual workflow builders, embedded chat interfaces, evaluation frameworks. Ramp went from blank canvas to live buyer agent in hours instead of months. LY Corporation built a multi-agent workflow in under two hours.

But here's what the launch post doesn't tell you: most of those demos will hit a wall before production.

What AgentKit Actually Shipped

AgentKit is three things bundled together:

Agent Builder gives you a visual canvas for composing agent logic. Drag-and-drop nodes, inline eval config, full versioning. You can see the entire workflow instead of debugging orchestration code at 2 AM.

ChatKit handles the annoying parts of chat UI—streaming responses, thread management, thinking indicators. Embed it in your product, customize the theme, ship.

Evals + RFT lets you measure agent performance with datasets, trace grading, and automated prompt optimization. Reinforcement fine-tuning on o4-mini and GPT-5 (private beta) to make models call the right tools at the right time.

It's a legitimate step forward for agent development velocity. Building workflows that used to take months now takes hours.

The Authentication and Authorization Wall

Here's the problem: AgentKit makes it easy to build agents. It doesn't solve how those agents actually authenticate and authorize in production.

AgentKit ships with native MCP (Model Context Protocol) support. MCP standardizes how agents connect to tools and data sources. Dropbox, Google Drive, SharePoint, Microsoft Teams—all available through the Connector Registry.

But MCP was designed for local, single-user scenarios. YOUR personal Claude Desktop connecting to YOUR personal filesystem. One user, one machine, no auth complexity.

99% of MCP servers today are still built that way—even the hosted ones. They work great for demos. They break in production when you need:

  • Authentication: Secure OAuth flows so agents can access third-party APIs on behalf of users
  • Authorization: Per-user permissions ensuring agents only access what each user is allowed to see
  • Token management: Production-grade refresh logic, secure storage, and scope validation
  • Audit trails: Logs showing which AI action happened on behalf of which user, with what permissions

This is what kills 70% of AI projects before they ship. Not the agent logic. The auth layer.

What Production-Ready Agent Auth Actually Requires

The gap between "works in a demo" and "ships to production" comes down to a few hard problems:

OAuth flows that don't suck. Your agent needs to access Gmail, Slack, Salesforce—tools that require OAuth. Building and maintaining OAuth integrations for dozens of services is months of work. Then you need to handle token refresh, scope changes, API versioning, and edge cases that only surface at scale.

Per-user authorization, not bot tokens. Most agent demos use a single API key or bot token with admin access. That's fine for a prototype. In production, you need every agent action to map to a specific user with appropriate permissions. Your legal team will ask: "Which user authorized this action? What were their permissions? Can we prove it in an audit?"

Token management that doesn't leak credentials. Storing OAuth tokens securely, refreshing them before expiry, handling revocation—this is all undifferentiated heavy lifting. Get it wrong and you're in breach of compliance requirements. Get it right and you've built something every other AI team also needs to build.

Production infrastructure. Monitoring which agent actions succeeded or failed. Rate limiting to avoid hammering APIs. Logging for debugging. Evaluation hooks to measure performance. None of this is glamorous, but all of it is mandatory for production deployment.

How Arcade.dev Solves This

This is exactly what Arcade.dev was built for—giving AI agents secure, authenticated access to real tools.

Pre-built OAuth integrations for Gmail, Slack, Notion, Stripe, Salesforce, and 100+ other services. Built and maintained by the team that shipped auth at scale (Okta, Stormpath). You don't write OAuth code. You configure scopes and let Arcade handle the rest.

Per-user authorization by default. Every agent action happens as the end user, with their permissions, through their authorized connection. No shared bot tokens, no admin access hacks.

Production-grade token management. Refresh logic, secure storage, scope validation, error handling. Your agent code never touches raw credentials.

Observability and evaluation. Monitoring, logging, rate limiting, and eval hooks built in. Everything you need to run agents at scale.

The architecture is straightforward: you build your agent workflows (in AgentKit, LangGraph, CrewAI, whatever), and Arcade provides the authenticated tool layer underneath. Your agent calls arcade.send_email() and Arcade handles the OAuth flow, token refresh, and user authorization.

Why This Matters Now

AgentKit just made it dramatically easier to build agent workflows. That's going to create a surge of teams hitting the authentication wall in the next few weeks.

You'll see it when you try to connect your agent to a user's Gmail account. Or when your compliance team asks how you're handling OAuth token storage. Or when you realize your demo works great with your own API keys but breaks when you try to deploy it for multiple users.

The good news: this is a solved problem. You don't need to build your own OAuth infrastructure. You don't need to maintain integrations for dozens of services. You don't need to figure out per-user authorization from scratch.

Arcade.dev handles the auth layer so you can focus on building agent workflows.


We're shipping MCP support for Arcade.dev next week—making it even easier to connect agent frameworks like AgentKit to authenticated tools. Join our Discord for updates or sign up for Arcade.dev to get on our mailing list.

If you're building agents today and already hitting auth problems, we're here. Get started with our quickstart guide or reach out directly—we're helping teams ship production agents every day.

SHARE THIS POST

RECENT ARTICLES

THOUGHT LEADERSHIP

5 Takeaways from the 2026 State of AI Agents Report

AI agents have moved quickly from experimentation to real-world deployment. Over the past year, organizations have gone from asking whether agents work to figuring out how to deploy enterprise AI agents reliably at scale. The 2026 State of AI Agents Report from the Claude team captures this shift clearly. Drawing on insights from teams building with modern LLM agents—including those powered by models from providers like Anthropic—the report offers a grounded view of how agentic systems are bein

THOUGHT LEADERSHIP

What It’s Actually Like to Use Docker Sandboxes with Claude Code

We spend a lot of time thinking about how to safely give AI agents access to real systems. Some of that is personal curiosity, and some of it comes from the work we do at Arcade building agent infrastructure—especially the parts that tend to break once you move past toy demos. So when Docker released Docker Sandboxes, which let AI coding agents run inside an isolated container instead of directly on your laptop, we wanted to try it for real. Not as a demo, but on an actual codebase, doing the k

THOUGHT LEADERSHIP

Docker Sandboxes Are a Meaningful Step Toward Safer Coding Agents — Here’s What Still Matters

Docker recently announced Docker Sandboxes, a lightweight, containerized environment designed to let coding agents work with your project files without exposing your entire machine. It’s a thoughtful addition to the ecosystem and a clear sign that agent tooling is maturing. Sandboxing helps solve an important problem: agents need room to operate. They install packages, run code, and modify files — and giving them that freedom without exposing your laptop makes everyone sleep a little better. B

Blog CTA Icon

Get early access to Arcade, and start building now.