OpenAI just dropped AgentKit at DevDay, and the demos look clean—visual workflow builders, embedded chat interfaces, evaluation frameworks. Ramp went from blank canvas to live buyer agent in hours instead of months. LY Corporation built a multi-agent workflow in under two hours.
But here's what the launch post doesn't tell you: most of those demos will hit a wall before production.
What AgentKit Actually Shipped
AgentKit is three things bundled together:
Agent Builder gives you a visual canvas for composing agent logic. Drag-and-drop nodes, inline eval config, full versioning. You can see the entire workflow instead of debugging orchestration code at 2 AM.
ChatKit handles the annoying parts of chat UI—streaming responses, thread management, thinking indicators. Embed it in your product, customize the theme, ship.
Evals + RFT lets you measure agent performance with datasets, trace grading, and automated prompt optimization. Reinforcement fine-tuning on o4-mini and GPT-5 (private beta) to make models call the right tools at the right time.
It's a legitimate step forward for agent development velocity. Building workflows that used to take months now takes hours.
The Authentication and Authorization Wall
Here's the problem: AgentKit makes it easy to build agents. It doesn't solve how those agents actually authenticate and authorize in production.
AgentKit ships with native MCP (Model Context Protocol) support. MCP standardizes how agents connect to tools and data sources. Dropbox, Google Drive, SharePoint, Microsoft Teams—all available through the Connector Registry.
But MCP was designed for local, single-user scenarios. YOUR personal Claude Desktop connecting to YOUR personal filesystem. One user, one machine, no auth complexity.
99% of MCP servers today are still built that way—even the hosted ones. They work great for demos. They break in production when you need:
- Authentication: Secure OAuth flows so agents can access third-party APIs on behalf of users
- Authorization: Per-user permissions ensuring agents only access what each user is allowed to see
- Token management: Production-grade refresh logic, secure storage, and scope validation
- Audit trails: Logs showing which AI action happened on behalf of which user, with what permissions
This is what kills 70% of AI projects before they ship. Not the agent logic. The auth layer.
What Production-Ready Agent Auth Actually Requires
The gap between "works in a demo" and "ships to production" comes down to a few hard problems:
OAuth flows that don't suck. Your agent needs to access Gmail, Slack, Salesforce—tools that require OAuth. Building and maintaining OAuth integrations for dozens of services is months of work. Then you need to handle token refresh, scope changes, API versioning, and edge cases that only surface at scale.
Per-user authorization, not bot tokens. Most agent demos use a single API key or bot token with admin access. That's fine for a prototype. In production, you need every agent action to map to a specific user with appropriate permissions. Your legal team will ask: "Which user authorized this action? What were their permissions? Can we prove it in an audit?"
Token management that doesn't leak credentials. Storing OAuth tokens securely, refreshing them before expiry, handling revocation—this is all undifferentiated heavy lifting. Get it wrong and you're in breach of compliance requirements. Get it right and you've built something every other AI team also needs to build.
Production infrastructure. Monitoring which agent actions succeeded or failed. Rate limiting to avoid hammering APIs. Logging for debugging. Evaluation hooks to measure performance. None of this is glamorous, but all of it is mandatory for production deployment.
How Arcade.dev Solves This
This is exactly what Arcade.dev was built for—giving AI agents secure, authenticated access to real tools.
Pre-built OAuth integrations for Gmail, Slack, Notion, Stripe, Salesforce, and 100+ other services. Built and maintained by the team that shipped auth at scale (Okta, Stormpath). You don't write OAuth code. You configure scopes and let Arcade handle the rest.
Per-user authorization by default. Every agent action happens as the end user, with their permissions, through their authorized connection. No shared bot tokens, no admin access hacks.
Production-grade token management. Refresh logic, secure storage, scope validation, error handling. Your agent code never touches raw credentials.
Observability and evaluation. Monitoring, logging, rate limiting, and eval hooks built in. Everything you need to run agents at scale.
The architecture is straightforward: you build your agent workflows (in AgentKit, LangGraph, CrewAI, whatever), and Arcade provides the authenticated tool layer underneath. Your agent calls arcade.send_email() and Arcade handles the OAuth flow, token refresh, and user authorization.
Why This Matters Now
AgentKit just made it dramatically easier to build agent workflows. That's going to create a surge of teams hitting the authentication wall in the next few weeks.
You'll see it when you try to connect your agent to a user's Gmail account. Or when your compliance team asks how you're handling OAuth token storage. Or when you realize your demo works great with your own API keys but breaks when you try to deploy it for multiple users.
The good news: this is a solved problem. You don't need to build your own OAuth infrastructure. You don't need to maintain integrations for dozens of services. You don't need to figure out per-user authorization from scratch.
Arcade.dev handles the auth layer so you can focus on building agent workflows.
We're shipping MCP support for Arcade.dev next week—making it even easier to connect agent frameworks like AgentKit to authenticated tools. Join our Discord for updates or sign up for Arcade.dev to get on our mailing list.
If you're building agents today and already hitting auth problems, we're here. Get started with our quickstart guide or reach out directly—we're helping teams ship production agents every day.