I was recently in Amsterdam meeting with some of the largest enterprises, and they all raised the same challenge: how to give AI agents access to more tools without everything falling apart?
The issue is that as soon as they hit 20-30 tools, token costs became untenable and selection accuracy plummeted. The pain has been so acute that many teams have been attempting (unsuccessfully) to build their own workarounds with RAG pipelines, only to hit performance walls.
That's why I'm excited about Anthropic's recently announced Tool Search Tool, which represents a major step forward in solving this common challenge for AI agents.
What did Anthropic actually release?
Announced as one of three new beta features, Anthropic’s Tool Search Tool allows Claude models to dynamically discover and load tools on demand instead of manually adding every single tool definition into its context window upfront.
Before, these models had to keep every possible tool in its working memory at all times. Now it can offload that and search through it when needed. It's like the difference between keeping everything in your head and referring to a dictionary. Just like your brain, giving the Claude models the ability to keep tools in a “dictionary” means reducing the taxing load of holding onto all that memory while also improving accuracy.
Let's dive more into the two primary constraints it addresses:
Token bloat: In their announcement, Anthropic provides a concrete example: Consider a five-server setup:
- GitHub: 35 tools (~26K tokens)
- Slack: 11 tools (~21K tokens)
- Sentry: 5 tools (~3K tokens)
- Grafana: 5 tools (~3K tokens)
- Splunk: 2 tools (~2K tokens)
That's 58 tools consuming approximately 55K tokens before the conversation even begins. Add additional servers like Jira (which alone uses ~17K tokens) and you quickly approach 100K+ token overhead. This token consumption directly impacts both response latency and operational costs.
Prior to this release, agents began experiencing reliability issues after approximately 20 tools. To put this in perspective, the GitHub toolkit alone contains 18 tools, and Gmail has 10-13. This created severe practical constraints. Organizations couldn't deploy agents capable of handling multiple systems simultaneously.
Accuracy: Tool selection accuracy was another critical constraint. As the number of tools increased, the model's ability to select the correct tool decreased significantly. This was particularly problematic when tools had similar names or overlapping functionality.
How Anthropic solved this for Claude
The solution is straightforward: mark tools with defer_loading: true. Those tools remain discoverable but don't consume context until Claude actually needs them. Claude searches using either regex or keyword ranking (BM25), then only loads what it needs.
The results are compelling: an 85% reduction in token usage while maintaining access to your full tool library, plus significant accuracy improvements on MCP evaluations, with Opus 4 improving from 49% to 74% with this enabled.
Why are we excited about this at Arcade?
While this capability represents a significant leap forward, it simultaneously introduces critical infrastructure challenges that organizations must address when running and scaling agents in production. As agents have access to any number of tools, enterprises now must ensure they can connect to them securely, the tools are optimized for agents, and they can maintain governance and control at scale. That’s where Arcade’s MCP runtime can help.
1. Secure Agent Authorization
Agent authorization is one of the hardest challenges to solve and is why most AI projects never go beyond a single-user demo. Arcade ensures agents can take actions on any system with controlled, user-specific permissions. It integrates within existing OAuth, IDP, and user access flows, so you get granular controls for your agents out of the box.
2. Agent-Optimized Tools
Most MCP servers and tools just wrap existing APIs, which leads to poor accuracy and disgruntled users. You can give Claude access to a thousand tools, but if they're poorly built, it doesn't matter. Bad tool definitions lead to bad tool selection. Arcade provides the largest catalog of agent-optimized MCP tools out of the box. Our tools outperform because we've done the hard work of making them actually work, not just wrapping APIs, but building tools specifically designed to handle agent intent with better reliability and lower costs.
3. Governance at Scale
More tool access unlocks more use cases, which means more agents and more teams deploying them across your organization. This agents and MCP sprawl makes it hard to know if teams are rebuilding existing servers or breaking workflows as they push upgrades. The Arcade MCP runtime centralizes the control and governance of all your MCP tools, improves discovery and access of these tools across teams, enables safe testing and versioning, and provides the only visibility into what every agent accesses on behalf of each user across each service, ultimately accelerating trusted production deployments across the board.
Tool search limitations to consider
It’s important to call out some limitations to Anthropic's Tool Search Tool that should be considered.
First, this tool is exclusively available for Claude. This means if you’re using Anthropic for your large model but another vendor for your small models (a pretty common pattern), this feature won’t work across both. This will also be particularly painful for teams using coding agents or IDE assistants, where this feature will only work on a subset of models available.
Second, broad framework support will require time. Currently, implementation requires using Anthropic's SDK directly with special beta headers and flags. This capability is not yet supported in LangChain or other popular frameworks.
Time to start building
Anthropic has helped to eliminate a major constraint on AI agent capabilities.
However, the critical question isn't whether your agent can access a thousand tools, it's whether it should, and whether you can manage that safely and effectively, particularly when agents have access to critical production systems.
That's where Arcade comes into play. As the runtime for MCP, Arcade is the only one able to deliver secure agent authorization, high-accuracy tools, and centralized governance. We give you the ability to deploy multi-user AI agents that take actions across any system with granular permissions and complete visibility, no complex infrastructure required.
Building production AI agents? Try Arcade’s MCP runtime for free so you can ship faster and scale with control.



