What does Anthropic's Tool Search for Claude mean for you?

What does Anthropic's Tool Search for Claude mean for you?

Alex Salazar's avatar
Alex Salazar
DECEMBER 2, 2025
4 MIN READ
THOUGHT LEADERSHIP
Rays decoration image
Ghost Icon

I was recently in Amsterdam meeting with some of the largest enterprises, and they all raised the same challenge: how to give AI agents access to more tools without everything falling apart? 

The issue is that as soon as they hit 20-30 tools, token costs became untenable and selection accuracy plummeted. The pain has been so acute that many teams have been attempting (unsuccessfully) to build their own workarounds with RAG pipelines, only to hit performance walls. 

That's why I'm excited about Anthropic's recently announced Tool Search Tool, which represents a major step forward in solving this common challenge for AI agents.

What did Anthropic actually release?

Announced as one of three new beta features, Anthropic’s Tool Search Tool allows Claude models to dynamically discover and load tools on demand instead of manually adding every single tool definition into its context window upfront.

Before, these models had to keep every possible tool in its working memory at all times. Now it can offload that and search through it when needed. It's like the difference between keeping everything in your head and referring to  a dictionary. Just like your brain, giving the Claude models the ability to keep tools in a “dictionary” means reducing the taxing load of holding onto all that memory while also improving accuracy. 

Let's dive more into the two primary constraints it addresses:

Token bloat: In their announcement, Anthropic provides a concrete example: Consider a five-server setup:

  • GitHub: 35 tools (~26K tokens)
  • Slack: 11 tools (~21K tokens)
  • Sentry: 5 tools (~3K tokens)
  • Grafana: 5 tools (~3K tokens)
  • Splunk: 2 tools (~2K tokens)

That's 58 tools consuming approximately 55K tokens before the conversation even begins. Add additional servers like Jira (which alone uses ~17K tokens) and you quickly approach 100K+ token overhead. This token consumption directly impacts both response latency and operational costs.

Prior to this release, agents began experiencing reliability issues after approximately 20 tools. To put this in perspective, the GitHub toolkit alone contains 18 tools, and Gmail has 10-13. This created severe practical constraints. Organizations couldn't deploy agents capable of handling multiple systems simultaneously. 

Accuracy: Tool selection accuracy was another critical constraint. As the number of tools increased, the model's ability to select the correct tool decreased significantly. This was particularly problematic when tools had similar names or overlapping functionality.

How Anthropic solved this for Claude

The solution is straightforward: mark tools with defer_loading: true. Those tools remain discoverable but don't consume context until Claude actually needs them. Claude searches using either regex or keyword ranking (BM25), then only loads what it needs.

The results are compelling: an 85% reduction in token usage while maintaining access to your full tool library, plus significant accuracy improvements on MCP evaluations, with Opus 4 improving from 49% to 74% with this enabled.

Why are we excited about this at Arcade?

While this capability represents a significant leap forward, it simultaneously introduces critical infrastructure challenges that organizations must address when running and scaling agents in production. As agents have access to any number of tools, enterprises now must ensure they can connect to them securely, the tools are optimized for agents, and they can maintain governance and control at scale. That’s where Arcade’s MCP runtime can help.

1. Secure Agent Authorization

Agent authorization is one of the hardest challenges to solve and is why most AI projects never go beyond a single-user demo. Arcade ensures agents can take actions on any system with controlled, user-specific permissions. It integrates within existing OAuth, IDP, and user access flows, so you get granular controls for your agents out of the box. 

2. Agent-Optimized Tools

Most MCP servers and tools just wrap existing APIs, which leads to poor accuracy and disgruntled users. You can give Claude access to a thousand tools, but if they're poorly built, it doesn't matter. Bad tool definitions lead to bad tool selection. Arcade provides the largest catalog of agent-optimized MCP tools out of the box. Our tools outperform because we've done the hard work of making them actually work, not just wrapping APIs, but building tools specifically designed to handle agent intent with better reliability and lower costs.

3. Governance at Scale

More tool access unlocks more use cases, which means more agents and more teams deploying them across your organization. This agents and MCP sprawl makes it hard to know if teams are rebuilding existing servers or breaking workflows as they push upgrades. The Arcade MCP runtime centralizes the control and governance of all your MCP tools, improves discovery and access of these tools across teams, enables safe testing and versioning, and provides the only visibility into what every agent accesses on behalf of each user across each service, ultimately accelerating trusted production deployments across the board.

Tool search limitations to consider

It’s important to call out some limitations to Anthropic's Tool Search Tool that should be considered.

First, this tool is exclusively available for Claude. This means if you’re using Anthropic for your large model but another vendor for your small models (a pretty common pattern), this feature won’t work across both. This will also be particularly painful for teams using coding agents or IDE assistants, where this feature will only work on a subset of models available.

Second, broad framework support will require time. Currently, implementation requires using Anthropic's SDK directly with special beta headers and flags. This capability is not yet supported in LangChain or other popular frameworks.

Time to start building

Anthropic has helped to eliminate a major constraint on AI agent capabilities.

However, the critical question isn't whether your agent can access a thousand tools, it's whether it should, and whether you can manage that safely and effectively, particularly when agents have access to critical production systems.

That's where Arcade comes into play. As the runtime for MCP, Arcade is the only one able to deliver secure agent authorization, high-accuracy tools, and centralized governance. We give you the ability to deploy multi-user AI agents that take actions across any system with granular permissions and complete visibility, no complex infrastructure required. 


Building production AI agents? Try Arcade’s MCP runtime for free so you can ship faster and scale with control.

SHARE THIS POST

RECENT ARTICLES

How Arcade Proactively Addressed The First Major Identity Vulnerability in Agentic AI

While building an AI demo has become trivially easy, production-grade deployments in enterprises have been stifled by performance issues, costs, and security vulnerabilities that their teams have been warning about. Today, we're addressing one of those vulnerabilities head-on. A new class of identity attack Security researchers at The Chinese University of Hong Kong recently identified new variants of COAT (Cross-app OAuth Account Takeover), an identity phishing attack targeting agentic AI a

TUTORIALS

New Year, New Agents to Make You More Productive

Most conversations about AI agents still start the same way: models, prompts, frameworks, followed by an incredible looking demo. Then someone asks, “Okay… when can it ship to production?” That’s where things get a little awkward. The naked truth in the fading demo afterglow is that agents are apps. Which means they need identity, permissions, real integrations, and a way to behave predictably when something goes sideways. Without these components, any agent can dazzle a boardroom, but it won

THOUGHT LEADERSHIP

5 Takeaways from the 2026 State of AI Agents Report

AI agents have moved quickly from experimentation to real-world deployment. Over the past year, organizations have gone from asking whether agents work to figuring out how to deploy enterprise AI agents reliably at scale. The 2026 State of AI Agents Report from the Claude team captures this shift clearly. Drawing on insights from teams building with modern LLM agents—including those powered by models from providers like Anthropic—the report offers a grounded view of how agentic systems are bein

Blog CTA Icon

Get early access to Arcade, and start building now.