TL;DR: Anthropic's new Tool Search is a step in the right direction-but if you're running 4,000+ tools across multiple services, it might not be ready for prime time.
The promise
Anthropic's Tool Search promises to let Claude "access thousands of tools without consuming its context window." Music to our ears. At Arcade, we maintain thousands of agent-optimized tools across Gmail, Slack, GitHub, HubSpot, Salesforce, and dozens more platforms. If anyone was going to stress-test this feature, it was us.
So we did! Source code and full results →
The setup
We loaded 4,027 tools into Anthropic's beta and ran 25 straightforward tasks. The kind of requests your agent should nail 100% of the time on smaller tool sets:
- "Send an email to my colleague about the project update."
- "Post a message to the #general channel in Slack."
- "Schedule a meeting for tomorrow at 2pm."
Nothing tricky. No ambiguous edge cases. Just everyday agentic workflows.
We tested both of Anthropic's built-in search modes:
# Regex-based search
search_tool = [{"type": "tool_search_tool_regex_20251119", "name": "tool_search_tool_regex"}]
# BM25-based search
search_tool = [{"type": "tool_search_tool_bm25_20251119", "name": "tool_search_tool_bm25"}]Then we checked: did the correct tool even appear in the top-K results?
The results
To keep this as fair as possible, we just tested the success rate for retrieval - whether the right tool showed up in the search results. We didn't test whether Claude would select that tool or fill in the parameters correctly.
Where it worked and where it struggled
Tool search handled some requests flawlessly:
- ✅ GoogleCalendar_CreateEvent
- ✅ GoogleDocs_CreateBlankDocument
- ✅ Github_CreateIssue
- ✅ Spotify_PlayTrackByName
- ✅ Salesforce_CreateContact
- ✅ MicrosoftTeams_SendMessageToChannel
However, it did struggle to retrieve some some of the most common tools:
- ❌ Gmail_SendEmail - Couldn't find "send email" in a Gmail prompt
- ❌ Slack_SendMessage - Missed "post a message to Slack"
- ❌ Zendesk_CreateTicket - Ticket creation? Never heard of it
- ❌ ClickUp_CreateTask - Task creation tools exist. Just not in the results.
- ❌ Youtube_SearchVideos - Returned Youtube_SearchForVideos instead. Close, but no cigar.
When "send an email" can't find Gmail_SendEmail, there's still work to do.
What this means
This is certainly a move in the right direction. The architecture is sound: defer loading tools into the model’s context window to sidestep the long-standing context-bloat problem, and instead discover them just-in-time, keeping interactions with a model lightweight. And especially important to enterprises: the token savings are real.
But ~60% retrieval accuracy isn't ready for prime time when you're building agents that need to reliably take real-world actions. Enterprises need to be able to reliably trust the results of their agents. And having nearly half the tool searches fail before you even get to selection and parameterization doesn’t instill that trust.
We believe that Anthropic has identified a real problem, and we’re happy to see progress made in this space. Arcade is committed to delivering the MCP runtime and agent-optimized tools that help enterprises deploy agents that can take actions reliably for any model and for any number of tools. While our customers have already been able to improve the reliability of their production agents through Arcade, stay tuned for some exciting updates that will continue to push the boundaries of what’s possible.
Ready to build? Get started with Arcade →



