The Hidden Cost of Connecting AI to Your Tools
I've been building AI automations for about a year now. Workflows that pull data from one system, process it, and push it somewhere else — the kind of thing that used to take a junior analyst half a day and now takes 30 seconds.
But here's something I didn't fully appreciate until I read Anthropic's latest engineering post: the way most AI agents connect to tools is wildly inefficient. Not "could be slightly better" inefficient. 98.7% wasted tokens inefficient.
Let me break down what's happening and why it matters for your costs.
The Problem: Your Agent Is Loading Everything
When you connect an AI agent to external tools — Google Drive, Salesforce, Slack, your ERP — the standard approach is to load all tool definitions into the model's context window upfront.
Think of it this way: imagine you walk into a library to look up one fact. Instead of going to the relevant shelf, the librarian dumps every book in the building onto your desk and says "it's in there somewhere."
That's what's happening with most AI agent setups. Connected to 50 tools across a few integrations? That's 150,000+ tokens consumed before the model even reads your actual request. You're paying for the model to process tool descriptions it will never use.
The Second Problem Is Worse
Even after the agent figures out which tools to use, there's another cost multiplier most people don't think about.
Say you ask your AI agent: "Download the meeting transcript from Google Drive and attach it to the Salesforce lead."
Simple enough. Two tool calls. But here's what actually happens:
- The agent calls Google Drive and gets the transcript — full text flows into the context window
- The agent needs to call Salesforce with that transcript — so it writes the entire transcript back out
A two-hour meeting transcript might be 25,000 words. That's roughly 50,000 tokens passing through the model just to copy a file from one system to another. The model isn't doing anything useful with the content. It's just the middleman moving bytes around.
For a finance team running this on 20 meetings a month, you're looking at a million unnecessary tokens just on the pass-through. At current API pricing, that adds up fast.
The Fix: Code Execution
Anthropic's engineering team published an approach that cuts this dramatically. Instead of giving the model direct access to every tool, you present the tools as code files on a filesystem.
The agent then:
- Browses the filesystem to find relevant tools (reads only what it needs)
- Writes code to move data directly between systems
- Executes the code in a sandbox — data flows tool-to-tool without passing through the model
The result on their benchmark: 150,000 tokens dropped to 2,000 tokens. Same outcome. 98.7% reduction.
The meeting transcript example becomes trivially efficient: the agent writes a few lines of code that read from Google Drive and write to Salesforce. The transcript content never enters the context window at all.
Why This Matters Even If You're Not a Developer
I know what you're thinking: "I'm a finance manager, not a developer. Why do I care about token efficiency?"
Three reasons:
Cost. If your team is using AI workflows that touch multiple systems — pulling data from your ERP, enriching it, pushing it to a reporting tool — you're paying per token. A 98% reduction in token usage is a 98% reduction in API costs for those workflows. When your CFO asks "what does this AI stuff cost us?", architecture is the difference between "actually pretty cheap" and "more than the analyst it replaced."
Speed. Fewer tokens means faster responses. If your month-end close automation takes 45 seconds per report because it's loading 150,000 tokens of tool definitions, this approach could cut that to under 2 seconds. Multiply by 50 reports and you're saving real time.
Reliability. This is the one people underestimate. When models have to copy large data payloads between tool calls, they sometimes make mistakes — dropped characters, truncated content, formatting errors. When data flows directly between systems via code, the model never touches it. Fewer tokens through the model = fewer opportunities for error.
What You Should Do About It
If you're building AI automations today, or evaluating vendors who are:
Ask about architecture. "How do your AI workflows connect to our tools?" is a better question than "which model do you use?" The model matters less than how efficiently it's wired up.
Watch your token usage. Most API dashboards show token consumption per request. If you're seeing requests that consume 100,000+ tokens for simple operations, there's probably a pass-through problem.
Prefer tools that use MCP. The Model Context Protocol is becoming the standard for AI-to-tool connections. Tools built on MCP will be the first to benefit from efficiency improvements like code execution. It's the difference between building on a standard and building on a custom integration that nobody maintains.
Don't over-connect. Every tool you connect to your AI agent adds to the baseline cost. Connect what you actually use. You can always add more later.
The Bigger Picture
We're at an inflection point in AI automation. The first wave was "can AI do this task?" The answer is mostly yes. The second wave — the one we're in now — is "can AI do this task efficiently enough to be worth it at scale?"
Anthropic's code execution approach is one answer, but the principle applies broadly: the most valuable AI implementations aren't the ones with the fanciest models. They're the ones with the smartest architecture.
Build efficient. Scale confident.
At Skillpress, we build AI skills that connect to your existing tools without the token bloat. Our skills are designed for practitioners — finance analysts, ops managers, HR coordinators — who need AI that works reliably at production scale. Explore our skills library.