{"id":15030,"date":"2026-05-27T17:59:56","date_gmt":"2026-05-27T17:59:56","guid":{"rendered":"https:\/\/temperies.com\/?p=15030"},"modified":"2026-05-27T17:59:56","modified_gmt":"2026-05-27T17:59:56","slug":"the-hidden-trap-of-mcps","status":"publish","type":"post","link":"https:\/\/temperies.com\/es\/2026\/05\/27\/the-hidden-trap-of-mcps\/","title":{"rendered":"The Hidden Trap of MCPs"},"content":{"rendered":"<p><\/p>\n\n\n\n<h1><strong>The Hidden Trap of MCPs: Federated vs. Indexed Architectures in AI Agents<\/strong><\/h1>\n\n\n\n<p>The standardization of the Model Context Protocol (MCP) has democratized how we connect our Large Language Models (LLMs) to the real world. With just a few lines of code, an agent can read repositories, query databases, or search Google Drive. It would seem that the integration problem is solved.<\/p>\n\n\n\n<p>However, as engineering teams move from proof of concept to production with multi-agent architectures, they encounter an implacable wall:<strong>cost scalability and latency<\/strong>.<\/p>\n\n\n\n<p>The current debate is no longer<em>as<\/em>connect the information, but<em>when and where<\/em>structuring it. In this article, we&#8217;ll break down the difference between federated and indexed access, the real-world lifecycle of tokens in a tooled flow, and the architectures needed to keep your API bill from spiraling out of control.<\/p>\n\n\n\n<h2><strong>The Problem: The &#8220;Fixed Tax&#8221; of Schemes<\/strong><\/h2>\n\n\n\n<p>To understand the cost, we first need to debunk a common myth: when you connect an MCP server to your agent, you&#8217;re not sending credentials or configurations to the LLM. You&#8217;re sending<strong>the tool schema<\/strong>.<\/p>\n\n\n\n<p>Every time the agent needs to think, the orchestrator injects into the<em>System Prompt<\/em>A JSON instruction manual describes what each tool does and what parameters it accepts. If your agent has access to 20 tools (searching CVs, cross-referencing timesheets, reading compliance policies, etc.), you&#8217;re sending thousands of &#8220;instruction&#8221; tokens.<em>each call<\/em>to the model, regardless of whether the agent ends up using the tool or not.<\/p>\n\n\n\n<p>This is known as the<strong>token tax<\/strong>.<\/p>\n\n\n\n<h2><strong>Two Architectural Paths: Federated vs. Indexed<\/strong><\/h2>\n\n\n\n<p>How do we give the agent the context of the company? There are two diametrically opposed approaches.<\/p>\n\n\n\n<h3><strong>1. The Federated Approach (Raw Real-Time Search)<\/strong><\/h3>\n\n\n\n<p>In this model, the agent has an MCP adapter for each platform (Slack, Drive, Jira). When the user asks a question, the agent &#8220;translates&#8221; the query into calls to the individual APIs, collects the raw results, reads them all, discards the garbage, and synthesizes an answer.<\/p>\n\n\n\n<ul><li><strong>The pain:<\/strong>The agent performs the work of a librarian using brute force. If the API returns noise, the LLM spends tokens processing that noise.<\/li><li><strong>The cost:<\/strong>Recent evaluations (such as those conducted by Glean using Claude) have shown that even when the federated approach achieves an excellent response, it consumes almost all of the resources.<strong>double tokens<\/strong>(~83k vs ~43k) because it requires iterations, error loops and the processing of multiple irrelevant documents returned by basic search engines.<\/li><\/ul>\n\n\n\n<h3><strong>2. The Indexed Approach (Centralized Context Layer)<\/strong><\/h3>\n\n\n\n<p>Here, an external system (such as an advanced RAG engine or Knowledge Graphs) has already indexed, vectorized, and related the data from Drive, Slack, and Jira.<\/p>\n\n\n\n<ul><li><strong>The advantage:<\/strong>The agent has connected<strong>a single MCP<\/strong>Master. Make a single call, and the search engine returns a semantically clean and ordered block of context. One archivist, one catalog.<\/li><\/ul>\n\n\n\n<h2><strong>Anatomy of a Consultation: The Complete Flow (and Cost)<\/strong><\/h2>\n\n\n\n<p>To visualize why the &#8220;out-of-the-box&#8221; federated approach is dangerous, let&#8217;s look at the ReAct flow (<em>Reasoning and Acting<\/em>) standard of an agent facing a complex query, for example:<em>&#8220;Review this month&#8217;s hours report and compare the tasks with the seniority level in the employee&#8217;s CV.&#8221;<\/em><\/p>\n\n\n\n<ol><li><strong>Initial Context Load (Input Token Cost &#8211; High):<\/strong>The orchestrator sends your prompt, the system rules (&#8220;you are a strict auditor&#8221;), and the LLM<strong>the JSON schemas of all available MCP tools<\/strong>.<\/li><li><strong>The Decision (Output Token Spending &#8211; Low):<\/strong>The LLM reasons and decides that it needs to search in Drive. It responds with a JSON:{&#8220;tool_call&#8221;: &#8220;buscar_drive&#8221;, &#8220;args&#8221;: {&#8220;query&#8221;: &#8220;CV empleado&#8221;}}.<\/li><li><strong>Execution (Latency, 0 Tokens):<\/strong>The orchestrator pauses the LLM, goes to the MCP server, runs the search, and retrieves text fragments.<\/li><li><strong>The Synthesis (Input Token Spending &#8211; Critical):<\/strong>One is made<em>second<\/em>Call to the LLM. The following is sent: All context from Step 1 + The decision history +<strong>The extracted documents<\/strong>The context window grows dramatically.<\/li><li><strong>Iteration (The Snowball):<\/strong>If the LLM notices that you brought the CV but are missing the hours report, they decide to look for it again. Steps 3 and 4 are repeated. On the third call to the LLM, you are paying again to have the MCP schemas reviewed and to have the CV reviewed, which you had already brought before.<\/li><li><strong>Final Response (Output Token Spending &#8211; Medium):<\/strong>The model finally drafts the audit analysis.<\/li><\/ol>\n\n\n\n<h2><strong>Architectural Evolution: Saving the Federated Model<\/strong><\/h2>\n\n\n\n<p>If you can&#8217;t build a global indexed layer and must use federated MCPs, injecting dozens of tools into a single monolithic agent is a recipe for failure. The solution lies in segmenting the intelligence.<\/p>\n\n\n\n<h3><strong>Improvement 1: The Supervisory Pattern and Specialized Nodes<\/strong><\/h3>\n\n\n\n<p>Using frameworks like LangGraph, we abandon the idea of \u200b\u200ba &#8220;know-it-all agent.&#8221; We build a graph of small, specialist agents. The masterstroke:<strong>Each specialist agent only has the tools they need &#8220;bound&#8221; (connected).<\/strong><\/p>\n\n\n\n<p>A conceptual example for a corporate audit system:<br><br><img fetchpriority=\"high\" width=\"2816\" height=\"1536\" class=\"wp-image-15031\" style=\"width: 1200px;\" src=\"https:\/\/temperies.com\/wp-content\/uploads\/2026\/05\/MCPs.png\" alt=\"\" srcset=\"https:\/\/temperies.com\/wp-content\/uploads\/2026\/05\/MCPs.png 2816w, https:\/\/temperies.com\/wp-content\/uploads\/2026\/05\/MCPs-768x419.png 768w, https:\/\/temperies.com\/wp-content\/uploads\/2026\/05\/MCPs-1536x838.png 1536w, https:\/\/temperies.com\/wp-content\/uploads\/2026\/05\/MCPs-2048x1117.png 2048w, https:\/\/temperies.com\/wp-content\/uploads\/2026\/05\/MCPs-18x10.png 18w\" sizes=\"(max-width: 2816px) 100vw, 2816px\" \/><\/p>\n\n\n\n<p>When the query about cross-referencing timesheets with CVs arrives, the Supervisor forwards the flow toauditor_personalThis model does its job by sending only<em>of the<\/em>tool schemes in your prompt, saving thousands of tokens by ignoring existing compliance, billing, or infrastructure tools within the company.<\/p>\n\n\n\n<h3><strong>Improvement 2: Tool RAG (Dynamic Tool Recovery)<\/strong><\/h3>\n\n\n\n<p>For massive ecosystems (e.g., 50+ MCP tools), pre-assigning them to specialist agents can be cumbersome. The solution is the &#8220;RAG Tool&#8221;. Instead of indexing documents,<strong>We vectorized the tool descriptions<\/strong>.<\/p>\n\n\n\n<ol><li>The user prompt arrives.<\/li><li>A quick embeddings model compares the prompt against the tool database.<\/li><li>Extract only the 3 most relevant tools (e.g.mcp_jira, mcp_gitlab).<\/li><li>The main agent is hot-initialized by injecting<em>only<\/em>those 3 outlines for that specific conversation.<\/li><\/ol>\n\n\n\n<h2><strong>Conclusion<\/strong><\/h2>\n\n\n\n<p>How your AI accesses your data is now a financial decision, not just a technical one.<\/p>\n\n\n\n<p>Protocols like MCP are brilliant connectors, but relying solely on flat federated architectures will result in prohibitive bills and unacceptable latencies due to ReAct cycles and massive context injection.<\/p>\n\n\n\n<p>The architecture of the future (and of the scalable present) is hybrid:<strong>Use static federated MCPs to execute actions<\/strong>(create tickets, approve pull requests, modify infrastructure), but it relies on<strong>indexed context layers or dynamic routers<\/strong>for knowledge extraction. Architecture will always beat brute force.<\/p>","protected":false},"excerpt":{"rendered":"<p>The Hidden Trap of MCPs: Federated vs. Indexed Architectures in AI Agents The standardization of the Model Context Protocol (MCP) has democratized how we connect our Large Language Models (LLMs) to the real world. With just a few lines of code, an agent can read repositories, query databases, or search Google Drive. It would seem&hellip;<\/p>","protected":false},"author":4,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[54],"tags":[55],"_links":{"self":[{"href":"https:\/\/temperies.com\/es\/wp-json\/wp\/v2\/posts\/15030"}],"collection":[{"href":"https:\/\/temperies.com\/es\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/temperies.com\/es\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/temperies.com\/es\/wp-json\/wp\/v2\/users\/4"}],"replies":[{"embeddable":true,"href":"https:\/\/temperies.com\/es\/wp-json\/wp\/v2\/comments?post=15030"}],"version-history":[{"count":1,"href":"https:\/\/temperies.com\/es\/wp-json\/wp\/v2\/posts\/15030\/revisions"}],"predecessor-version":[{"id":15032,"href":"https:\/\/temperies.com\/es\/wp-json\/wp\/v2\/posts\/15030\/revisions\/15032"}],"wp:attachment":[{"href":"https:\/\/temperies.com\/es\/wp-json\/wp\/v2\/media?parent=15030"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/temperies.com\/es\/wp-json\/wp\/v2\/categories?post=15030"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/temperies.com\/es\/wp-json\/wp\/v2\/tags?post=15030"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}