Moving From Tool-Calling to Code Execution

In the past year, AI systems have started behaving less like static models and more like dynamic collaborators, connecting to tools, APIs, databases, and cloud systems on our behalf. Yet as this ecosystem grows, we’re hitting an increasingly familiar ceiling: context overload.

Large models can plan, reason, and synthesise beautifully, but when they must ingest, e.g.:

400 tool schemas,
80 pages of argument definitions, and
a 5 MB JSON payload from a database query,

they become slow, fragile, and error-prone. In compuational biology and genomics, where workflows routinely involve thousands of rows of gene-disease hits, expression matrices, variant tables, and other data types, the problem becomes even more severe.

Recently, the Model Context Protocol (MCP) standardised how agents interact with external systems. But the real breakthrough comes from a different pattern:

Don’t let the model call tools directly.

Let the model write Python code that calls those tools.

This small inversion has major implications in every domain where datasets are large, schemas are detailed, and workflows are multi-step.

The Context Bottleneck in Multi-Step Biological Queries

To ground this in something familiar, consider a typical (and here deliberately simplified) bioinformatics workflow:

Start from a gene → look up its disease associations → examine its expression in the tissue most relevant to that disease.

This is a mock but representative multi-step reasoning task that appears constantly in genomics, and it exposes a fundamental limitation of traditional direct tool-calling.

In a direct tool-calling setup, the model is forced to absorb far more detail than it needs. A GWAS Catalogue query might return hundreds or thousands of association rows, each with effect sizes, p-values, disease ontology annotations, sample sizes, and study metadata. An Expression Atlas or GTEx lookup might return an entire expression vector across dozens of tissues. All of this flows through the model’s context, even though the model only needs a tiny fraction of the information, typically just the top hit and the relevant tissue.

This is where context overloading becomes a real bottleneck. The model must interpret large, structured biological datasets in its own context window, carry them between tool calls, and reason over them token-by-token. It’s slow, expensive, error-prone, and ultimately unnecessary.

The alternative is simpler and far more scalable:

The model shouldn’t manipulate the data directly; it should describe the workflow by writing the code that performs it.

In this pattern, the model would outline the logic (e.g. in Python), and the execution environment would carry it out: retrieving the data, processing it, and returning only the distilled result. Intent stays with the model; computation moves into code.

A Lightweight Bioinformatics Example:

GWAS → Tissue → Gene Expression

Here’s a compact example of how such a workflow could look in practice.

Scenario

For a given gene (e.g., BRCA1),

fetch all disease associations from the GWAS Catalogue,

identify the top associated trait,

infer the tissue most relevant to that trait,

and then query Expression Atlas (or GTEx) for the gene’s expression in that tissue.

We’ll keep the MCP tool wrappers simple and assume two hypothetical MCP servers:

gwas_catalog.search_associations(gene: str)
expression_atlas.get_expression(gene: str, tissue: str)

Below we contrast:

the inefficient direct-tool-calling pattern, and
the efficient Python code-execution pattern.

1. The Inefficient Way: Direct Tool Calling

What the model has to do:

Load both tool schemas into context
Receive the entire GWAS table
Extract the top association through natural-language reasoning inside the model
Call gene-expression tool with manually constructed arguments
Receive entire expression matrix

This is what the direct tool-calling flow looks like (conceptually):

LLM → tool.call(gwas_catalog.search_associations)
    ← returns a 2000-row GWAS table into model context

LLM filters/reads/infers top tissue inside context

LLM → tool.call(expression_atlas.get_expression)
    ← returns entire expression dataset into context

Pain Points:

Two large biological datasets pass through the model.
Schemas for both tools reside in the model.
Filtering and logic happen in the model.

2. The Efficient Way:

Code Execution + MCP

Here, the model writes a short Python script and sends it to the execution environment.
All heavy operations happen outside the model.

Python script generated by the model:

# scripts/gene_gwas_expression.py

import asyncio
from servers.gwas_catalog import search_associations, SearchAssociationsInput
from servers.expression_atlas import get_expression, GetExpressionInput

async def main():
    gene = "BRCA1"

    # 1. Fetch associations from GWAS Catalogue
    associations = await search_associations(
        SearchAssociationsInput(gene=gene)
    )
    rows = associations.rows  # list[dict], stays OUTSIDE the model context
    
    if not rows:
        print(f"No GWAS hits found for {gene}.")
        return

    # 2. Sort by p-value and take the top association
    rows_sorted = sorted(rows, key=lambda r: r.get("pvalue", 1))
    top_hit = rows_sorted[0]
    top_trait = top_hit.get("trait")
    associated_tissue = top_hit.get("mapped_tissue", None)

    print(f"Top GWAS trait for {gene}: {top_trait}")
    print(f"Associated tissue: {associated_tissue}")

    if not associated_tissue:
        print("No tissue information available for the top trait.")
        return

    # 3. Query expression in the associated tissue
    expr = await get_expression(
        GetExpressionInput(
            gene=gene,
            tissue=associated_tissue
        )
    )

    # Only send a summary back to the model
    print(f"Expression of {gene} in {associated_tissue}:")
    print(f"Median TPM: {expr.tpm}")

asyncio.run(main())

What the model sees:

Just the final summary:

Top GWAS trait for BRCA1: breast cancer
Associated tissue: mammary tissue
Expression of BRCA1 in mammary tissue:
Median TPM: 1.284

Benefits:

The GWAS table never enters the context.
The entire expression table never enters the context.
The schemas are small and loaded on demand.

Why This Matters (in Bioinformatics and beyond)

In biology (and many other domains), almost every workflow involves:

high-dimensional data,
multiple databases,
several layers of cross-referencing, and
complex domain logic.

Doing this with direct tool-calling forces the LLM to juggle raw data structures far larger and more intricate than any simple metadata or file-retrieval API.

Code execution solves this elegantly:

All heavy lifting happens in Python
The model works with summaries
No large tables enter context
No schema overload
No fragile sequence of multiple tool calls the model must orchestrate by hand

This pattern is cleaner, faster, safer → and closer to how humans design computational workflows.

A Final Thought

The more I experiment with this architecture, the more I’m convinced that agentic AI in biology won’t scale through bigger models alone. It will scale through better software engineering practices and better interfaces:

tools that speak a common protocol (MCP), and
models that express intent through code rather than micromanagement via natural language.

This separation is not only computationally efficient; it’s cognitively elegant: freeing the model from shuttling data around and letting it operate as the lightweight, efficient reasoning layer it’s meant to be.

Dimitrios Vitsios's blog

MCP: When the Data Overloads the Model’s Context

Moving From Tool-Calling to Code Execution

The Context Bottleneck in Multi-Step Biological Queries

A Lightweight Bioinformatics Example:

GWAS → Tissue → Gene Expression

Scenario

1. The Inefficient Way: Direct Tool Calling

What the model has to do:

This is what the direct tool-calling flow looks like (conceptually):

Pain Points:

2. The Efficient Way:

Code Execution + MCP

Python script generated by the model:

What the model sees:

Benefits:

Why This Matters (in Bioinformatics and beyond)

Code execution solves this elegantly:

A Final Thought

Leave a comment Cancel reply

Recent posts

From Interfaces to Intelligence: Where Agentic AI Really Shines

MCP: When the Data Overloads the Model’s Context

The Shift from Computation to Conversation

Quote(s)

Dimitrios Vitsios's blog

About ME

Topics

Follow ME ON

MCP: When the Data Overloads the Model’s Context

Moving From Tool-Calling to Code Execution

The Context Bottleneck in Multi-Step Biological Queries

A Lightweight Bioinformatics Example:

GWAS → Tissue → Gene Expression

Scenario

1. The Inefficient Way: Direct Tool Calling

What the model has to do:

This is what the direct tool-calling flow looks like (conceptually):

Pain Points:

2. The Efficient Way:

Code Execution + MCP

Python script generated by the model:

What the model sees:

Benefits:

Why This Matters (in Bioinformatics and beyond)

Code execution solves this elegantly:

A Final Thought

Share this:

Leave a comment Cancel reply

Recent posts

From Interfaces to Intelligence: Where Agentic AI Really Shines

MCP: When the Data Overloads the Model’s Context

The Shift from Computation to Conversation

Quote(s)

Dimitrios Vitsios's blog

About ME

Topics

Follow ME ON