How Skills and MCP impact context in modern harnesses

There are tons of arguments around skills and MCPs. Most of those arguments are quite pointless because they focus on particular implementations. One aspect I consider to be important though - how exactly the harness (1) presents the skill/MCP capabilities and (2) if those capabilities are selected by the model how and what is put into the model’s context.

Let’s quickly recap what Skills and MCPs are.

Skills

Skills are reusable prompts distributed as markdown files with attached preamble and set of bundled resources - additional documents or scripts. Detailed specification can be found here. Additional documents or scripts shouldn’t be normally loaded on interaction unless harness considers them to be particularly important during its work. I’m ignoring them for this post.

Diving deeper in SKILL.md.

The SKILL.md file consist of preamble (frontmatter) and body. The name and description fields are of particular importance. These are loaded into the first prompt of every session. Preamble allows to quickly scan through the skill package and give to harness short summary that harness later inject in the prompt. This mechanism of progressive disclosure makes skills quite economical in terms of consumed context. When model decides or forced to use the skill harness loads full body of the skill.

What it really means is cost saving should be indeed quite significant in the first prompt, because name and description are indeed short (max 64 + 1024 characters according to spec per skill). When a skill is loaded it’s now a wild west - skills can be written in absolutely horrendous ways.

Here is how Codex CLI works with skills:

  1. Skills are discovered by going through a set of directories
  2. For every skill codex parses only frontmatter first.
  3. When new session starts codex creates initial prompt and relevant part is added like this:
<skills_instructions>
## Skills
A skill is a set of local instructions to follow that is stored in a `SKILL.md` file...
### Available skills
- demo-skill: Does something useful (file: /.../SKILL.md)
- another-skill: Does something stupid (file: /stupid-skill/SKILL.md)

### How to use skills
- Discovery: ...
- Trigger rules: ...
- How to use a skill (progressive disclosure):
  1) After deciding to use a skill, open its `SKILL.md`.
  ...
</skills_instructions>
  1. This list of skills is not of infinite length and is capped at 8k characters or 2% of available context window. In case there is issue with length first descriptions are trimmed, then some skills might be thrown away.
  2. When skill is activated it will be appended to the user message as such:
  <skill>
  <name>demo-skill</name>
  <path>/path/to/SKILL.md</path>
  --- frontmatter ---
  ...
  # full skill body
  ...
  </skill>

It will become a user message which is very similar to user basically copy-pasting the contents of the SKILL.md into harness. 6.

MCP

MCP is a protocol describing how to enable LLMs to interact with external systems. Protocol leaves a lot to implementations hence quality of those implementations vary from high to garbage.

There are MCP clients and MCP servers. The Harness is MCP client and MCP server is a service that runs somewhere, maybe on the same machine or elsewhere. I’m leaving the complexity of MCP alone and focus on the context-related properties. Fundamentally MCP operates on the level of LLM tools. LLM Tools are usually special instructions that allow LLM to generate a text describing well-defined action also known as tool call. Tool definitions have special treatment during model pretraining and usually are presented in a special way to the model. For details about tool presentation to the model search for the “ + template format" where template format might be Harmony, Anthropic, ChatML. The bruteforce way for MCP to work is to provide to client set of all available tools within MCP server. Then MCP client puts all tools into LLM context and harness starts to answer the user queries. Main issue with this approach is tool definitions might be quite wordy hence take a lot of context as they describe full JSON Schema of a given function.

BOTH clients and servers have decided that something needs and should be done. It means that it is not trivial matter to understand how exactly addition of MCP will impact the way how harness works on the user side.

Lets start with client side.

MCP Clients

Lets consider how Codex does it. Codex maintains a list of connected MCP servers.

When users starts a new session Codex looks for the tools that MCP servers are advertising. Depending on a feature flag like “tool_search” or currently “tool_search_always_defer_mcp_tools” and number of available tools Codex makes a decision - present all tools as is to model (so called direct presentation) or hide them (deferred presentation) behind a special tool called “tool_search”.

Lets consider what will happen in the case when all tools are presented. Every tool that MCP servers are presenting will be converted into the tool format that given LLM understands. It includes tool name, tool description and tool input/output schemas with their descriptions. Then those tools are included at every model turn. Comparing this to skills it’s obvious that MCP tools will take more context in the case of direct MCP tools presentation. It is not necessarily very bad as this method of presentation might help with recall as model will always have information about available tools at every turn somewhere nearby.

If tools are deferred then special tool is generated called “tool_search”. In its description Codex puts list of available sources of tools. This description will be seen by LLM. In background Codex will also build a BM25 index to enable actual search of tools from MCP servers. This index is not directly available to LLM and is not impacting the context. By default tool_search retrieves top 8 matching tools.

Then in each turn tool_search tool can be called, it will search corresponding real MCP tools put some number of them into messages.

MCP Servers

There are plenty ways to develop an MCP server. Among popular options are FastMCP, @modelcontextprotocol/sdk and probably many others.

Lets consider FastMCP features as it seems to be one of the popular choices to develop MCP servers.

First, it implements a pattern we have already explored - tool search. In this case FastMCP will replace all available tools with two tools - search_tools and call_tool. Compared to Codex client implementation search_tool can use regexps instead of BM25 but the basic logic still holds. By default it returns 5 tools and then LLM will be shown those tools and decide what to call using call_tool tool. There is also pinning option that will always expose certain tools beyond search_tools and call_tool. I find it to be very thoughtful that search tool will filter tools on the basis of whether a tool is exposed to a particular role.

Second option - Code Mode - is much more radical. Code Mode idea is now implemented in several frameworks and details of how it works are similar in a way.

What is required for any kind of Code Mode to work?

A. The LLM needs to know which tools are available, not necessarily in the form of LLM tools though! Just names of tools might be enough

B. The LLM presented with an option to write a snippet of code in Python/Typescript or any other language that is supported by Code Mode executor

C. The LLM provides a code snippet to code_mode tool and this snippet is going to be executed in a special environment on the MCP server side. Specific objects within this snippet will be replaced by original MCP tools that were shown to the LLM in A.

FastMCP default implementation satisfies those requirements with the following implementation:

  1. LLM is provided with 3 tools - search, get_schema and execute.
  2. search allows LLM to find the names of tools. Next, LLM issues a get_schema tool with the names of selected tools to get details of those tools - input/output schema, maybe some additional data compared to short summary presented by search.
  3. LLM writes a snippet of Python code which will have function calls like await call_tool(<original_mcp_tool>, params)

FastMCP provides several ways to decrease the number of LLM turns doing tool-calling search, get_schema and execute by preloading more and more data about tools first into search tool dropping get_schema and then ultimately to execute tool by putting all available information into execute description.

So, now the question of how MCP will impact the context size sounds non-trivial as there are so many possibilities of client x server interactions.

Summary

After this dive I conclude that impact of MCP servers on the context consumption of LLMs is challenging to estimate given the interplay of client and server. My suggestion would be to study the source of your MCP client if possible to understand how MCP tools are presented and carefully vet the MCP servers one is using. Tool search and Code Mode should be probably exclusive options and be turned on on only client or server side. According to MCP documentation it should be a client’s responsibility.

Skills context consumption model on the other hand seems to be significantly simpler.