How Skills, Agents, and MCP Servers Work Together in Claude Code
You've probably hit this wall before: you paste your entire codebase into an AI chat, add a detailed prompt, and get back... nothing useful. Or worse, the model starts hallucinating function names that don't exist because it lost track of what it read 50,000 tokens ago.
This is the context window problem. And it's why throwing more code at an LLM doesn't make it smarter.
Claude Code solves this differently. Not by having a bigger context window (though that helps), but by being smart about how it uses the space it has. It does this through three things working together: skills, sub-agents, and MCP servers.
Let me show you how these pieces fit together. Once you understand this, you'll stop fighting the AI and start working with it.
The Context Window Problem
Here's something that took me a while to internalize: LLMs don't actually "remember" the beginning of a long conversation the way you'd expect. The context window is more like a spotlight than a filing cabinet. As you add more tokens, earlier content gets less attention. Paste 100,000 tokens of code, and the model is essentially skimming by the time it reaches your question.
This explains a lot of frustrating AI behavior. The model seems smart at first, then gets confused. It forgets things you told it earlier. It hallucinates details from files it "read" but clearly didn't retain.
The fix isn't a bigger context window. The fix is not needing one.
How Claude Code Actually Thinks
Claude Code is an agent. That sounds fancy, but it just means it can plan, execute, observe results, and iterate. When you ask it to "fix the failing tests in my project," it doesn't try to understand your entire codebase at once. Instead, it works in focused steps:
- Read the project structure to understand what's where
- Run the tests to see what's actually failing
- Look at the specific failing test and the code it tests
- Make a fix
- Run the tests again
- If still broken, try something else
Each step works with only the context it needs. The agent maintains a working summary of what it's learned, not a raw log of everything it's seen. This is how Claude Code can iterate 20 times on a complex bug without running out of context space.
Think of it like how you'd actually debug something. You don't hold your entire codebase in your head. You focus on one area, form a hypothesis, test it, and adjust. Claude Code works the same way.
Sub-Agents - The Real Game Changer
Here's where it gets really interesting. Claude Code can spawn sub-agents.
A sub-agent is a separate Claude instance with its own clean context window. You can tell Claude Code to spin one up, give it a focused task, let it work independently, and get back a summary of what it found. The sub-agent's full context never pollutes your main conversation.
This is huge. Instead of one agent trying to hold everything:
You: "Analyze this codebase for security issues, performance problems, and test coverage"
Main agent spawns sub-agents:
├── Sub-agent 1: Security analysis (own context, reads security-relevant files)
├── Sub-agent 2: Performance review (own context, focuses on hot paths)
└── Sub-agent 3: Test coverage (own context, examines test files)
Each sub-agent returns a summary to the main agent
Main agent synthesizes everything into one response
The sub-agents can read thousands of lines of code each. But the main agent only receives their condensed findings. Your main context stays clean while still getting deep analysis across multiple areas.
Think of it like delegating to junior developers. You don't need to read every line they reviewed. You need their conclusions and any red flags they found. Sub-agents work the same way.
How to Actually Use Sub-Agents
You can explicitly ask Claude Code to spawn a sub-agent:
- "Spawn a sub-agent to analyze the authentication module in depth"
- "Use a sub-agent to review all the database queries for N+1 problems"
- "Have a sub-agent go through the test files and report back on coverage gaps"
The sub-agent gets its own fresh context, does the deep work, and returns a summary. Your main conversation doesn't get bloated with all the raw analysis.
This is especially powerful for large codebases. Instead of Claude Code losing track of things halfway through, you get focused analysis that actually scales.
Where Skills Come In
So Claude Code can break down tasks, spawn sub-agents, and work in focused steps. But how does it know how to do specific things well? That's where skills come in.
Skills are packaged instructions that Claude reads before tackling specific types of tasks. When you ask Claude Code to create a PowerPoint presentation, it first reads the skill file at /mnt/skills/public/pptx/SKILL.md. This file contains best practices, common pitfalls, and exact code patterns that actually work.
A skill typically includes:
- A
SKILL.mdfile with detailed instructions - Code templates and examples
- Known limitations and workarounds
- Output format specifications
The key insight is that skills are loaded on demand. Claude doesn't waste context window space on PowerPoint instructions when you're debugging Python. It loads the relevant skill when it needs it, uses that expertise for the task, and moves on.
This is different from tools. Tools are functions Claude calls to get data back, like web_search or bash_tool. Skills are knowledge packages that change how Claude approaches problems. The distinction matters because tools cost tokens in the system prompt (more tools = less room for your code), while skills are loaded only when relevant.
| Aspect | Tools | Skills |
|---|---|---|
| How it works | Call function, get result | Load instructions, modify behavior |
| Token cost | Always present in context | Loaded on demand |
| Best for | Discrete actions (API calls, file ops) | Complex workflows (document creation) |
Sub-agents can load skills too. A sub-agent analyzing security can load security-specific skills without those instructions cluttering up your main conversation.
The Iteration Loop
One pattern you'll see constantly is the iteration loop. Claude Code generates something, checks if it works, and fixes issues until it's right.
1. Generate code
2. Run it (or lint it, or type-check it)
3. See if it works
4. If not, analyze the error and try again
5. Repeat until it passes
This is why Claude Code can fix bugs that a single-shot prompt would miss. It's not just generating code and hoping. It's testing its own work and responding to real feedback.
When Claude Code creates a React component and then runs the dev server to check for errors, it's using this loop. When it writes a function and runs the tests to verify it works, same thing. The generation and verification steps work together.
This also explains why Claude Code sometimes takes longer than you'd expect. It's not slow. It's thorough. It's running your tests, checking for type errors, and iterating until things actually work.
MCP Servers - Connecting to External Systems
So far we've talked about how Claude Code thinks (agentic steps, sub-agents) and what it knows (skills). But what about connecting to external systems? That's where MCP comes in.
The Model Context Protocol is how Claude Code talks to databases, GitHub, file systems, CI/CD pipelines, and pretty much anything else. Think of MCP as USB-C for AI: a standard interface that lets any compatible AI system connect to any compatible service.
MCP servers expose three types of capabilities:
Tools - Functions Claude can call, like querying a database or creating a pull request
Resources - Data sources Claude can read, like file contents or database schemas
Prompts - Reusable templates for common interactions
For web development, the most useful MCP servers include:
- Filesystem server - Read and write files in your project (sandboxed for safety)
- GitHub server - Manage repos, PRs, issues, and workflows
- Database servers - Query PostgreSQL, MySQL, SQLite through natural language
Here's what a simple MCP server looks like in TypeScript:
import { McpServer } from '@modelcontextprotocol/sdk/server/mcp.js';
import { StdioServerTransport } from '@modelcontextprotocol/sdk/server/stdio.js';
const server = new McpServer({
name: 'code-analyzer',
version: '1.0.0'
});
server.registerTool(
'analyze_complexity',
{
description: 'Analyze cyclomatic complexity of a file',
inputSchema: { filePath: { type: 'string' } }
},
async ({ filePath }) => {
const result = await analyzeFile(filePath);
return { content: [{ type: 'text', text: JSON.stringify(result) }] };
}
);
const transport = new StdioServerTransport();
await server.connect(transport);
The power of MCP servers comes from how they work with sub-agents. A sub-agent can query a database through MCP, analyze thousands of rows, and return just the relevant findings. The raw data never touches your main context.
Putting It All Together
Let's walk through a real example. You ask Claude Code: "Review this PR and check if the database migrations are safe."
Here's what actually happens:
- Main agent loads relevant skills for code review
- It spawns a sub-agent to analyze the PR changes in depth
- That sub-agent calls GitHub MCP server to fetch PR details
- Another sub-agent reads migration files and queries current database schema through MCP
- Sub-agents return focused summaries: "Migration adds index, looks safe" or "Warning: this drops a column that's still referenced"
- Main agent synthesizes sub-agent findings into a clear response
Notice how the heavy lifting happens in sub-agents with their own contexts. The main agent orchestrates and summarizes. Your conversation stays clean while getting deep analysis.
Why This Matters For Your Workflow
Understanding this architecture changes how you work with Claude Code:
You stop fighting the context window. Once you realize that sub-agents handle deep analysis separately, you stop trying to paste your entire codebase into the main conversation. Let the sub-agents do the heavy reading.
You explicitly use sub-agents for big tasks. Instead of hoping Claude Code figures it out, tell it: "Use a sub-agent to analyze the payment module" or "Spawn sub-agents for security, performance, and test review." You're working with the architecture instead of against it.
You understand why some tasks work better than others. Tasks that can be delegated to sub-agents scale great. Tasks requiring constant back-and-forth in one context hit limits faster.
You trust the iteration loop. When fixing bugs, let Claude iterate. It will often find issues you didn't anticipate by actually running the code and observing failures. Don't interrupt it after the first attempt.
Practical Tips
Explicitly ask for sub-agents on big tasks. "Spawn a sub-agent to review the auth system" works better than hoping Claude Code will figure out it needs to delegate.
Let Claude read skills first. If you're creating documents or working with specific file formats, Claude performs better when it reads the relevant skill file before starting.
Connect MCP servers for repetitive integrations. Instead of copy-pasting database outputs or GitHub PR details into your prompts, connect the relevant MCP servers. Claude pulls exactly what it needs.
Check the ecosystem before building custom tools. The MCP ecosystem has grown to over 5,000 community servers covering Stripe, Cloudflare, CI systems, and most services you'd want to integrate. Someone probably already built what you need.
Give feedback when it's wrong. When Claude Code produces wrong results, tell it what went wrong. The iteration loop works for your feedback too.
The Bigger Picture
MCP was recently donated to the Linux Foundation, with OpenAI, Google, and Microsoft joining as supporting members. This signals that the protocol is becoming industry standard infrastructure, not just an Anthropic thing.
The combination of sub-agents (parallel work in separate contexts), skills (domain expertise loaded on demand), and MCP servers (clean connections to external systems) represents how modern AI coding assistants actually work. It's not magic. It's good architecture solving real constraints.
The sub-agent pattern especially is worth understanding deeply. It's the difference between an AI that chokes on large codebases and one that scales. When you ask Claude Code to analyze something big, it's not stuffing everything into one context and praying. It's delegating to focused sub-agents, each with clean context, and synthesizing their findings.
Understanding these pieces helps you work with the system instead of against it. You'll get better results, hit fewer walls, and know when to trust the AI versus when to take over manually.
