Harness engineering

Harness engineering is the practice of improving a coding agent’s output quality and reliability by shaping the software around it, not just the prompt you type into the chat. Instead of treating the coding agent as a black box, you shape its harness: the instructions, tools, context, hooks, and integrations around the model, so that correct behavior becomes easier and more repeatable.

In practice, that means shaping the scaffolding around the model so the right information, tools, and constraints show up at the right time. This page walks through several common ways to do that and the tradeoffs that come with them.

AGENTS.md

AGENTS.md is the lightest-weight way to steer an agent inside a repository. It is not the only mechanism available, but it is usually the first place where project-specific guidance belongs.

Good and bad practices

Let’s start by citing rules from Claude docs . They are pretty good:

✅ Include	❌ Exclude
Bash commands Claude can’t guess	Anything Claude can figure out by reading code
Code style rules that differ from defaults	Standard language conventions Claude already knows
Testing instructions and preferred test runners	Detailed API documentation (link to docs instead)
Repository etiquette (branch naming, PR conventions)	Information that changes frequently
Architectural decisions specific to your project	Long explanations or tutorials
Developer environment quirks (required env vars)	File-by-file descriptions of the codebase
Common gotchas or non-obvious behaviors	Self-evident practices like “write clean code”

On top of that, we would add the following:

✅ Include	❌ Exclude
Only crucial and frequently needed hints	Anything that is one `ls` or `cat` call away
Terser language throughout the file	Rules better handled by hooks or extensions

As your codebase and AGENTS.md grow, it will make sense to move chunks of that file into either skills or separate files in the docs/ directory, while AGENTS.md becomes mostly a table of contents.

Below, you can see an example of what NOT to do:

This is a Next.js project showcasing pet grooming practices.

# Command Reference

```bash
# Install dependencies
npm install

# Clean build artifacts
npm run clean

# Type checking
npm run typecheck

# Linting
npm run lint
```

# Structure

- `src/SCREENS.ts`: Screen name constants
- `src/ROUTES.ts`: Route definitions and builders
- `src/NAVIGATORS.ts`: Navigator configuration

Skills

Agent Skills is an open standard for extending AI agents with specialized capabilities. Skills package domain-specific knowledge and workflows that agents can use to perform specific tasks.

A skill is just a well-named and well-described directory containing a SKILL.md file with arbitrary Markdown content. Agents initially only see all loadable skill names and descriptions, and they need to explicitly load full skill definitions. You can mention a skill explicitly, or the model may decide to do it by itself. What goes inside a skill is up to your imagination.

Choosing the right skills

Skills are domain-specific by nature. We can’t tell you upfront what skills you might need without knowing what you’re working on. What we can tell you is that unlike other harness-specific mechanisms, skills are largely portable and can be used in many creative ways. You may use them to save repetitive prompts, build more elaborate rules, provide hard-to-reach up-to-date documentation about your toolchain or dependencies, or define agent personalities.

Where to find skills

skills.sh

The skills.sh directory is a great place to look for valuable skills. The accompanying npx skills CLI installs them in the right format for 40+ agent harnesses, including Claude Code, Cursor, Amp, Codex, Gemini CLI, GitHub Copilot, and many more.

npx skills CLI

Skill repositories

There are many GitHub repositories that collect useful skills, similar to the Awesome X list repositories. Many companies use these repositories as part of their marketing strategy, providing skills that offer useful guidance for their products.

For example:

anthropics/skills Anthropic.
openai/skills OpenAI.
software-mansion-labs/skills Software Mansion.
GitHub Topic: agent-skills

Skills are forkable and your own

Skills are meant to be amended by you or your agent to tailor to your project, machine, and taste.

Many harnesses have skills for creating new skills or updating/forking existing ones.
Don’t be afraid to fork a third-party skill. If the “upstream” skill is updated, tell your agent to update your fork.

Security considerations

Before you try a new skill, always read its entire source and think about its security considerations. Skills are a powerful mechanism partly because they can be insecure.

The surrounding ecosystem is still very young, and many skill-based attacks are happening in the wild. Especially be cautious about updating third-party skills, you never know when an upstream repository becomes compromised and the attacker inserts prompt injections.

These writeups show how this can go wrong in practice:

Weaponizing Claude Skills with MedusaLocker Inga Cherny. 2025-12-02. A seemingly harmless skill step in a GIF workflow was used to fetch and execute MedusaLocker ransomware, showing how a reviewed skill can still hide second-stage execution.
Hidden Unicode Instructions in Skills Johann Rehberger. 2026-02-11. Hidden Unicode Tag characters were used to smuggle invisible instructions into a skill, which means visual source review alone may miss malicious behavior.

MCP

MCP is a protocol through which AI applications can connect to data sources (local and remote), tools (applications and services), and workflows (like domain-specialized models). MCP servers plug new tools into your agent to extend its capabilities beyond the filesystem, bash commands, and web browsing.

MCP servers can run locally (like npx @playwright/mcp ), allowing your agent to interact with your local environment; or be remote HTTP-based servers, like Linear MCP , that connect your agent to remote services.

Pros and cons

One advantage of MCP that CLIs and skills do not solve is authentication: you can easily connect to MCP services using API keys or OAuth with the UX provided by your harness.

The downside of MCP is that the specification requires harnesses to inject information about all MCP tools into the system prompt. This will sound less magic in future chapters, but in a nutshell, too many MCP servers will make your agent dumb, and the harness will take a long time to start. Claude Code is experimenting with lazy loading MCP tools, but this is still an unstable feature that is also not available elsewhere.

Another problem is that MCP tools are not easily composable. Agents are trained a lot on Bash, and they are excellent on piping, awk, or jq. They can’t use these tools on MCP outputs.

What MCPs might I use?

The current consensus is to use MCPs mostly for connecting to external services, like Linear, Figma, Slack or Sentry. A good starting point for seeing which popular MCPs are available is GitHub MCP Registry .

Some MCPs used to be popular, but now have more efficient alternatives in the form of CLIs or Skills. Examples include:

GitHub MCP provides far too many tools and overloads the agent’s context. Models have great knowledge of the gh CLI and can do a lot by leveraging its JSON output and piping through Bash commands.
Context7 can be effectively replaced with domain-dedicated skills akin to vercel-react-best-practices , or just more and more documentation websites support Content: text/markdown responses.

Security considerations

MCP servers have security considerations similar to those of Skills and any other CLI or remote service. You are unlikely to use many MCPs, and the ones most commonly used are access points to well-known services. So… in such cases just make sure your agents will not perform destructive actions on your behalf 🙂.

Subagents

Subagents visualized

Subagents are extra agent runs that the main agent starts to handle a narrower piece of work. Their biggest advantage is not that they magically become a “database expert” or a “frontend engineer”. It is that they keep context separated.

This matters because long agent conversations degrade quickly. Exploration notes, false starts, and verbose tool output all compete with the actual task for attention. Offloading a bounded investigation to a subagent keeps the main thread smaller and gives you a cleaner final result.

Create custom subagents - Claude Code Docs Anthropic.
Cursor Subagents Cursor.
Codex Multi-agents OpenAI.

Hooks

Hooks are the part of the harness that runs deterministic logic around the agent. Different tools expose them differently, but the idea is the same: when something should happen every time, do not rely on the model to remember it.

Hooks are a better home for repetitive enforcement than AGENTS.md. Formatting code, running linters, asking for approval before sensitive commands, posting notifications, or opening a pull request are all examples of work that can often be attached to a well-placed hook.

A good hook reduces prompt clutter instead of adding to it. If the formatter succeeds, the agent usually does not need to hear about it. If a check fails, then the failure should come back with enough detail to guide the next step. This kind of back-pressure is useful because it keeps routine success paths out of the model’s way while still surfacing actionable problems. Written instructions should cover the cases that require judgment. Hooks should cover the cases that do not.

Hooks reference - Claude Code Docs Anthropic.
Hooks Docs Cursor.

When to use what

If you are unsure which mechanism to reach for, use this rule of thumb:

Tool	Use it when
AGENTS.md	You constantly repeat specific lightweight information in your prompts.
Skills	You need reusable, named knowledge or workflows or for anything not covered by other tools.
MCP	You need authenticated access to an external service that you use very frequently.
Subagents	You can delegate a bounded or parallelizable task to keep the main thread smaller.
Hooks	You have deterministic, mechanical logic meant to happen every time without depending on the model to remember it.