Autoresearch

You’ve already seen how a closed feedback loop gives agents more autonomy: tests and scripts let them self-correct without waiting for you. Autoresearch takes that idea further. Instead of fixing one known problem, the agent explores a space of possible improvements, runs experiments, and keeps what works.

It works especially well for optimization tasks where the goal can be expressed as a number.

How it works

You give the agent two things:

A task description — what to optimize, what constraints to respect, and what “success” means.
A benchmark script — something the agent runs after each experiment to get a measurable result.

The agent then runs a loop: propose a change, apply it, measure it, keep it or revert it, repeat. Each experiment is isolated, so results stay interpretable.

…

Why it works

Three conditions make autoresearch effective:

A measurable goal. “Make it faster” becomes actionable when the agent can run a script and read a number. Without a benchmark, there’s no feedback loop.
A robust test suite. Tests let the agent discard changes that break correctness. Without them, the agent can’t safely move fast.
Isolated experiments. Trying one change at a time keeps results interpretable. When everything changes at once, you can’t tell what worked.

These conditions are not limited to performance work. Autoresearch can help with any goal you can express as script output.

karpathy/autoresearch Andrej Karpathy. 2026-03-06
Shopify/liquid: Performance: 53% faster parse+render, 61% fewer allocations Simon Willison. 2026-03-13

…

Autoresearch

How it works

Why it works

Attribution

Authors

Contributors