Skip to content

Autoresearch

You’ve already seen how a closed feedback loop makes agents more autonomous — tests and scripts let them self-correct without waiting for you. Autoresearch takes that idea further. Instead of fixing one known problem, the agent explores a space of potential improvements on its own, running experiments and keeping what works.

It’s particularly effective for optimization tasks where you can express the goal as a number.

You give the agent two things:

  • A task description — what to optimize, what constraints to respect, and what “success” means.
  • A benchmark script — something the agent runs after each experiment to get a measurable result.

The agent then runs a loop: propose a change, apply it, measure it, keep it or revert it, repeat. Each experiment is isolated, so results stay interpretable.

Three conditions make autoresearch effective:

  • A measurable goal. “Make it faster” becomes actionable when the agent can run a script and read a number. Without a benchmark, there’s no feedback loop.
  • A robust test suite. Tests let the agent discard changes that break correctness. Without them, the agent can’t safely move fast.
  • Isolated experiments. Trying one change at a time keeps results interpretable. If everything changes at once, you can’t tell what worked.

These conditions apply broadly — autoresearch works for performance, but also for any goal you can express as a script output.

Read more: