Skip to content

Autoresearch

You’ve already seen how a closed feedback loop gives agents more autonomy: tests and scripts let them self-correct without waiting for you. Autoresearch takes that idea further. Instead of fixing one known problem, the agent explores a space of possible improvements, runs experiments, and keeps what works.

It works especially well for optimization tasks where the goal can be expressed as a number.

You give the agent two things:

  • A task description — what to optimize, what constraints to respect, and what “success” means.
  • A benchmark script — something the agent runs after each experiment to get a measurable result.

The agent then runs a loop: propose a change, apply it, measure it, keep it or revert it, repeat. Each experiment is isolated, so results stay interpretable.

Three conditions make autoresearch effective:

  • A measurable goal. “Make it faster” becomes actionable when the agent can run a script and read a number. Without a benchmark, there’s no feedback loop.
  • A robust test suite. Tests let the agent discard changes that break correctness. Without them, the agent can’t safely move fast.
  • Isolated experiments. Trying one change at a time keeps results interpretable. When everything changes at once, you can’t tell what worked.

These conditions are not limited to performance work. Autoresearch can help with any goal you can express as script output.

Read more:

Attribution

Authors

Contributors