Skip to content

Closing the loop

Efficient agent workflows depend on a closed feedback loop. Agents should be able to gather signals from tests, logs, and runtime checks continuously, without waiting for manual input on every step.

The tips below focus on building that loop so agents can diagnose and fix issues more autonomously.

  1. Make sure your agent writes tests for any regression it finds before attempting to fix actual code. If it doesn’t do this by itself, consider telling it so in your AGENTS.md.
  2. Pay attention to how models assert the expected state. Many models tend to write leaky assertions, which only catch the exact issue they just reasoned about.
  1. Make your app tee logs to a *.log file. This will allow agents to observe runtime behavior. Models are also good at adding their own temporary logs while debugging.
  2. Make it easy for an agent to connect to your database via psql or sqlite3. You can even use this interface in place of database GUIs.
  3. Tidewave .
  1. Models are trained a lot on Bash. They breathe it and are very productive when they can process data through shell one-liners or quick Python scripts.
  2. If you build a quick library for some remote API:
    1. Try to make it easy for agents to play with this API.
    2. In a non-interactive language (Go, Rust, Swift, etc.), consider asking your agent to whip up a quick CLI on top of your code.
    3. In JS, Python, Elixir, or Ruby, agents can efficiently use REPLs or one-off scripts.

If you’re building a CLI that agents will use, design it for non-interactive use from the start:

  • Make it non-interactive. Every input needs a flag equivalent. Don’t drop into interactive prompts mid-execution — agents can’t press arrow keys.
  • Make --help useful. Include examples in every subcommand’s help output. Agents pattern-match off examples faster than they read descriptions.
  • Accept stdin. Agents think in pipelines and want to chain commands. Don’t require positional args in unusual orders.
  • Fail fast with actionable errors. If a required flag is missing, show the correct invocation immediately. Agents self-correct well when you give them something to work with.
  • Make commands idempotent. Agents retry often. Running the same command twice should be a no-op, not a duplicate action.
  • Add --dry-run for destructive actions. Let agents validate a plan before committing to it.
  • Add --yes to skip confirmations. Make the safe path the default, but allow bypassing it.
  • Use a predictable command structure. Pick a pattern (e.g. resource + verb) and use it everywhere. If an agent learns mycli service list, it should be able to guess mycli deploy list.
  • Return data on success. Output IDs and URLs, not just a success message.

Read more:

Also, take a look at these skills:

  • tmux skill Armin Ronacher. 2026-01-23 - useful if you make/use interactive CLIs. Agents are pretty good at using GDB/LLDB via tmux.

Current frontier models are surprisingly capable of browsing websites, clicking around, and observing what happens, provided they are given the right tools.

Try using one of these tools:

Make it easy for agents to spawn a new instance of the app by themselves. Ideally on a separate port so that multiple agents can work in parallel.

Also, take a look at these skills:

You can also tell the agent to play with a phone simulator.

Take a look at:

Also, have a look at these skills: