Closing the loop

Efficient agent workflows depend on a closed feedback loop. Agents should be able to gather signals from tests, logs, and runtime checks continuously, without waiting for manual input at every step.

The tips below focus on building that loop so agents can diagnose and fix issues more autonomously.

Tests, tests, tests

Make sure your agent writes tests for any regression it finds before attempting to fix actual code. If it doesn’t do this by itself, consider telling it so in your AGENTS.md.
Pay attention to how models assert the expected state. Many models tend to write leaky assertions, which only catch the exact issue they just reasoned about.

…

Backend logs and database access

Make your app tee logs to a *.log file. This lets agents observe runtime behavior. Models are also good at adding their own temporary logs while debugging.
Make it easy for an agent to connect to your database with psql or sqlite3. You can even use this interface instead of a database GUI.
Tidewave .

…

Leverage CLIs

Models are trained a lot on Bash. They breathe it and can be very productive when they process data through shell one-liners or quick Python scripts.
If you build a quick library for some remote API:
1. Try to make it easy for agents to play with this API.
2. In a non-interactive language (Go, Rust, Swift, etc.), consider asking your agent to whip up a quick CLI on top of your code.
3. In JS, Python, Elixir, or Ruby, agents can efficiently use REPLs or one-off scripts.

If you’re building a CLI that agents will use, design it for non-interactive use from the start:

Make it non-interactive. Every input needs a flag equivalent. Don’t drop into interactive prompts mid-execution — agents can’t press arrow keys.
Make --help useful. Include examples in each subcommand’s help output. Agents pattern-match off examples faster than they read descriptions.
Accept stdin. Agents think in pipelines and want to chain commands. Don’t require positional args in unusual orders.
Fail fast with actionable errors. If a required flag is missing, show the correct invocation immediately. Agents self-correct well when you give them something to work with.
Make commands idempotent. Agents retry often. Running the same command twice should be a no-op, not a duplicate action.
Add --dry-run for destructive actions. Let agents validate a plan before committing to it.
Add --yes to skip confirmations. Make the safe path the default, but allow bypassing it.
Use a predictable command structure. Pick a pattern, such as resource + verb, and use it everywhere. If an agent learns mycli service list, it should be able to guess mycli deploy list.
Return data on success. Output IDs and URLs, not just a success message.

Building CLIs for agents Eric Zakariasson. 2026-03-25

Also, take a look at these skills:

tmux skill Armin Ronacher. 2026-01-23 - useful if you make or use interactive CLIs. Agents are pretty good at using GDB/LLDB via tmux.

…

Automating web frontend QA

Current frontier models are surprisingly capable of browsing websites, clicking around, and observing what happens, provided they are given the right tools.

Try using one of these tools:

Make it easy for agents to spawn a new instance of the app by themselves. Ideally, each instance should run on a separate port so multiple agents can work in parallel.

Also, take a look at these skills:

vercel-react-best-practices skill Vercel. 2026-01-16

…

Automating mobile app QA

You can also tell the agent to play with a phone simulator.

Take a look at:

Argent Software Mansion.
Radon AI
XcodeBuildMCP Sentry.
callstackincubator/agent-device: CLI to control iOS and Android devices for AI agents Callstack.

Also, have a look at these skills:

expo/skills Expo.
react-native-best-practices skill Callstack.

…

Closing the loop

Tests, tests, tests

Backend logs and database access

Leverage CLIs

Automating web frontend QA

Automating mobile app QA

Attribution

Authors

Contributors