The Engineering of Intent, Chapter 11: The Art of Agentic Debugging

This is Part 11 of a series walking through my book The Engineering of Intent. In the previous chapter, we walked through the five-layer quality gate stack that catches the structural problems. This chapter is about the bugs that slip through — the ones that don’t reproduce locally and whose causes are spread across hundreds of files you didn’t personally write.


Debugging in the AI-Native Regime Is Archaeology

In the AI-native regime, debugging is archaeology. The code may have been written by an agent you supervised loosely. The commit message may not describe what you were actually thinking that Tuesday afternoon. Chapter 11 gives you the tools to reconstruct causation after the fact — which is most of the job now.


The Self-Correction Loop

Reproduce, capture full trace, hand to the agent, ask for hypotheses and diagnostics, iterate. The loop converges in a few iterations when the human insists on root cause before accepting fixes. The loop fails when the human says “just fix it.”

💡 Key idea: Before re-reading the code, ask the agent to generate a Mermaid diagram of the control flow. Many bugs are immediately apparent in a diagram that were invisible in text. This one trick has saved me dozens of hours over the past year, and it’s the cheapest move in the chapter.

Bisection, Observability, and the Caching Heisenbug

Two disciplines the chapter argues are non-negotiable:

  • Learn git bisect cold. Scripts that automate bisection on common failure modes pay for themselves within a month. In AI-native codebases, the offending commit is often second-order; bisect to the commit, then use the agent to trace the true cause.
  • Observability as substrate. Agentic debugging is only as good as the traces you can feed. Structured logs, distributed traces, state event histories. If you cannot produce the full sequence of events for a reported bug in one query, fix observability before you fix the bug.

“An intermittent bug for one customer. Hours of manual investigation turned up nothing. The agent, given the request history and cache configuration, hypothesized an edge case in cache key computation conflating users whose IDs differed by trailing whitespace. Reproduced in fifteen minutes, fixed in twenty more, and codified a ‘trim at the boundary’ convention that made the class impossible.”


The Three Debugging Anti-Patterns

  • The just-fix-it anti-pattern — accepting a fix without confirming the root cause.
  • The context-overflow anti-pattern — adding more files when the question itself is malformed.
  • The swap-the-model anti-pattern — reaching for a different model instead of restating the problem.
âš  What not to do in a live incident: Do not let agents execute autonomous commands against production without human confirmation. Do not refactor while debugging — those are two operations, not one. Do not close the incident before conventions are updated to prevent the class of bug. Incidents that close without a convention update will recur, usually worse, usually in a less convenient week.

Next up — Chapter 12: The GenDD Pod. Part V of the book zooms out from the individual and the codebase to the team. Chapter 12 introduces the GenDD Pod — the team structure that actually ships AI-native work sustainably — and the handful of roles whose job descriptions have changed the most in the last two years.


📖 Want the full picture?

The chapter walks the full Reproduce → Hypothesize → Diagnose → Root Cause → Fix loop, the bisection-under-velocity playbook, the observability checklist, the caching heisenbug case study in full, the three debugging anti-patterns with recovery tactics, and the live-incident runbook.

Get The Engineering of Intent on Amazon →

2026-04-27

Sho Shimoda

I share and organize what I’ve learned and experienced.