The Engineering of Intent, Chapter 10: The Five-Layer Quality Gate Stack

This is Part 10 of a series walking through my book The Engineering of Intent. In the previous chapter, we covered context engineering — shaping what the agent reads at the start of every session. Part IV of the book turns to defense-in-depth: what catches the inevitable mistakes before they reach production.


Every AI-Generated Change Must Pass Five Gates Before a Human Sees It

Every AI-generated change must pass through a stack of automated checks before a human even sees it. The stack has five layers. None of them is optional.

Chapter 10 walks each layer, tunes the strictness dial, and works through the fintech case study where this exact stack reduced cycle time from two hours to twenty-two minutes and cut production incidents three-fold.


The Five Layers, Compressed

  1. Static Linting. Catches hallucinated imports, unused variables, obvious bugs. Modern linters run on every change without noticeable latency. The most valuable lint rule in AI-native projects is the “unknown import” rule — agents occasionally invent libraries or reference APIs that do not exist.
  2. Strict Type Checking. Types are the fastest form of executable documentation. When an agent changes a function signature, a strict type checker immediately identifies every caller that needs updating. Strict everywhere possible: TypeScript strict, Pyright strict, Sorbet with strict sigils.
  3. SAST and Security Scans. AI-generated code has characteristic vulnerabilities: string-concatenated SQL, sensitive data in debug logs, “temporarily” disabled CSRF. Semgrep, Snyk, CodeQL. Maintain a custom ruleset for your organization’s anti-patterns.
  4. Automated Test Synthesis. Agent-written tests cover the happy path; adversarial tests require human judgment. Require both. The combination covers more surface than either alone.
  5. Agentic End-to-End Testing. Autonomous browser agents exercise the application as users would. Use as a final gate before merge, not a per-commit check. Keep the scope small.
💡 Key idea: The layers aren’t interchangeable. They’re ordered by speed and determinism: fast deterministic checks run on every commit; slow probabilistic checks run on merge queues with explicit retry budgets. A team I worked with went from a 30% flake rate to under 5% by separating these two tiers and trusting the signal again.

The Anti-Patterns That Quietly Kill Gates

  • The aspirational gate that exists in warning-only mode forever. Delete it or fix it.
  • The swiss army gate that bundles everything into one slow serial job. Parallelize.
  • The override culture where the bypass button is used weekly. Instrument overrides; expect fewer than one a month on a healthy team.

“Gate strictness is a dial, not a switch. Run a monthly gate review: classify every firing as true or false positive; tune any rule with over ten percent false positives. Do not add a new rule in response to a single incident — wait for a second occurrence to prove the class exists.”


The Reviewer Agent Pattern

A reviewer agent, run independently of the author agent, reads the diff and emits comments. Give it a different model when possible — different failure modes approximate classical pair review. Reviewer agents catch consistency violations across large diffs and Context Pack contradictions. They do not replace humans on product taste.

âš  What not to do: Do not ship without types where the language supports them. Do not rely on code coverage as a quality signal. Do not let the author agent silence its own lint violations. Do not combine reviewer and author agents into one run. Do not let legacy exceptions accumulate without a retirement budget. Each of these failures invalidates a layer of the stack; each has cost a team I’ve worked with a production incident.

Next up — Chapter 11: The Art of Agentic Debugging. Gates catch the structural problems. But the interesting bugs in AI-native systems are the ones that slip through gates and then don’t reproduce locally. Chapter 11 is about the self-correction loop, bisection under velocity, observability as substrate, and the incident debugging practices that scale to AI-native throughput.


📖 Want the full picture?

The chapter walks each layer in depth with concrete tool recommendations and configuration snippets, the flaky-pipeline case study (30% to sub-5% flake rate), the regulated fintech rollout over a quarter with month-by-month sequencing, the full anti-pattern catalog, and the reviewer agent pattern with the model-diversity configuration that catches bugs a single model would miss.

Get The Engineering of Intent on Amazon →

2026-04-26

Sho Shimoda

I share and organize what I’ve learned and experienced.