Loop Engineering, Chapter 4: From Laptop Loops to LLMOps

This is Part 4 of a series walking through my book Loop Engineering: Scaling and Governing Agentic AI. In the previous chapter, we chose an orchestration framework. This one is about graduating from a script that works to a system you can operate.

Almost anyone can build what I call a laptop loop: a script on a personal machine that wakes up, does agentic work, and produces something useful. The laptop loop is a genuine achievement and a terrible production system at the same time, and holding both of those thoughts at once is the start of this chapter. It runs under a personal credential that should never have touched a server. Its spending is unmonitored, so a loop stuck retrying a $0.10 call can become a $10,000 surprise before anyone notices. It crashes mid-run and leaves the world half-finished. It has no memory of yesterday and no record an auditor could inspect.

The scale multiplier is the part people underestimate. On your laptop, under your eye, every one of those failure modes is a tolerable annoyance. The moment the same loop runs unattended for someone else, each one becomes an operational emergency, because nobody is watching and the loop is faster than the humans who would.

Why LLMOps is not just MLOps

It is tempting to assume the discipline that managed machine-learning systems will carry over, and some of it does. But managing a language system differs in four ways that matter. The model is external and mutable — it changes underneath you on a vendor's schedule. The output is language, which resists the clean numerical metrics MLOps was built around. The cost is a perpetual, variable token bill rather than a fixed training run. And the system takes actions in the world, which carries a kind of risk a prediction never did. LLMOps is the practice that grows up around those four differences, and pretending it is just MLOps with a new model is how teams get surprised.

💡 Key idea: The agent should never hold a provider key. Policy, budgets, and rate limits belong in one place that every agent passes through — not scattered, piecemeal, through application code where each one can be forgotten.

That one place is the AI gateway: a single door between every agent and every model provider. It centralizes model fallbacks so one provider's outage does not stop the system, enforces rate limits and spend ceilings centrally, vaults credentials so they never live in agent code, and gives you observability across every model, agent, and tool call at once. The chapter pairs it with a treatment of observability as a flight recorder — cost and token usage attached to every span — so that when something goes wrong you can reconstruct exactly what happened and what it cost.

Tomorrow: operating a loop safely is not the same as trusting it. Chapter 5 opens the governance section with the security principle most agentic systems get wrong.

📖 Get the book

The full chapter — the laptop-loop failure catalog, the LLMOps-versus-MLOps breakdown, a declarative gateway config, and an observability sketch with cost accounting — in one place.

Get Loop Engineering on Amazon →

2026-06-18

Sho Shimoda

I share and organize what I’ve learned and experienced.