The Forward Deployed Engineer, Chapter 5: The AI and Agentic Frontier
This is Part 5 of a series walking through my book The Forward Deployed Engineer. In the previous chapter, we set the technical bar. This one adds the AI-specific layer on top — the skills that separate a senior FDE in 2026 from a senior FDE three years ago.
The first wave of LLM deployments, which ran roughly from late 2022 through 2024, was dominated by chatbots. A model would sit behind an interface, take a question, and produce an answer. The shape was simple, the failure modes were mostly visible, and the operational discipline a team needed to deploy one was a manageable extension of what they already knew. The 2026 wave is something different. Systems plan, retrieve, call tools, escalate, and act inside operations. Agentic is the word, and the architectural line between a chatbot and an agent is sharper than it sounds. Almost every skill that was “nice to have” in 2023 is load-bearing now, and the FDE who hasn’t kept up is, in practice, two product generations behind.
The dominant pattern for grounding agents in customer-specific data is still some form of retrieval-augmented generation, and it is worth being precise about why — and where it breaks. RAG works because it sidesteps the fine-tuning problem; you don’t need to retrain the model on the customer’s data if you can retrieve the relevant slice at inference time. But the retrieval stack has its own failure modes, and they are the ones that quietly degrade deployments long after the launch demo. Chunking strategies that worked on the design partners stop working when the documents get longer. Recency rules that worked in February quietly stale by June. Retrieval poisoning — deliberate or accidental — is a vector almost nobody is testing for. In most deployments I’ve worked on, the retrieval stack is what the deployment actually depends on, not the model.
Multi-Agent Orchestration and the Evals Discipline
The biggest architectural shift of the last two years has been the move from single-model deployments to multi-agent systems — planner-executor pairs, supervisor-worker fleets, peer-review loops, and the agent fleets that some of the more aggressive deployments now run. Each pattern has its own strengths and its own failure surface, and what mature teams have learned is that the orchestration discipline is what makes them work. Timeouts, budgets, tie-breaking protocols, escalation paths. Without those, a multi-agent system is a runaway loop waiting to happen, and the cost-per-call math turns nasty in a hurry when it does.
Of all the new disciplines, the one that most distinguishes a senior FDE in 2026 from one in 2023 is evals. The model is whoever shipped this week. The evals are what tell you whether anything has actually improved. The chapter treats evals as a product surface rather than a test artifact, because that is the right operational stance — you build the eval suite the way you build a feature, you maintain it the way you maintain a service, and you ship it the way you ship code. The failure modes are recognizable. Overfitting to the eval, where the system passes the suite and fails in production. Eval debt, where the suite stops reflecting the customer’s actual workflow because nobody updated it after the workflow drifted. And the “our eval looks great” trap, where a flat eval curve hides a falling production curve because the eval is no longer measuring the right thing. The Eval-Customer Split we get to in Chapter 7 depends on the foundation laid here.
Latency, Cost, and the Model Underneath
The chapter closes on the operational economics that the agentic architectures expose. The cost-per-call math that worked in the prototype breaks at scale; a system that calls the model three times per user action is cheap at a hundred users and ruinous at ten thousand. The latency budget that worked in the prototype breaks in the critical path; a 1.2-second response is fine for a chat interface and unusable for a real-time copilot. And the discipline of model-agnostic deployment — building so that the underlying model can be swapped without rebuilding the system — matters more in 2026 than it did in 2023, because in 2026 the underlying model changes every quarter and the deployments that didn’t plan for that are the deployments now being torn apart and rebuilt. Tomorrow: the soft stack, which is the part of the skillset that all this technical fluency rests on top of.
📖 Get the book
The full AI-and-agentic treatment — RAG patterns, multi-agent architectures, evals as a discipline, adversarial testing, and model-agnostic deployment — in one place.
Sho Shimoda
I share and organize what I’ve learned and experienced.Categories
Tags
Search Logs
Development & Technical Consulting
Working on a new product or exploring a technical idea? We help teams with system design, architecture reviews, requirements definition, proof-of-concept development, and full implementation. Whether you need a quick technical assessment or end-to-end support, feel free to reach out.
Contact Us