The Forward Deployed Engineer, Chapter 5: The AI and Agentic Frontier

This is Part 5 of a series walking through my book The Forward Deployed Engineer. In the previous chapter, we set the technical bar. This one adds the AI-specific layer on top — the skills that separate a senior FDE in 2026 from a senior FDE three years ago.

The first wave of LLM deployments, which ran roughly from late 2022 through 2024, was dominated by chatbots. A model would sit behind an interface, take a question, and produce an answer. The shape was simple, the failure modes were mostly visible, and the operational discipline a team needed to deploy one was a manageable extension of what they already knew. The 2026 wave is something different. Systems plan, retrieve, call tools, escalate, and act inside operations. Agentic is the word, and the architectural line between a chatbot and an agent is sharper than it sounds. Almost every skill that was “nice to have” in 2023 is load-bearing now, and the FDE who hasn’t kept up is, in practice, two product generations behind.

The dominant pattern for grounding agents in customer-specific data is still some form of retrieval-augmented generation, and it is worth being precise about why — and where it breaks. RAG works because it sidesteps the fine-tuning problem; you don’t need to retrain the model on the customer’s data if you can retrieve the relevant slice at inference time. But the retrieval stack has its own failure modes, and they are the ones that quietly degrade deployments long after the launch demo. Chunking strategies that worked on the design partners stop working when the documents get longer. Recency rules that worked in February quietly stale by June. Retrieval poisoning — deliberate or accidental — is a vector almost nobody is testing for. In most deployments I’ve worked on, the retrieval stack is what the deployment actually depends on, not the model.

Multi-Agent Orchestration and the Evals Discipline

The biggest architectural shift of the last two years has been the move from single-model deployments to multi-agent systems — planner-executor pairs, supervisor-worker fleets, peer-review loops, and the agent fleets that some of the more aggressive deployments now run. Each pattern has its own strengths and its own failure surface, and what mature teams have learned is that the orchestration discipline is what makes them work. Timeouts, budgets, tie-breaking protocols, escalation paths. Without those, a multi-agent system is a runaway loop waiting to happen, and the cost-per-call math turns nasty in a hurry when it does.

Of all the new disciplines, the one that most distinguishes a senior FDE in 2026 from one in 2023 is evals. The model is whoever shipped this week. The evals are what tell you whether anything has actually improved. The chapter treats evals as a product surface rather than a test artifact, because that is the right operational stance — you build the eval suite the way you build a feature, you maintain it the way you maintain a service, and you ship it the way you ship code. The failure modes are recognizable. Overfitting to the eval, where the system passes the suite and fails in production. Eval debt, where the suite stops reflecting the customer’s actual workflow because nobody updated it after the workflow drifted. And the “our eval looks great” trap, where a flat eval curve hides a falling production curve because the eval is no longer measuring the right thing. The Eval-Customer Split we get to in Chapter 7 depends on the foundation laid here.

💡 Key idea: Evals are the single capability that distinguishes a senior FDE in 2026 from a senior FDE in 2023. The model is whoever shipped this week. The evals are what tell you whether anything has actually improved.

Latency, Cost, and the Model Underneath

The chapter closes on the operational economics that the agentic architectures expose. The cost-per-call math that worked in the prototype breaks at scale; a system that calls the model three times per user action is cheap at a hundred users and ruinous at ten thousand. The latency budget that worked in the prototype breaks in the critical path; a 1.2-second response is fine for a chat interface and unusable for a real-time copilot. And the discipline of model-agnostic deployment — building so that the underlying model can be swapped without rebuilding the system — matters more in 2026 than it did in 2023, because in 2026 the underlying model changes every quarter and the deployments that didn’t plan for that are the deployments now being torn apart and rebuilt. Tomorrow: the soft stack, which is the part of the skillset that all this technical fluency rests on top of.

📖 Get the book

The full AI-and-agentic treatment — RAG patterns, multi-agent architectures, evals as a discipline, adversarial testing, and model-agnostic deployment — in one place.

Get The Forward Deployed Engineer on Amazon →

2026-05-31

forward-deployed-engineer

agentic-ai

rag

evals

multi-agent

model-agnostic

book-series

Sho Shimoda

I share and organize what I’ve learned and experienced.

Search Logs

IT assistant bot 1375 Deploy Teams bot to Azure 1372 Hello World bot 1356 Teams production bot 1255 bot for sprint updates 1245 Microsoft Bot Framework 1223 Teams bot development 1219 Teams app zip 1181 Zendesk Teams integration 1180 Bot Framework Adaptive Card 1168 Microsoft Teams Task Modules 1167 Teams chatbot 1165 Teams bot tutorial 1153 Teams bot packaging 1147 Bot Framework example 1143 Task Modules 1118 Bot Framework proactive messaging 1113 Graph API token 1106 Bot Framework prompts 1101 Bot Framework CLI 1098 C 1098 Azure App Service bot 1063 Azure CLI webapp deploy 1055 Adaptive Card Action.Submit 1045 sideload bot in Teams 1037 Azure Bot Services 1034 Microsoft Graph 1017 Azure bot registration 997 Adaptive Cards 992 identity in Teams 987

Development & Technical Consulting

Working on a new product or exploring a technical idea? We help teams with system design, architecture reviews, requirements definition, proof-of-concept development, and full implementation. Whether you need a quick technical assessment or end-to-end support, feel free to reach out.