The Engineering of Intent, Chapter 37: Context Scaling — Just-In-Time Retrieval for Million-Line Codebases

This is Part 37 of a series walking through my book The Engineering of Intent. In the previous chapter, we took the long view. Chapter 37 opens the Advanced Topics section of the book — the four chapters that extend core practice into harder territory. First up: Context Packs when the codebase is too big to hand-curate.

Hand-Authored Context Packs Don’t Scale

Chapters 9 and 23 assumed a tractable codebase. At five million lines, thirty teams, seven hundred services, that assumption breaks. Three limits hit at once:

Breadth. A bug in checkout might touch pricing, tax, inventory, fulfillment. No single AGENTS.md can enumerate which files matter for every possible ticket.
Drift. Every merge moves files. A pack perfect six months ago is subtly wrong today, catastrophically wrong next year. Human maintenance doesn’t keep pace.
Specificity. “Fix the null pointer in checkout” and “add a discount code to checkout” need different context. A single pack serves neither well.

The fix: stop authoring packs by hand. Start retrieving them by query.

The Just-In-Time Context Pattern

JIT Context: instead of pre-specifying what the agent sees, specify a retrieval function that runs at task time and fetches the most relevant subset. Four steps:

Agent given the task + a small durable core pack (10–20K tokens: AGENTS.md, conventions, top-level architecture).
Agent (or a retrieval sidecar) issues queries against a vector index of the codebase, fetching 10–50 most relevant files or snippets.
Retrieved materials appended to the context for this task only.
When the task completes, the retrieved context is discarded. Next task retrieves its own.

💡 Key discipline: Step 3 — appended for this task only. Context that persists across tasks drifts. Context rebuilt per task stays fresh. This is the opposite of the naive instinct to keep adding things to the agent’s “memory” so it “knows more over time.” Memory accumulates staleness. Retrieval stays current.

MCP as the Delivery Mechanism

The retrieval layer sits behind an MCP server exposing three or four tools: semantic search, find-related-files-for-a-path, find-files-by-team, find-recent-changes-in-directory. The agent calls these during the task; responses inject into context on the fly.

Two advantages of doing this via MCP rather than custom glue: the agent decides what to retrieve (matches planning to the task’s true shape, which varies ticket by ticket); and the retrieval server is shared across agents, frameworks, IDEs. Shared retrieval is shared ground truth. Divergent retrieval is divergent reality.

“One team owns the retrieval MCP server. Product teams consume it. Do not let each team build their own — you will end up with seven retrieval systems with subtly different ranking, and debugging cross-team context problems becomes impossible.”

The Three Governors You Will Need

JIT Context has a failure mode static packs don’t: runaway retrieval. An agent with retrieval tools will, if untuned, retrieve until the context is full of plausible but marginally useful material. Three governors:

Hard token budget per task (~60K regardless of query count). Forces selectivity; prevents “retrieve everything, read nothing.”
Relevance threshold. Discard below a similarity score of ~0.7 before injection. Tune against a labeled eval set.
Observability. Log every query, every retrieved file, every task outcome. Monthly, refine retrieval against outcomes. Retrieval systems that aren’t tuned against outcomes drift toward mediocrity within a quarter.

A Concrete Pipeline (from a 3.8M-line Codebase)

On every merge to main, a worker embeds changed files using a 1536-dim model. Full re-embed weekly. Stored in pgvector with HNSW. Cost: under $20/week.
At task start, agent gets a 14K-token core pack, then calls the MCP server with three queries: semantic, git-blame proximity, symbol-name extraction.
Server returns up to 30 reranked candidates. Agent selects 8–12 labeled primary / related / possible. Total context typically 40–60K.
If agent can’t complete with retrieved context, it re-queries after reading its first file. Usually succeeds.
Task outcome logged. Monthly review of which retrieved files were actually referenced in agent output. Unused = candidates for rank downweighting. Referenced-but-not-retrieved = candidates for new retrieval strategies.

⚠ When JIT Context is overkill: Do not build retrieval infrastructure for a 50K-line codebase. Three or four AGENTS.md files plus the Chapter 23 recipes will serve you better at zero operational cost. The rough threshold where JIT becomes worth it: ~500K lines of active code, or ~15 active engineers, or more than 20 services. The transitional phase between 500K and 2M is where teams often try to get by with “smart grep” — it works surprisingly well for a while and then stops working suddenly, usually during an incident.

Next up — Chapter 38: Multi-Agent Conflict Resolution. Once you have multiple agents collaborating on the same codebase, the question stops being “can they work together” and starts being “how do we break ties.” Chapter 38 walks the protocols for agentic tie-breaking.

📖 Want the full picture?

The chapter walks the full index/query/promotion pipeline, the MCP server design, the three governors with tuning guidance, the concrete 3.8M-line pipeline with numbers, and the scale-threshold decision chart.

Get The Engineering of Intent on Amazon →

2026-05-23

engineering-of-intent

vibe-coding

ai-native-development

context-engineering

retrieval

mcp

monorepo

book-series

Sho Shimoda

I share and organize what I’ve learned and experienced.

Search Logs

IT assistant bot 1375 Deploy Teams bot to Azure 1372 Hello World bot 1356 Teams production bot 1255 bot for sprint updates 1245 Microsoft Bot Framework 1223 Teams bot development 1219 Teams app zip 1181 Zendesk Teams integration 1180 Bot Framework Adaptive Card 1168 Microsoft Teams Task Modules 1167 Teams chatbot 1165 Teams bot tutorial 1153 Teams bot packaging 1147 Bot Framework example 1143 Task Modules 1118 Bot Framework proactive messaging 1113 Graph API token 1106 Bot Framework prompts 1101 Bot Framework CLI 1098 C 1098 Azure App Service bot 1063 Azure CLI webapp deploy 1055 Adaptive Card Action.Submit 1045 sideload bot in Teams 1037 Azure Bot Services 1034 Microsoft Graph 1017 Azure bot registration 997 Adaptive Cards 992 identity in Teams 987

Development & Technical Consulting

Working on a new product or exploring a technical idea? We help teams with system design, architecture reviews, requirements definition, proof-of-concept development, and full implementation. Whether you need a quick technical assessment or end-to-end support, feel free to reach out.