The Engineering of Intent, Chapter 37: Context Scaling — Just-In-Time Retrieval for Million-Line Codebases
This is Part 37 of a series walking through my book The Engineering of Intent. In the previous chapter, we took the long view. Chapter 37 opens the Advanced Topics section of the book — the four chapters that extend core practice into harder territory. First up: Context Packs when the codebase is too big to hand-curate.
Hand-Authored Context Packs Don’t Scale
Chapters 9 and 23 assumed a tractable codebase. At five million lines, thirty teams, seven hundred services, that assumption breaks. Three limits hit at once:
- Breadth. A bug in checkout might touch pricing, tax, inventory, fulfillment. No single
AGENTS.mdcan enumerate which files matter for every possible ticket. - Drift. Every merge moves files. A pack perfect six months ago is subtly wrong today, catastrophically wrong next year. Human maintenance doesn’t keep pace.
- Specificity. “Fix the null pointer in checkout” and “add a discount code to checkout” need different context. A single pack serves neither well.
The fix: stop authoring packs by hand. Start retrieving them by query.
The Just-In-Time Context Pattern
JIT Context: instead of pre-specifying what the agent sees, specify a retrieval function that runs at task time and fetches the most relevant subset. Four steps:
- Agent given the task + a small durable core pack (10–20K tokens:
AGENTS.md, conventions, top-level architecture). - Agent (or a retrieval sidecar) issues queries against a vector index of the codebase, fetching 10–50 most relevant files or snippets.
- Retrieved materials appended to the context for this task only.
- When the task completes, the retrieved context is discarded. Next task retrieves its own.
MCP as the Delivery Mechanism
The retrieval layer sits behind an MCP server exposing three or four tools: semantic search, find-related-files-for-a-path, find-files-by-team, find-recent-changes-in-directory. The agent calls these during the task; responses inject into context on the fly.
Two advantages of doing this via MCP rather than custom glue: the agent decides what to retrieve (matches planning to the task’s true shape, which varies ticket by ticket); and the retrieval server is shared across agents, frameworks, IDEs. Shared retrieval is shared ground truth. Divergent retrieval is divergent reality.
“One team owns the retrieval MCP server. Product teams consume it. Do not let each team build their own — you will end up with seven retrieval systems with subtly different ranking, and debugging cross-team context problems becomes impossible.”
The Three Governors You Will Need
JIT Context has a failure mode static packs don’t: runaway retrieval. An agent with retrieval tools will, if untuned, retrieve until the context is full of plausible but marginally useful material. Three governors:
- Hard token budget per task (~60K regardless of query count). Forces selectivity; prevents “retrieve everything, read nothing.”
- Relevance threshold. Discard below a similarity score of ~0.7 before injection. Tune against a labeled eval set.
- Observability. Log every query, every retrieved file, every task outcome. Monthly, refine retrieval against outcomes. Retrieval systems that aren’t tuned against outcomes drift toward mediocrity within a quarter.
A Concrete Pipeline (from a 3.8M-line Codebase)
- On every merge to main, a worker embeds changed files using a 1536-dim model. Full re-embed weekly. Stored in pgvector with HNSW. Cost: under $20/week.
- At task start, agent gets a 14K-token core pack, then calls the MCP server with three queries: semantic, git-blame proximity, symbol-name extraction.
- Server returns up to 30 reranked candidates. Agent selects 8–12 labeled primary / related / possible. Total context typically 40–60K.
- If agent can’t complete with retrieved context, it re-queries after reading its first file. Usually succeeds.
- Task outcome logged. Monthly review of which retrieved files were actually referenced in agent output. Unused = candidates for rank downweighting. Referenced-but-not-retrieved = candidates for new retrieval strategies.
AGENTS.md files plus the Chapter 23 recipes will serve you better at zero operational cost. The rough threshold where JIT becomes worth it: ~500K lines of active code, or ~15 active engineers, or more than 20 services. The transitional phase between 500K and 2M is where teams often try to get by with “smart grep” — it works surprisingly well for a while and then stops working suddenly, usually during an incident.Next up — Chapter 38: Multi-Agent Conflict Resolution. Once you have multiple agents collaborating on the same codebase, the question stops being “can they work together” and starts being “how do we break ties.” Chapter 38 walks the protocols for agentic tie-breaking.
📖 Want the full picture?
The chapter walks the full index/query/promotion pipeline, the MCP server design, the three governors with tuning guidance, the concrete 3.8M-line pipeline with numbers, and the scale-threshold decision chart.
Sho Shimoda
I share and organize what I’ve learned and experienced.Categories
Tags
Search Logs
Development & Technical Consulting
Working on a new product or exploring a technical idea? We help teams with system design, architecture reviews, requirements definition, proof-of-concept development, and full implementation. Whether you need a quick technical assessment or end-to-end support, feel free to reach out.
Contact Us