The Engineering of Intent, Chapter 37: Context Scaling — Just-In-Time Retrieval for Million-Line Codebases

This is Part 37 of a series walking through my book The Engineering of Intent. In the previous chapter, we took the long view. Chapter 37 opens the Advanced Topics section of the book — the four chapters that extend core practice into harder territory. First up: Context Packs when the codebase is too big to hand-curate.


Hand-Authored Context Packs Don’t Scale

Chapters 9 and 23 assumed a tractable codebase. At five million lines, thirty teams, seven hundred services, that assumption breaks. Three limits hit at once:

  • Breadth. A bug in checkout might touch pricing, tax, inventory, fulfillment. No single AGENTS.md can enumerate which files matter for every possible ticket.
  • Drift. Every merge moves files. A pack perfect six months ago is subtly wrong today, catastrophically wrong next year. Human maintenance doesn’t keep pace.
  • Specificity. “Fix the null pointer in checkout” and “add a discount code to checkout” need different context. A single pack serves neither well.

The fix: stop authoring packs by hand. Start retrieving them by query.


The Just-In-Time Context Pattern

JIT Context: instead of pre-specifying what the agent sees, specify a retrieval function that runs at task time and fetches the most relevant subset. Four steps:

  1. Agent given the task + a small durable core pack (10–20K tokens: AGENTS.md, conventions, top-level architecture).
  2. Agent (or a retrieval sidecar) issues queries against a vector index of the codebase, fetching 10–50 most relevant files or snippets.
  3. Retrieved materials appended to the context for this task only.
  4. When the task completes, the retrieved context is discarded. Next task retrieves its own.
💡 Key discipline: Step 3 — appended for this task only. Context that persists across tasks drifts. Context rebuilt per task stays fresh. This is the opposite of the naive instinct to keep adding things to the agent’s “memory” so it “knows more over time.” Memory accumulates staleness. Retrieval stays current.

MCP as the Delivery Mechanism

The retrieval layer sits behind an MCP server exposing three or four tools: semantic search, find-related-files-for-a-path, find-files-by-team, find-recent-changes-in-directory. The agent calls these during the task; responses inject into context on the fly.

Two advantages of doing this via MCP rather than custom glue: the agent decides what to retrieve (matches planning to the task’s true shape, which varies ticket by ticket); and the retrieval server is shared across agents, frameworks, IDEs. Shared retrieval is shared ground truth. Divergent retrieval is divergent reality.

“One team owns the retrieval MCP server. Product teams consume it. Do not let each team build their own — you will end up with seven retrieval systems with subtly different ranking, and debugging cross-team context problems becomes impossible.”


The Three Governors You Will Need

JIT Context has a failure mode static packs don’t: runaway retrieval. An agent with retrieval tools will, if untuned, retrieve until the context is full of plausible but marginally useful material. Three governors:

  1. Hard token budget per task (~60K regardless of query count). Forces selectivity; prevents “retrieve everything, read nothing.”
  2. Relevance threshold. Discard below a similarity score of ~0.7 before injection. Tune against a labeled eval set.
  3. Observability. Log every query, every retrieved file, every task outcome. Monthly, refine retrieval against outcomes. Retrieval systems that aren’t tuned against outcomes drift toward mediocrity within a quarter.

A Concrete Pipeline (from a 3.8M-line Codebase)

  • On every merge to main, a worker embeds changed files using a 1536-dim model. Full re-embed weekly. Stored in pgvector with HNSW. Cost: under $20/week.
  • At task start, agent gets a 14K-token core pack, then calls the MCP server with three queries: semantic, git-blame proximity, symbol-name extraction.
  • Server returns up to 30 reranked candidates. Agent selects 8–12 labeled primary / related / possible. Total context typically 40–60K.
  • If agent can’t complete with retrieved context, it re-queries after reading its first file. Usually succeeds.
  • Task outcome logged. Monthly review of which retrieved files were actually referenced in agent output. Unused = candidates for rank downweighting. Referenced-but-not-retrieved = candidates for new retrieval strategies.
âš  When JIT Context is overkill: Do not build retrieval infrastructure for a 50K-line codebase. Three or four AGENTS.md files plus the Chapter 23 recipes will serve you better at zero operational cost. The rough threshold where JIT becomes worth it: ~500K lines of active code, or ~15 active engineers, or more than 20 services. The transitional phase between 500K and 2M is where teams often try to get by with “smart grep” — it works surprisingly well for a while and then stops working suddenly, usually during an incident.

Next up — Chapter 38: Multi-Agent Conflict Resolution. Once you have multiple agents collaborating on the same codebase, the question stops being “can they work together” and starts being “how do we break ties.” Chapter 38 walks the protocols for agentic tie-breaking.


📖 Want the full picture?

The chapter walks the full index/query/promotion pipeline, the MCP server design, the three governors with tuning guidance, the concrete 3.8M-line pipeline with numbers, and the scale-threshold decision chart.

Get The Engineering of Intent on Amazon →

2026-05-23

Sho Shimoda

I share and organize what I’ve learned and experienced.