The Engineering of Intent, Chapter 31: Vibe Coding in Data and ML

This is Part 31 of a series walking through my book The Engineering of Intent. In the previous chapter, we covered frontend work. Chapter 31 turns to data engineering and applied ML — where AI-native practice meets statistical thinking, reproducibility, and slow feedback loops.


Some Practices Transfer Directly. Some Need Adaptation. Some Don’t Transfer at All.

Data and ML is the domain where the gap between prototype and production is widest — and where agents both help and mislead the most. Chapter 31 maps which practices to keep, which to adapt, and which to retire.


Three Subdomains

  1. ETL Pipelines. Benefits from AI-native velocity. But: always run on small sample, then medium, then full. Agents produce pipelines that look right on the sample and blow up at scale. Declare SLA, idempotency mode, and replay policy in every job file — these three declarations prevent the three most common operational disasters.
  2. Feature Engineering. Agents are good at generating variations on a theme. The caveat is brutal: agents have no concept of train/test split — they will produce features that leak target information if you don’t enforce the discipline. Every feature ships with a spec declaring source, time semantics, and leakage guarantee.
  3. Model Evaluation. Use agents for generating evaluation suites, synthesizing adversarial examples, sanity-checking reported metrics. Do not use them as the final arbiter of whether a model is ready. Calibration plots, fairness audits, and conversations with domain experts remain human responsibilities.
💡 The leakage catch: An agent-generated fraud feature included “average transaction amount in the next 24 hours” — pulled from a dataset with post-transaction data. Caught because the feature-spec convention required a time-semantics declaration, and the declaration didn’t match what the feature actually did. Without the convention, the feature would have shipped and the model would have been quietly broken in production.

The Churn Pipeline Case Study

“AI-native speed in ML is real but is bottle-necked by evaluation and shadow-testing, which cannot be compressed much. The pipeline part can be eight days instead of a month. The validation part remains weeks. Plan accordingly.”

A subscription company I advised wanted a churn-prediction model. Classical estimate: a month to first shipped version. Actual:

  • Day 1: Feature specs extracted from product conversations. Agent-generated PM questionnaires synthesized a feature-intent document; data scientist dropped two, added three. Final: 24 features.
  • Days 2–4: Feature pipelines implemented. Leakage audit caught two issues.
  • Day 5: Baseline logistic regression.
  • Days 6–7: Gradient-boosted model. Calibration required post-hoc correction. Feature-importance review identified a “too important” feature — turned out to be leaky, removed.
  • Day 8: Shadow deployment (predictions logged, not acted on).
  • Weeks 2–3: Shadow data validated stability. Shipped in week nine including the shadow period.
âš  The pattern to internalize: The pipeline part compresses dramatically. The validation part doesn’t. Teams that promise their executives “AI-native model in two weeks” ship broken models in two weeks. Teams that promise “pipeline in one week, validated rollout by week nine” ship working models. Say the right thing at the start.

Next up — Chapter 32: Vibe Coding in Platform and Infrastructure. Platform work is slow-moving by nature and its users are internal engineers — a different set of constraints from product code. Chapter 32 walks infrastructure-as-code, CI evolution, and the deployment-tool rebuild that dropped deploy times from 40 minutes to 12.


📖 Want the full picture?

The chapter walks each of the three subdomains with specific conventions, the fraud-feature leakage case study, the full churn-pipeline timeline, and the evaluation practices that keep data work honest when velocity is tempting you to ship early.

Get The Engineering of Intent on Amazon →

2026-05-17

Sho Shimoda

I share and organize what I’ve learned and experienced.