A new Google whitepaper — The New SDLC With Vibe Coding, by Addy Osmani, Shubham Saboo, and Sokratis Kartakis — opens with a claim that sounds like marketing and turns out to be the most useful thing in it: the biggest shift in software engineering isn’t a new language, framework, or cloud. It’s the move from writing code to expressing intent, and trusting machines to turn that intent into working software.

The numbers back the framing. As of early 2026, the paper reports that 85% of professional developers regularly use AI coding agents, 51% use them daily, and roughly 41% of all new code is AI-generated.

But the sharpest idea in the document is counterintuitive, and it’s the one worth building your strategy around: the model you’re paying so much attention to is the smallest part of the system. Or as the paper puts it in its closing line — generation is solved; verification, judgment, and direction are the new craft.

Here’s the framework, and where I think it’s right, useful, and quietly selling you something.

The Model Is Only 10% — The New SDLC With Vibe Coding

AI Dispatch · Field Notes

Google · Osmani, Saboo & Kartakis · May 2026

The model is only 10%

A Google whitepaper argues software’s biggest shift is from writing code to expressing intent. Its sharpest claim: the model you obsess over is the smallest part of the system — the scaffolding around it does the real work.

A spectrum, not a binary — the differentiator is how outputs get verified

Vibe Coding

Casual prompts · “does it seem to work?” · disposable code · high risk

Structured AI-Assisted

Detailed prompts + constraints · manual testing · features in real codebases

Agentic Engineering

Formal specs · automated tests + evals + CI gates · production scale · low risk

Tests verify the deterministic; evals verify the rest. Without both, it’s vibe coding — however clever the prompt.

The idea worth building your strategy around

Agent = Model + Harness

~10%

HARNESS — prompts · tools · context · hooks · sandboxes · observability

MODEL~90% IS YOUR SURFACE AREA, NOT THE PROVIDER’S

Outside Top 30 → Top 5 on Terminal Bench 2.0 by changing only the harness — same model.

“Most agent failures, examined honestly, are configuration failures” — a missing tool, a vague rule, a noisy context.

The economics: it’s a token-cost problem (CapEx vs OpEx)

Vibe Coding

Low CapEx · High OpEx

Looks free, hides debt: token burn (fix-it loops), maintenance tax (AI spaghetti), security remediation. Crosses over to 3–10× more per feature.

Agentic Engineering

High CapEx · Low OpEx

Pay upfront (specs, evals, context), then ship cheaply. Levers: context engineering for first-pass success + intelligent model routing — cheap models for the easy work.

85%

of devs use AI coding agents (51% daily)

41%

of all new code is AI-generated

~90%

of agent behavior is the harness, not the model

+19%

longer on some tasks (METR) — verification is the cost

The read

The clearest map yet of how serious AI development works — and mostly tool-agnostic. But it’s a Google funnel: the concepts are neutral, the on-ramps point to Gemini, Jules & the ADK. If the harness is 90% and it’s yours, your moat and your costs both live there — so own your scaffolding, route across models, and remember: AI amplifies whatever engineering culture it lands in.

Source: Osmani, Saboo & Kartakis, “The New SDLC With Vibe Coding,” Google (May 2026). Figures are the paper’s own, incl. METR & LangChain. Analysis is the author’s.

thorstenmeyerai.com

The spectrum, not the binary

The first thing the paper does well is kill a lazy word. “Vibe coding” — Andrej Karpathy’s February 2025 phrase for giving in to the vibes, accepting whatever the AI returns, and pasting errors back until it works — got stretched to cover every AI-assisted workflow until it meant nothing.

The fix is to treat it as one end of a spectrum, not a category. At the casual end: vibe coding — quick prompts, “does it seem to work?”, minimal review, fine for prototypes and disposable scripts. At the disciplined end: agentic engineering (a term Karpathy himself reached for in early 2026), where AI is a powerful implementation engine running inside formal specs, automated tests, evals, CI/CD gates, and human oversight of architecture.

The differentiator is not whether you use AI. It’s how much structure, verification, and judgment surround the AI’s output. The paper’s tell is a good one: telling a CTO your team is vibe coding the payment system should set off alarms; telling them you practice agentic engineering, with tests proving correctness, is a different conversation entirely.

And the line between the two is verification. Tests check the deterministic parts (this input yields that output); evals check the non-deterministic parts (did the agent take a sensible path, pick the right tools, hit the quality bar). Without both, the paper argues, you’re vibe coding no matter how clever your prompts are.

Coding with AI For Dummies (For Dummies: Learning Made Easy)

As an affiliate, we earn on qualifying purchases.

Agent = Model + Harness

Now the idea that should change where you spend your money.

When teams hit a wall with an agent, the reflex is to blame the model and wait for the next one. The paper calls this wrong, and expensive. A running agent is the model plus the harness — the prompts, rule files, tools, context policies, hooks, sandboxes, sub-agents, and observability wrapped around it. By the paper’s rough split, the model is something like 10% of what determines behavior; the harness is the other 90%.

The evidence is concrete. On a public benchmark (Terminal Bench 2.0), one team moved a coding agent from outside the Top 30 to the Top 5 by changing only the harness — same model. A separate LangChain experiment lifted an agent’s score 13.7 points by tweaking only the prompt, tools, and middleware. The everyday version: when an agent misbehaves, the cause is usually a missing tool, a vague rule, an absent guardrail, or a context window stuffed with noise. Most agent failures, examined honestly, are configuration failures.

The strategic punchline is the part most readers will skim past and shouldn’t: the harness is your surface area, not the model provider’s. The behavior you experience in Claude Code, Cursor, Codex, or any other tool is dominated by scaffolding you can build, own, and improve — which means your durable advantage lives there, not in whichever frontier model happens to be ahead this quarter.

AI Integrated Software Automation Testing JAVA with Selenium: Selenium WebDriver with JAVA | Software Automation Testing with AI Tools | TestNG … 2025 | Code with AI | Auto Coding with AI

As an affiliate, we earn on qualifying purchases.

Context engineering is the real skill

If the harness is where the work is, context engineering is the core discipline. The insight: code quality depends far less on clever prompting than on the quality of the context you give the agent — the same information a competent new teammate would need to do the job.

The paper names six kinds of context — instructions, knowledge, memory, examples, tools, and guardrails — and one architectural decision that matters more than any prompt: what’s loaded always (static context, expensive because every token rides along every time) versus on demand (dynamic context, cheap because you only pay when you need it). The pattern that makes this scale is Agent Skills: packaged procedural knowledge the agent loads only when a task calls for it, so a generalist can flex into a specialist without carrying the token weight of every capability at once.

The reframe is from “how do I trick the AI into good code?” to “what would a new hire need to know, and how do I encode it?” That’s a healthier question, and it’s the bridge between vibe coding and agentic engineering.

LEAN PROGRAMMING FOR FORMAL SOFTWARE VERIFICATION: Mathematical proof systems and logical frameworks for verified computation

As an affiliate, we earn on qualifying purchases.

The economics, where it gets real

This is the section engineering leaders should read twice, because it reframes AI development as a total-cost-of-ownership problem, and in the AI era cost is dictated by the token economy.

Vibe coding looks free — a subscription and some prompts, near-zero upfront cost. But it carries a heavy, compounding operating cost: a high token burn rate (dumping huge unstructured context and looping the model to fix its own unverified mistakes), a maintenance tax (six months later, someone reverse-engineers AI “spaghetti”), and security remediation (fast generation means fast vulnerabilities). The paper marks a crossover point where ad-hoc prompting ends up costing 3–10x more per feature than the disciplined alternative.

Agentic engineering inverts the curve: high upfront cost (designing schemas, building test and eval suites, structuring context) in exchange for a low marginal cost per feature thereafter. Two levers do the heavy lifting. Context engineering as a financial strategy — a dense, high-signal payload (a precise AGENTS.md, real guardrails) raises first-pass success and kills the trial-and-error loop. And intelligent model routing — sending hard work (architecture, initial implementation) to big expensive models while routing deterministic work (test generation, review, CI monitoring) to small cheap ones, instead of paying frontier prices to fix a typo.

That last point is the quiet thesis of the whole document: in a token economy, the team that owns its harness and routes intelligently across models controls its costs. The team that treats one expensive model as the whole system does not.

Small, Sharp Software Tools: Harness the Combinatoric Power of Command-Line Tools and Utilities

As an affiliate, we earn on qualifying purchases.

The 80% problem and the new bottleneck

A useful dose of realism: the 80% problem. Agents produce roughly 80% of a feature fast; the last 20% — edge cases, error handling, integration points, subtle correctness — needs context current models often lack. Worse, the nature of AI errors has shifted from obvious syntax mistakes to insidious conceptual ones: wrong business-logic assumptions, missed edge cases, decisions that create long-term maintenance burdens. They’re hard to catch precisely because the code looks right and passes basic tests.

The paper is honest enough to cite the counter-evidence to its own optimism. Industry surveys claim 25–39% productivity gains — but a METR study found experienced developers took 19% longer on certain tasks, because the time saved typing was spent verifying, debugging, and correcting. AI doesn’t remove implementation work so much as convert it from writing into reviewing and directing. The bottleneck moves from typing to specification and verification — which is why the developer’s role splits into two modes the paper calls conductor (hands-on, real-time, in the IDE) and orchestrator (async, delegating to background agents, reviewing outcomes not keystrokes).

My read

Credit where it’s due: this is the clearest articulation I’ve seen of how serious AI-assisted development actually works. The spectrum, the harness, context-as-the-real-skill, the factory model (where your output is the system that builds software, not the software), and especially the token-economy framing are durable, decision-useful mental models — and most of them are tool-agnostic. The core argument would hold if every product name in the paper vanished.

Which is the one caveat worth stating plainly: this is a Google document, and a funnel. The concepts are vendor-neutral; the on-ramps are not. Every worked example routes toward Gemini, Jules, Antigravity, the ADK, and Google’s new Agents CLI. None of that makes the framework wrong — it makes it a map drawn by a party that also sells the territory. Read the ideas, then pick your own tools.

And here’s the part the paper points at but doesn’t quite say out loud, which matters most if you’re a builder rather than a buyer: if the harness is 90% of the system and it’s your surface area, then your moat and your cost control both live in the scaffolding — the rule files, the evals, the routing, the skills — not in the rented model. That’s an argument for owning your harness and staying able to route across models, including cheaper and self-hosted ones. The paper frames this as token economics; it’s also independence. The teams that build their harness once and refine it many times aren’t just spending less. They’re less captive.

The honest risk isn’t that any of this fails. It’s the line the paper lands on its own: AI amplifies the engineering culture it lands in. Drop agentic tooling into a shop with weak tests, vague specs, and casual review, and you don’t get a faster good team — you get a faster bad one, generating technical debt quicker than anyone can pay it down. The gap between “it seems to work” and “it works correctly under all conditions” is exactly where the undisciplined will drown, and most shops are more undisciplined than they think.

Generation is solved. Verification, judgment, and direction are the new craft. The work now is to build the system that exercises them — and to make sure you, not your vendor, own it.

Source: Osmani, A., Saboo, S., and Kartakis, S., “The New SDLC With Vibe Coding,” Google, May 2026. Figures (developer-adoption rates, productivity studies, benchmark results) are the paper’s own, drawn from cited industry sources including METR and LangChain. Analysis and opinions are the author’s.

The Model Is Only 10%: The Real Lesson of the New SDLC

Author

Thorsten Meyer

Share article

The model is only 10%

The spectrum, not the binary

Coding with AI For Dummies (For Dummies: Learning Made Easy)

Agent = Model + Harness

AI Integrated Software Automation Testing JAVA with Selenium: Selenium WebDriver with JAVA | Software Automation Testing with AI Tools | TestNG … 2025 | Code with AI | Auto Coding with AI

Context engineering is the real skill

LEAN PROGRAMMING FOR FORMAL SOFTWARE VERIFICATION: Mathematical proof systems and logical frameworks for verified computation

The economics, where it gets real

Small, Sharp Software Tools: Harness the Combinatoric Power of Command-Line Tools and Utilities

The 80% problem and the new bottleneck

My read

The Stack: Six Layers Every Executive Should Understand About AI Coding Tools

The three numbers that shape every AI interaction

The AI Coding Stack, Decoded

What an LLM actually is

The Local-First Agentic Operator

World Model Readiness: Are You Ready for AI That Acts?

The $60 Billion Bargain: Why Cursor Could Be a Steal for SpaceX

Évian and the Fallout: What Europe Actually Wants From Amodei, Hassabis, and Altman

The Model Is Only 10%: The Real Lesson of the New SDLC

Author

Thorsten Meyer

Share article

The model is only 10%

The spectrum, not the binary

Coding with AI For Dummies (For Dummies: Learning Made Easy)

Agent = Model + Harness

AI Integrated Software Automation Testing JAVA with Selenium: Selenium WebDriver with JAVA | Software Automation Testing with AI Tools | TestNG … 2025 | Code with AI | Auto Coding with AI

Context engineering is the real skill

LEAN PROGRAMMING FOR FORMAL SOFTWARE VERIFICATION: Mathematical proof systems and logical frameworks for verified computation

The economics, where it gets real

Small, Sharp Software Tools: Harness the Combinatoric Power of Command-Line Tools and Utilities

The 80% problem and the new bottleneck

My read

You May Also Like