AI ProductFebruary 17, 2026By Avidan Nadav

MVP Walks. MVQ Decides Whether You Ship.

Minimum viable product tells you how small you can build. It says nothing about whether the thing is good enough to trust. For AI features, that second question is the one that ships you or sinks you.

Table of contents

The first AI feature I watched get pulled after launch demoed beautifully. Clean UI, slick suggestion, the room nodded along. Four days later it was switched off, because out in the wild it had been cheerfully telling customers the wrong refund window, and nobody had ever checked how often it would.

Here's the uncomfortable part. It passed every test we had. We just had the wrong tests. We measured whether it worked. We never measured whether it was good enough to trust.

Those are different questions. The gap between them is where AI features quietly go to die.

Why MVP stopped being enough

We inherited the MVP from Eric Ries and the lean-startup years, and it was the right tool for its moment. Build the smallest thing that teaches you something, ship it, learn, repeat. It worked because software back then was deterministic. Same input, same output, every time. Once the thing was built, quality was a given — the code either did the job or threw an error you could see in a log.

AI broke that quietly. The same input can produce a great answer on Tuesday and an embarrassing one on Wednesday. "It works" is no longer a yes-or-no. It's a distribution. And MVP has nothing to say about distributions. It was designed for a world that doesn't exist anymore.

The missing half has a name. Marily Nika calls it minimum viable quality — the bar a probabilistic feature has to clear before it's safe to put in front of a person.

ℹ️ Info

MVQ in one line: the quality threshold an AI feature must clear before it creates value instead of harm. MVP is about scope. MVQ is about acceptable risk. You need both.

MVP measures scope. MVQ measures risk.

This isn't a new idea so much as an old one we forgot to bring with us. Every field that ships things which can hurt people — aviation, medicine, manufacturing — abandoned "does it work, yes or no" a century ago. They think in failure rates and failure severity. A parachute that opens 97% of the time is not 97% of a good parachute. It's a deathtrap with good marketing.

Software got to skip that discipline for decades, because deterministic systems failed in ways you could enumerate and catch. AI dragged us back into the world the aviation engineers have always lived in, and most product teams haven't noticed they've crossed the border.

A feature that's wrong 30% of the time isn't a smaller version of a good feature. It's a liability with a clean UI.

Take the support-reply drafter that got me into this. The MVP was obvious and cheap — one button, one suggested reply, a week of work. But the MVQ was the entire game. Draft a confident, wrong answer about a refund, and you didn't save an agent thirty seconds. You manufactured a trust incident and a screenshot headed for social.

The MVP shipped. The MVQ decided whether it should have.

Why teams skip it anyway

Because MVP is fun and MVQ is uncomfortable.

MVP is a planning exercise. You sit in a room, draw boxes, cut scope, feel productive. MVQ forces you to say out loud how often your feature is allowed to be wrong, and then go measure whether it clears that bar. Most teams never write the number down — because writing it down means you might miss it, and missing a number you committed to feels worse than never having one.

⚠️ Warning

"It demos well" is the most dangerous quality bar in AI product. A demo is the least representative sample of how a probabilistic system behaves, because you unconsciously feed it the inputs you already know it handles.

What an MVQ looks like in practice

It's a threshold, tied to a consequence, that you commit to before you see the results. Three of them, usually: a floor below which the feature does net harm and can't ship, a bar where it creates real value and which becomes your launch target, and a ceiling past which more accuracy costs more than it returns so you stop chasing it.

The discipline isn't the numbers themselves. It's setting them while you're clear-headed, so the results can't lobby you into moving them later. (That sentence is the whole article, honestly.) I wrote a step-by-step for setting those three thresholds if you want the worked version.


MVQ changes what "done" means

On a deterministic feature, done is when the acceptance criteria pass. On an AI feature, done is when the output distribution clears the MVQ across a real sample of inputs — including the ugly ones.

So the work doesn't stop at "it functions." It stops when you've shoved the feature into its failure modes on purpose and confirmed it still holds the floor when the world gets weird. The arithmetic of prioritization can tell you what to build. Only MVQ tells you whether the thing you built is safe to hand to a human.

If your roadmap is full of AI features and not one of them has a written quality threshold, you don't have a roadmap. You have a list of incidents that haven't picked their dates yet.