PlaybooksFebruary 24, 2026By Avidan Nadav

How to Set Your Three MVQ Thresholds

MVQ only works if you turn it into numbers you commit to before launch. Here's how to set the floor, the bar, and the ceiling for an AI feature, with a worked example.

Table of contents

MVQ is a great idea that dies the moment it stays an idea. "We should hit a minimum viable quality" is a slogan you can nod at forever. Three written numbers you committed to before launch is a tool that makes decisions for you.

By the end of this you'll have those three numbers for one AI feature — a floor, a bar, and a ceiling. With them, "should we ship?" stops being a meeting and becomes a comparison.

ℹ️ Info

The three thresholds: the floor (below it, the feature does net harm — do not ship), the bar (where it creates real value — your launch target), and the ceiling (where more accuracy costs more than it returns — stop optimizing). Set all three before you see the results.

Borrow the idea that already runs the internet

Before we set numbers, steal a concept from the people who keep large systems alive: Google's site reliability engineers and their error budget.

The insight that made SRE famous is that the right reliability target is almost never 100%. Chasing those last fractions of a percent costs more than anyone gains from them, so you pick an explicit budget — say 99.9% — and you spend the remaining 0.1% on shipping faster. The number is decided up front, in the cold light of day, and then it governs behavior when things get heated.

MVQ is that same move, pointed at quality instead of uptime. The MVQ framing comes from Marily Nika; the mechanic for turning it into committed numbers is pure error-budget thinking. Here's the recipe.

First, pick the metric you'll threshold on

You can't set a number without a unit. So decide what you're measuring before you decide the level. Usually it's one of:

  • Accuracy / pass rate on your eval set — "clears the rubric on X% of cases."
  • Critical failure rate — "produces a high-harm error at most 1 in N times."

For anything where a single bad output is dangerous, threshold on the critical failure rate, not average accuracy. Average accuracy is a liar's metric here — it cheerfully hides the rare catastrophe inside a comfortable-looking number.

Threshold 1: The floor (below this, don't ship)

The floor is the line where the feature does net harm. Below it, you're better off shipping nothing.

Set it by finishing this sentence: "This feature is actively worse than nothing if it ever…" The answer points straight at the floor.

For our support-reply drafter: "…if it gives a confidently wrong policy answer more than rarely." Quantify "rarely" against the consequence. Maybe the floor is no more than 1 critical error per 100 drafts. Below that, it stays off. No debate — because you set the line while you were thinking clearly instead of while you were attached to a launch date and three people were staring at you.

Threshold 2: The bar (your launch target)

The bar is where the feature creates real value for most users. This is what you're actually aiming to clear.

Set it by asking: "At what quality does a typical user come away better off — often enough that they'd keep using it?" It sits above the floor and below perfect.

For the drafter, maybe the bar is 85% of drafts need only light editsand the floor on critical errors still holds. Note the "and." The bar is a quality target; the floor is a safety gate. You clear both or you don't ship. A feature can be delightful on average and still sit below the floor on rare catastrophic errors, and that feature does not go out, no matter how good the average looks in the deck.

Threshold 3: The ceiling (where you stop)

This is the one teams forget, and forgetting it quietly burns quarters.

The ceiling is the point past which more accuracy costs more than it returns. Chasing 99% when 92% already clears the bar and the floor is, more often than not, a quarter spent buying a rounding error no user will ever feel.

Set it by asking: "Past what point would a user not notice the improvement — or not pay for it?" That's where you declare done and walk away. The ceiling is permission to stop, and stopping on time is a competitive advantage most teams never grant themselves.

💡 Tip

Write all three numbers down before you run the eval, and have someone else sign off. The entire power of MVQ is that the thresholds are fixed before the results exist to argue with them. A number you can move under pressure isn't a threshold. It's a suggestion.

The artifact: your MVQ card

One small block, pinned to the launch ticket:

Feature: Support reply drafter
Metric:  critical-error rate + light-edit pass rate

FLOOR    ≤ 1 critical error / 100 drafts   → below = do not ship
BAR      ≥ 85% need only light edits        → launch target
CEILING  ~95% light-edit rate               → stop optimizing past here

Set by: [name]   Date: [before the eval run]

How to actually use it

Run your eval. Lay the result next to the card.

  • Below the floor? Don't ship. Fix it or add a guardrail, then re-run.
  • Above the floor, below the bar? Keep working. You're close, not ready.
  • Above the bar, below the ceiling? Ship it. Stop polishing.
  • At the ceiling? Definitely stop. Go spend that energy on the next feature.

That's the whole tool. It replaces the launch debate — the one usually won by whoever's most senior or most exhausted — with a comparison anyone on the team can run and agree on. Set the numbers while you're clear-headed, and let them make the call when you're not.