Snowflake Adaptive:
one less thing to plan.
Snowflake shipped Adaptive Compute. The pitch is that you stop managing warehouses. We ran the numbers (50 concurrent queries across nine warehouse configurations) to see if the scaling claim holds, and whether the operational reduction is worth the move.
Five findings, up front.
Adaptive isn't really about speed or cost. It's about not having to plan. No size, no cluster count, no scaling policy, no auto-suspend. Set a per-query ceiling and a concurrency floor.
We stress-tested the riskiest claim. It holds. Adaptive matched or beat every classic configuration we tried at matched compute strength.
QTM sets the floor, not the ceiling. From our credit numbers: QTM=10 guarantees ~10 units and the pool over-provisions above; QTM=0 can spin up one unit per concurrent query.
No single sweet spot. QTM=4 cheapest. QTM=10 the latency optimum: 37s for 0.48 cr. QTM=0 fastest, at 3.5× more for sub-20s.
Adopt unless your testing shows cost spikes. The operational reduction is real. The performance is at least as good. The unanswered questions are real but not blockers.
The framing matters: this isn't "adaptive is faster." Sometimes classic is competitive, when properly planned. The win is that nobody actually plans classic properly, and adaptive turns that planning into a couple of CREATE WAREHOUSE parameters.
A routing target, not a box.
A classic Snowflake warehouse is a thing you size. You pick a T-shirt label, configure clusters, decide on auto-suspend, configure Query Acceleration Service, and live with whatever quarterly tuning meetings result.
An adaptive warehouse asks you for two things and removes everything else.
Classic five+ decisions legacyCREATE WAREHOUSE analytics_wh WITH WAREHOUSE_SIZE = LARGE WAREHOUSE_TYPE = STANDARD MIN_CLUSTER_COUNT = 1 MAX_CLUSTER_COUNT = 3 SCALING_POLICY = STANDARD AUTO_SUSPEND = 60 AUTO_RESUME = TRUE ENABLE_QUERY_ACCELERATION = TRUE QUERY_ACCELERATION_MAX_SCALE_FACTOR = 8;
Adaptive two decisions recommendedCREATE ADAPTIVE WAREHOUSE analytics_wh WITH MAX_QUERY_PERFORMANCE_LEVEL = LARGE QUERY_THROUGHPUT_MULTIPLIER = 4; -- everything else is gone.
What the two dials mean
MAX_QUERY_PERFORMANCE_LEVEL (MAX_QPL) is the ceiling on per-query compute. Snowflake routes each query into a shared pool and sizes it from XS up to your ceiling. Most queries don't need the ceiling. The ones that do, get it.
QUERY_THROUGHPUT_MULTIPLIER (QTM) is the floor on guaranteed concurrency. At QTM=4, Snowflake commits to running at least four max-sized queries in parallel. Smaller queries pack alongside; bursts above the floor may queue. Accepts any non-negative integer (default 2); 0 means unlimited burst (best-effort, no floor commitment).
What's not there
Look at what's absent. No size column in SHOW WAREHOUSES (it's null). No STARTED or SUSPENDED state; it shows ENABLED. No min_cluster_count, no started_clusters. No AUTO_SUSPEND or AUTO_RESUME to configure. No Query Acceleration Service settings. The abstraction is the absence.
If you've used Databricks job compute or AWS Lambda, the model is familiar. You set bounds, the platform allocates compute per request. Snowflake's version arrived later than the others, but the shape is the same.
Why this matters
Every parameter you don't set is a tuning meeting you don't have to attend. The classic warehouse model required teams to first pick a warehouse type (Standard, Standard Gen 2, Snowpark-Optimized) and a scaling policy (Standard or Economy), then on top of that, decide warehouse size, cluster count, suspend timer, and Query Acceleration Service configuration. Often per workload. Often re-tuned quarterly.
Adaptive collapses the type-and-policy decision tree entirely and reduces the runtime parameters to two. Even those two don't really need ongoing tuning: set them conservatively high, and you've capped your worst-case spend. Roughly:
Classic doesn't give you a clean ceiling like that; auto-scaling, suspend timers, and per-cluster billing all interact in ways that make "what's the worst case" hard to answer without modelling. Faster or cheaper is secondary. The win is no longer having those meetings.
Does it actually scale?
Operational simplicity is a real benefit, but only if the platform actually works under load. The riskiest claim Snowflake makes about adaptive is that the shared pool scales elastically to absorb burst concurrency without you orchestrating it.
We ran 50 concurrent queries (a per-row math computation over a 100M-row fact table joined with a 10K-row dimension, scanning ~14 GB per query) against nine warehouse configurations to verify. Same data, same role, same Python test harness. One warmup query per warehouse before the burst. Then we measured wall clock, per-query latency, queue times, and total credits.
The nine setups
| Setup | Type | Config |
|---|---|---|
| classic_m_mc1 | Classic Std Gen 2 | Single cluster, no scale-out |
| classic_m_mc4_auto | Classic Std Gen 2 | MIN=1, MAX=4, auto-scale on demand |
| classic_m_mc10_auto | Classic Std Gen 2 | MIN=1, MAX=10, auto-scale on demand |
| classic_m_mc4_pre | Classic Std Gen 2 | MIN=MAX=4, all clusters running before burst |
| classic_m_mc10_pre | Classic Std Gen 2 | MIN=MAX=10, all clusters running before burst |
| adaptive_m_qtm2 | Adaptive | MAX_QPL=M, QTM=2 (Snowflake default) |
| adaptive_m_qtm4 | Adaptive | MAX_QPL=M, QTM=4 |
| adaptive_m_qtm10 | Adaptive | MAX_QPL=M, QTM=10 |
| adaptive_m_qtm0 | Adaptive | MAX_QPL=M, QTM=0 (unlimited burst) |
All nine cap at Medium-class per-query compute. The classic setups span the four ways you might choose to handle 50 concurrent queries today, including the pre-warmed configurations almost nobody actually deploys. The adaptive setups span the QTM dial.
Auto-scaling didn't help. Pre-warming did.
The classic results showed something we did not predict: permitting more cluster capacity made things slower.
Auto-scaling was too slow. Pre-warming with too few clusters was uneven. Pre-warming with enough clusters was expensive. The dispatcher was the bottleneck. Capacity rarely was.
| Setup | Wall | p50 latency | Queued | Credits | Relative wall clock |
|---|---|---|---|---|---|
| classic_m_mc1 | 234s | 148s | 42/50 | 0.45 | |
| classic_m_mc4_auto | 308s | 114s | 42/50 | 2.00 | |
| classic_m_mc10_auto worst | 380s | 163s | 42/50 | 2.79 | |
| classic_m_mc4_pre | 271s | 76s | 21/50 | 2.03 | |
| classic_m_mc10_pre best of classic | 30s | 29s | 0/50 | 1.53 |
The single-cluster baseline finished in 234s. Allowing up to 10 clusters of auto-scaling capacity stretched the same workload to 380s, slower than doing nothing. SCALING_POLICY = STANDARD waits for sustained queueing before adding clusters, and our 500ms burst finishes its arrival before the auto-scaler has decided to act. New clusters arrive cold, with no shared SSD cache, and start picking up the tail of an already-serializing queue.
Six of ten clusters never started
The per-cluster breakdown for classic_m_mc10_auto tells the whole story:
Six of ten allowed clusters never started during the 380-second window. The auto-scaler simply didn't react fast enough.
Pre-warming worked, when pushed
The bottom row of the headline table is the same physical Medium-class compute with the same query, same data, same harness. The only difference: MIN_CLUSTER_COUNT = 10, so all ten clusters were running before the burst arrived. We polled SHOW WAREHOUSES until started_clusters = 10 before firing. Result: 30 seconds wall clock, perfect load distribution (5 queries per cluster), zero queueing.
The dispatcher was the bottleneck, not capacity
The middle row of the table is more interesting than the bottom row. classic_m_mc4_pre had all four clusters running before the burst arrived, and still took 271 seconds. Look at how the burst distributed:
| mc4_pre cluster | Queries handled | Distribution |
|---|---|---|
| CL 1 | 32 | |
| CL 2 | 1 | |
| CL 3 | 11 | |
| CL 4 | 6 |
Cluster 1 got 32 of 50 queries. Clusters 2–4 were running, idle, and visible, but the intra-warehouse dispatcher concentrated load on cluster 1 until it was visibly saturated. Same root cause as the auto-scaling failure: Snowflake's classic load balancer is reluctant to fan out.
⟶ The lesson "Classic works if you so vastly over-provision that the dispatcher can't ruin it." — field note, 17 May 2026 · eu-west-1
The caveats
This finding is bounded by our setup. We tested a trial account on a quiet morning; queries that solo in ~30 seconds (mid-weight, not multi-minute); a 500ms arrival burst (more aggressive than most real workloads); and SCALING_POLICY = STANDARD only. For multi-minute queries, the auto-scaler has more time to react and may engage productively. If you've measured this on an Enterprise account or with longer-running queries, we'd be glad to hear what you saw.
Classic warehouses have two ways to lose under burst, and both come from the dispatcher. The auto-scaler waits too long to add clusters. The intra-warehouse load balancer concentrates queries on one cluster even when others are idle. Adding more cluster permission doesn't help with either; capacity wasn't the missing piece.
Pre-warming worked, but only when over-provisioned to the point where the dispatcher couldn't ruin it. The classic warehouse model assumes sustained load, and bursts are a different problem it never solved well.
One dial. Four ways to spend.
Four adaptive runs, same workload, same MAX_QPL = Medium. The only variable: the QTM floor.
Wall clock by QTM
Same workload, 50 concurrent queries, MAX_QPL=M. Bar width is seconds; right column is credits billed.
| Setup | Wall | p50 | Overload queued | Provisioning wait | Credits |
|---|---|---|---|---|---|
| adaptive_m_qtm2 | 115s | 66s | 42/50 | 0.2s · 48/50 | 0.74 |
| adaptive_m_qtm4 cheapest | 65s | 37s | 34/50 | 1.7s · 48/50 | 0.39 |
| adaptive_m_qtm10 balanced | 37s | 21s | 10/50 | 1.0s · 47/50 | 0.48 |
| adaptive_m_qtm0 premium | 21s | 17s | 0/50 | 0.0s · 6/50 | 1.67 |
How QTM allocates compute
Look at the "overload queued" column. At QTM=2, Snowflake commits to running two max-sized queries concurrently; the other 42 queue. At QTM=10, the first ten run; the next ten queue. At QTM=0, nothing queues. The pool absorbs all 50 simultaneously.
The compute-units inference, from credits: QTM=2 billed roughly 4–6 units of work; QTM=10 billed roughly 10–12; QTM=0 billed many tens of units, closer to one worker per concurrent query plus ramp-up overhead, not a clean N=50. The mechanism we read out of this: QTM=N is a floor the pool over-provisions above when capacity allows, and QTM=0 is "give every query its own worker, regardless of count."
QTM isn't a speed dial. It's a cost dial. Higher QTM means more compute reserved above your floor, faster bursts, bigger bill. QTM=0 is "I do not care what it costs."
The decision: pick the universe your workload lives in
The hidden cost: provisioning time
Above the overload-queue column is another one we haven't shown yet: provisioning queue. Every adaptive query waits a small amount for compute to spin up before executing. Typically 1–2 seconds. Tiny per query, but real, and not advertised. At QTM=0 the cost almost disappears (only 6 of 50 queries waited), suggesting unlimited-burst mode pre-allocates more aggressively. Classic warehouses don't show this column at all; the clusters are either running or they're not.
QTM=2 and QTM=0 both saw ~88% cache hits. QTM=4 saw 72%. QTM=10 saw only 24%. A single warmup query seems to seed one slot in the pool, not all of them. If you're benchmarking adaptive against classic, run a real warm-up burst rather than a single query; your results will look very different.
Pool capacity is real, but invisible
Here's something the data hints at but Snowflake doesn't document: the adaptive pool has bounds. We observed QTM=0 absorbing 50 concurrent queries with zero overload queueing. What we don't know is what happens at 500. Or 5,000. Or what happens when the pool is contending with high load from other accounts in the same region.
The adaptive pool is shared across customers. "Unlimited" is permission to ask for elasticity, not a guarantee of capacity. On our quiet trial account at 9 AM, the pool delivered. On a busy Enterprise account at end-of-quarter close, it may not deliver the same way. This is a real risk that's hard to test in advance: we can't load-test the global pool from our seat. We can only flag it.
All nine, ranked by credits per query.
Same workload, same data, different shapes of compute. Sorted by what each query cost on average.
| Setup | Wall | Total credits | Credits / query | Cost (relative) | Trade-off |
|---|---|---|---|---|---|
| adaptive_m_qtm4 cheapest | 65s | 0.39 | 0.008 | Cheap, decent speed | |
| classic_m_mc1 | 234s | 0.45 | 0.009 | Cheapest classic, slowest overall | |
| adaptive_m_qtm10 balanced | 37s | 0.48 | 0.010 | Best latency at acceptable cost | |
| adaptive_m_qtm2 | 115s | 0.74 | 0.015 | Heavy queueing | |
| classic_m_mc10_pre | 30s | 1.53 | 0.031 | Fast, requires heavy planning | |
| adaptive_m_qtm0 premium | 21s | 1.67 | 0.033 | Fastest, premium price | |
| classic_m_mc4_auto | 308s | 2.00 | 0.040 | Slow and expensive | |
| classic_m_mc4_pre | 271s | 2.03 | 0.041 | Dispatcher pinned 32/50 to cluster 1 | |
| classic_m_mc10_auto worst | 380s | 2.79 | 0.056 | Worst of every world |
A 7× cost difference for the same work, same data, same nominal compute strength. The variable isn't capacity. It's how compute gets allocated. — §6, the cost picture
Three configurations are defensibly good, and they live in three different metric universes: qtm4 for billing, qtm10 for latency, qtm0 for raw speed. The pre-warmed classic option (classic_m_mc10_pre) finishes fast, but at qtm0's premium price and with the dispatcher caveat: you only get that result by over-provisioning so heavily that the dispatcher can't matter. The lower five rows are actively bad choices on this workload.
Open questions worth knowing.
Every benchmark is bounded by its conditions. These are the honest gaps in our findings, in the order they're most likely to bite a production decision.
Q1 · Pool behaviour at scale
The adaptive pool is shared across customers in a region. Our test ran on a quiet morning. When adaptive adoption ramps up and many accounts simultaneously fire bursts during business hours, the pool will face contention we can't measure today. This may be the biggest unknown.
Q2 · Heavier, longer queries
Our queries solo in ~30 seconds. For multi-minute analytical queries, classic auto-scaling has more time to react before the burst is over. The relative gap should shrink. We don't know by how much.
Q3 · Per-query cost decomposition
QUERY_ATTRIBUTION_HISTORY.credits_attributed_compute was NULL for all 450 of our queries (classic and adaptive) within an hour of completion. Docs note this view "may be NULL for adaptive during Public Preview." Until per-query cost lands at GA, all adaptive cost claims are at warehouse level.
Q4 · Mixed workloads
We tested a uniform burst of 50 identical queries. A realistic workload mixes 100ms dashboard queries with 30-second heavy queries. Adaptive's per-query sizing should help here (small queries get small allocation), but we didn't measure it.
Q5 · The pool's internal anatomy
Our credit numbers suggest QTM=N allocates roughly 1–2× N compute units depending on workload, and QTM=0 allocates up to one per concurrent query. The exact pool composition, per-region capacity, and dispatch algorithm aren't documented.
Did you try this? Tell us what you saw.
Did you measure adaptive on your workload?
This was one experiment on one trial account on one morning. Your account, your queries, your concurrency profile, your business hours: almost certainly different. If you've measured adaptive on your workload, we'd be glad to hear what you found.
A few things in particular we'd love to compare against:
- Enterprise account vs trial: does the pool feel different?
- Multi-minute queries: does classic auto-scaling catch up?
- QTM values we didn't test: does
QTM=20orQTM=50shift the sweet spot? - Production workloads with mixed query sizes: what does per-query sizing actually save?
- End-of-quarter or peak-hours runs: pool contention from other accounts?
If your numbers are dramatically different from ours (faster, slower, much cheaper, much more expensive), let us know at darshan.meel@gmail.com. We'll happily update this post with a note and credit your data.
What we'll test next — next post: late June 2026: a heavier, more realistic shape. Hold an adaptive L warehouse at QTM=2 busy with two long-running queries, then fire a 50-query burst on top. Does the pool absorb the burst on top of saturated floor capacity? Compare against a pre-warmed classic L with 2 clusters running the same two long queries, hit by the same burst. That tells us what happens when the floor is already spoken for.
Should you move?
The framing we'd defend from the data: default to adopting adaptive, unless your testing surfaces specific cost spikes. The operational reduction is real. The performance, in matched comparisons, is at least as good as classic done well, and dramatically better than classic done badly (which is to say, most classic deployments).
Not a save-money-or-go-faster story. A remove-planning-overhead story — provided your testing shows nothing breaks. As long as it doesn't make things worse for your workload, the operational benefit is the reason to adopt.
A practical implication of having only two dials: you can set them conservatively high once and largely stop tuning. MAX_QPL × QTM × per-hour rate × peak hours gives you a worst-case daily spend you can reason about. If that number is acceptable, you don't need to re-tune later. Actual cost will be lower, because each query only pays for what it used. Classic never gave you that.
- Bursty BI and analyst workloads. Dashboards refreshing at 9 AM, analyst queries arriving in waves.
- Mixed-shape workloads on one warehouse. Small dashboards alongside occasional heavy queries.
- Teams managing 10+ warehouses with quarterly sizing reviews. Operational reduction alone is worth the move.
- New deployments where you'd be guessing the right size anyway.
- Production needing predictability. Use
QTM=Nmatching your concurrency floor. Don't useQTM=0without measuring pool behaviour at your levels. - Heavy ETL with predictable cadence. If your classic warehouses are already fully utilised at the right size, savings may be marginal. Run a cost A/B.
- Sparse workloads with long idle gaps. Per-query billing may not amortise as cheaply as classic + auto-suspend.
- One bad query dominates the bill. Adaptive doesn't fix bad SQL. Fix the SQL.
- Hard concurrent compute budgets. MAX_QPL caps per-query, not aggregate. Resource monitors still required.
- Standard Edition accounts (Adaptive requires Enterprise+).
- Snowpark-Optimized and Interactive warehouses.
- Regions where Adaptive Public Preview isn't yet live.
Our honest take
If your workload looks like a typical BI or analyst pattern, adaptive at QTM=10 (or higher, sized to your concurrency expectations) is the right default going forward. It costs less than pre-warmed classic for similar performance, and it removes the planning overhead that classic always demanded but most teams never invested in. QTM=0 is for when you need fastest possible burst response and accept the premium. Lower QTM values trade burst speed for cost; pick by which one your workload is judged on.
If you're risk-averse: keep one classic warehouse for predictable production batch work, migrate ad-hoc and analyst workloads to adaptive first, and let one billing cycle of data tell you whether to expand. The cost evidence will surface fast if there's a problem.
Three things that would close the loop.
If anyone at Snowflake reads this, three product asks based on what we couldn't measure:
-
01
Per-query cost visibility for adaptive warehouses
Currently NULL in
QUERY_ATTRIBUTION_HISTORY, well beyond the documented latency window. The product docs say this is "planned for GA"; please get there. Without it, every cost claim about adaptive is warehouse-level guesswork. -
02
Pool capacity and utilisation visibility
A simple "current pool load" view per region would let customers understand whether
QTM=0will deliver elasticity at their burst time, or contend with neighbour workloads. Today this is invisible. -
03
Documentation of the QTM-to-compute-units mapping
Our credit numbers seem to suggest a 1–2× provisioning multiplier on QTM, with
QTM=0behaving differently again. Customers would benefit from understanding this so they can capacity-plan rather than treat QTM as a black box.
None of this is critical for adoption; customers will move to adaptive because it works. But these gaps will create FinOps confusion and capacity-planning anxiety that doesn't need to exist.
If you want the harness, warehouse configs, queries, and credit-parsing scripts, or you'd like to compare notes on adaptive numbers from your own account, email darshan.meel@gmail.com for the setup data and runbook to reproduce on your own account.
— field notes from a quiet morning on eu-west-1