Snowflake Adaptive: one less thing to plan

01The short version

Five findings, up front.

01 — Frame

Adaptive isn't really about speed or cost. It's about not having to plan. No size, no cluster count, no scaling policy, no auto-suspend. Set a per-query ceiling and a concurrency floor.

02 — Scaling

We stress-tested the riskiest claim. It holds. Adaptive matched or beat every classic configuration we tried at matched compute strength.

03 — The dial

QTM sets the floor, not the ceiling. From our credit numbers: QTM=10 guarantees ~10 units and the pool over-provisions above; QTM=0 can spin up one unit per concurrent query.

04 — Trade-offs

No single sweet spot. QTM=4 cheapest. QTM=10 the latency optimum: 37s for 0.48 cr. QTM=0 fastest, at 3.5× more for sub-20s.

05 — Verdict

Adopt unless your testing shows cost spikes. The operational reduction is real. The performance is at least as good. The unanswered questions are real but not blockers.

The framing matters: this isn't "adaptive is faster." Sometimes classic is competitive, when properly planned. The win is that nobody actually plans classic properly, and adaptive turns that planning into a couple of CREATE WAREHOUSE parameters.

Preview-feature disclaimer Adaptive Compute is in Public Preview as of 2026. Region availability, defaults, syntax, and behaviour may change. Everything below is sourced from docs.snowflake.com or our own runs on a quiet trial account at a single point in time. Where the post talks about how the platform actually behaves under load (particularly how the QTM dial maps to allocated compute), those are inferences from credit numbers, not statements from Snowflake's docs. Verify before relying on anything specific in production.

02The product

A routing target, not a box.

A classic Snowflake warehouse is a thing you size. You pick a T-shirt label, configure clusters, decide on auto-suspend, configure Query Acceleration Service, and live with whatever quarterly tuning meetings result.

An adaptive warehouse asks you for two things and removes everything else.

Classic five+ decisions legacyCREATE WAREHOUSE analytics_wh WITH
    WAREHOUSE_SIZE      = LARGE
    WAREHOUSE_TYPE      = STANDARD
    MIN_CLUSTER_COUNT   = 1
    MAX_CLUSTER_COUNT   = 3
    SCALING_POLICY      = STANDARD
    AUTO_SUSPEND        = 60
    AUTO_RESUME         = TRUE
    ENABLE_QUERY_ACCELERATION = TRUE
    QUERY_ACCELERATION_MAX_SCALE_FACTOR = 8;

Adaptive two decisions recommendedCREATE ADAPTIVE WAREHOUSE analytics_wh
    WITH MAX_QUERY_PERFORMANCE_LEVEL = LARGE
         QUERY_THROUGHPUT_MULTIPLIER = 4;

-- everything else is gone.

What the two dials mean

MAX_QUERY_PERFORMANCE_LEVEL (MAX_QPL) is the ceiling on per-query compute. Snowflake routes each query into a shared pool and sizes it from XS up to your ceiling. Most queries don't need the ceiling. The ones that do, get it.

QUERY_THROUGHPUT_MULTIPLIER (QTM) is the floor on guaranteed concurrency. At QTM=4, Snowflake commits to running at least four max-sized queries in parallel. Smaller queries pack alongside; bursts above the floor may queue. Accepts any non-negative integer (default 2); 0 means unlimited burst (best-effort, no floor commitment).

What's not there

Look at what's absent. No size column in SHOW WAREHOUSES (it's null). No STARTED or SUSPENDED state; it shows ENABLED. No min_cluster_count, no started_clusters. No AUTO_SUSPEND or AUTO_RESUME to configure. No Query Acceleration Service settings. The abstraction is the absence.

If you've used Databricks job compute or AWS Lambda, the model is familiar. You set bounds, the platform allocates compute per request. Snowflake's version arrived later than the others, but the shape is the same.

Why this matters

Every parameter you don't set is a tuning meeting you don't have to attend. The classic warehouse model required teams to first pick a warehouse type (Standard, Standard Gen 2, Snowpark-Optimized) and a scaling policy (Standard or Economy), then on top of that, decide warehouse size, cluster count, suspend timer, and Query Acceleration Service configuration. Often per workload. Often re-tuned quarterly.

Adaptive collapses the type-and-policy decision tree entirely and reduces the runtime parameters to two. Even those two don't really need ongoing tuning: set them conservatively high, and you've capped your worst-case spend. Roughly:

MAX_QPL × QTM × per-hour rate × peak hours = worst-case daily spend

Classic doesn't give you a clean ceiling like that; auto-scaling, suspend timers, and per-cluster billing all interact in ways that make "what's the worst case" hard to answer without modelling. Faster or cheaper is secondary. The win is no longer having those meetings.

03The stress test

Does it actually scale?

Operational simplicity is a real benefit, but only if the platform actually works under load. The riskiest claim Snowflake makes about adaptive is that the shared pool scales elastically to absorb burst concurrency without you orchestrating it.

We ran 50 concurrent queries (a per-row math computation over a 100M-row fact table joined with a 10K-row dimension, scanning ~14 GB per query) against nine warehouse configurations to verify. Same data, same role, same Python test harness. One warmup query per warehouse before the burst. Then we measured wall clock, per-query latency, queue times, and total credits.

The nine setups

Setup	Type	Config
classic_m_mc1	Classic Std Gen 2	Single cluster, no scale-out
classic_m_mc4_auto	Classic Std Gen 2	MIN=1, MAX=4, auto-scale on demand
classic_m_mc10_auto	Classic Std Gen 2	MIN=1, MAX=10, auto-scale on demand
classic_m_mc4_pre	Classic Std Gen 2	MIN=MAX=4, all clusters running before burst
classic_m_mc10_pre	Classic Std Gen 2	MIN=MAX=10, all clusters running before burst
adaptive_m_qtm2	Adaptive	MAX_QPL=M, QTM=2 (Snowflake default)
adaptive_m_qtm4	Adaptive	MAX_QPL=M, QTM=4
adaptive_m_qtm10	Adaptive	MAX_QPL=M, QTM=10
adaptive_m_qtm0	Adaptive	MAX_QPL=M, QTM=0 (unlimited burst)

All nine cap at Medium-class per-query compute. The classic setups span the four ways you might choose to handle 50 concurrent queries today, including the pre-warmed configurations almost nobody actually deploys. The adaptive setups span the QTM dial.

04Classic under burst

Auto-scaling didn't help. Pre-warming did.

The classic results showed something we did not predict: permitting more cluster capacity made things slower.

Auto-scaling was too slow. Pre-warming with too few clusters was uneven. Pre-warming with enough clusters was expensive. The dispatcher was the bottleneck. Capacity rarely was.

Setup	Wall	p50 latency	Queued	Credits
classic_m_mc1	234s	148s	42/50	0.45
classic_m_mc4_auto	308s	114s	42/50	2.00
classic_m_mc10_auto worst	380s	163s	42/50	2.79
classic_m_mc4_pre	271s	76s	21/50	2.03
classic_m_mc10_pre best of classic	30s	29s	0/50	1.53

The single-cluster baseline finished in 234s. Allowing up to 10 clusters of auto-scaling capacity stretched the same workload to 380s, slower than doing nothing. SCALING_POLICY = STANDARD waits for sustained queueing before adding clusters, and our 500ms burst finishes its arrival before the auto-scaler has decided to act. New clusters arrive cold, with no shared SSD cache, and start picking up the tail of an already-serializing queue.

Six of ten clusters never started

The per-cluster breakdown for classic_m_mc10_auto tells the whole story:

CL 1 · 152s

CL 2 · 101s

CL 3

CL 4 · 217s

CL 5 · 287s

CL 6

CL 7

CL 8

CL 9

CL 10

saturated partial brief never started

Six of ten allowed clusters never started during the 380-second window. The auto-scaler simply didn't react fast enough.

Pre-warming worked, when pushed

The bottom row of the headline table is the same physical Medium-class compute with the same query, same data, same harness. The only difference: MIN_CLUSTER_COUNT = 10, so all ten clusters were running before the burst arrived. We polled SHOW WAREHOUSES until started_clusters = 10 before firing. Result: 30 seconds wall clock, perfect load distribution (5 queries per cluster), zero queueing.

The dispatcher was the bottleneck, not capacity

The middle row of the table is more interesting than the bottom row. classic_m_mc4_pre had all four clusters running before the burst arrived, and still took 271 seconds. Look at how the burst distributed:

mc4_pre cluster	Queries handled	Distribution
CL 1	32
CL 2	1
CL 3	11
CL 4	6

Cluster 1 got 32 of 50 queries. Clusters 2–4 were running, idle, and visible, but the intra-warehouse dispatcher concentrated load on cluster 1 until it was visibly saturated. Same root cause as the auto-scaling failure: Snowflake's classic load balancer is reluctant to fan out.

⟶ The lesson "Classic works if you so vastly over-provision that the dispatcher can't ruin it." — field note, 17 May 2026 · eu-west-1

The caveats

This finding is bounded by our setup. We tested a trial account on a quiet morning; queries that solo in ~30 seconds (mid-weight, not multi-minute); a 500ms arrival burst (more aggressive than most real workloads); and SCALING_POLICY = STANDARD only. For multi-minute queries, the auto-scaler has more time to react and may engage productively. If you've measured this on an Enterprise account or with longer-running queries, we'd be glad to hear what you saw.

The classic story, honestly

Classic warehouses have two ways to lose under burst, and both come from the dispatcher. The auto-scaler waits too long to add clusters. The intra-warehouse load balancer concentrates queries on one cluster even when others are idle. Adding more cluster permission doesn't help with either; capacity wasn't the missing piece.

Pre-warming worked, but only when over-provisioned to the point where the dispatcher couldn't ruin it. The classic warehouse model assumes sustained load, and bursts are a different problem it never solved well.

05Adaptive's QTM dial

One dial. Four ways to spend.

Four adaptive runs, same workload, same MAX_QPL = Medium. The only variable: the QTM floor.

Wall clock by QTM

Same workload, 50 concurrent queries, MAX_QPL=M. Bar width is seconds; right column is credits billed.

QTM=2 floor

115s

0.74 cr

QTM=4 cheap

65s

0.39 cr

QTM=10 balanced

37s

0.48 cr

QTM=0 burst

21s

1.67 cr

0s50s100s150s200s250s300s380s

Setup	Wall	p50	Overload queued	Provisioning wait	Credits
adaptive_m_qtm2	115s	66s	42/50	0.2s · 48/50	0.74
adaptive_m_qtm4 cheapest	65s	37s	34/50	1.7s · 48/50	0.39
adaptive_m_qtm10 balanced	37s	21s	10/50	1.0s · 47/50	0.48
adaptive_m_qtm0 premium	21s	17s	0/50	0.0s · 6/50	1.67

How QTM allocates compute

Look at the "overload queued" column. At QTM=2, Snowflake commits to running two max-sized queries concurrently; the other 42 queue. At QTM=10, the first ten run; the next ten queue. At QTM=0, nothing queues. The pool absorbs all 50 simultaneously.

The compute-units inference, from credits: QTM=2 billed roughly 4–6 units of work; QTM=10 billed roughly 10–12; QTM=0 billed many tens of units, closer to one worker per concurrent query plus ramp-up overhead, not a clean N=50. The mechanism we read out of this: QTM=N is a floor the pool over-provisions above when capacity allows, and QTM=0 is "give every query its own worker, regardless of count."

QTM isn't a speed dial. It's a cost dial. Higher QTM means more compute reserved above your floor, faster bursts, bigger bill. QTM=0 is "I do not care what it costs."

The decision: pick the universe your workload lives in

Cost optimum

QTM=4

Cheapest per query. Pick this when billing dominates the tuning conversation.

0.008 cr/qper query

65swall

Sweet spot

QTM=10

Latency optimum at acceptable cost. 40 of 50 queries finish in under 30s. Pick this when a human or a dashboard is waiting.

0.010 cr/q+25% vs QTM=4

37swall

Burst

QTM=0

Fastest possible. Pick this for Sunday backfills, end-of-month rebuilds, demos. The premium is real.

0.033 cr/q3.3× QTM=10

21swall

Reading the QTM dial There's no single sweet spot, only trade-offs. The decision is which metric your workload is judged on, not which QTM is "right."

The hidden cost: provisioning time

Above the overload-queue column is another one we haven't shown yet: provisioning queue. Every adaptive query waits a small amount for compute to spin up before executing. Typically 1–2 seconds. Tiny per query, but real, and not advertised. At QTM=0 the cost almost disappears (only 6 of 50 queries waited), suggesting unlimited-burst mode pre-allocates more aggressively. Classic warehouses don't show this column at all; the clusters are either running or they're not.

Cache-warmup observation One thing the data showed but we don't have a clean story for: cache hit rate doesn't correlate cleanly with QTM. QTM=2 and QTM=0 both saw ~88% cache hits. QTM=4 saw 72%. QTM=10 saw only 24%. A single warmup query seems to seed one slot in the pool, not all of them. If you're benchmarking adaptive against classic, run a real warm-up burst rather than a single query; your results will look very different.

Pool capacity is real, but invisible

Here's something the data hints at but Snowflake doesn't document: the adaptive pool has bounds. We observed QTM=0 absorbing 50 concurrent queries with zero overload queueing. What we don't know is what happens at 500. Or 5,000. Or what happens when the pool is contending with high load from other accounts in the same region.

The adaptive pool is shared across customers. "Unlimited" is permission to ask for elasticity, not a guarantee of capacity. On our quiet trial account at 9 AM, the pool delivered. On a busy Enterprise account at end-of-quarter close, it may not deliver the same way. This is a real risk that's hard to test in advance: we can't load-test the global pool from our seat. We can only flag it.

06The cost picture

All nine, ranked by credits per query.

Same workload, same data, different shapes of compute. Sorted by what each query cost on average.

Setup	Wall	Total credits	Credits / query	Trade-off
adaptive_m_qtm4 cheapest	65s	0.39	0.008	Cheap, decent speed
classic_m_mc1	234s	0.45	0.009	Cheapest classic, slowest overall
adaptive_m_qtm10 balanced	37s	0.48	0.010	Best latency at acceptable cost
adaptive_m_qtm2	115s	0.74	0.015	Heavy queueing
classic_m_mc10_pre	30s	1.53	0.031	Fast, requires heavy planning
adaptive_m_qtm0 premium	21s	1.67	0.033	Fastest, premium price
classic_m_mc4_auto	308s	2.00	0.040	Slow and expensive
classic_m_mc4_pre	271s	2.03	0.041	Dispatcher pinned 32/50 to cluster 1
classic_m_mc10_auto worst	380s	2.79	0.056	Worst of every world

A 7× cost difference for the same work, same data, same nominal compute strength. The variable isn't capacity. It's how compute gets allocated. — §6, the cost picture

Three configurations are defensibly good, and they live in three different metric universes: qtm4 for billing, qtm10 for latency, qtm0 for raw speed. The pre-warmed classic option (classic_m_mc10_pre) finishes fast, but at qtm0's premium price and with the dispatcher caveat: you only get that result by over-provisioning so heavily that the dispatcher can't matter. The lower five rows are actively bad choices on this workload.

07What we can't tell you

Open questions worth knowing.

Every benchmark is bounded by its conditions. These are the honest gaps in our findings, in the order they're most likely to bite a production decision.

Q1 · Pool behaviour at scale

The adaptive pool is shared across customers in a region. Our test ran on a quiet morning. When adaptive adoption ramps up and many accounts simultaneously fire bursts during business hours, the pool will face contention we can't measure today. This may be the biggest unknown.

Q2 · Heavier, longer queries

Our queries solo in ~30 seconds. For multi-minute analytical queries, classic auto-scaling has more time to react before the burst is over. The relative gap should shrink. We don't know by how much.

Q3 · Per-query cost decomposition

QUERY_ATTRIBUTION_HISTORY.credits_attributed_compute was NULL for all 450 of our queries (classic and adaptive) within an hour of completion. Docs note this view "may be NULL for adaptive during Public Preview." Until per-query cost lands at GA, all adaptive cost claims are at warehouse level.

Q4 · Mixed workloads

We tested a uniform burst of 50 identical queries. A realistic workload mixes 100ms dashboard queries with 30-second heavy queries. Adaptive's per-query sizing should help here (small queries get small allocation), but we didn't measure it.

Q5 · The pool's internal anatomy

Our credit numbers suggest QTM=N allocates roughly 1–2× N compute units depending on workload, and QTM=0 allocates up to one per concurrent query. The exact pool composition, per-region capacity, and dispatch algorithm aren't documented.

08Over to you

Did you try this? Tell us what you saw.

Reader challenge

Did you measure adaptive on your workload?

This was one experiment on one trial account on one morning. Your account, your queries, your concurrency profile, your business hours: almost certainly different. If you've measured adaptive on your workload, we'd be glad to hear what you found.

A few things in particular we'd love to compare against:

Enterprise account vs trial: does the pool feel different?
Multi-minute queries: does classic auto-scaling catch up?
QTM values we didn't test: does QTM=20 or QTM=50 shift the sweet spot?
Production workloads with mixed query sizes: what does per-query sizing actually save?
End-of-quarter or peak-hours runs: pool contention from other accounts?

If your numbers are dramatically different from ours (faster, slower, much cheaper, much more expensive), let us know at darshan.meel@gmail.com. We'll happily update this post with a note and credit your data.

What we'll test next — next post: late June 2026: a heavier, more realistic shape. Hold an adaptive L warehouse at QTM=2 busy with two long-running queries, then fire a 50-query burst on top. Does the pool absorb the burst on top of saturated floor capacity? Compare against a pre-warmed classic L with 2 clusters running the same two long queries, hit by the same burst. That tells us what happens when the floor is already spoken for.

09Adoption

Should you move?

The framing we'd defend from the data: default to adopting adaptive, unless your testing surfaces specific cost spikes. The operational reduction is real. The performance, in matched comparisons, is at least as good as classic done well, and dramatically better than classic done badly (which is to say, most classic deployments).

Not a save-money-or-go-faster story. A remove-planning-overhead story — provided your testing shows nothing breaks. As long as it doesn't make things worse for your workload, the operational benefit is the reason to adopt.

A practical implication of having only two dials: you can set them conservatively high once and largely stop tuning. MAX_QPL × QTM × per-hour rate × peak hours gives you a worst-case daily spend you can reason about. If that number is acceptable, you don't need to re-tune later. Actual cost will be lower, because each query only pays for what it used. Classic never gave you that.

Strong yes

Bursty BI and analyst workloads. Dashboards refreshing at 9 AM, analyst queries arriving in waves.
Mixed-shape workloads on one warehouse. Small dashboards alongside occasional heavy queries.
Teams managing 10+ warehouses with quarterly sizing reviews. Operational reduction alone is worth the move.
New deployments where you'd be guessing the right size anyway.

Probably yes, with measurement

Production needing predictability. Use QTM=N matching your concurrency floor. Don't use QTM=0 without measuring pool behaviour at your levels.
Heavy ETL with predictable cadence. If your classic warehouses are already fully utilised at the right size, savings may be marginal. Run a cost A/B.

Be careful with

Sparse workloads with long idle gaps. Per-query billing may not amortise as cheaply as classic + auto-suspend.
One bad query dominates the bill. Adaptive doesn't fix bad SQL. Fix the SQL.
Hard concurrent compute budgets. MAX_QPL caps per-query, not aggregate. Resource monitors still required.

Can't yet

Standard Edition accounts (Adaptive requires Enterprise+).
Snowpark-Optimized and Interactive warehouses.
Regions where Adaptive Public Preview isn't yet live.

Our honest take

If your workload looks like a typical BI or analyst pattern, adaptive at QTM=10 (or higher, sized to your concurrency expectations) is the right default going forward. It costs less than pre-warmed classic for similar performance, and it removes the planning overhead that classic always demanded but most teams never invested in. QTM=0 is for when you need fastest possible burst response and accept the premium. Lower QTM values trade burst speed for cost; pick by which one your workload is judged on.

If you're risk-averse: keep one classic warehouse for predictable production batch work, migrate ad-hoc and analyst workloads to adaptive first, and let one billing cycle of data tell you whether to expand. The cost evidence will surface fast if there's a problem.

For Snowflake

Three things that would close the loop.

If anyone at Snowflake reads this, three product asks based on what we couldn't measure:

01
Per-query cost visibility for adaptive warehouses

Currently NULL in QUERY_ATTRIBUTION_HISTORY, well beyond the documented latency window. The product docs say this is "planned for GA"; please get there. Without it, every cost claim about adaptive is warehouse-level guesswork.
02
Pool capacity and utilisation visibility

A simple "current pool load" view per region would let customers understand whether QTM=0 will deliver elasticity at their burst time, or contend with neighbour workloads. Today this is invisible.
03
Documentation of the QTM-to-compute-units mapping

Our credit numbers seem to suggest a 1–2× provisioning multiplier on QTM, with QTM=0 behaving differently again. Customers would benefit from understanding this so they can capacity-plan rather than treat QTM as a black box.

None of this is critical for adoption; customers will move to adaptive because it works. But these gaps will create FinOps confusion and capacity-planning anxiety that doesn't need to exist.

Get the setup data

If you want the harness, warehouse configs, queries, and credit-parsing scripts, or you'd like to compare notes on adaptive numbers from your own account, email darshan.meel@gmail.com for the setup data and runbook to reproduce on your own account.

· · ·

— field notes from a quiet morning on eu-west-1

Darshan Singh

writes Crosshire Journal · crosshire.ch · May 2026

Five findings, up front.

A routing target, not a box.

What the two dials mean

What's not there

Why this matters

Does it actually scale?

The nine setups

Auto-scaling didn't help. Pre-warming did.

Six of ten clusters never started

Pre-warming worked, when pushed

The dispatcher was the bottleneck, not capacity

The caveats

One dial. Four ways to spend.

Wall clock by QTM

How QTM allocates compute

The decision: pick the universe your workload lives in

The hidden cost: provisioning time

Pool capacity is real, but invisible

All nine, ranked by credits per query.

Open questions worth knowing.

Q1 · Pool behaviour at scale

Q2 · Heavier, longer queries

Q3 · Per-query cost decomposition

Q4 · Mixed workloads

Q5 · The pool's internal anatomy

Did you try this? Tell us what you saw.

Did you measure adaptive on your workload?

Should you move?

Our honest take

Three things that would close the loop.

Per-query cost visibility for adaptive warehouses

Pool capacity and utilisation visibility

Documentation of the QTM-to-compute-units mapping