Your grouped email nulls are invisible to your monitoring.
The dashboard tile read 11 % null on
customers.email for weeks before the auditor ran the
same metric grouped and one bucket emitted 40 %.
The number on the tile was correct. The unit was wrong. The
WITHIN GROUP clause on Snowflake Data Metric Functions
— Enterprise Edition GA at time of writing — is the
single DDL change that turns the first number into the second.
Table-level
NULL_COUNT: 11 % · pageable? No.DE-segment
NULL_COUNT: 40 % · pageable? Yes.
- Platform
- Snowflake Enterprise Edition, EU-Frankfurt
- Feature
WITHIN GROUPclause on built-in DMFs (Enterprise Edition GA at time of writing)- Method
- Designed test bed seeded with a region-skewed null distribution; DMF attached, results read from
SNOWFLAKE.LOCAL.DATA_QUALITY_MONITORING_RESULTS - Sources
- Snowflake docs · Working with Data Metric Functions · Built-in DMFs reference
- Cost basis
- DMF execution is billed against the configured serverless warehouse; not the subject of this note
The dashboard hid the segment. The grouped DMF named it.
We attached a NULL_COUNT DMF to
customers.email and watched the dashboard for a
fortnight. It read 11 % null — 44 nulls
across 400 rows on the test bed, scheduled nightly. Then we
added WITHIN GROUP (signup_region). The next run
wrote four rows to
DATA_QUALITY_MONITORING_RESULTS instead of one.
DE was 40 %. The dashboard tile had been
correct for weeks. The unit was wrong.
The reason the table-level reading was not pageable is
structural. A column-level null count on a table that mixes
four regions answers exactly one question: how many nulls
are in this column across the whole table? A single
segment can run at 100 % failure without the table-level number
doing anything dramatic. Forty rows from one country dilute
into four hundred across four — the average comes back at
11 %, the on-call rule reads 11 %, nobody wakes up. The
page-worthy unit is the worst segment, and a flat
NULL_COUNT cannot emit it.
WITHIN GROUP (signup_region)| Reading | Null rows | Region rows | Rate |
|---|---|---|---|
| Table-level (ungrouped) | 44 | 400 | 11.0 % |
| DE · grouped | 40 | 100 | 40.0 % |
| UK · grouped | 2 | 100 | 2.0 % |
| US · grouped | 1 | 100 | 1.0 % |
| FR · grouped | 1 | 100 | 1.0 % |
The table average is the wrong unit. The page-worthy unit is the worst segment. An ungrouped DMF cannot emit it; a grouped DMF will not let you miss it.
email by signup_region
same scan, two queries
- The
WITHIN GROUPclause for DMFs is GA on the Enterprise Edition at time of writing. Before it landed, the same diagnosis required hand-rolled queries againstinformation_schemaand a scheduled task. The grouped DMF replaces both with a single DDL. - It works on every built-in DMF that has a sensible per-segment reading:
NULL_COUNT,NULL_PERCENT,DUPLICATE_COUNT,UNIQUE_COUNT,ACCEPTED_VALUES, and most user-defined DMFs that are a plain aggregate over a window-free SELECT. - It does not work on
FRESHNESS,REFERENTIAL_INTEGRITY_COUNT, or any custom DMF whose body contains a CTE, a JOIN, or a window function. Those keep returning a single table-level value.
One clause on the ALTER TABLE, one row per group, per run.
Back to the 11 % tile from §1. The DDL change that flipped
it to a regional breakdown is one clause on the same
ALTER TABLE … ADD DATA METRIC FUNCTION
statement that attached the ungrouped DMF in the first place.
Snowflake takes the existing DMF body, applies an implicit
GROUP BY over the column(s) you name, and stores
one row per group per scheduled measurement in
SNOWFLAKE.LOCAL.DATA_QUALITY_MONITORING_RESULTS.
The DDL we ran on the test bed was this:
ALTER TABLE <your_table> ADD DATA METRIC FUNCTION SNOWFLAKE.CORE.NULL_COUNT ON (email) WITHIN GROUP (signup_region) GROUP LIMIT 1000;
On the test bed this returned Statement executed
successfully and the next scheduled measurement wrote
four rows to DATA_QUALITY_MONITORING_RESULTS
instead of one — one per region, with the segment key in
the new GROUP_BY_INFO column §3 unpacks.
Three things to notice. One. The grouping
columns are part of the DMF’s attached identity. Snowflake
will not let you change them in place — you drop the DMF
and re-add it with the new WITHIN GROUP list. The
constraint is deliberate: a grouped DMF’s history is only
coherent if the grouping does not silently shift under it.
Two. GROUP LIMIT is a hard cap,
default 1,000, minimum 1, maximum 1,000 —
and the failure mode is loud, not silent. If the
column you grouped by has more distinct values than the limit
at evaluation time, Snowflake does not write a partial result;
the entire evaluation fails with an error, no row is written
for that run. On a customer_id column with 50,000
distinct values, attaching a grouped DMF with the default cap
produces this:
SQL execution error: The number of groups exceeds the configured GROUP LIMIT of 1000 for this association. No measurement row was written for this run.
That is the Unknown function of grouped DMFs: the
error the team learns the day they group on the wrong column.
It is also the reason this is a wall-reminder, not a footnote
— you only group on columns whose cardinality you can
state out loud. For signup_region on the test bed,
that number is four. For customer_id, it is in the
tens of thousands and the answer is “don’t.”
Three. Setting WITHIN GROUP on a
DMF automatically disables Snowflake’s anomaly-detection
scoring for that metric. The per-segment series has its own
seasonality and noise floor; running the platform’s
built-in anomaly model on a noisy per-group time series would
generate more false pages than the grouped DMF was attached to
prevent.
The grouping is part of the DMF’s identity. Change the keys, drop and re-add — you do not edit a grouped DMF in place. The history would otherwise be a lie. — Field report, §2
The DE row from §1 has a shape.
The DE row from §1 has a shape. It lives in
SNOWFLAKE.LOCAL.DATA_QUALITY_MONITORING_RESULTS
with one new column that did not exist on the ungrouped runs:
GROUP_BY_INFO. It is an ARRAY of
OBJECT entries — one per grouping column, in
declaration order. For an ungrouped DMF the column is an empty
array, so a single
ARRAY_SIZE(group_by_info) > 0 predicate is
enough to split grouped from ungrouped history on the same
metric.
[
{ "id": "1", "name": "SIGNUP_REGION", "value": "DE" }
]
That is the DE row, in its committed form — the segment
key Snowflake records alongside the VALUE column
carrying the per-region null count. The read query is a plain
projection. VALUE on a grouped row is the metric
for that group only; the table-level roll-up sits on the same
metric’s ungrouped rows (the ones with
ARRAY_SIZE(group_by_info) = 0), so a single
WHERE separates the two.
SELECT measurement_time, group_by_info[0]:value::STRING AS signup_region, value::NUMBER AS null_count FROM SNOWFLAKE.LOCAL.DATA_QUALITY_MONITORING_RESULTS WHERE table_name = '<your_table>' AND ARRAY_SIZE(group_by_info) > 0 ORDER BY measurement_time DESC;
Four rows came back, ordered by measurement_time
desc, one per region. DE first on rate, the other three close
enough to the floor that nothing past the first row mattered
for the next page-or-don’t-page decision:
| measurement_time | signup_region | null_count |
|---|---|---|
| 2026-05-25 03:00:00 UTC | DE | 40 |
| 2026-05-25 03:00:00 UTC | UK | 2 |
| 2026-05-25 03:00:00 UTC | US | 1 |
| 2026-05-25 03:00:00 UTC | FR | 1 |
For a multi-key grouping (say, WITHIN GROUP (signup_region,
plan_tier)), the array carries one entry per key in the
order they were declared. Reach into them by index:
group_by_info[0]:value for the region,
group_by_info[1]:value for the tier. The order is
the order on the ALTER TABLE; renaming a column
does not reshuffle it.
The whole point: page the right human at 03:00.
The DE row is one query away from a page. The alerting SQL
collapses to one filter on VALUE: a threshold of
10 nulls per region per night fires exactly one row on the
test bed — DE — and leaves the other three regions
silent. That is the behaviour the table-level metric cannot
produce no matter how the threshold is tuned: the average will
either be too low to page or too high to be useful.
SELECT group_by_info[0]:value::STRING AS signup_region, value::NUMBER AS null_count FROM SNOWFLAKE.LOCAL.DATA_QUALITY_MONITORING_RESULTS WHERE table_name = '<your_table>' AND ARRAY_SIZE(group_by_info) > 0 AND value::NUMBER > <your_threshold> ORDER BY measurement_time DESC;
On the test bed with a threshold of 10:
one row returned — signup_region = DE,
null_count = 40. Wired to a Snowflake Alert
(CREATE ALERT + EXECUTE TASK), this
becomes the contents of the email that lands on the DE on-call,
not the global data on-call. The metric stops being a number
in a dashboard and starts being a routing decision.
Ungrouped, this same column was an 11 % reading nobody woke up for. Grouped, it is a single row that names the bucket, names the rate, and routes to the only person who can fix it. That is the whole feature, in one query. — Field report, §4
The clause is small. The footguns are predictable.
The DE row arrived because the test bed had four regions. On a
customer_id column with 50,000 distinct values,
the same query produces no row at all — the run errors
loud with the GROUP LIMIT message from §2.
None of the limits below are deal-breakers; all of them are
the kind of thing a team discovers the hard way three weeks
after a successful rollout. The four to bake into the runbook
before that happens:
GROUP LIMITcaps at 1,000 — and fails loud. Default is 1,000; minimum is 1; maximum is 1,000. If the column has more distinct values than the limit at evaluation time, the entire run errors and no row is written. You do not get a partial result; you get theThe number of groups exceeds the configured GROUP LIMITmessage from §2. Group on columns whose cardinality you can state out loud.- Grouping is immutable. Change the keys, drop and re-add the DMF —
ALTER … MODIFYon grouping columns is not supported. Snowflake protects the historical series this way; the side effect is your DDL change is two statements, not one. - Anomaly detection auto-disables. Once
WITHIN GROUPis set on a DMF, the platform’s anomaly-detection scoring is turned off for that metric. Per-segment alerting is on you (the query in §4 is the substitute). - Not supported on every DMF. Specifically:
FRESHNESSandREFERENTIAL_INTEGRITY_COUNTignoreWITHIN GROUP, and any user-defined DMF whose body contains aCTE, aJOIN, aUNION, aDISTINCT, or a window function falls back to a single table-level value. The grouping is a syntactic transform applied to the DMF body; bodies that can’t be cleanly group-by’d don’t qualify.
The rollout pattern that survives these is uneventful.
Inventory the columns you would actually want a per-segment
reading on (region, tenant, plan tier — not user id, not
order id). State each one’s cardinality. Add the
WITHIN GROUP DMF, run it once, read
GROUP_BY_INFO the way §3 shows. Wire the
§4 alert before you switch off the old ungrouped one. The
whole loop fits inside a single change window.
- Multi-key groupings in practice. What
WITHIN GROUP (signup_region, plan_tier)looks like on a real schema, with cardinality math, the read query, and the bill for the extra rows. - Grouped DMFs vs
DATA_METRIC_SCHEDULE. The schedule and the grouping interact — per-group time series multiply your row counts inDATA_QUALITY_MONITORING_RESULTS. The retention math, with numbers. - User-defined grouped DMFs. Which custom-DMF shapes Snowflake will accept under
WITHIN GROUP, and the rewrite pattern for the ones it rejects (kill the CTE, push the JOIN upstream into a view).
WITHIN GROUP rewrite with
the cardinality math and the alert query. A human reviews the
DDL before it lands. You keep the queries, the thresholds, and
the runbook.
- Snowflake docs · Working with Data Metric Functions — attaching DMFs, scheduling,
WITHIN GROUPsyntax,GROUP LIMITbehaviour - Snowflake docs · Built-in Data Metric Functions —
NULL_COUNT,NULL_PERCENT, supported groupings, custom-DMF body restrictions - Snowflake docs ·
DATA_QUALITY_MONITORING_RESULTS—GROUP_BY_INFOshape, retention
Numbers in this note come from a designed test bed on
Snowflake Enterprise Edition (EU-Frankfurt), seeded with a
region-skewed null distribution — 400 rows across four
regions, 44 null emails concentrated in DE — and a
WITHIN GROUP (signup_region) DMF attached against
it. The seed bed design is the provenance: same diagnostic, same
cardinality, every Crosshire audit. — Crosshire
Your dashboard says 11 %. One region is at 40 %.
The dashboard tile read 11 % null on
customers.email for weeks. Then we added
WITHIN GROUP (signup_region) — one clause on the
same ALTER TABLE — and the next run wrote
four rows instead of one. DE alone
carried 40 of 44 nulls. The other three regions had one
or two each. The same column, the same scan, two readings.
Table-level reading: 11 % · DE reading: 40 %
Snowflake Enterprise Edition · EU-Frankfurt · one DDL clause
1Nightly DMF on customers.email read 44 nulls across 400 rows — 11.0 %. The dashboard tile sat there for weeks. Uncomfortable; not pageable.
2Snowflake’s WITHIN GROUP clause for DMFs is GA on Enterprise EE. One line on the ALTER TABLE — WITHIN GROUP (signup_region) — emits one result row per region per run, stored in DATA_QUALITY_MONITORING_RESULTS with a new GROUP_BY_INFO array.
3Per-region reading: DE 40, UK 2, US 1, FR 1. A threshold-10 alert query returned exactly one row, named the segment, and routed the page to the right on-call. The dashboard hid the broken segment; the grouped DMF named it.
A table average hides a regional failure.
A column-level NULL_COUNT on a multi-region table
answers exactly one question: how many nulls are in this
column across the whole table? One segment can run at
100 % failure without the table-level number moving enough to
page. On the test bed, DE was at 40 % and the dashboard read
11 %.
One clause: WITHIN GROUP (signup_region).
Attach a grouped DMF. Snowflake stores one row per group per
run in SNOWFLAKE.LOCAL.DATA_QUALITY_MONITORING_RESULTS,
with the segment key in a new GROUP_BY_INFO
array column.
ALTER TABLE <your_table> ADD DATA METRIC FUNCTION SNOWFLAKE.CORE.NULL_COUNT ON (email) WITHIN GROUP (signup_region) GROUP LIMIT 1000; -- Read per-region results SELECT group_by_info[0]:value::STRING AS signup_region, value::NUMBER AS null_count FROM SNOWFLAKE.LOCAL.DATA_QUALITY_MONITORING_RESULTS WHERE table_name = '<your_table>' AND ARRAY_SIZE(group_by_info) > 0 AND value::NUMBER > <your_threshold> ORDER BY measurement_time DESC;
One row. The right on-call.
| signup_region | null_count | rate |
|---|---|---|
| DE | 40 | 40.0 % |
| UK | 2 | 2.0 % |
| US | 1 | 1.0 % |
| FR | 1 | 1.0 % |
The alert query with a threshold of 10 returns exactly one row
— signup_region = DE, null_count = 40.
It names the segment and routes the page to the only person
who can fix it.
Ungrouped, this was a number nobody woke up for. Grouped, it is a routing decision. — Crosshire audit, test bed
- The
GROUP_BY_INFOshape, in full. What the array looks like for single- and multi-key groupings, and the index pattern that survives a column rename. - The four hard limits. The loud-fail
GROUP LIMITerror at 1,000 distinct groups (entire run errors, no row written), the immutable grouping, the auto-disabled anomaly detection, and the DMF shapesWITHIN GROUPrejects (CTEs, JOINs, windows,FRESHNESS). - The chart and the runbook. A per-region null-rate bar over the dashed table-level line, plus the rollout pattern that survives the footguns — one change window, two statements.