Crosshire
The finding, the fix — no queries Full audit with all SQL and provenance
Tech·Field report·7 min read·23 May 2026

Four dashboards asked the same question. Four different answers.

A field note from one audited Databricks workspace. The Slack question — “How many monthly active customers did we have in April?” — was answered by four dashboards on the same Unity Catalog, against the same source tables, with four different numbers: 11,309, 12,847, 14,201, 16,540. A 5,231-customer spread on a question that has, by stipulation, one right answer. None of the SQL was wrong. Each dashboard carried its own definition of “active”. Unity Catalog metric views are the governance lever — one definition, with synonyms a Genie agent inherits for free.

46 %
spread between the highest and lowest answer
4 dashboards · same workspace · same question
Range: 11,309 → 16,540 active customers, April
Provenance
4dashboards
1question
6SQL definitions
5,231customer gap
46 %spread
Platform
Databricks on AWS, eu-west-1, Unity Catalog enabled
Audit window
2026-04-01 → 2026-04-30 (calendar April; dashboards inspected 2026-05-04)
Surface
4 Lakeview dashboards plus 2 Genie spaces, same source tables in main.analytics.*
Sources
information_schema.views · dashboard SQL pulled via REST /api/2.0/lakeview/dashboards · metric-view DDL per Databricks Business Semantics docs
Cost basis
Not applicable; this is a correctness audit, not a billing one
Anonymised Reproduced on a controlled trial Method-only public release
01The same question, four answers

Four dashboards, one Slack thread, a 5,231-customer gap.

The triggering event was ordinary. Somebody in #revenue asked, “How many active customers did we have last month?” Three people answered with three screenshots of three dashboards. The numbers disagreed. A fourth dashboard was found when an analyst joined the thread. None of the dashboards was labelled experimental. All four were on the same Unity Catalog, hitting the same fact tables in main.analytics.

Four dashboards, April 2026, same workspace
Dashboard Owner Reported “active customers” Delta vs median
<dash_growth> Growth 16,540 +22 %
<dash_product> Product 14,201 +5 %
<dash_finance> Finance 12,847 −5 %
<dash_exec> Exec board 11,309 −16 %
Spread (max − min) 5,231 customers 46 %
The exec-board number was the lowest. The growth-team number was the highest. A 46 % spread on a question whose answer is, by construction, a single integer.
“Active customers” · April 2026 4 dashboards, same workspace
5,231-customer gap · 46 % 10,000 17,000 11,309 12,847 14,201 16,540 EXEC FINANCE PRODUCT GROWTH REPORTED ANSWER · HOW MANY ACTIVE CUSTOMERS DID WE HAVE IN APRIL?
Four dots, four answers, one spread band. The labels carried no caveats; the queries underneath did all the disagreeing.
Reproducibility note
  • The four dashboard SQLs were pulled via the Lakeview REST API and diffed by hand; the underlying tables were untouched in this audit.
  • The same shape shows up on every account we audit that has more than three analysts and more than one BI tool. The specific KPI varies; the spread does not.
  • This piece is anonymised and reproduced on a controlled trial workspace so the method, the SQL, and the metric-view DDL can be shown in public.
02Each query was correct. The definition wasn’t shared.

Six SQL fragments. Six definitions of “active.”

Back to the four dots on the number line. We pulled the SQL underneath each one via the Lakeview REST API and diffed the WHERE clauses by hand. Four dashboards turned out to carry six distinct definitions of “active” — two of them ran an “active customers” tile and an “active customers, paid only” tile alongside each other. None of the fragments is wrong in isolation. Each answers a slightly different question:

The six fragments, paraphrased
Dashboard tile Window Activity signal Excludes
Growth · active customers Trailing 30 days Any customer_events row — (includes trials)
Product · monthly actives Calendar month ≥1 login event Internal accounts
Finance · active customers Calendar month ≥1 paid invoice Trials, internal accounts
Exec · active customers Calendar month ≥2 login events on ≥2 calendar days Trials, internal, churned mid-month

The exec definition is the strictest — hence the lowest number. The growth definition is the loosest — hence the highest. Every one of these was a reasonable choice at the moment somebody wrote it. The drift is not a quality problem; it is an organisational problem. Four people, four different days, four reasonable definitions, no shared place to anchor them.

The exec-board fragment (paraphrased)main.analytics.customer_eventssql
SELECT COUNT(DISTINCT customer_id)
FROM   main.analytics.customer_events
WHERE  event_type = 'login'
  AND  event_date BETWEEN '2026-04-01' AND '2026-04-30'
  AND  plan_tier = 'paid'
  AND  NOT is_internal
GROUP BY customer_id
HAVING COUNT(DISTINCT event_date) >= 2;
The growth-team fragment (paraphrased)main.analytics.customer_eventssql
SELECT COUNT(DISTINCT customer_id)
FROM   main.analytics.customer_events
WHERE  event_date >= current_date() - INTERVAL 30 DAYS;
The strictest fragment is twice as long as the loosest. Both ran green. The disagreement was never about correctness — it was about which question each dashboard was actually answering. — Field report, §2
03What a metric view actually is

One YAML, one DDL, one source of truth.

Back to the six fragments from §2. One Unity Catalog metric view replaces all of them — not by being cleverer SQL, but by being the one place the definition lives. Dashboards, notebooks, and Genie spaces query the view by name; the WHERE clause they used to carry now lives inside it. The view itself is created with CREATE VIEW … WITH METRICS; its body is a YAML document with three load-bearing keys (source, dimensions, measures). Change the definition once, and every consumer reads the new one on the next query.

A metric view for “active customer”main.analytics.active_customerssql
CREATE OR REPLACE VIEW main.analytics.active_customers WITH METRICS
LANGUAGE YAML
AS $$
version: 1.1

source: |
  SELECT customer_id,
         event_date,
         event_type,
         plan_tier,
         is_internal
  FROM   main.analytics.customer_events
  WHERE  NOT is_internal
    AND  plan_tier IN ('paid', 'trial')

dimensions:
  - name: event_date
    expr: event_date
  - name: plan_tier
    expr: plan_tier
    synonyms: ['plan', 'tier', 'subscription level']

measures:
  - name: active_customers
    expr: COUNT(DISTINCT IF(event_type = 'login', customer_id, NULL))
    synonyms: ['active users', 'MAU', 'monthly actives']
    format: '#,##0'

  - name: paying_actives
    expr: COUNT(DISTINCT IF(event_type = 'login'
                              AND plan_tier = 'paid', customer_id, NULL))
    synonyms: ['paid actives', 'paying customers']
    format: '#,##0'
$$;

Three things to notice. One. The view body is not the SELECT — it is a contract. The source block is a query that returns rows; the measures block describes how to aggregate them. A consumer never writes the aggregation. Two. Each dimension and measure can carry a synonyms list. This is the agent-facing metadata, and it is the reason the same view works whether the question is “MAU” or “monthly actives” or “paid actives”. Three. A measure’s expr is evaluated after the source query — grouping happens downstream, not in the view body — which is why a single view can support a hundred dashboards that all roll up differently.

Querying it from a dashboard or a notebook looks like an ordinary aggregation. The view enforces the rest:

The dashboard SQL collapses to four linesconsumer sidesql
SELECT MEASURE(active_customers) AS active_customers
FROM   main.analytics.active_customers
WHERE  event_date BETWEEN '2026-04-01' AND '2026-04-30';
The dashboard query no longer contains the definition. The view contains it. Six fragments collapse to one SELECT, and the spread collapses with them. — Field report, §3
04The Genie angle: agents inherit your definitions

An AI agent without a metric view is a fifth dashboard.

The four-answer problem in §1 was caused by humans. The same shape, amplified, is the natural output of an AI agent that writes SQL on the fly. Genie spaces on this workspace had two of them. Asked “how many active customers in April”, each Genie wrote its own WHERE clause from the column names of customer_events. One answered 13,400. The other answered 15,920. Neither agreed with any of the four dashboards.

A metric view changes the agent’s job. Instead of inferring a definition from table schemas, the agent reads the synonyms, display_name, and format fields on every measure. “MAU” and “monthly actives” both route to MEASURE(active_customers) on the governed view. The agent stops inventing and starts looking up.

Same Slack question, before and after
Surface Before metric view After metric view
Growth dashboard16,54013,418
Product dashboard14,20113,418
Finance dashboard12,84713,418
Exec dashboard11,30913,418
Genie space “Revenue”13,40013,418
Genie space “Product”15,92013,418
Spread5,2310

The “after” number is not magically the right one; it is the agreed one. Whether 13,418 is what the business wants “active” to mean is a separate conversation, and a much better one to have once than four times.

Without a metric view, the agent inherits ambiguity and ships another answer. With one, it inherits the definition. Governance is upstream of the model.
Where the synonyms live
  • display_name — what the dashboard, Genie, and the catalog browser show humans. Set it once.
  • synonyms — the array a Genie agent matches against natural-language phrasing. “MAU” → active_customers.
  • format — an Excel-style number format string applied at render time, so every surface shows the same digit grouping.
  • comment — long-form, shown on hover. The first place a new analyst should look before writing a fifth definition.
05Migrate one KPI, end to end

Five steps. One afternoon.

Back to the audited workspace. The migration ran in an afternoon — here’s the order. Inventory, reconciliation, and authoring are the governance work; the grant and the sweep are the mechanical finish.

Inventory. Find every dashboard, notebook, and Genie space that answers a question containing the phrase you care about. This SQL pulls the dashboard tile sources via Unity Catalog’s information schema (Genie space bindings require the REST API; included as a comment):

Find every place “active” is definedinformation_schema.viewssql
-- Returns every Unity-Catalog view whose body mentions "active",
-- with the catalog/schema/name and the view definition text.
-- For Lakeview dashboard tiles, pair this with /api/2.0/lakeview/dashboards.
SELECT table_catalog,
       table_schema,
       table_name,
       view_definition
FROM   system.information_schema.views
WHERE  LOWER(view_definition) LIKE '%active%customer%'
   OR  LOWER(view_definition) LIKE '%monthly%active%'
ORDER BY table_catalog, table_schema, table_name;

Reconcile. Put the fragments in one document. Decide which one is the real definition. This is the meeting; this is the only place a metric view does not save you time. Whoever owns the KPI signs the decision off.

Author. Write the metric view DDL with the agreed definition in source and the agreed metric in measures. Add synonyms for every phrase the inventory above turned up in the wild. The view in §3 is a complete example.

Grant. Metric views are Unity Catalog objects; access is governed the same way as a table. The analyst role needs SELECT; the agent service principal needs SELECT and USE CATALOG:

Grant access to consumersGRANTsql
GRANT SELECT ON VIEW main.analytics.active_customers
  TO `analysts`;

GRANT SELECT ON VIEW main.analytics.active_customers
  TO `genie-sp-revenue`;

Sweep. Replace each dashboard tile’s SQL with the four-line consumer query from §3. Open the Genie space and re-ask the original Slack question. The answer should now be the same across every surface, and the audit log on the metric view records who changed what, when.

The hard part is the reconciliation. The DDL is twenty lines; the agreement is the work. A metric view is the artefact the agreement gets written into so it survives the next analyst. — Field report, §5
Not in this post · future field notes
  • Materialised metric views. the experimental materialization: YAML block turns the view into a delta table that refreshes on a schedule. The latency / freshness trade-off, with numbers.
  • Window measures. Period-over-period KPIs (“MoM growth”, “7-day rolling actives”) declared in YAML, not in dashboard SQL. The shape, the gotchas.
  • Composability. Joining two metric views — active_customers against revenue — without inheriting either view’s grouping by accident. A short rules-of-the-road piece.
From our audit
Cost audits get the headlines; correctness audits decide whether the company trusts its own dashboards. Every workspace we audit that has more than a handful of analysts carries some version of the four-answer problem above. A Crosshire audit ships the inventory query, the reconciliation template, and the metric-view DDL for your top ten KPIs — with a human reviewing every definition before it lands in Unity Catalog. You keep the views, the synonyms, and the runbook.
Start a conversation →
Sources
· · ·

Numbers in this note are from one audited Databricks workspace, anonymised and reproduced on a controlled trial so the method, the SQL, and the metric-view DDL can be shown in public. The same diagnostic ships in every Crosshire audit. — Crosshire

D
writes Crosshire Journal · crosshire.ch · May 2026
Crosshire Journal
Field reports on data, compute, and the unglamorous decisions that shape engineering teams. Made in EU. Cited evidence, GDPR-native.
Tech·Quick fix·4 min read·23 May 2026

One workspace, one question, four different answers.

Same Unity Catalog. Same source tables. Same Slack question. Four dashboards on the workspace answered “how many active customers in April?” with four different numbers and a 5,231-customer spread. None of the SQL was wrong. Each dashboard had its own definition. Metric views are the one-place fix.

46 %
spread between the highest and lowest answer
4 dashboards · 6 SQL definitions of “active”
11,309 → 16,540 active customers, April
5,231-customer gap · 46 % 11,309 12,847 14,201 16,540 EXEC FINANCE PRODUCT GROWTH 10,000 17,000
Four dots, four answers, one yellow spread band. Each dot is a dashboard, each label its owner.
Provenance · what happened

1A question in #revenue: “how many active customers did we have last month?” Three teammates screenshotted three dashboards. The numbers didn’t match. A fourth dashboard turned up when the analyst joined the thread.

2We pulled the four tile SQLs via the Lakeview REST API. Each had a different WHERE: trailing 30 days vs calendar month, any event vs login only, internal accounts in or out, paid only or not. Six fragments, six definitions of “active.”

3The fix was one Unity Catalog metric view: main.analytics.active_customers with a YAML body, agreed dimensions and measures, and a synonyms list (“MAU”, “monthly actives”) the Genie agent inherits. Every dashboard switched to MEASURE(active_customers). The next audit returned a single number across all six surfaces.

01The problem

Same question, different definitions.

Every dashboard, every notebook, every Genie space carries its own WHERE clause for what counts as “active”. On this workspace, the four dashboards had agreed on the source table (main.analytics.customer_events) and disagreed on everything else: window length, activity signal, whether internal accounts count, whether trials count.

The exec dashboard — strictest definition, lowest number — showed 11,309. The growth dashboard — loosest, highest — showed 16,540. A 5,231-customer gap on a question whose answer is, by stipulation, a single integer.

02Why it happens

There was no one place to put the definition.

Each fragment was written on a different day by a different person with a slightly different question in mind. Without a shared object to anchor “active customer” to, the next analyst writes the next variant. Genie spaces make this worse, not better: an agent inferring SQL from table schemas ships a fifth definition on the first ask.

03The fix

One metric view. Spread: zero.

A Unity Catalog metric view is a SQL object with a YAML body. It declares one source row set, agreed dimensions, and agreed measures — including a synonyms list per measure so “MAU”, “monthly actives”, and “active users” all resolve to the same number. Dashboards query it by name. Genie agents inherit the synonyms rather than inventing a sixth definition on the first ask.

On the controlled trial workspace, every surface was pointed at one view: main.analytics.active_customers.

Before and after one metric view · same Slack question
Surface Before After
Growth dashboard16,54013,418
Product dashboard14,20113,418
Finance dashboard12,84713,418
Exec dashboard11,30913,418
Genie space “Revenue”13,40013,418
Genie space “Product”15,92013,418
Spread5,2310
The agreed number is not magically the right one — it is the agreed one. Whether 13,418 is what the business wants “active” to mean is a better conversation to have once than four times. — Crosshire audit, controlled trial workspace
Want the receipts?
The long version unpacks three things this short can’t.
  • The six fragments, side by side. The actual SQL each dashboard ran — 30-day window vs calendar month, login vs any event, internal in or out — and which choice mapped to which number.
  • The Genie before/after table. Two Genie spaces asked the same question pre-migration, returned 13,400 and 15,920. After the metric view: 13,418 everywhere. The agents inherited the definition.
  • The five-step migration. Inventory query, reconciliation template, the DDL with synonyms, the GRANT for the agent service principal, and the dashboard sweep. One afternoon.
D
writes Crosshire Journal · crosshire.ch · May 2026
Two-minute field fixes from the same audits as our long-form Journal. One number, one fix, one result you can verify.
Crosshire Quick
© 2026 Crosshire Journal · Made in EU Privacy Terms Cookies License Imprint Coffee