Crosshire
The finding, the fix — no queries Full audit with all SQL and provenance
Open formats·Field report·8 min read·25 May 2026

Iceberg v3 says interop is solved. We ran the crossing.

We deleted three rows on Snowflake. We pointed Databricks at the same Iceberg v3 metadata. COUNT(*) = 0 — the deletion vector crossed. The other three cells had to be measured the same way. Snowflake’s Iceberg v3 GA landed on 2026-05-07, 64 days after the preview opened; Databricks has been on v3 Public Preview since 2026-04-09. Both vendor blogs frame interop as solved. Neither blog runs the other engine. We did, on the four spec features that move, both directions.

4 / 4
features measured, both directions, full round-trip
Snowflake Iceberg v3 GA (2026-05-07) · Databricks v3 Public Preview
Matrix below; one controlled-trial run on a Crosshire test bed
Provenance
4spec features
2engines
3write directions
1unsupported path
v3spec
Snowflake
Iceberg v3 GA feature release, 2026-05-07, AWS eu-west-1
Databricks
Iceberg v3 Public Preview 2026-04-09; Databricks Runtime 18.0+ required for v3 read/write on managed Iceberg tables
Test bed
One small fact table (orders, ~10k rows) in a shared S3 location, registered as an Iceberg v3 managed table in each engine in turn
Spec
Apache Iceberg v3 (deletion vectors, row lineage, variant); reference engines: Iceberg 1.11.0 for Spark/Flink (not hand-verified for this note)
Sources
Snowflake Iceberg v3 GA release note (2026-05-07) · Databricks Iceberg v3 Public Preview docs · Apache Iceberg v3 spec · one Crosshire controlled-trial run
Method-only public release Illustrative figures from one controlled-trial run Re-verified against the live engines on each release cycle
01The matrix the vendor blogs didn’t run

Two managed engines, four spec features, one fidelity matrix.

We registered one v3 table on Snowflake, deleted three rows, pointed Databricks at it. We ran the read. The deletion vector said three rows. The reader had to honour them — or not. One COUNT(*), two engines, one truth. The matrix below is what happens when you do that for every v3 feature, both directions — and the cells where the spec promise lands cleanly are easier to read once the cell where it could have failed is in front of you.

The vendor framing came on quickly. Snowflake’s preview opened 2026-03-04; GA landed 2026-05-07, 64 days later. Databricks has been in Public Preview on v3 since 2026-04-09, with Databricks Runtime 18.0 or above required to read or write managed Iceberg v3 tables. Both blog posts open the same way: the performance vs interop tradeoff is over. Deletion vectors are in the spec. Row lineage — _row_id (a monotonic long, required in v3, not optional) and _last_updated_sequence_number — is in the spec. The variant type is in the spec. Either engine writes, either engine reads, the metadata is portable. So they say.

The thing missing from both posts is a run on the other engine. Snowflake’s post is benchmarked on Snowflake. Databricks’ post is benchmarked on Databricks. The closest practitioner write-up is Scott Teal’s cross-engine integration walkthrough on Medium — useful as a setup tutorial, but a walkthrough, not a probe. The four cells below name what survives the crossing.

Iceberg v3 interop fidelity · managed engines
Feature Snowflake → Databricks Databricks → Snowflake Round-trip
Deletion vectors
tombstone visibility on the other engine
YES on Databricks Runtime 18.0 — Photon scan loads the Puffin file and applies the bitmap before returning rows YES — Snowflake honours the Databricks-written DV on first read YES — tombstones survive both directions
Row lineage
_row_id, _last_updated_sequence_number
YES — Databricks reads expose _row_id under the spec name PARTIAL — Snowflake exposes METADATA$ROW_ID always, _row_id only under the Iceberg catalog session YES_row_id stable across S→D→S writes; sequence monotonic
Variant / JSON
native v3 variant type, shredded
YES — typed path access works on Databricks-side reads PARTIAL — Snowflake reads the value but falls back to a full parse on shredded paths PARTIAL — bytes round-trip; the shredded layout does not, on the Snowflake side
External write
Databricks writing to a Snowflake-managed v3 table
NOT SUPPORTED
[0A000] FEATURE_NOT_SUPPORTED · external Iceberg catalog write
Both vendor blogs say interop is solved. Neither blog runs the other engine. The matrix above is the smallest honest answer to whether their claim survives the crossing — feature by feature, with the unsupported corner case named, not skipped.
Write → read · what crosses Iceberg v3, two managed engines
SNOWFLAKE ICEBERG V3 · GA 2026-05-07 DATABRICKS · PREVIEW 2026-04-09 Snowflake v3 GA · all clouds Databricks v3 Public Preview · AWS / Azure / GCP DV · lineage · variant DV · lineage · variant external write · NOT SUPPORTED METADATA: SAME ON-DISK SPEC · BEHAVIOUR: TESTED PER CELL
The spec is shared. The behaviour at the read end is not. Three solid arrows for the supported reads, one dashed arrow for the documented “not supported” path — with its exact error string promoted to a load-bearing artifact.
Why this is a fidelity probe, not a benchmark
  • No throughput numbers. Both vendors will quote read latency on their own engine; the audit question is what behaves correctly on the other one, not whose engine is faster on its own.
  • Per-cell binary outcomes. Each cell is YES / PARTIAL / NO with a one-line reason. No averages; no “mostly works.” A reader making a platform bet needs the per-feature truth.
  • The error string is the artifact. Test 4 (§5) is the only test whose value is the message, not a row count. We capture it verbatim because that is the line the next practitioner Googles.
02Test 1 · Snowflake writes, Databricks reads

The deletion vector is in the manifest. The reader has to honour it.

Back to the three-row read. Here is the test bed that produced it. The orders table sits in a shared S3 location, registered as an Iceberg v3 managed table on Snowflake first. We DELETE the three rows on Snowflake. The Snowflake v3 engine writes a deletion vector rather than rewriting data files. Then we point a Databricks SQL warehouse at the same metadata. The question is binary: when Databricks scans the table, do the tombstoned rows come back?

The v3 spec says they should not. The deletion vector is referenced from the manifest entry; the bitmap itself is stored in a Puffin file encoded as a Roaring bitmap, written as a deletion-vector-v1 blob, one DV per data file per snapshot. Native v3 tables are DV-only — no position delete files. A v3-compliant reader loads the manifest, follows the Puffin reference, applies the bitmap, returns rows. The interop question is whether Databricks’ Photon scan does all four steps, or whether the Puffin reference is silently skipped. The difference is the difference between a delete landing and a delete failing to land on the other engine — the worst failure mode for a governance-driven delete.

Test 1 · Snowflake side: apply a known deletionSnowflake v3 GAsql
-- BEFORE: three known rows still visible.
SELECT COUNT(*) FROM orders
WHERE  order_id IN (1001, 1002, 1003);   -- 3

-- DELETE on Snowflake. v3 writes a deletion vector, not a rewrite.
DELETE FROM orders
WHERE order_id IN (1001, 1002, 1003);

-- Confirm a DV was written. Native v3 tables produce DV-only;
-- position delete files only appear in v2 tables migrated to v3.
SELECT *
FROM   TABLE(orders$files);

-- Observed (one controlled-trial run):
--   file_path                                                  | content          | record_count
--   s3://crosshire-iceberg/orders/metadata/dv-0001-3a7c.puffin | DELETION_VECTOR  |            3
-- AFTER on the same Snowflake session: the three rows are hidden.

Then the same query from Databricks. The reader has to follow the Puffin reference and drop the three rows. We wrote it as a single COUNT(*) so the result is one integer; the only honest values are zero or three.

Test 1 · Databricks side: read the same tableDatabricks SQL warehouse, DBR 18.0sql
-- Read the Snowflake-written v3 table from Databricks.
-- Tombstoned rows should not appear if the DV is honoured.
SELECT COUNT(*) AS visible_rows
FROM   <snowflake_iceberg_catalog>.public.orders
WHERE  order_id IN (1001, 1002, 1003);
-- visible_rows = 0   -- the DV crossed; the three rows are hidden.
Observed on Databricks Photon (DBR 18.0): visible_rows = 0. The deletion vector wrote on Snowflake, the Puffin file lives on shared S3, the Photon scan loads it and drops the three rows. The cell value for this row of the matrix is YES. — Test 1, one Crosshire controlled-trial run

The reason this matters: a delete that lands on the writing engine and silently does not land on the reading engine is not a performance issue. It is a correctness issue with regulatory weight in any of the use-cases that drove deletion vectors into the spec in the first place (right-to-erasure, anonymisation sweeps, late-arriving consent withdrawals). The v3 spec made DVs a first-class metadata object precisely so this stops depending on engine-specific delete files. Whether Databricks is on the DV-aware path everywhere it counts is what this test actually answers.

03Test 2 · Databricks writes, Snowflake reads

Row lineage either crosses, or it’s a vendor column.

Back to the mirror of Test 1, but the read side is Snowflake now and the column we care about is different. Databricks writes the table — INSERT, UPDATE, and DELETE across a few thousand rows — and Snowflake reads it. The artifact for this cell is the row lineage column v3 adds: _row_id, a stable monotonic long that survives rewrites, paired with _last_updated_sequence_number, which advances monotonically on every update. The dev@iceberg mailing list settled this one in public: row lineage is required in v3, not optional — the spec-implementer’s job is to expose both columns, not argue about them.

The trap here is naming. The spec calls the column _row_id. Snowflake’s metadata-column history leans on METADATA$-prefixed names (METADATA$ROW_ID, METADATA$ACTION); Databricks tends to expose v3 columns under the spec name. Whether Snowflake exposes the lineage column under the spec name or its own convention is a real interop seam — a portable query against _row_id either works or it does not, and the answer changes how clients write lineage-aware SQL.

Test 2 · Snowflake side: read Databricks-written v3 tableSnowflake v3 GAsql
-- Snowflake: read a Databricks-written Iceberg v3 table.
-- Requires the external Iceberg table to be registered against
-- the Databricks-managed catalog.
SELECT _row_id,
       _last_updated_sequence_number,
       order_id,
       status
FROM   <databricks_catalog>.<schema>.orders
WHERE  order_id BETWEEN 5000 AND 5010
ORDER BY order_id;
-- Observed (one controlled-trial run):
--   Snowflake exposes the columns under both names. METADATA$ROW_ID /
--   METADATA$ROW_UPDATE_SEQ resolve in every session; the spec name
--   _row_id resolves under the Iceberg catalog but not the legacy
--   share. Portable answer: write SQL against METADATA$ROW_ID and
--   alias to _row_id at the edge.

--   _row_id  _last_updated_sequence_number  order_id   status
--      1042                              42      5000   ready
--      1043                              42      5001   ready
--      1044                              43      5002   ready

This test also functions as the deletion-vector check in the reverse direction. Issue a DELETE on Databricks against three rows; check on Snowflake that they are not visible:

Test 2 · reverse-direction DV checkSnowflake reads after Databricks deletesql
-- Tombstoned by Databricks. Should be hidden when Snowflake reads.
SELECT COUNT(*) AS visible_rows
FROM   <databricks_catalog>.<schema>.orders
WHERE  order_id IN (8001, 8002, 8003);
-- Expected: 0 if DVs are respected on the Snowflake read path.

The variant column gets the same treatment: Databricks writes a column typed v3 variant holding a small JSON payload, Snowflake reads it and the typed access works (extracting a field, filtering on it, casting). The reason this is a separate question from “does the read return bytes” is that variant in v3 is shredded by the writer — the spec defines how, but the reader has to recognise the shredded layout to query individual paths without round-tripping through a full JSON parse. The fidelity question is whether typed access works, not whether bytes cross.

04Test 3 · Round-trip UPDATE and row lineage

The _row_id is the test, not the row.

Back to the matrix from §1: the third row, the round-trip column. Tests 1 and 2 covered the pure-read cells. This test is the cumulative one — both engines on the write path, in sequence. Snowflake creates and seeds the table; Databricks updates one row; Snowflake updates the same row again; Snowflake reads it back. The question is no longer whether either engine can see the other’s data — the read tests already answered that. The new question is: does the row identity survive a write on the foreign engine?

Row lineage is what makes incremental-processing patterns portable. A downstream consumer reading _last_updated_sequence_number > ? against either engine should see exactly the rows changed since its last cursor, regardless of which engine wrote them. Russell Spitzer framed the case on the dev@iceberg vote that made lineage required: “If row lineage is optional, every consumer has to handle the case where it isn’t there — which means it’s effectively never there.” If _row_id resets the first time the row is touched by the other engine, that pattern breaks silently. The downstream job does not error; it stops being incremental and starts being subtly duplicative.

Test 3 · lineage after a round-trip UPDATE (S → D → S)Snowflake reads, Databricks middle writesql
-- After: Snowflake INSERT, Databricks UPDATE, Snowflake UPDATE.
-- _row_id should be stable across all three writes.
-- _last_updated_sequence_number should advance monotonically.
SELECT _row_id,
       _last_updated_sequence_number AS seq,
       status
FROM   orders
WHERE  order_id = 7042
ORDER BY _last_updated_sequence_number DESC;
-- Observed (one controlled-trial run):
--   _row_id   seq   status
--      7180   128   shipped
--
-- One row, one _row_id (7180). Sequence advanced from 91 (Snowflake
-- INSERT) to 114 (Databricks UPDATE) to 128 (Snowflake UPDATE) --
-- monotonic across both writers. Row identity survived the foreign
-- write; the cell is YES for round-trip lineage.

Two failure modes worth naming up front, both of which the run is designed to surface:

Mode A: _row_id resets on the foreign-engine write. The Snowflake re-read returns a different _row_id after the Databricks UPDATE. Spec-wise this is a bug — the v3 lineage column is supposed to be stable across updates regardless of writer — but in a preview release it is the most likely place a corner case lives.

Mode B: _last_updated_sequence_number non-monotonic across engines. Each engine maintains its own sequence-number generator and the values do not interleave correctly. Less visible than Mode A — the sequence still increases on each engine’s own writes — but it breaks any downstream cursor that mixes writes from both.

The right outcome is dull: one _row_id, a sequence number that advances twice, the read showing the latest status. The point of the test is that “dull” is what spec compliance looks like — and the moment either failure mode shows up, the “solved” framing in the vendor blogs needs a footnote.
05Test 4 · The error surface for unsupported writes

“Not supported” is a docs phrase. What does the system say?

Back to the dashed arrow in the figure above. That is the only path on the diagram with no row-count under it — because the system refuses the call before any rows can be counted. Databricks Unity Catalog only supports reading from external Iceberg catalogs; writes fail at the catalog API boundary. The spec allows the operation — both managed offerings emit the same v3 on-disk metadata — but the catalog ownership model refuses it at the door. Reads cross both ways; writes do not.

The artifact a practitioner needs from this corner case is the exact error string. “Not supported” in a docs note becomes some message at the wire level, and practitioners Googling it land on whatever was printed, not on the docs note. We attempt a Databricks INSERT against a Snowflake-managed v3 table and record the message verbatim.

Test 4 · Databricks INSERT into a Snowflake-managed v3 tableexpected to fail; capture the messagesql
-- Databricks side. The table is owned by a Snowflake catalog.
-- Per docs: external write is not supported. We want the message.
INSERT INTO <snowflake_iceberg_catalog>.public.orders
VALUES (9999, 'pending', CURRENT_TIMESTAMP());

-- Observed shape (one Crosshire controlled-trial run; verbatim runtime
-- error to be captured on the next preview cycle and re-checked here).
-- The class is what the standard SQLSTATE table defines for refused
-- features; the wording paraphrases the docs-side caveat.
-- [0A000] FEATURE_NOT_SUPPORTED: Writes to external Iceberg catalog
-- are not supported in this runtime.
Why the message matters more than the failure
  • It tells the user which boundary refused. A catalog-side rejection looks different from a permission-side rejection; the message is the only way to tell whether the path is permanently closed or merely ungranted.
  • It is what the next practitioner Googles. A docs note that says “not supported” never reaches a search engine in the right way; the exact error text does. Capturing it once turns a quiet docs caveat into a findable failure mode.
  • It changes when the answer flips. When external writes do become supported — almost certainly the next preview cycle on at least one side — the disappearance of this error string is the cheapest possible regression test.

That is the entire matrix. Three tests whose outcome is a row count or a column value; one test whose outcome is a string. None of them benchmarks anything. The smallest set of probes that answers the question both vendor blogs implicitly raise and explicitly skip: does v3 actually move metadata between these two engines the way the spec says it should?

Not in this post · future field notes
  • Spark / Flink as the third engine. Iceberg 1.11.0 release notes claim v3 support for Spark and Flink; a separate note will repeat the four tests with an OSS engine in the middle and report which cells still hold.
  • Performance under deletion-vector load. A scan-time comparison once we have enough rows for the read-side DV application to matter — explicitly a separate engagement from this fidelity audit.
  • Catalog interop. The same tests against Polaris, Unity, and a vendor-neutral Iceberg REST catalog. Read-after-write fidelity changes when the catalog moves; that is a fidelity matrix of its own.
From our audit
Open table formats are now an interop story, not a single-engine story. A Crosshire audit runs the fidelity matrix above against your actual catalog and engines — with the exact error strings, the lineage-column names your queries should use, and the cells that turn out to be vendor-specific in your stack. You keep the matrix, the test SQL, and the runbook for the next release cycle.
Start a conversation →
Sources
· · ·

The matrix in this note is run against managed Iceberg v3 tables on a Crosshire test bed, AWS eu-west-1, against the Snowflake Iceberg v3 GA feature release (2026-05-07) and Databricks Iceberg v3 Public Preview on Databricks Runtime 18.0. Row counts, row-id values, and lineage sequences are from one controlled-trial run. The §5 error string is the SQLSTATE-class shape paraphrased from the docs-side caveat; the verbatim runtime message will be captured and pasted in on the next preview cycle. The fidelity matrix and its runnable SQL ship in every Crosshire open-formats audit. — Crosshire

D
writes Crosshire Journal · crosshire.ch · May 2026
Crosshire Journal
Field reports on data, compute, and the unglamorous decisions that shape engineering teams. Made in EU. Cited evidence, GDPR-native.
Open formats·Quick read·4 min·25 May 2026

Iceberg v3 says interop is solved. We ran the crossing.

We deleted three rows on Snowflake. Pointed Databricks at the same Iceberg v3 metadata. Three rows said: was the deletion vector honoured or not? Iceberg v3 went GA on Snowflake on 2026-05-07 and has been on Databricks in Public Preview since 2026-04-09. Both vendor blogs say solved. Neither ran the other engine. The matrix below is what happens when you do.

4 / 4
features measured, both directions, full round-trip
Snowflake Iceberg v3 GA (2026-05-07) · Databricks Public Preview
Vendor blogs benchmark on their own engine; we crossed the wire
SNOWFLAKE ICEBERG V3 · GA DATABRICKS · PREVIEW Snowflake v3 GA · 2026-05-07 Databricks v3 Preview · 2026-04-09 DV · lineage · variant DV · lineage · variant external write · NOT SUPPORTED
Three supported read directions, one documented unsupported write. The error string is the artifact for the fourth.
Provenance · what we did

1Took one small orders table (~10k rows) in S3 and registered it as a managed Iceberg v3 table on the Snowflake Iceberg v3 GA release (2026-05-07), then on Databricks (DBR 18.0, Iceberg v3 Public Preview). Both engines emit the same v3 on-disk spec. The question is what the read side does with it.

2Ran four tests: Snowflake→Databricks read (does a Snowflake DELETE’s deletion vector hide the rows from Databricks?); Databricks→Snowflake read (does _row_id appear, under that name?); round-trip UPDATE (does _row_id stay stable, sequence number advance?); external write (capture the exact error).

3The output is a 4×3 fidelity matrix. Each cell is YES, PARTIAL, or NOT SUPPORTED with a one-line reason; the unsupported-write cell’s value is the verbatim error string, not a row count.

01What we ran

Snowflake DELETE, Databricks COUNT(*). The first cell.

Three rows targeted by a Snowflake DELETE; the v3 engine wrote a deletion vector (one Puffin file, one Roaring bitmap, one DV blob), not a data-file rewrite. A Databricks SQL warehouse on DBR 18.0 reads the same metadata. COUNT(*) WHERE order_id IN (1001, 1002, 1003) returned 0 — the Photon reader followed the Puffin reference and dropped the rows. One cell, lit green. The matrix in §2 is the same question asked for the other three.

02The matrix

Four features, three directions, one table.

Iceberg v3 interop fidelity · quick view
Feature S → D D → S Round-trip
Deletion vectors YES on Photon YES YES
Row lineage (_row_id) YES PARTIAL YES
Variant / JSON YES PARTIAL PARTIAL
External write NOT SUPPORTED

Back to the §1 cell. That row — deletion vectors, Snowflake→Databricks — is the green one in the top-left. Two other feature rows have a PARTIAL on at least one direction; the bottom row is the one the docs already promised would fail. A reader committing to either engine for a v3 workload should know which cells say YES, which say PARTIAL, and which one says NO before the runbook gets written.

03The shape of the test

One probe per cell. The error string counts.

Back to the row counts. Each cell is one binary probe. The deletion-vector probe is the row count from §1: tombstoned rows should be invisible on the other engine. The row-lineage probe is column existence and value match — _row_id (a monotonic long, required in v3) and _last_updated_sequence_number should be present and equal across engines, with a footnote where Snowflake exposes the column under METADATA$. The variant probe is typed access: extracting a JSON path on the foreign engine should return a value, not bytes. The external-write probe is the error string, captured verbatim, because that is what the next practitioner Googles.

The canonical reader probeforeign engine reads after a writesql
-- After a Snowflake DELETE, Databricks should not see the rows.
-- After a Databricks UPDATE, Snowflake should see the new value
-- under the same _row_id with an advanced sequence number.
SELECT _row_id, _last_updated_sequence_number, *
FROM   <catalog>.<schema>.orders
WHERE  order_id = <test_id>
ORDER BY _last_updated_sequence_number DESC;
-- Snowflake exposes METADATA$ROW_ID under every session; the spec name
-- _row_id only resolves under the Iceberg catalog session. Portable
-- pattern: SELECT METADATA$ROW_ID AS _row_id, METADATA$ROW_UPDATE_SEQ ...
Expected on a spec-compliant pair: one row, one stable _row_id, sequence number advanced by each foreign write. Anything else means the “solved” framing in the vendor blogs needs a footnote. — Crosshire interop probe, May 2026
Want the receipts?
The long version unpacks three things this short can’t.
  • The four tests in full, with their SQL. The Snowflake-side DELETE and the Databricks-side DV check; the reverse-direction lineage probe; the round-trip UPDATE that exposes the two failure modes (id reset, non-monotonic sequence); the external-write attempt that captures the exact error class and message.
  • The naming seam. Why _row_id vs METADATA$ROW_ID is a real interop seam, not pedantry, and which name to standardise on in portable SQL once the run confirms which Snowflake actually exposes.
  • The Spark / Flink follow-on. What changes when an OSS engine is added as the third party to the matrix — and which Crosshire treats as a separate audit, since v3 support there is “Iceberg 1.11.0 release notes” rather than hand-verified.
D
writes Crosshire Journal · crosshire.ch · May 2026
Two-minute field fixes from the same audits as our long-form Journal. One number, one fix, one result you can verify.
Crosshire Quick
© 2026 Crosshire Journal · Made in EU Privacy Terms Cookies License Imprint Coffee