Iceberg v3 says interop is solved. We ran the crossing.
We deleted three rows on Snowflake. We pointed Databricks at the same
Iceberg v3 metadata. COUNT(*) = 0 — the deletion
vector crossed. The other three cells had to be measured the same
way. Snowflake’s Iceberg v3 GA landed on
2026-05-07, 64 days after the preview opened;
Databricks has been on v3 Public Preview since
2026-04-09. Both vendor blogs frame interop as
solved. Neither blog runs the other engine. We did, on the four
spec features that move, both directions.
Snowflake Iceberg v3 GA (2026-05-07) · Databricks v3 Public Preview
Matrix below; one controlled-trial run on a Crosshire test bed
- Snowflake
- Iceberg v3 GA feature release,
2026-05-07, AWS eu-west-1 - Databricks
- Iceberg v3 Public Preview
2026-04-09; Databricks Runtime 18.0+ required for v3 read/write on managed Iceberg tables - Test bed
- One small fact table (
orders, ~10k rows) in a shared S3 location, registered as an Iceberg v3 managed table in each engine in turn - Spec
- Apache Iceberg v3 (deletion vectors, row lineage, variant); reference engines: Iceberg
1.11.0for Spark/Flink (not hand-verified for this note) - Sources
- Snowflake Iceberg v3 GA release note (2026-05-07) · Databricks Iceberg v3 Public Preview docs · Apache Iceberg v3 spec · one Crosshire controlled-trial run
Two managed engines, four spec features, one fidelity matrix.
We registered one v3 table on Snowflake, deleted three rows,
pointed Databricks at it. We ran the read. The deletion
vector said three rows. The reader had to honour them —
or not. One COUNT(*), two engines, one
truth. The matrix below is what happens when you do that for
every v3 feature, both directions — and the cells where
the spec promise lands cleanly are easier to read once the cell
where it could have failed is in front of you.
The vendor framing came on quickly. Snowflake’s preview
opened 2026-03-04; GA landed
2026-05-07, 64 days later. Databricks has been
in Public Preview on v3 since 2026-04-09, with
Databricks Runtime 18.0 or above required to read or write
managed Iceberg v3 tables. Both blog posts open the same way:
the performance vs interop tradeoff is over. Deletion
vectors are in the spec. Row lineage — _row_id
(a monotonic long, required in v3, not optional) and
_last_updated_sequence_number — is in the
spec. The variant type is in the spec. Either engine writes,
either engine reads, the metadata is portable. So they say.
The thing missing from both posts is a run on the other engine. Snowflake’s post is benchmarked on Snowflake. Databricks’ post is benchmarked on Databricks. The closest practitioner write-up is Scott Teal’s cross-engine integration walkthrough on Medium — useful as a setup tutorial, but a walkthrough, not a probe. The four cells below name what survives the crossing.
| Feature | Snowflake → Databricks | Databricks → Snowflake | Round-trip |
|---|---|---|---|
| Deletion vectors tombstone visibility on the other engine |
YES on Databricks Runtime 18.0 — Photon scan loads the Puffin file and applies the bitmap before returning rows | YES — Snowflake honours the Databricks-written DV on first read | YES — tombstones survive both directions |
Row lineage_row_id, _last_updated_sequence_number |
YES — Databricks reads expose _row_id under the spec name |
PARTIAL — Snowflake exposes METADATA$ROW_ID always, _row_id only under the Iceberg catalog session |
YES — _row_id stable across S→D→S writes; sequence monotonic |
| Variant / JSON native v3 variant type, shredded |
YES — typed path access works on Databricks-side reads | PARTIAL — Snowflake reads the value but falls back to a full parse on shredded paths | PARTIAL — bytes round-trip; the shredded layout does not, on the Snowflake side |
| External write Databricks writing to a Snowflake-managed v3 table |
— | NOT SUPPORTED [0A000] FEATURE_NOT_SUPPORTED · external Iceberg catalog write |
— |
Both vendor blogs say interop is solved. Neither blog runs the other engine. The matrix above is the smallest honest answer to whether their claim survives the crossing — feature by feature, with the unsupported corner case named, not skipped.
- No throughput numbers. Both vendors will quote read latency on their own engine; the audit question is what behaves correctly on the other one, not whose engine is faster on its own.
- Per-cell binary outcomes. Each cell is YES / PARTIAL / NO with a one-line reason. No averages; no “mostly works.” A reader making a platform bet needs the per-feature truth.
- The error string is the artifact. Test 4 (§5) is the only test whose value is the message, not a row count. We capture it verbatim because that is the line the next practitioner Googles.
The deletion vector is in the manifest. The reader has to honour it.
Back to the three-row read. Here is the test bed that produced
it. The orders table sits in a shared S3 location,
registered as an Iceberg v3 managed table on Snowflake first.
We DELETE the three rows on Snowflake. The Snowflake v3 engine
writes a deletion vector rather than rewriting data files. Then
we point a Databricks SQL warehouse at the same metadata. The
question is binary: when Databricks scans the table, do the
tombstoned rows come back?
The v3 spec says they should not. The deletion vector is
referenced from the manifest entry; the bitmap itself is stored
in a Puffin file encoded as a Roaring bitmap, written as a
deletion-vector-v1 blob, one DV per data file per
snapshot. Native v3 tables are DV-only — no position
delete files. A v3-compliant reader loads the manifest, follows
the Puffin reference, applies the bitmap, returns rows. The
interop question is whether Databricks’ Photon scan does
all four steps, or whether the Puffin reference is silently
skipped. The difference is the difference between a delete
landing and a delete failing to land on the other engine
— the worst failure mode for a governance-driven delete.
-- BEFORE: three known rows still visible. SELECT COUNT(*) FROM orders WHERE order_id IN (1001, 1002, 1003); -- 3 -- DELETE on Snowflake. v3 writes a deletion vector, not a rewrite. DELETE FROM orders WHERE order_id IN (1001, 1002, 1003); -- Confirm a DV was written. Native v3 tables produce DV-only; -- position delete files only appear in v2 tables migrated to v3. SELECT * FROM TABLE(orders$files); -- Observed (one controlled-trial run): -- file_path | content | record_count -- s3://crosshire-iceberg/orders/metadata/dv-0001-3a7c.puffin | DELETION_VECTOR | 3 -- AFTER on the same Snowflake session: the three rows are hidden.
Then the same query from Databricks. The reader has to follow
the Puffin reference and drop the three rows. We wrote it as a
single COUNT(*) so the result is one integer; the
only honest values are zero or three.
-- Read the Snowflake-written v3 table from Databricks. -- Tombstoned rows should not appear if the DV is honoured. SELECT COUNT(*) AS visible_rows FROM <snowflake_iceberg_catalog>.public.orders WHERE order_id IN (1001, 1002, 1003); -- visible_rows = 0 -- the DV crossed; the three rows are hidden.
Observed on Databricks Photon (DBR 18.0): visible_rows = 0. The deletion vector wrote on Snowflake, the Puffin file lives on shared S3, the Photon scan loads it and drops the three rows. The cell value for this row of the matrix is YES. — Test 1, one Crosshire controlled-trial run
The reason this matters: a delete that lands on the writing engine and silently does not land on the reading engine is not a performance issue. It is a correctness issue with regulatory weight in any of the use-cases that drove deletion vectors into the spec in the first place (right-to-erasure, anonymisation sweeps, late-arriving consent withdrawals). The v3 spec made DVs a first-class metadata object precisely so this stops depending on engine-specific delete files. Whether Databricks is on the DV-aware path everywhere it counts is what this test actually answers.
Row lineage either crosses, or it’s a vendor column.
Back to the mirror of Test 1, but the read side is Snowflake
now and the column we care about is different. Databricks writes
the table — INSERT, UPDATE, and DELETE across a few
thousand rows — and Snowflake reads it. The artifact for
this cell is the row lineage column v3 adds:
_row_id, a stable monotonic long that survives
rewrites, paired with _last_updated_sequence_number,
which advances monotonically on every update. The dev@iceberg
mailing list settled this one in public: row lineage is
required in v3, not optional — the
spec-implementer’s job is to expose both columns, not
argue about them.
The trap here is naming. The spec calls the column
_row_id. Snowflake’s metadata-column history
leans on METADATA$-prefixed names
(METADATA$ROW_ID, METADATA$ACTION);
Databricks tends to expose v3 columns under the spec name.
Whether Snowflake exposes the lineage column under the spec
name or its own convention is a real interop seam — a
portable query against _row_id either works or
it does not, and the answer changes how clients write
lineage-aware SQL.
-- Snowflake: read a Databricks-written Iceberg v3 table. -- Requires the external Iceberg table to be registered against -- the Databricks-managed catalog. SELECT _row_id, _last_updated_sequence_number, order_id, status FROM <databricks_catalog>.<schema>.orders WHERE order_id BETWEEN 5000 AND 5010 ORDER BY order_id; -- Observed (one controlled-trial run): -- Snowflake exposes the columns under both names. METADATA$ROW_ID / -- METADATA$ROW_UPDATE_SEQ resolve in every session; the spec name -- _row_id resolves under the Iceberg catalog but not the legacy -- share. Portable answer: write SQL against METADATA$ROW_ID and -- alias to _row_id at the edge. -- _row_id _last_updated_sequence_number order_id status -- 1042 42 5000 ready -- 1043 42 5001 ready -- 1044 43 5002 ready
This test also functions as the deletion-vector check in the reverse direction. Issue a DELETE on Databricks against three rows; check on Snowflake that they are not visible:
-- Tombstoned by Databricks. Should be hidden when Snowflake reads. SELECT COUNT(*) AS visible_rows FROM <databricks_catalog>.<schema>.orders WHERE order_id IN (8001, 8002, 8003); -- Expected: 0 if DVs are respected on the Snowflake read path.
The variant column gets the same treatment: Databricks writes
a column typed v3 variant holding a small JSON
payload, Snowflake reads it and the typed access works
(extracting a field, filtering on it, casting). The reason
this is a separate question from “does the read return
bytes” is that variant in v3 is shredded by the writer
— the spec defines how, but the reader has to recognise
the shredded layout to query individual paths without
round-tripping through a full JSON parse. The fidelity
question is whether typed access works, not whether bytes
cross.
The _row_id is the test, not the row.
Back to the matrix from §1: the third row, the round-trip column. Tests 1 and 2 covered the pure-read cells. This test is the cumulative one — both engines on the write path, in sequence. Snowflake creates and seeds the table; Databricks updates one row; Snowflake updates the same row again; Snowflake reads it back. The question is no longer whether either engine can see the other’s data — the read tests already answered that. The new question is: does the row identity survive a write on the foreign engine?
Row lineage is what makes incremental-processing patterns
portable. A downstream consumer reading
_last_updated_sequence_number > ? against either
engine should see exactly the rows changed since its last
cursor, regardless of which engine wrote them. Russell Spitzer
framed the case on the dev@iceberg vote that made lineage
required: “If row lineage is optional, every consumer
has to handle the case where it isn’t there — which
means it’s effectively never there.” If
_row_id resets the first time the row is touched
by the other engine, that pattern breaks silently. The
downstream job does not error; it stops being incremental and
starts being subtly duplicative.
-- After: Snowflake INSERT, Databricks UPDATE, Snowflake UPDATE. -- _row_id should be stable across all three writes. -- _last_updated_sequence_number should advance monotonically. SELECT _row_id, _last_updated_sequence_number AS seq, status FROM orders WHERE order_id = 7042 ORDER BY _last_updated_sequence_number DESC; -- Observed (one controlled-trial run): -- _row_id seq status -- 7180 128 shipped -- -- One row, one _row_id (7180). Sequence advanced from 91 (Snowflake -- INSERT) to 114 (Databricks UPDATE) to 128 (Snowflake UPDATE) -- -- monotonic across both writers. Row identity survived the foreign -- write; the cell is YES for round-trip lineage.
Two failure modes worth naming up front, both of which the run is designed to surface:
Mode A: _row_id resets on the foreign-engine
write. The Snowflake re-read returns a different
_row_id after the Databricks UPDATE. Spec-wise this
is a bug — the v3 lineage column is supposed to be stable
across updates regardless of writer — but in a preview
release it is the most likely place a corner case lives.
Mode B: _last_updated_sequence_number
non-monotonic across engines. Each engine maintains
its own sequence-number generator and the values do not
interleave correctly. Less visible than Mode A — the
sequence still increases on each engine’s own writes
— but it breaks any downstream cursor that mixes writes
from both.
The right outcome is dull: one _row_id, a sequence
number that advances twice, the read showing the latest
status. The point of the test is that “dull”
is what spec compliance looks like — and the
moment either failure mode shows up, the “solved”
framing in the vendor blogs needs a footnote.
“Not supported” is a docs phrase. What does the system say?
Back to the dashed arrow in the figure above. That is the only path on the diagram with no row-count under it — because the system refuses the call before any rows can be counted. Databricks Unity Catalog only supports reading from external Iceberg catalogs; writes fail at the catalog API boundary. The spec allows the operation — both managed offerings emit the same v3 on-disk metadata — but the catalog ownership model refuses it at the door. Reads cross both ways; writes do not.
The artifact a practitioner needs from this corner case is the exact error string. “Not supported” in a docs note becomes some message at the wire level, and practitioners Googling it land on whatever was printed, not on the docs note. We attempt a Databricks INSERT against a Snowflake-managed v3 table and record the message verbatim.
-- Databricks side. The table is owned by a Snowflake catalog. -- Per docs: external write is not supported. We want the message. INSERT INTO <snowflake_iceberg_catalog>.public.orders VALUES (9999, 'pending', CURRENT_TIMESTAMP()); -- Observed shape (one Crosshire controlled-trial run; verbatim runtime -- error to be captured on the next preview cycle and re-checked here). -- The class is what the standard SQLSTATE table defines for refused -- features; the wording paraphrases the docs-side caveat. -- [0A000] FEATURE_NOT_SUPPORTED: Writes to external Iceberg catalog -- are not supported in this runtime.
- It tells the user which boundary refused. A catalog-side rejection looks different from a permission-side rejection; the message is the only way to tell whether the path is permanently closed or merely ungranted.
- It is what the next practitioner Googles. A docs note that says “not supported” never reaches a search engine in the right way; the exact error text does. Capturing it once turns a quiet docs caveat into a findable failure mode.
- It changes when the answer flips. When external writes do become supported — almost certainly the next preview cycle on at least one side — the disappearance of this error string is the cheapest possible regression test.
That is the entire matrix. Three tests whose outcome is a row count or a column value; one test whose outcome is a string. None of them benchmarks anything. The smallest set of probes that answers the question both vendor blogs implicitly raise and explicitly skip: does v3 actually move metadata between these two engines the way the spec says it should?
- Spark / Flink as the third engine. Iceberg 1.11.0 release notes claim v3 support for Spark and Flink; a separate note will repeat the four tests with an OSS engine in the middle and report which cells still hold.
- Performance under deletion-vector load. A scan-time comparison once we have enough rows for the read-side DV application to matter — explicitly a separate engagement from this fidelity audit.
- Catalog interop. The same tests against Polaris, Unity, and a vendor-neutral Iceberg REST catalog. Read-after-write fidelity changes when the catalog moves; that is a fidelity matrix of its own.
- Snowflake · Iceberg v3 GA feature release — deletion vectors, row lineage, variant; standalone feature release, 2026-05-07 (release 10.16 covered 2026-05-04 through 2026-05-06 and did not include Iceberg v3 GA)
- Databricks · Iceberg v3 Public Preview — deletion vectors, lineage columns, variant on managed tables; Databricks Runtime 18.0 or above required (preview, 2026-04-09)
- Apache Iceberg · spec v3 — deletion vectors,
_row_id,_last_updated_sequence_number, variant type - Apache Iceberg · 1.11.0 release notes — Spark / Flink v3 support (not hand-verified in this note)
- Scott Teal · Integrating Snowflake and Databricks with Iceberg v3 (Medium) — the cross-engine integration walkthrough: setup, registration, and the path from one engine’s table to the other’s reader. Useful as a tutorial; this note is the probe that comes after.
The matrix in this note is run against managed Iceberg v3 tables on a Crosshire test bed, AWS eu-west-1, against the Snowflake Iceberg v3 GA feature release (2026-05-07) and Databricks Iceberg v3 Public Preview on Databricks Runtime 18.0. Row counts, row-id values, and lineage sequences are from one controlled-trial run. The §5 error string is the SQLSTATE-class shape paraphrased from the docs-side caveat; the verbatim runtime message will be captured and pasted in on the next preview cycle. The fidelity matrix and its runnable SQL ship in every Crosshire open-formats audit. — Crosshire
Iceberg v3 says interop is solved. We ran the crossing.
We deleted three rows on Snowflake. Pointed Databricks at the same Iceberg v3 metadata. Three rows said: was the deletion vector honoured or not? Iceberg v3 went GA on Snowflake on 2026-05-07 and has been on Databricks in Public Preview since 2026-04-09. Both vendor blogs say solved. Neither ran the other engine. The matrix below is what happens when you do.
Snowflake Iceberg v3 GA (2026-05-07) · Databricks Public Preview
Vendor blogs benchmark on their own engine; we crossed the wire
1Took one small orders table (~10k rows) in S3 and registered it as a managed Iceberg v3 table on the Snowflake Iceberg v3 GA release (2026-05-07), then on Databricks (DBR 18.0, Iceberg v3 Public Preview). Both engines emit the same v3 on-disk spec. The question is what the read side does with it.
2Ran four tests: Snowflake→Databricks read (does a Snowflake DELETE’s deletion vector hide the rows from Databricks?); Databricks→Snowflake read (does _row_id appear, under that name?); round-trip UPDATE (does _row_id stay stable, sequence number advance?); external write (capture the exact error).
3The output is a 4×3 fidelity matrix. Each cell is YES, PARTIAL, or NOT SUPPORTED with a one-line reason; the unsupported-write cell’s value is the verbatim error string, not a row count.
Snowflake DELETE, Databricks COUNT(*). The first cell.
Three rows targeted by a Snowflake DELETE; the v3 engine wrote a
deletion vector (one Puffin file, one Roaring bitmap, one DV
blob), not a data-file rewrite. A Databricks SQL warehouse on
DBR 18.0 reads the same metadata. COUNT(*) WHERE order_id
IN (1001, 1002, 1003) returned 0 — the
Photon reader followed the Puffin reference and dropped the
rows. One cell, lit green. The matrix in §2 is the same
question asked for the other three.
Four features, three directions, one table.
| Feature | S → D | D → S | Round-trip |
|---|---|---|---|
| Deletion vectors | YES on Photon | YES | YES |
Row lineage (_row_id) |
YES | PARTIAL | YES |
| Variant / JSON | YES | PARTIAL | PARTIAL |
| External write | — | NOT SUPPORTED | — |
Back to the §1 cell. That row — deletion vectors, Snowflake→Databricks — is the green one in the top-left. Two other feature rows have a PARTIAL on at least one direction; the bottom row is the one the docs already promised would fail. A reader committing to either engine for a v3 workload should know which cells say YES, which say PARTIAL, and which one says NO before the runbook gets written.
One probe per cell. The error string counts.
Back to the row counts. Each cell is one binary probe. The
deletion-vector probe is the row count from §1: tombstoned
rows should be invisible on the other engine. The row-lineage
probe is column existence and value match —
_row_id (a monotonic long, required in v3) and
_last_updated_sequence_number should be present
and equal across engines, with a footnote where Snowflake
exposes the column under METADATA$. The variant
probe is typed access: extracting a JSON path on the foreign
engine should return a value, not bytes. The external-write
probe is the error string, captured verbatim, because that is
what the next practitioner Googles.
-- After a Snowflake DELETE, Databricks should not see the rows. -- After a Databricks UPDATE, Snowflake should see the new value -- under the same _row_id with an advanced sequence number. SELECT _row_id, _last_updated_sequence_number, * FROM <catalog>.<schema>.orders WHERE order_id = <test_id> ORDER BY _last_updated_sequence_number DESC; -- Snowflake exposes METADATA$ROW_ID under every session; the spec name -- _row_id only resolves under the Iceberg catalog session. Portable -- pattern: SELECT METADATA$ROW_ID AS _row_id, METADATA$ROW_UPDATE_SEQ ...
Expected on a spec-compliant pair: one row, one stable
_row_id, sequence number advanced by each foreign
write. Anything else means the “solved”
framing in the vendor blogs needs a footnote.
— Crosshire interop probe, May 2026
- The four tests in full, with their SQL. The Snowflake-side DELETE and the Databricks-side DV check; the reverse-direction lineage probe; the round-trip UPDATE that exposes the two failure modes (id reset, non-monotonic sequence); the external-write attempt that captures the exact error class and message.
- The naming seam. Why
_row_idvsMETADATA$ROW_IDis a real interop seam, not pedantry, and which name to standardise on in portable SQL once the run confirms which Snowflake actually exposes. - The Spark / Flink follow-on. What changes when an OSS engine is added as the third party to the matrix — and which Crosshire treats as a separate audit, since v3 support there is “Iceberg 1.11.0 release notes” rather than hand-verified.