Crosshire
The finding, what it cost — no method The lab, the 2×2 design, all the traces
Tech·Agentic analytics·Part 1 of 5·9 min read·8 Jun 2026

The $15 lab: what four AI agents found in one Snowflake account.

A notebook I forgot to close burned 43.86 credits on one warehouse in four days — 98.9% idle. An AI agent found it for $1.77 and named the function responsible, a notebook-container heartbeat called SYSTEM$NOTEBOOK_CONTAINER_PARAMS. That finding is the least interesting thing here. The lab around it cost under $15, and what it measured contradicts both loud opinions about agentic analysis: you need a framework and just prompt a frontier model.

$1.77
one agent run that found the idle warehouse
98.9% idle · COMPUTE_WH · 4 days
43.86 credits metered, 0.47 of real query work
COMPUTE_WH · 16–20 May 2026 · 53 metered hours
COMPUTE_WH, 4 days Credits
Metered (billed) 43.86
Attributed to query work 0.47
Idle (unattributed) — 98.9% 43.39
In this report
  1. A notebook nobody closed, found for $1.77
  2. The finding, with receipts
  3. Why the cheapness is the method
  4. The four components
  5. The experimental variable, and the 2×2
Series · Skills are the floor, not the ceiling
  1. The $15 lab — the anchor finding and the 2×2 design
  2. Skills vs model — the directed question and three lawssoon
  3. Union beats everyone — the open-ended sweepsoon
  4. Variance and budget death — traces as refereesoon
  5. The operating model — prompt, tiers, economicssoon
Provenance
43.86credits, 4 days
0.47query work
98.9%idle
$1.77to find it
4agent configs
Account
A seeded Crosshire trial Snowflake account, not a customer
Window
16–20 May 2026, ~4 days of metered data (47,420 queries, 39 warehouses)
Lab
17 ACCOUNT_USAGE CSVs → one local DuckDB → four agents on the Claude Agent SDK
Sources
WAREHOUSE_METERING_HISTORY · QUERY_ATTRIBUTION_HISTORY · QUERY_HISTORY · SERVICES · nine on-disk trace JSONLs
Cost basis
$3.00/credit Snowflake list rate, disclosed as an estimate
Seeded trial account Every number from SQL Method-only public release
01A notebook nobody closed

An agent found it for $1.77. I never did.

A notebook I forgot to close burned 43.86 credits on COMPUTE_WH in four days98.9% idle, 0.47 credits of actual query work. A skilled frontier agent found it, named the function doing the polling (SYSTEM$NOTEBOOK_CONTAINER_PARAMS, a notebook-container heartbeat), and wrote it up for $1.77 in a 14-minute run. I had been staring at the same account for a week and had not seen it.

That discovery is the least interesting thing in this series. The lab around it is. The same investigation ran through four agent configurations — two model tiers, two levels of preparation — with every tool call traced and every claim adjudicated against SQL. The result contradicted both loud opinions in this space and converged on a third one neither camp emphasises: what matters is who carries the causal model — the model, or the playbook.

02The finding, with receipts

A warehouse bills for being awake, not for working.

The invoice shows one number per warehouse. It does not show you how much of that number was a warehouse doing nothing. The table at the top of this report splits COMPUTE_WH over the four-day window: 43.86 credits metered, 0.47 attributed to query work, the rest — 98.9% — idle.

COMPUTE_WH is an X-Small with AUTO_SUSPEND = 60s, and it was the only warehouse still awake when the snapshot was taken. The reason is one object: a notebook container service, DARSHANSINGHCROSSHIRE_SERVICE_1, configured with AUTO_SUSPEND_SECS = 0 — it never suspends. For four days it polled around the clock: 3,982 calls to SYSTEM$NOTEBOOK_CONTAINER_PARAMS and 18,683 file-poll operations (GET_FILES plus LIST_FILES), at roughly 300–400 queries an hour, every hour, overnight included.

That cadence quietly defeats the 60-second auto-suspend. Of the 37,399 inter-query gaps on COMPUTE_WH, only 120 — 0.32% — were longer than 60 seconds. The notebook polled faster than the warehouse could fall asleep, so it billed 53 hours awake for 0.47 credits of real work. That one warehouse is 53.2% of the account’s warehouse credits — about half the account’s compute — spent on a heartbeat. At the $3.00/credit list rate (a disclosed estimate), left running unattended that idle burn compounds; the recurring waste is the subject of Part 5.

Every number in that table is real, and the warehouse was never broken. It did exactly what an always-on container tells it to do. The bill is one number; idle is not a line on it.
COMPUTE_WH · metered vs attributed 98.9% idle · 4 days
COMPUTE_WH · 43.86 CREDITS METERED OVER 4 DAYS 43.39 IDLE 98.9% 0.47 credits attributed to query work A WAREHOUSE THAT NEVER SLEPT
One bar, one warehouse. The pale block is idle metered time; the thin cap at the bottom is the query work the bill was supposedly for. The polling notebook kept the whole bar lit.
03Why the cheapness is the method

If reproducing it costs $15, people reproduce it.

The interesting claim is not that an agent found an idle warehouse. It is that the whole experiment — one account export, four agent configurations, full traces — ran for under $15 on my own Snowflake bill, with no warehouse spun up for any of the agent’s curiosity. The nine runs I logged sum to $8.43; the rest were cheap directed runs I did not keep. That is the entire cost of the lab.

The cheapness is the contribution I actually care about. A finding you cannot reproduce is an anecdote. If reproducing an experiment costs $15 and an afternoon, people reproduce it, argue with it, and improve it. That is the bar a field note should clear: not trust me, but here is the cheap rig, run it yourself.

04The four components

One export, one DuckDB, two agents, every event traced.

One export. Seventeen CSVs from Snowflake’s ACCOUNT_USAGE schema — query and warehouse metering, query attribution, access history, storage, logins, users, grants, resource monitors, account parameters. The snapshot: 47,420 queries, 39 warehouses, about four days of a trial account, for one-time cents. Pointing an agent at the live views instead would spin a warehouse on every curious query; exporting once makes curiosity free.

One local DuckDB. A Python script loads the CSVs, parses the timestamps, and flattens the access-history JSON into a lineage table. From there every agent question costs zero Snowflake credits, forever. One hygiene step is worth stealing: the export runs query texts through AST-based literal redaction (with sqlglot) so the value in every literal is replaced with a placeholder. Metadata about a query is not the same as the data inside it, and the export should enforce that line.

Two agents on the Claude Agent SDK. The SDK is the engine inside Claude Code exposed as a library: it gives you the full agentic loop — tool execution, context management, file access — and you supply a system prompt and options. The entire “framework” here is two roughly 80-line Python files. The framework question settles in one paragraph: for agent investigates database, the loop you would build with an orchestration framework is the loop the SDK already gives you, minus a dependency tree. Deterministic glue stays ordinary code; the agentic behaviour comes from the vendor harness; nothing in between needs a framework.

Full tracing. A thin wrapper around the SDK’s message stream writes every event to a JSONL file — the exact prompt, the system prompt, every tool call with the full SQL, every result, every error, the model’s thinking, the final cost — one object per line, flushed immediately so an interrupted run loses nothing. The trace is for epistemics, not debugging. Every dispute in this series — did the agent cheat, did a prompt edit take effect, did it ever read the attribution column — was settled by grepping a trace. Agent claims without traces are vibes.

05The experimental variable, and the 2×2

Two levels of preparation, two model tiers, one grid.

The two agents differ in exactly one dimension. Skilled gets the database schema in its system prompt — table descriptions, join keys, unit conventions, known gotchas — plus five SKILL.md playbooks that encode how I would investigate: cost-and-idle analysis, impact analysis, query antipatterns, credit-spike triage, and an open-ended anomaly-discovery checklist. Naive gets one sentence — here is a Snowflake account export to analyze — no schema, no playbooks, and (after a contamination incident that is the subject of Part 4) an isolated temp directory so it cannot read the skills off disk.

Cross that one variable with two model tiers — Opus (the frontier model) and Sonnet (the cheap one) — and you get a 2×2: four configurations, same database, same question, same turn budget, full traces. The cheap tier here is Sonnet, not the smallest model; the point was to compare a frontier model against a genuinely capable cheaper one, not against a toy.

The 2×2 · one variable, two tiers
  Opus · frontier Sonnet · cheap
Skilled · schema + 5 playbooks The expected winner Does the playbook rescue the cheap model?
Naive · one sentence Can the frontier model compensate for no knowledge? The expected loser

The hypothesis I would have bet on: skilled-Opus wins everything, naive-Sonnet loses everything, and the diagonals — does the playbook rescue the cheap model, does the frontier model compensate for missing knowledge — are the only interesting cells. That hypothesis was half right. The wrong half is where every useful lesson in this series came from, and it starts with a cheap, naive agent that produced a confident, fluent, completely wrong answer.

Two rules the lab does not relax
The model never produces a number. Every credit, every percent in this note comes from a SQL query against the DuckDB. The model reads numbers and writes prose around them; it does not compute the figures it reports. When the agent says “98.9% idle,” that percentage was in its input, returned by a query the trace records.
A human ratifies everything. No finding ships because an agent asserted it. Each one is checked against the row that produced it before it appears here. The agent is a fast, tireless reader of the account; the judgement of what is true stays with a person.

That is the Crosshire audit method in a petri dish: a cheap, reproducible rig, every claim traceable to a query, and a person standing between the model’s prose and the published number. The published Snowflake login-count note asserts that discipline; this series is the experiment behind it. The agent also flagged the account’s zero-MFA posture and its adaptive-versus-classic warehouse billing — both covered in depth in those notes, so this series sends you there rather than re-treading them.

Next in the series · Part 2
  • The 2×2 on a directed question. Skilled and naive, Opus and Sonnet, all asked “why is COMPUTE_WH expensive?” — with the traces side by side.
  • The three laws that fell out of it. What the playbook buys, what the frontier model buys, and where the two stop substituting for each other.
  • The cell that was confident and wrong. The cheap, naive configuration that wrote a fluent answer with the wrong number — and why it is the most expensive cell to trust.
From our audit
This lab is the public twin of how a Crosshire audit runs: one export, every finding sourced to the row that produced it, a human signing off each number before it lands. If a warehouse on your account is billing for being awake rather than for working, the same method finds it — with the query attached so you can check the math yourself.
Start a conversation →
Sources & further reading
· · ·

Numbers in this note come from a seeded Crosshire trial Snowflake account — not a customer — snapshot 16–20 May 2026, about four days of metered data, re-derived against the same DuckDB the agents queried. Credit-to-dollar figures use the $3.00/credit list rate as a disclosed estimate. The model read every number; a human ratified every one before it shipped. — Crosshire

D
writes Crosshire Journal · crosshire.ch · June 2026
Crosshire Journal
Field reports on data, compute, and the unglamorous decisions that shape engineering teams. Made in EU. Cited evidence, GDPR-native.
Tech·Agentic analytics·Part 1 of 5·2 min read·8 Jun 2026

A notebook nobody closed. Found for $1.77.

An AI agent read one Snowflake account export and found a notebook left running — 98.9% of one warehouse’s credits, burned idle — for $1.77. Here is the finding and the query that proves it.

$1.77
one agent run that found the idle warehouse
98.9% idle · COMPUTE_WH · 4 days
43.86 credits metered, 0.47 of real query work
COMPUTE_WH · 43.86 CREDITS, 4 DAYS 43.39 IDLE 98.9% 0.47 credits of real query work
One warehouse, four days. The pale block is idle metered time; the thin cap is the query work the bill was for.
Provenance · what happened

1Exported one Snowflake account to CSV — 17 ACCOUNT_USAGE files, 47,420 queries, 39 warehouses, ~4 days — for one-time cents.

2Loaded the CSVs into a local DuckDB and pointed four AI agents at it. Every agent question cost zero Snowflake credits; the whole lab ran under $15.

3The skilled frontier agent found the idle notebook for $1.77 and named the function keeping it awake: SYSTEM$NOTEBOOK_CONTAINER_PARAMS.

01The problem

A notebook left running, 98.9% idle.

COMPUTE_WH metered 43.86 credits over four days. Only 0.47 were attributed to query work; the other 98.9% — 43.39 credits — was idle, about half the account’s warehouse credits. The cause was a notebook container set to never auto-suspend, polling around the clock so the 60-second auto-suspend almost never fired.

02Why it was missed

The bill is one number. Idle isn’t a line on it.

The invoice shows total credits per warehouse. It does not split them into working and awake but doing nothing. A warehouse bills for being awake, not for working. The split only appears when you put metered credits next to attributed credits — which the query below does for every warehouse you have.

03Find your own

Metered vs attributed, one query.

Run this on your own account. Any warehouse with a high pct_idle is billing for being awake; a service with AUTO_SUSPEND set too low, or to zero, is the usual cause.

Idle vs working, per warehouseACCOUNT_USAGEsql
WITH metered AS (
  SELECT WAREHOUSE_NAME, SUM(CREDITS_USED) AS credits
  FROM SNOWFLAKE.ACCOUNT_USAGE.WAREHOUSE_METERING_HISTORY
  WHERE START_TIME >= DATEADD('day',-30,CURRENT_TIMESTAMP()) GROUP BY 1),
attributed AS (
  SELECT WAREHOUSE_NAME, SUM(CREDITS_ATTRIBUTED_COMPUTE) AS credits
  FROM SNOWFLAKE.ACCOUNT_USAGE.QUERY_ATTRIBUTION_HISTORY
  WHERE START_TIME >= DATEADD('day',-30,CURRENT_TIMESTAMP()) GROUP BY 1)
SELECT m.WAREHOUSE_NAME,
       ROUND(m.credits,2) AS metered_credits,
       ROUND(COALESCE(a.credits,0),2) AS query_credits,
       ROUND(100*(1 - COALESCE(a.credits,0)/NULLIF(m.credits,0)),1) AS pct_idle
FROM metered m LEFT JOIN attributed a USING (WAREHOUSE_NAME)
ORDER BY metered_credits DESC;
Want the receipts?
The long version adds three things this short can’t.
  • The 2×2 grid and its three laws. Skilled and naive, Opus and Sonnet, on the same account — what the playbook versus the frontier model each buys.
  • The cell that was confident and wrong. A cheap, naive run that wrote a fluent answer with the wrong number.
  • The operating model. The $15 rig, full tracing, and the two rules that turn it into a process.
D
writes Crosshire Journal · crosshire.ch · June 2026
Two-minute field fixes from the same audits as our long-form Journal. One number, one fix, one result you can verify.
Crosshire Quick
© 2026 Crosshire Journal · Made in EU Privacy Terms Cookies License Imprint Coffee