Skip to main content

Production validation results

Everything on this page was measured against a live Snowflake account (Snowflake server 10.20) using the official snowflake-connector-python — not an emulator, not a mock. The harness is in the repo (scripts/live-pilot.sh) so you can reproduce every number against your own account.

Functional matrix (all green)

ValidatedDetail
TLS terminationReal driver verifying chukei's certificate chain end-to-end; OCSP soft-fails open (JDBC needs ocspFailOpen=true)
Auth: passwordLogin + full session lifecycle through the proxy
Auth: key-pair (JWT)RSA key registered via ALTER USER through chukei, then authenticated with private_key
Auth: programmatic access tokenPAT minted through the proxy, authenticated via token= + authenticator=PROGRAMMATIC_ACCESS_TOKEN
Statement shapesSHOW / DESCRIBE / USE / ALTER SESSION, and PUT + LIST + GET file transfers via a stage (presigned-URL responses pass through untouched)
Fail-open on unparseable SQLA MATCH_RECOGNIZE query chukei's parser cannot handle executed correctly via verbatim passthrough
Cache correctnessHit on repeated deterministic read; no reuse across different literals; RANDOM() never cached; INSERT invalidates the cached COUNT
Chunked results200,000-row result passed through intact (chunked downloads are never cached by design)
Concurrency12 parallel sessions × 8 mixed queries: no errors, no lost writes
Credential hygieneTrace-level audit: credentials appear nowhere in logs, cache, or ledger

13.5-hour soak

Continuous mixed workload (4 workers: repeated dashboard reads, randomized literals, non-deterministic queries, writes with readback, 50k-row results):

MetricResult
Queries processed~120,000 (60,001 cache hits / 59,895 passthrough)
Cache blame mismatches0 — every sampled cache hit re-executed against live Snowflake matched (25% sampling)
Client errors0 across all workers, all night
Memory12 MB → 37 MB during initial cache fill, then flat for 11 hours
Panics / crashes0
Session-token rotationSessions ran straight through Snowflake's ~4-hour rotations without re-login or error

Measured failure behaviour (we drill this, not assume it)

  • A chukei restart faster than the driver's retry budget (~10 s for the Python connector) is invisible to running clients.
  • A sustained 30 s outage produced its first client error at t+15 s — each query burns its retry budget before failing. Clients do not silently fall back to Snowflake.
  • After restart, existing sessions resumed without re-login (39 consecutive recovered queries on the same session) — chukei holds no session state.
  • Kill switch verified live: CHUKEI_PLUGINS_CACHE_ENABLED=false bypasses the cache instantly while queries keep flowing.

Enforce-mode suspend (live)

Validated end-to-end against a real warehouse, observed directly on Snowflake (not through the proxy):

  • chukei's sweeper executed ALTER WAREHOUSE SUSPEND via its OPERATE-scoped service account ~10 seconds after the model's min_observations-th arrival — attributed to the service user in QUERY_HISTORY with status SUCCESS.
  • The suspend was real: the next query's QUEUED_PROVISIONING_TIME (93 ms) shows it had to resume the warehouse chukei had put to sleep.
  • Long-running and async queries validated through the proxy: a 50-second query via the driver's result-polling flow, and the explicit execute_async / status-poll / get_results_from_sfqid API.
  • JDBC validated over TLS with ocspFailOpen=true: login, cached repeat reads, 100k-row chunked results.

The validation loop also caught and fixed three shipped bugs — a TLS panic in doctor, metadata statements and cache hits resetting the suspend idle model, and the session warehouse never being populated from live login traffic. Each now has a permanent regression test.

Signed evidence

The soak's savings ledger was exported as an Ed25519-signed evidence bundle and verified:

VERIFIED chukei-savings--24h--20260612T052028Z
kind: savings-ledger tool: chukei 0.2.0
corpus: savings.db (60,997 rows, sha256-pinned)

Every chukei deployment can produce the same artifact with chukei savings --evidence — the methodology (canonical wall-clock × credit rate × 0.7 conservative factor) is embedded in the bundle.

Proxy overhead (local benchmark)

Measured on commodity hardware against a mock upstream (so it isolates chukei's own cost, not network/Snowflake latency): passthrough 56k qps at p99 4 ms (+2 ms over baseline), cache hits ~97k qps, 17.7 MB binary. The enforced budget is +5 ms p99, alerted via chukei_proxy_overhead_seconds.

Reproduce it

scripts/live-pilot.sh --tls --stages "core shapes concurrency" runs the functional matrix against your account; scripts/soak.sh runs the soak; scripts/soak-report.sh is the morning gate.