Skip to main content

Production pilot guide

chukei is a transparent proxy: your drivers keep their credentials and SQL and just point at a different host. Rollback is repointing them back — chukei holds no state your queries depend on.

1. Start from the conservative profile

Use config/customer-pilot.yaml: cache with strict determinism gate and 5% blame sampling, no result data persisted at rest, suspend in suggest-only, router off. Three values to change: TLS cert/key, account locator, your $/credit rate.

2. TLS

Clients must trust the certificate for the hostname they'll use (e.g. chukei.internal.example.com) — an internal-CA cert is the normal choice. JDBC enforces OCSP against the proxy cert; add ocspFailOpen=true to the JDBC URL for the pilot. The Python connector and snowsql soft-fail by default (validated against Snowflake 10.20).

chukei doctor --config chukei.yaml
# ✓ config ✓ tls ✓ upstream ✓ listen ✓ savings ✓ cache

3. Cut over a subset first

Pilot with one team or workload via explicit host override, not an account-wide DNS change. See Examples for per-driver snippets.

4. Watch

:9090/metrics, Prometheus format:

SignalThresholdAction
chukei_cache_blame_mismatches_total> 0Page. Set CHUKEI_PLUGINS_CACHE_ENABLED=false, restart, file a bug with the fingerprint.
chukei_circuit_breaker_fast_fails_totalclimbingSnowflake-side trouble; check status.
p99 chukei_proxy_overhead_seconds> 0.005File a bug with /metrics output.
/healthznon-200Restart policy handles it; if flapping, roll back.

5. Rollback (rehearse once)

Point the driver host back at <account>.snowflakecomputing.com. Per-plugin kill switches: CHUKEI_PLUGINS_CACHE_ENABLED=false, CHUKEI_PLUGINS_REWRITE_ENABLED=false, CHUKEI_PLUGINS_SUSPEND_ENABLED=false.

Measured restart behaviour: a restart faster than the driver's retry budget (~10s for the Python connector) is invisible to running clients; longer outages surface as connection errors only after each query burns its retry budget. Existing sessions resume after restart without re-login.

Known pilot limitations

  • Large chunked results are never cached (they pass through unmodified — validated to 200k rows).
  • Suspend stays suggest-only in week one; enforce mode needs a CHUKEI_SUSPENDER service account.
  • One instance per deployment (per-instance cache).
  • The savings ledger deliberately under-claims (×0.7) and says so in every report.