Query fingerprinting
Caching a query result safely requires answering two questions: is this the same query I've seen before? and is it safe to cache at all? chukei answers the first with two fingerprints and the second with a strict determinism gate. This is the machinery behind deterministic query caching — the property that lets chukei's cache reuse results across users and tools where Snowflake's native result cache cannot.
Hard and soft fingerprints
raw SQL
│
v
parse (sqlparser, Snowflake dialect)
│
v
normalise whitespace / case / alias / literal folding
│
v
hard fingerprint = blake3(normalised AST)
│
v
exact match in cache? -- yes --> CACHE HIT
│ no
v
soft fingerprint = feature vector
│ (tables / predicates / aggregations / joins)
v
semantically-equal match? -- yes --> CACHE HIT
│ no
v
CACHE MISS -> forward to Snowflake
- The hard fingerprint is a blake3 hash of the normalised AST. Two queries
that differ only in whitespace, casing, alias names, or literal formatting
produce the same hard fingerprint — so
select * from tandSELECT * FROM Tcollide intentionally. - The soft fingerprint is a feature vector (tables touched, predicate shape, aggregations, joins) used to recognise semantically equivalent queries that normalisation alone can't unify, and to cluster related queries for attribution and routing decisions.
Normalisation is what makes cross-tool reuse possible: a Tableau extract and a dbt model that issue the same logical query share a cache entry even though the generated SQL text differs.
The determinism gate
Before anything is cacheable, the gate must pass. It vetoes (see the plugin bus) any query that is not provably deterministic:
| Vetoed | Why |
|---|---|
RANDOM(), UUID_STRING() | non-deterministic by definition |
CURRENT_TIMESTAMP(), CURRENT_DATE, LOCALTIME | time-varying |
writes (INSERT/UPDATE/MERGE/DDL) | mutate state; also trigger invalidation |
| sequences, session-dependent functions | per-session results |
| anything the parser cannot fully understand | unknown → unsafe |
The rule is when in doubt, miss. A false negative costs one normal query; a false positive serves a wrong answer — so the gate is deliberately conservative. The full rule set lives in the determinism gate reference.
Invalidation
A write to a table invalidates every cached entry whose hard fingerprint touched that table (tracked via the soft fingerprint's table set). Chunked results are never cached. The whole scheme is verified continuously in production by blame mode — see is a proxy safe? for the measured 60,000-hits / 0-mismatch soak record.
Related
- Snowflake query caching guide — the concept, for cost readers
- The plugin bus — how the determinism veto wins
- Fail-open design