Caching System
GradientHarbor uses a three-tier caching architecture to ensure fast data retrieval without unnecessary recomputation or repeated database queries. Each layer serves a different purpose and operates at a different level of the stack.
Architecture Overview
┌──────────────────────────────────────────────────┐
│ User Request │
└──────────────────────┬───────────────────────────┘
│
▼
┌──────────────────────────────────────────────────┐
│ Layer 1: Browser Cache (IndexedDB) │
│ ───────────────────────────────────── │
│ First-page results cached locally │
│ TTL: 30 days │ Max: 500 entries │ LRU eviction │
│ Hit? → Return instantly, no network request │
└──────────────────────┬───────────────────────────┘
│ miss
▼
┌──────────────────────────────────────────────────┐
│ Layer 2: S3 Result Cache (Backend) │
│ ───────────────────────────────────── │
│ Full query results stored as paged JSON │
│ Immutable per execution │ 60-day lifecycle │
│ Hit? → Fetch only needed pages from S3 │
└──────────────────────┬───────────────────────────┘
│ miss (new query)
▼
┌──────────────────────────────────────────────────┐
│ Layer 3: Data Warehouse Cache (External) │
│ ───────────────────────────────────── │
│ Implicit caching at the warehouse level │
│ Snowflake, BigQuery, PostgreSQL query cache │
│ Hit? → DWH serves from its internal cache │
└──────────────────────┬───────────────────────────┘
│ miss
▼
Full Computation
(warehouse scans data)Layer 1: Browser Cache (IndexedDB)
The frontend caches the first page of query results in the browser's IndexedDB storage.
How it works:
- When a query executes and returns results, the first page (up to 1,000 rows) is stored in IndexedDB
- Subsequent views of the same query results are served instantly from the local cache
- No network request is made if the cache entry is valid
Configuration:
| Setting | Value |
|---|---|
| TTL | 30 days |
| Max entries | 500 |
| Eviction | LRU (Least Recently Used) — prunes to 400 when limit hit |
| Scope | Per-browser, per-execution ID |
What triggers cache reads:
- Opening a dashboard with previously-loaded query cells
- Revisiting a chat conversation with inline query results
- Any query result view where
offset = 0(first page)
What bypasses the cache:
- Pagination (offset > 0) — goes directly to S3
- Re-executing a query (creates a new execution ID)
- Switching organizations (cache is cleared)
Layer 2: S3 Result Cache (Backend)
After a query executes, the full result set is stored in Amazon S3 as paged JSON files.
Storage structure:
s3://{bucket}/query_execution_results/{org_id}/{execution_id}/
├── manifest.json # Metadata: columns, row count, page count
└── pages/
├── 0.json # Rows 0-999
├── 1.json # Rows 1000-1999
└── ... # Up to 10,000 rows totalHow it works:
- Query results are uploaded immediately after execution completes
- A
manifest.jsonfile stores metadata (columns, total rows, page size, execution time) - Results are split into pages of 1,000 rows each
- When a client requests results, only the needed pages are fetched from S3
Key properties:
| Property | Value |
|---|---|
| Page size | 1,000 rows |
| Max rows | 10,000 per query |
| Retention | 60 days (S3 lifecycle policy) |
| Storage class | Transitions to Intelligent-Tiering after 7 days |
| Mutability | Immutable — each execution produces a new entry |
Why paged storage matters:
- Dashboard pagination only fetches the pages needed for the current view
- Initial page load is fast — only page 0 is fetched
- Large result sets don't block the UI — pages load on demand
Layer 3: Data Warehouse Cache (External)
Most data warehouses maintain their own internal query result cache. GradientHarbor benefits from this implicitly.
Snowflake
Snowflake caches query results for 24 hours. If the same query is re-executed and the underlying data hasn't changed, Snowflake returns the cached result without consuming warehouse credits.
Benefits:
- No compute cost for repeated queries
- Sub-second response times for cached results
- Automatic invalidation when data changes
BigQuery
BigQuery caches query results in temporary tables for approximately 24 hours. Cached results are free of charge — no bytes are scanned.
Benefits:
- Zero cost for repeated identical queries
- Automatic for all queries (no configuration needed)
PostgreSQL
PostgreSQL uses its buffer cache (shared_buffers) and OS page cache to keep frequently accessed data in memory. While not a query result cache, it significantly speeds up repeated queries against the same tables.
How the Layers Work Together
Consider a user opening a dashboard with a revenue chart:
First visit: All three layers miss
- DWH executes the query (Layer 3 — may use buffer cache)
- Results are stored in S3 (Layer 2)
- First page is cached in IndexedDB (Layer 1)
Second visit (same browser): Layer 1 hit
- IndexedDB returns the cached first page instantly
- No network request needed
- Dashboard appears immediately
Visit from another browser: Layer 1 miss, Layer 2 hit
- IndexedDB has no cache for this execution
- S3 serves the stored results (only page 0 fetched)
- Results are cached in this browser's IndexedDB
Re-executing the query: New execution ID, all layers miss for new results
- A new execution creates a new S3 entry
- DWH may serve from its own cache (Layer 3) if data hasn't changed
- New results are cached in S3 and IndexedDB
Cache Invalidation
| Layer | Invalidation Mechanism |
|---|---|
| IndexedDB | TTL (30 days), LRU eviction, cleared on org switch/logout |
| S3 | Immutable — each execution is a new entry. Old entries auto-expire after 60 days |
| DWH | Automatic — warehouse invalidates when underlying data changes |
TIP
There is no manual cache purge button. To get fresh data, simply re-execute the query — this creates a new execution with a new cache entry at all layers.