Caching System

GradientHarbor uses a three-tier caching architecture to ensure fast data retrieval without unnecessary recomputation or repeated database queries. Each layer serves a different purpose and operates at a different level of the stack.

Architecture Overview

┌──────────────────────────────────────────────────┐
│                   User Request                    │
└──────────────────────┬───────────────────────────┘
                       │
                       ▼
┌──────────────────────────────────────────────────┐
│  Layer 1: Browser Cache (IndexedDB)              │
│  ─────────────────────────────────────           │
│  First-page results cached locally               │
│  TTL: 30 days │ Max: 500 entries │ LRU eviction  │
│  Hit? → Return instantly, no network request     │
└──────────────────────┬───────────────────────────┘
                       │ miss
                       ▼
┌──────────────────────────────────────────────────┐
│  Layer 2: S3 Result Cache (Backend)              │
│  ─────────────────────────────────────           │
│  Full query results stored as paged JSON         │
│  Immutable per execution │ 60-day lifecycle      │
│  Hit? → Fetch only needed pages from S3          │
└──────────────────────┬───────────────────────────┘
                       │ miss (new query)
                       ▼
┌──────────────────────────────────────────────────┐
│  Layer 3: Data Warehouse Cache (External)        │
│  ─────────────────────────────────────           │
│  Implicit caching at the warehouse level         │
│  Snowflake, BigQuery, PostgreSQL query cache     │
│  Hit? → DWH serves from its internal cache       │
└──────────────────────┬───────────────────────────┘
                       │ miss
                       ▼
                  Full Computation
              (warehouse scans data)

Layer 1: Browser Cache (IndexedDB)

The frontend caches the first page of query results in the browser's IndexedDB storage.

How it works:

When a query executes and returns results, the first page (up to 1,000 rows) is stored in IndexedDB
Subsequent views of the same query results are served instantly from the local cache
No network request is made if the cache entry is valid

Configuration:

Setting	Value
TTL	30 days
Max entries	500
Eviction	LRU (Least Recently Used) — prunes to 400 when limit hit
Scope	Per-browser, per-execution ID

What triggers cache reads:

Opening a dashboard with previously-loaded query cells
Revisiting a chat conversation with inline query results
Any query result view where offset = 0 (first page)

What bypasses the cache:

Pagination (offset > 0) — goes directly to S3
Re-executing a query (creates a new execution ID)
Switching organizations (cache is cleared)

Layer 2: S3 Result Cache (Backend)

After a query executes, the full result set is stored in Amazon S3 as paged JSON files.

Storage structure:

s3://{bucket}/query_execution_results/{org_id}/{execution_id}/
├── manifest.json          # Metadata: columns, row count, page count
└── pages/
    ├── 0.json             # Rows 0-999
    ├── 1.json             # Rows 1000-1999
    └── ...                # Up to 10,000 rows total

How it works:

Query results are uploaded immediately after execution completes
A manifest.json file stores metadata (columns, total rows, page size, execution time)
Results are split into pages of 1,000 rows each
When a client requests results, only the needed pages are fetched from S3

Key properties:

Property	Value
Page size	1,000 rows
Max rows	10,000 per query
Retention	60 days (S3 lifecycle policy)
Storage class	Transitions to Intelligent-Tiering after 7 days
Mutability	Immutable — each execution produces a new entry

Why paged storage matters:

Dashboard pagination only fetches the pages needed for the current view
Initial page load is fast — only page 0 is fetched
Large result sets don't block the UI — pages load on demand

Layer 3: Data Warehouse Cache (External)

Most data warehouses maintain their own internal query result cache. GradientHarbor benefits from this implicitly.

Snowflake

Snowflake caches query results for 24 hours. If the same query is re-executed and the underlying data hasn't changed, Snowflake returns the cached result without consuming warehouse credits.

Benefits:

No compute cost for repeated queries
Sub-second response times for cached results
Automatic invalidation when data changes

BigQuery

BigQuery caches query results in temporary tables for approximately 24 hours. Cached results are free of charge — no bytes are scanned.

Benefits:

Zero cost for repeated identical queries
Automatic for all queries (no configuration needed)

PostgreSQL

PostgreSQL uses its buffer cache (shared_buffers) and OS page cache to keep frequently accessed data in memory. While not a query result cache, it significantly speeds up repeated queries against the same tables.

How the Layers Work Together

Consider a user opening a dashboard with a revenue chart:

First visit: All three layers miss
- DWH executes the query (Layer 3 — may use buffer cache)
- Results are stored in S3 (Layer 2)
- First page is cached in IndexedDB (Layer 1)
Second visit (same browser): Layer 1 hit
- IndexedDB returns the cached first page instantly
- No network request needed
- Dashboard appears immediately
Visit from another browser: Layer 1 miss, Layer 2 hit
- IndexedDB has no cache for this execution
- S3 serves the stored results (only page 0 fetched)
- Results are cached in this browser's IndexedDB
Re-executing the query: New execution ID, all layers miss for new results
- A new execution creates a new S3 entry
- DWH may serve from its own cache (Layer 3) if data hasn't changed
- New results are cached in S3 and IndexedDB

Cache Invalidation

Layer	Invalidation Mechanism
IndexedDB	TTL (30 days), LRU eviction, cleared on org switch/logout
S3	Immutable — each execution is a new entry. Old entries auto-expire after 60 days
DWH	Automatic — warehouse invalidates when underlying data changes

TIP

There is no manual cache purge button. To get fresh data, simply re-execute the query — this creates a new execution with a new cache entry at all layers.

Caching System ​

Architecture Overview ​

Layer 1: Browser Cache (IndexedDB) ​

Layer 2: S3 Result Cache (Backend) ​

Layer 3: Data Warehouse Cache (External) ​

Snowflake ​

BigQuery ​

PostgreSQL ​

How the Layers Work Together ​

Cache Invalidation ​