Skip to content

Caching System

GradientHarbor uses a three-tier caching architecture to ensure fast data retrieval without unnecessary recomputation or repeated database queries. Each layer serves a different purpose and operates at a different level of the stack.

Architecture Overview

┌──────────────────────────────────────────────────┐
│                   User Request                    │
└──────────────────────┬───────────────────────────┘


┌──────────────────────────────────────────────────┐
│  Layer 1: Browser Cache (IndexedDB)              │
│  ─────────────────────────────────────           │
│  First-page results cached locally               │
│  TTL: 30 days │ Max: 500 entries │ LRU eviction  │
│  Hit? → Return instantly, no network request     │
└──────────────────────┬───────────────────────────┘
                       │ miss

┌──────────────────────────────────────────────────┐
│  Layer 2: S3 Result Cache (Backend)              │
│  ─────────────────────────────────────           │
│  Full query results stored as paged JSON         │
│  Immutable per execution │ 60-day lifecycle      │
│  Hit? → Fetch only needed pages from S3          │
└──────────────────────┬───────────────────────────┘
                       │ miss (new query)

┌──────────────────────────────────────────────────┐
│  Layer 3: Data Warehouse Cache (External)        │
│  ─────────────────────────────────────           │
│  Implicit caching at the warehouse level         │
│  Snowflake, BigQuery, PostgreSQL query cache     │
│  Hit? → DWH serves from its internal cache       │
└──────────────────────┬───────────────────────────┘
                       │ miss

                  Full Computation
              (warehouse scans data)

Layer 1: Browser Cache (IndexedDB)

The frontend caches the first page of query results in the browser's IndexedDB storage.

How it works:

  • When a query executes and returns results, the first page (up to 1,000 rows) is stored in IndexedDB
  • Subsequent views of the same query results are served instantly from the local cache
  • No network request is made if the cache entry is valid

Configuration:

SettingValue
TTL30 days
Max entries500
EvictionLRU (Least Recently Used) — prunes to 400 when limit hit
ScopePer-browser, per-execution ID

What triggers cache reads:

  • Opening a dashboard with previously-loaded query cells
  • Revisiting a chat conversation with inline query results
  • Any query result view where offset = 0 (first page)

What bypasses the cache:

  • Pagination (offset > 0) — goes directly to S3
  • Re-executing a query (creates a new execution ID)
  • Switching organizations (cache is cleared)

Layer 2: S3 Result Cache (Backend)

After a query executes, the full result set is stored in Amazon S3 as paged JSON files.

Storage structure:

s3://{bucket}/query_execution_results/{org_id}/{execution_id}/
├── manifest.json          # Metadata: columns, row count, page count
└── pages/
    ├── 0.json             # Rows 0-999
    ├── 1.json             # Rows 1000-1999
    └── ...                # Up to 10,000 rows total

How it works:

  • Query results are uploaded immediately after execution completes
  • A manifest.json file stores metadata (columns, total rows, page size, execution time)
  • Results are split into pages of 1,000 rows each
  • When a client requests results, only the needed pages are fetched from S3

Key properties:

PropertyValue
Page size1,000 rows
Max rows10,000 per query
Retention60 days (S3 lifecycle policy)
Storage classTransitions to Intelligent-Tiering after 7 days
MutabilityImmutable — each execution produces a new entry

Why paged storage matters:

  • Dashboard pagination only fetches the pages needed for the current view
  • Initial page load is fast — only page 0 is fetched
  • Large result sets don't block the UI — pages load on demand

Layer 3: Data Warehouse Cache (External)

Most data warehouses maintain their own internal query result cache. GradientHarbor benefits from this implicitly.

Snowflake

Snowflake caches query results for 24 hours. If the same query is re-executed and the underlying data hasn't changed, Snowflake returns the cached result without consuming warehouse credits.

Benefits:

  • No compute cost for repeated queries
  • Sub-second response times for cached results
  • Automatic invalidation when data changes

BigQuery

BigQuery caches query results in temporary tables for approximately 24 hours. Cached results are free of charge — no bytes are scanned.

Benefits:

  • Zero cost for repeated identical queries
  • Automatic for all queries (no configuration needed)

PostgreSQL

PostgreSQL uses its buffer cache (shared_buffers) and OS page cache to keep frequently accessed data in memory. While not a query result cache, it significantly speeds up repeated queries against the same tables.

How the Layers Work Together

Consider a user opening a dashboard with a revenue chart:

  1. First visit: All three layers miss

    • DWH executes the query (Layer 3 — may use buffer cache)
    • Results are stored in S3 (Layer 2)
    • First page is cached in IndexedDB (Layer 1)
  2. Second visit (same browser): Layer 1 hit

    • IndexedDB returns the cached first page instantly
    • No network request needed
    • Dashboard appears immediately
  3. Visit from another browser: Layer 1 miss, Layer 2 hit

    • IndexedDB has no cache for this execution
    • S3 serves the stored results (only page 0 fetched)
    • Results are cached in this browser's IndexedDB
  4. Re-executing the query: New execution ID, all layers miss for new results

    • A new execution creates a new S3 entry
    • DWH may serve from its own cache (Layer 3) if data hasn't changed
    • New results are cached in S3 and IndexedDB

Cache Invalidation

LayerInvalidation Mechanism
IndexedDBTTL (30 days), LRU eviction, cleared on org switch/logout
S3Immutable — each execution is a new entry. Old entries auto-expire after 60 days
DWHAutomatic — warehouse invalidates when underlying data changes

TIP

There is no manual cache purge button. To get fresh data, simply re-execute the query — this creates a new execution with a new cache entry at all layers.