Query Execution

GradientHarbor executes SQL queries directly against your database — it does not download or replicate your tables. This means your data stays where it is, queries leverage your warehouse's compute engine, and you get real-time results.

How It Works

┌──────────────┐     ┌──────────────────┐     ┌──────────────────┐
│  User Query  │     │  GradientHarbor  │     │  Your Database   │
│  (SQL)       │────▶│  (pass-through)  │────▶│  (executes SQL)  │
│              │     │                  │◀────│  (returns rows)  │
│              │◀────│  (format + cache)│     │                  │
└──────────────┘     └──────────────────┘     └──────────────────┘

When you or the AI agent runs a query:

GradientHarbor connects to your database using stored credentials
The SQL is sent as-is to your database's query engine
Your database executes the query using its own optimizer, indexes, and compute
Results are returned to GradientHarbor (up to 10,000 rows)
Results are cached in S3 for fast subsequent access

INFO

GradientHarbor never downloads full tables, creates replicas, or moves data out of your warehouse. All computation happens on your database.

Execution Properties

Property	Value
Max rows	10,000 per query
Timeout	2 minutes
Access mode	Read-only (`default_transaction_read_only=on`)
Result format	Paged JSON (1,000 rows per page)
Cancellation	Supported (PostgreSQL via `pg_cancel_backend`, BigQuery via job cancellation)

Storage Recommendations

Since GradientHarbor queries your database directly, query performance depends entirely on your database's capabilities. Choosing the right storage engine makes a significant difference.

Recommended: Columnar / Analytics Storage

For best performance with analytical queries (aggregations, filters on large tables, GROUP BY), use a columnar storage engine:

┌────────────────────────────────────────────────────┐
│              Recommended Engines                    │
├───────────────────┬────────────────────────────────┤
│ Snowflake         │ Cloud data warehouse, auto-    │
│                   │ scaling, columnar storage       │
├───────────────────┼────────────────────────────────┤
│ BigQuery          │ Serverless, pay-per-query,     │
│                   │ columnar storage                │
├───────────────────┼────────────────────────────────┤
│ ClickHouse        │ Open-source columnar OLAP,     │
│                   │ extremely fast aggregations     │
├───────────────────┼────────────────────────────────┤
│ Redshift          │ AWS columnar data warehouse,   │
│                   │ MPP architecture                │
├───────────────────┼────────────────────────────────┤
│ Databricks SQL    │ Lakehouse architecture,        │
│                   │ Delta Lake + Photon engine      │
└───────────────────┴────────────────────────────────┘

Why columnar storage matters:

Analytical queries typically read a few columns across millions of rows
Columnar engines only scan the columns referenced in your query
Aggregations (SUM, COUNT, AVG) are orders of magnitude faster
Compression is much more effective on columnar data

Supported: Transactional Databases

GradientHarbor also works with traditional row-based (OLTP) databases:

┌────────────────────────────────────────────────────┐
│              Supported Engines                      │
├───────────────────┬────────────────────────────────┤
│ PostgreSQL        │ Excellent for small-medium     │
│                   │ datasets with proper indexing   │
├───────────────────┼────────────────────────────────┤
│ MySQL             │ Common OLTP database, works    │
│                   │ well for indexed queries        │
├───────────────────┼────────────────────────────────┤
│ MongoDB (SQL)     │ Document DB via SQL interface,  │
│                   │ good for semi-structured data   │
└───────────────────┴────────────────────────────────┘

WARNING

Transactional databases are optimized for row-level operations (INSERT, UPDATE, DELETE), not full-table scans. For large datasets (10M+ rows), analytical queries may be slow without proper indexes. Consider using a columnar warehouse for heavy analytics workloads.

Optimization Tips

Add indexes on columns used in WHERE and GROUP BY clauses
Materialize views for frequently-run aggregation queries
Partition large tables by date or another common filter dimension
Use your warehouse's caching — Snowflake and BigQuery cache repeated query results automatically
Set billing caps — Configure maximumBytesBilled on BigQuery to prevent runaway costs

Result Caching

Query results are cached at multiple levels to avoid recomputation. See Caching System for details on the three-tier caching architecture (warehouse → S3 → browser).

Column Type Detection

GradientHarbor normalizes database-specific column types into four universal types:

Universal Type	Maps From
string	VARCHAR, TEXT, CHAR, UUID, JSON, etc.
number	INTEGER, BIGINT, FLOAT, DECIMAL, NUMERIC, etc.
date	DATE, TIMESTAMP, TIMESTAMPTZ, DATETIME, etc.
boolean	BOOLEAN, BOOL

This normalization ensures consistent formatting and chart rendering regardless of which database you connect.

Query Execution ​

How It Works ​

Execution Properties ​

Storage Recommendations ​

Recommended: Columnar / Analytics Storage ​

Supported: Transactional Databases ​

Optimization Tips ​

Result Caching ​

Column Type Detection ​