Query Execution
GradientHarbor executes SQL queries directly against your database — it does not download or replicate your tables. This means your data stays where it is, queries leverage your warehouse's compute engine, and you get real-time results.
How It Works
┌──────────────┐ ┌──────────────────┐ ┌──────────────────┐
│ User Query │ │ GradientHarbor │ │ Your Database │
│ (SQL) │────▶│ (pass-through) │────▶│ (executes SQL) │
│ │ │ │◀────│ (returns rows) │
│ │◀────│ (format + cache)│ │ │
└──────────────┘ └──────────────────┘ └──────────────────┘When you or the AI agent runs a query:
- GradientHarbor connects to your database using stored credentials
- The SQL is sent as-is to your database's query engine
- Your database executes the query using its own optimizer, indexes, and compute
- Results are returned to GradientHarbor (up to 10,000 rows)
- Results are cached in S3 for fast subsequent access
INFO
GradientHarbor never downloads full tables, creates replicas, or moves data out of your warehouse. All computation happens on your database.
Execution Properties
| Property | Value |
|---|---|
| Max rows | 10,000 per query |
| Timeout | 2 minutes |
| Access mode | Read-only (default_transaction_read_only=on) |
| Result format | Paged JSON (1,000 rows per page) |
| Cancellation | Supported (PostgreSQL via pg_cancel_backend, BigQuery via job cancellation) |
Storage Recommendations
Since GradientHarbor queries your database directly, query performance depends entirely on your database's capabilities. Choosing the right storage engine makes a significant difference.
Recommended: Columnar / Analytics Storage
For best performance with analytical queries (aggregations, filters on large tables, GROUP BY), use a columnar storage engine:
┌────────────────────────────────────────────────────┐
│ Recommended Engines │
├───────────────────┬────────────────────────────────┤
│ Snowflake │ Cloud data warehouse, auto- │
│ │ scaling, columnar storage │
├───────────────────┼────────────────────────────────┤
│ BigQuery │ Serverless, pay-per-query, │
│ │ columnar storage │
├───────────────────┼────────────────────────────────┤
│ ClickHouse │ Open-source columnar OLAP, │
│ │ extremely fast aggregations │
├───────────────────┼────────────────────────────────┤
│ Redshift │ AWS columnar data warehouse, │
│ │ MPP architecture │
├───────────────────┼────────────────────────────────┤
│ Databricks SQL │ Lakehouse architecture, │
│ │ Delta Lake + Photon engine │
└───────────────────┴────────────────────────────────┘Why columnar storage matters:
- Analytical queries typically read a few columns across millions of rows
- Columnar engines only scan the columns referenced in your query
- Aggregations (SUM, COUNT, AVG) are orders of magnitude faster
- Compression is much more effective on columnar data
Supported: Transactional Databases
GradientHarbor also works with traditional row-based (OLTP) databases:
┌────────────────────────────────────────────────────┐
│ Supported Engines │
├───────────────────┬────────────────────────────────┤
│ PostgreSQL │ Excellent for small-medium │
│ │ datasets with proper indexing │
├───────────────────┼────────────────────────────────┤
│ MySQL │ Common OLTP database, works │
│ │ well for indexed queries │
├───────────────────┼────────────────────────────────┤
│ MongoDB (SQL) │ Document DB via SQL interface, │
│ │ good for semi-structured data │
└───────────────────┴────────────────────────────────┘WARNING
Transactional databases are optimized for row-level operations (INSERT, UPDATE, DELETE), not full-table scans. For large datasets (10M+ rows), analytical queries may be slow without proper indexes. Consider using a columnar warehouse for heavy analytics workloads.
Optimization Tips
- Add indexes on columns used in WHERE and GROUP BY clauses
- Materialize views for frequently-run aggregation queries
- Partition large tables by date or another common filter dimension
- Use your warehouse's caching — Snowflake and BigQuery cache repeated query results automatically
- Set billing caps — Configure
maximumBytesBilledon BigQuery to prevent runaway costs
Result Caching
Query results are cached at multiple levels to avoid recomputation. See Caching System for details on the three-tier caching architecture (warehouse → S3 → browser).
Column Type Detection
GradientHarbor normalizes database-specific column types into four universal types:
| Universal Type | Maps From |
|---|---|
| string | VARCHAR, TEXT, CHAR, UUID, JSON, etc. |
| number | INTEGER, BIGINT, FLOAT, DECIMAL, NUMERIC, etc. |
| date | DATE, TIMESTAMP, TIMESTAMPTZ, DATETIME, etc. |
| boolean | BOOLEAN, BOOL |
This normalization ensures consistent formatting and chart rendering regardless of which database you connect.