Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/spiceai/spiceai/llms.txt

Use this file to discover all available pages before exploring further.

The PostgreSQL accelerator provides full-featured relational database acceleration by connecting to an external PostgreSQL instance. It’s optimized for OLTP workloads requiring advanced database features, multi-dataset federation, and high concurrency.

When to Use PostgreSQL

  • Advanced features: Need triggers, stored procedures, extensions
  • Multi-dataset federation: Query across multiple accelerated tables
  • High concurrency: Multiple concurrent writers
  • Existing infrastructure: Already running PostgreSQL
  • Large datasets: Datasets > 100GB with proper indexing

Configuration

Basic Setup

datasets:
  - name: orders
    from: s3://data-lake/orders/
    acceleration:
      enabled: true
      engine: postgres
      params:
        pg_host: localhost
        pg_port: 5432
        pg_db: spice_acceleration
        pg_user: spice
        pg_pass: ${secrets:pg_password}

With SSL/TLS

params:
  pg_host: db.example.com
  pg_port: 5432
  pg_db: spice
  pg_user: spice
  pg_pass: ${secrets:pg_password}
  pg_sslmode: require
  pg_sslrootcert: /path/to/ca-cert.pem

SSL Modes

Configure SSL connection security:
ModeDescription
disableNo SSL (not recommended for production)
preferUse SSL if available, fallback to non-SSL
requireRequire SSL, but don’t verify CA certificate
verify-caRequire SSL and verify CA certificate
verify-fullRequire SSL and verify hostname matches cert
params:
  pg_sslmode: verify-full
  pg_sslrootcert: /etc/ssl/certs/ca-cert.pem

Connection Pooling

Pool Configuration

params:
  pg_connection_pool_size: 20      # Max connections (default: 10)
  pg_connection_pool_min: 5        # Min idle connections (default: 5)
The connection pool:
  • Reuses connections for better performance
  • Limits concurrent connections to PostgreSQL
  • Lazily creates connections up to min idle

Pool Size Guidelines

  • Small deployments: 10-20 connections
  • Medium deployments: 20-50 connections
  • Large deployments: 50-100 connections
  • Ensure PostgreSQL max_connections > pool size

Refresh Modes

Full Refresh

Replaces all data:
acceleration:
  enabled: true
  engine: postgres
  refresh_mode: full
  refresh_interval: 1h

Append Mode

Appends only new data:
acceleration:
  enabled: true
  engine: postgres
  refresh_mode: append
  refresh_interval: 5m
  time_column: created_at

Primary Keys and Constraints

Primary Key

acceleration:
  enabled: true
  engine: postgres
  primary_key: id

Composite Primary Key

primary_key:
  - customer_id
  - order_id

On Conflict Behavior

acceleration:
  enabled: true
  engine: postgres
  primary_key: id
  on_conflict: upsert
Uses PostgreSQL’s ON CONFLICT DO UPDATE for upserts.

Indexing

Create indexes in PostgreSQL for better query performance:
CREATE INDEX idx_orders_customer_id ON orders(customer_id);
CREATE INDEX idx_orders_created_at ON orders(created_at);
Spice respects existing PostgreSQL indexes.

Federation

Query across multiple PostgreSQL-accelerated datasets:
datasets:
  - name: orders
    acceleration:
      engine: postgres
      params:
        pg_host: localhost
        pg_db: spice
  
  - name: customers
    acceleration:
      engine: postgres
      params:
        pg_host: localhost
        pg_db: spice
Federated query:
SELECT 
  o.order_id,
  c.customer_name,
  o.total_amount
FROM orders o
JOIN customers c ON o.customer_id = c.id
WHERE o.created_at > NOW() - INTERVAL '1 day';

Performance Characteristics

Query Performance

OperationPerformanceNotes
Point queriesExcellentWith proper indexes
InsertsExcellentBatch inserts recommended
UpdatesExcellentIn-place updates
DeletesExcellentFast with indexes
Full table scanGoodBetter with parallel query
AggregationsGoodOptimized for OLTP
Complex joinsExcellentAdvanced query planner

Network Latency

PostgreSQL is an external service. Minimize latency:
  • Co-locate Spice and PostgreSQL in same datacenter
  • Use connection pooling to reduce connection overhead
  • Consider local accelerators (DuckDB, SQLite) if network is slow

ACID Transactions

PostgreSQL provides full ACID guarantees:
BEGIN;
  INSERT INTO orders (id, customer_id, amount) VALUES (1, 100, 50.00);
  UPDATE customers SET total_spent = total_spent + 50.00 WHERE id = 100;
COMMIT;

Advanced Features

Triggers

Create PostgreSQL triggers for automation:
CREATE TRIGGER update_modified_timestamp
BEFORE UPDATE ON orders
FOR EACH ROW
EXECUTE FUNCTION update_modified_column();

Extensions

Use PostgreSQL extensions:
CREATE EXTENSION IF NOT EXISTS pg_trgm;  -- Trigram matching
CREATE INDEX idx_name_trgm ON customers USING gin(name gin_trgm_ops);

Partitioning

Leverage PostgreSQL native partitioning:
CREATE TABLE orders (
    order_id SERIAL,
    customer_id INTEGER,
    order_date DATE,
    amount NUMERIC
) PARTITION BY RANGE (order_date);

CREATE TABLE orders_2024_q1 PARTITION OF orders
    FOR VALUES FROM ('2024-01-01') TO ('2024-04-01');

Example Configurations

High-Concurrency Transactional

datasets:
  - name: orders
    from: kafka://localhost:9092/orders
    acceleration:
      enabled: true
      engine: postgres
      params:
        pg_host: localhost
        pg_port: 5432
        pg_db: spice
        pg_user: spice
        pg_pass: ${secrets:pg_password}
        pg_connection_pool_size: 50
        pg_connection_pool_min: 10
      primary_key: order_id
      on_conflict: upsert
      refresh_interval: 30s

Secure Remote Connection

datasets:
  - name: customers
    from: s3://data-lake/customers/
    acceleration:
      enabled: true
      engine: postgres
      params:
        pg_host: db.production.example.com
        pg_port: 5432
        pg_db: spice_prod
        pg_user: spice_app
        pg_pass: ${secrets:prod_pg_password}
        pg_sslmode: verify-full
        pg_sslrootcert: /etc/ssl/certs/production-ca.pem
      refresh_interval: 5m

Multi-Dataset Federation

datasets:
  - name: orders
    acceleration:
      engine: postgres
      params:
        pg_host: postgres-1.internal
        pg_db: spice
  
  - name: customers
    acceleration:
      engine: postgres
      params:
        pg_host: postgres-1.internal
        pg_db: spice
  
  - name: products
    acceleration:
      engine: postgres
      params:
        pg_host: postgres-1.internal
        pg_db: spice

Monitoring

Connection Pool Metrics

SELECT * FROM runtime.metrics 
WHERE name LIKE 'postgres_pool_%';

Dataset Metrics

SELECT 
  dataset_name,
  value as row_count
FROM runtime.metrics
WHERE name = 'acceleration_rows';

Parameters

ParameterTypeDescriptionDefault
pg_hoststringPostgreSQL server hostname-
pg_portintegerPostgreSQL server port5432
pg_dbstringDatabase name-
pg_userstringUsername (use secrets)-
pg_passstringPassword (use secrets)-
pg_sslmodestringSSL mode (disable, require, etc.)prefer
pg_sslrootcertstringPath to CA certificate-
pg_connection_pool_sizeintegerMaximum pool connections10
pg_connection_pool_minintegerMinimum idle connections5

Secrets Management

Always use secrets for credentials:
params:
  pg_user: spice_app
  pg_pass: ${secrets:postgres_password}
Configure secrets in spicepod.yml:
secrets:
  - from: env
    name: postgres_password

Limitations

  • Requires external PostgreSQL instance
  • Network latency affects performance
  • Connection pool size limited by PostgreSQL max_connections
  • Not ideal for pure analytical workloads (consider DuckDB)

When to Use DuckDB Instead

Consider DuckDB if:
  • Analytical queries with heavy aggregations
  • No need for external database
  • Want embedded solution
  • Don’t need multi-user concurrent writes

When to Use SQLite Instead

Consider SQLite if:
  • Lightweight embedded database sufficient
  • Single-file storage preferred
  • Dataset < 10GB
  • No need for advanced PostgreSQL features

Next Steps