Documentation Index
Fetch the complete documentation index at: https://mintlify.com/spiceai/spiceai/llms.txt
Use this file to discover all available pages before exploring further.
What is Data Acceleration?
Data acceleration materializes data from remote sources into local storage for fast, low-latency queries. Think of it as an active cache or CDN for databases that prefetches and stores working sets of data close to your application. Key Difference from Caching:- Traditional Cache: Fetches data on cache-miss (reactive)
- Spice Acceleration: Prefetches and materializes data on schedule/trigger/CDC (proactive)
Dual-Engine Acceleration
Spice supports both OLAP (analytical) and OLTP (transactional) acceleration engines at the dataset level:| Engine | Mode | Type | Best For |
|---|---|---|---|
arrow | memory | OLAP | Fast analytical queries, large scans |
duckdb | memory, file | OLAP | Complex analytics, aggregations, joins |
cayenne | file | OLAP | High compression, S3-backed, multi-file scale |
sqlite | memory, file | OLTP | Point queries, indexes, ACID transactions |
postgres | N/A | OLTP | Shared acceleration, remote OLTP access |
OLAP vs OLTP Engines
OLAP (Analytical) - Optimized for:- Large table scans
- Aggregations (SUM, AVG, GROUP BY)
- Complex joins
- Columnar storage
- Compression
- Point queries (single row lookups)
- Index-based retrieval
- ACID transactions
- Row-based storage
- Concurrent updates
Acceleration Engines
Arrow (In-Memory)
Best for: Fast analytical queries on datasets that fit in memory- Zero-copy reads
- Columnar format
- SIMD-optimized compute
- No persistence (ephemeral)
DuckDB (OLAP)
Best for: Complex analytical queries with aggregations and joinsmemory: In-memory database (fast, ephemeral)file: Persistent to disk (survives restarts)
- Excellent for GROUP BY, JOIN, aggregations
- Compressed columnar storage
- ACID transactions
- Can query Parquet directly
Cayenne (Vortex + SQLite)
Best for: High compression, multi-file scaling, S3-backed acceleration- DuckDB-comparable performance
- No single-file size limits
- Compressed columnar format (Vortex)
- Primary key support for upserts/deletes
- CRUD operations
- File-mode only
- File mode only (no memory mode)
- No secondary indexes
- No snapshots support
- Some Arrow types unsupported (Interval, Duration, Map)
SQLite (OLTP)
Best for: Point queries, indexed lookups, small to medium datasetsmemory: In-memory (fast, ephemeral)file: Persistent to disk
- Fast indexed queries
- ACID transactions
- Row-based storage
- Supports indexes and unique constraints
PostgreSQL (OLTP)
Best for: Shared acceleration across multiple Spice instances, remote OLTP- Shared across Spice instances
- Full PostgreSQL feature set
- Network latency vs. embedded engines
- Concurrent writes from multiple runtimes
Acceleration Modes
Memory Mode
Data stored in RAM (ephemeral):- Fastest query performance
- Zero disk I/O
- Lost on restart
- Limited by available RAM
- Cold start requires full refresh
File Mode
Data persisted to disk:- Survives restarts
- Larger datasets (disk-bound)
- Fast cold starts (no initial load)
- Slower than memory (disk I/O)
File Create Mode
Always start fresh (truncate on startup):Refresh Strategies
Full Refresh
Replace entire dataset:Append Refresh
Add only new data:Changes Refresh (CDC)
Stream incremental changes:Caching Refresh
Query-driven caching:Refresh Scheduling
Interval-Based
s (seconds), m (minutes), h (hours), d (days)
Cron-Based
Custom SQL Refresh
Refresh Jitter
Add randomness to prevent thundering herd:Acceleration Snapshots
Bootstrap accelerations from S3 for fast cold starts: Configuration:enabled: Bootstrap from snapshots and create new onesbootstrap_only: Only load from snapshots, don’t createcreate_only: Only create snapshots, don’t bootstrapdisabled: No snapshot usage
- Cold starts in seconds vs. minutes/hours
- Ephemeral compute with persistent recovery
- Reduced source database load on restarts
Indexes and Primary Keys
Indexes (SQLite/PostgreSQL)
Primary Keys
- Upsert behavior
- Deduplication
- Update/delete by key (Cayenne)
Upsert and Conflict Resolution
Handle duplicate rows on insert:drop: Drop conflicting rows (default)upsert: Update existing rowsupsert_dedup: Deduplicate before upsertupsert_dedup_by_row_id: Deduplicate using internal row IDs
Data Retention
Automatically expire old data:Partitioning
Partition data for better query performance:Ready State
Control when dataset becomes queryable:on_load: Dataset ready after initial load completes (default)on_registration: Dataset ready immediately, falls back to federated query until load completes
Real-World Example
From the includedspicepod.yml:
Query Results Caching
Separate from acceleration, Spice also caches SQL query results:spicepod.yaml
- Repeated identical queries
- Dashboard auto-refresh
- High-frequency reads
Monitoring Acceleration
Query acceleration metrics:Performance Comparison
Typical Query Latencies:| Scenario | Latency |
|---|---|
| Remote PostgreSQL (federated) | 100-500ms |
| Arrow acceleration (memory) | 1-10ms |
| DuckDB acceleration (memory) | 5-50ms |
| SQLite acceleration (file) | 10-100ms |
| Cayenne acceleration (file) | 10-100ms |
- Queries taking >100ms
- Repeated access to same data
- High query frequency (>10 QPS)
- Source rate limits
- Network latency issues
Best Practices
- Choose the right engine: OLAP for analytics, OLTP for point queries
- Use appropriate refresh mode: Append for time-series, full for small datasets
- Enable snapshots: For large datasets with slow initial loads
- Add jitter: Prevent refresh storms across multiple instances
- Monitor refresh duration: Alert if refresh takes too long
- Set retention policies: Prevent unbounded growth
- Use ready_state: on_registration: For high availability during startup
- Partition large datasets: Enable partition pruning
Next Steps
Acceleration Engines
Detailed engine configurations
Data Federation
Query without acceleration
Spicepods
Configuration reference
Snapshots Guide
Fast cold starts with snapshots