Documentation Index
Fetch the complete documentation index at: https://mintlify.com/spiceai/spiceai/llms.txt
Use this file to discover all available pages before exploring further.
Overview
The Spice runtime (spiced) can be configured through command-line arguments, environment variables, and spicepod configuration files. This guide covers all available configuration options.
Command-Line Arguments
Network Binding
HTTP Server
spiced --http 0.0.0.0:8090
- Default:
127.0.0.1:8090
- Description: HTTP/REST API endpoint for queries, health checks, and management
- Protocol: HTTP/1.1
Flight Server
spiced --flight 0.0.0.0:50051
- Default:
127.0.0.1:50051
- Description: Arrow Flight SQL and Flight RPC endpoint
- Protocol: gRPC (HTTP/2)
Metrics Server
spiced --metrics 0.0.0.0:9090
- Default: Not exposed by default
- Description: Prometheus metrics endpoint
- Protocol: HTTP/1.1
Cluster Mode
See Distributed Query for detailed cluster configuration.
Scheduler
spiced \
--role scheduler \
--node-bind-address 0.0.0.0:50052 \
--node-advertise-address scheduler.example.com
Executor
spiced \
--role executor \
--scheduler-address https://scheduler.example.com:50052 \
--node-bind-address 0.0.0.0:50052 \
--node-advertise-address executor-1.example.com
mTLS Configuration
spiced \
--node-mtls-ca-certificate-file /path/to/ca-cert.pem \
--node-mtls-certificate-file /path/to/node-cert.pem \
--node-mtls-key-file /path/to/node-key.pem
Environment Variables
Secrets
Environment variables prefixed with SPICE_SECRET_ are available as secrets:
export SPICE_SECRET_DATABASE_PASSWORD="mypassword"
export SPICE_SECRET_API_KEY="sk-1234567890"
Reference in spicepod.yaml:
secrets:
- from: env
name: env
datasets:
- from: postgres:my_table
name: my_table
params:
connection_string: postgres://user:${env:SPICE_SECRET_DATABASE_PASSWORD}@host/db
Data Connector Credentials
Many connectors use standard environment variables:
# AWS S3
export AWS_ACCESS_KEY_ID="AKIAIOSFODNN7EXAMPLE"
export AWS_SECRET_ACCESS_KEY="wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY"
export AWS_REGION="us-west-2"
# Azure
export AZURE_STORAGE_ACCOUNT_NAME="myaccount"
export AZURE_STORAGE_ACCOUNT_KEY="mykey"
# Google Cloud
export GOOGLE_APPLICATION_CREDENTIALS="/path/to/service-account.json"
Spicepod Configuration
Runtime Settings
Configure runtime behavior in spicepod.yaml:
version: v1
kind: Spicepod
name: my_app
runtime:
# Query result caching
caching:
sql_results:
enabled: true
max_size: 128MB
item_ttl: 1s
search_results:
enabled: true
max_size: 128MB
item_ttl: 60s
embeddings:
enabled: true
max_size: 1GB
item_ttl: 3600s
# Task history settings
task_history:
enabled: true
retention_period: 24h
max_task_runs: 1000
# Query settings
query:
max_concurrent_queries: 100
timeout: 300s
Acceleration Configuration
In-Memory (Arrow)
datasets:
- from: postgres:orders
name: orders
acceleration:
enabled: true
engine: arrow # In-memory, fastest
refresh_mode: full
refresh_check_interval: 10s
Best for:
- Small to medium datasets (< 10GB)
- Highest query performance
- Frequent updates
File-Based (DuckDB)
datasets:
- from: s3://my-bucket/data/
name: large_dataset
acceleration:
enabled: true
engine: duckdb
mode: file
params:
duckdb_file: /data/large_dataset.db
refresh_mode: full
refresh_check_interval: 1h
Best for:
- Large datasets (10GB - 1TB)
- Persistent storage required
- Analytical queries (OLAP)
File-Based (SQLite)
datasets:
- from: postgres:transactions
name: transactions
acceleration:
enabled: true
engine: sqlite
mode: file
params:
sqlite_file: /data/transactions.db
refresh_mode: append
refresh_check_interval: 5s
Best for:
- Transactional workloads (OLTP)
- Point lookups and inserts
- Row-level updates
Spice Cayenne (Vortex)
datasets:
- from: s3://analytics/clickstream/
name: clickstream
acceleration:
enabled: true
engine: cayenne
mode: file
params:
cayenne_path: /data/clickstream
refresh_mode: full
refresh_check_interval: 1h
Best for:
- Very large datasets (> 1TB)
- Columnar storage with compression
- DuckDB-comparable performance
PostgreSQL Acceleration
datasets:
- from: s3://warehouse/sales/
name: sales
acceleration:
enabled: true
engine: postgres
params:
connection_string: postgres://user:pass@postgres-host:5432/spice
refresh_mode: full
refresh_check_interval: 30m
Best for:
- Shared acceleration across multiple Spice instances
- Transactional consistency requirements
- Existing PostgreSQL infrastructure
Refresh Modes
refresh_mode: full # Replace all data
refresh_mode: append # Add new data only
refresh_mode: changes # CDC-based incremental updates
Caching Configuration
Query Result Cache
runtime:
caching:
sql_results:
enabled: true
max_size: 256MB # Maximum cache size
item_ttl: 10s # Time-to-live per cached result
Caches identical SQL query results to avoid re-execution.
Search Result Cache
runtime:
caching:
search_results:
enabled: true
max_size: 128MB
item_ttl: 300s # 5 minutes
Caches vector and text search results.
Embeddings Cache
runtime:
caching:
embeddings:
enabled: true
max_size: 1GB
item_ttl: 86400s # 24 hours
Caches generated embeddings to avoid recomputation.
Memory Settings
runtime:
memory:
# Limit total memory usage for accelerated tables
max_acceleration_memory: 16GB
# Memory pool for query execution
query_execution_memory: 4GB
Parallelism
runtime:
parallelism:
# Number of threads for query execution
# Default: Number of CPU cores
num_threads: 8
# Number of threads for data refresh
refresh_threads: 4
Connection Pooling
datasets:
- from: postgres:orders
name: orders
params:
connection_string: postgres://host/db
# Connection pool settings
max_connections: 10
min_connections: 2
connection_timeout: 30s
Security Configuration
Authentication
See the Authentication documentation for details on configuring authentication.
TLS/SSL
Configure TLS for HTTP and Flight endpoints:
spiced \
--tls-certificate-file /path/to/cert.pem \
--tls-key-file /path/to/key.pem
OpenTelemetry Export
Export runtime metrics to OpenTelemetry collectors:
runtime:
otel_exporter:
endpoint: http://otel-collector:4317
push_interval: 60s
metrics:
- spice_runtime_*
- dataset_*
Supported protocols:
- gRPC:
http://host:4317 or https://host:4317
- HTTP:
http://host:4318/v1/metrics
Resource Limits
Query Limits
runtime:
query:
max_concurrent_queries: 100
default_timeout: 300s
max_memory_per_query: 2GB
Dataset Limits
datasets:
- from: s3://large-bucket/data/
name: large_data
params:
max_partition_size: 1GB
max_file_size: 100MB
Health Check Configuration
The runtime provides two health endpoints:
/health: Returns “ok” when the runtime is alive
/v1/ready: Returns ready status when all datasets are loaded
Configure readiness behavior:
runtime:
readiness:
# Don't wait for all datasets to load
wait_for_datasets: false
Logging Configuration
Control log output:
# Set log level
export RUST_LOG=info
# Detailed logging for specific components
export RUST_LOG=runtime=debug,datafusion=info
# JSON-formatted logs
export RUST_LOG_FORMAT=json
Log levels: error, warn, info, debug, trace
Complete Configuration Example
version: v1
kind: Spicepod
name: production-app
runtime:
caching:
sql_results:
enabled: true
max_size: 512MB
item_ttl: 30s
search_results:
enabled: true
max_size: 256MB
item_ttl: 300s
embeddings:
enabled: true
max_size: 2GB
item_ttl: 86400s
query:
max_concurrent_queries: 100
default_timeout: 600s
task_history:
enabled: true
retention_period: 168h # 7 days
otel_exporter:
endpoint: http://otel-collector:4317
push_interval: 60s
datasets:
- from: postgres:transactions
name: transactions
params:
connection_string: ${env:POSTGRES_URL}
max_connections: 20
acceleration:
enabled: true
engine: duckdb
mode: file
params:
duckdb_file: /data/transactions.db
refresh_mode: append
refresh_check_interval: 10s
- from: s3://analytics/clickstream/
name: clickstream
params:
file_format: parquet
acceleration:
enabled: true
engine: cayenne
mode: file
params:
cayenne_path: /data/clickstream
refresh_mode: full
refresh_check_interval: 1h
Next Steps