Documentation Index
Fetch the complete documentation index at: https://mintlify.com/spiceai/spiceai/llms.txt
Use this file to discover all available pages before exploring further.
What is Data Federation?
Data federation allows you to query data across multiple disparate sources using a single SQL interface, without moving or copying the data. Spice acts as a unified query layer that routes queries to the appropriate backend systems. Single Query, Multiple Sources:How Federated Queries Work
Query Execution Flow
Query Push-Down Optimization
Spice intelligently pushes computations to the source systems: Filter Push-DownSupported Data Connectors
Spice supports 30+ data connectors across databases, warehouses, lakes, and files.Databases
| Connector | Status | Use Case |
|---|---|---|
postgres | Stable | Transactional data, operational queries |
mysql | Stable | Web applications, CMS systems |
mssql | Beta | Enterprise SQL Server deployments |
oracle | Alpha | Legacy enterprise systems |
clickhouse | Alpha | Real-time analytics |
mongodb | Alpha | Document databases |
scylladb | Alpha | Wide-column NoSQL |
dynamodb | Release Candidate | AWS key-value store |
Data Warehouses
| Connector | Status | Use Case |
|---|---|---|
snowflake | Beta | Cloud data warehouse |
databricks | Beta/Stable | Lakehouse analytics |
dremio | Stable | Data lakehouse |
spice.ai | Stable | Spice Cloud Platform |
Data Lakes & Files
| Connector | Status | Protocol |
|---|---|---|
s3 | Stable | Parquet, CSV from S3 |
delta_lake | Stable | Delta Lake format |
iceberg | Beta | Apache Iceberg tables |
file | Stable | Local Parquet/CSV |
abfs | Alpha | Azure Blob Storage |
gcs | Alpha | Google Cloud Storage |
glue | Alpha | AWS Glue Catalog |
Other Sources
| Connector | Status | Use Case |
|---|---|---|
github | Stable | GitHub issues, PRs, stargazers |
graphql | Release Candidate | GraphQL APIs |
http/https | Alpha | REST APIs returning Parquet/CSV/JSON |
kafka | Alpha | Streaming data |
debezium | Alpha | Change Data Capture (CDC) |
Configuring Federated Datasets
Define datasets in yourspicepod.yaml:
spicepod.yaml
Using Secrets
Never hardcode credentials. Use secret references:- Environment variables (
envsecret store, default) - AWS Secrets Manager
- Azure Key Vault
- Kubernetes secrets
- HashiCorp Vault
Federated Query Examples
Cross-Database Join
Aggregating Across Sources
Querying S3 Data Lake
Distributed Multi-Node Query
Scale federated queries across multiple nodes using Apache Ballista:spicepod.yaml
- Parallel data scanning across partitions
- Distributed aggregations
- Improved performance on large datasets (TB+)
Performance Considerations
When Federation Works Well
- Selective queries with strong filters
- Pre-aggregated source data
- Small result sets after filtering
- Push-down compatible operations
When to Use Acceleration
Consider data acceleration when:- Queries access the same data repeatedly
- Source queries are slow (seconds)
- Network latency is high
- Source systems have rate limits
- Low-latency queries required (<100ms)
Catalog Connectors
Catalog connectors expose entire catalogs for federated query:spicepod.yaml
| Catalog | Status | Description |
|---|---|---|
spice.ai | Stable | Spice Cloud Platform |
unity_catalog | Stable | Databricks Unity Catalog |
databricks | Beta | Databricks Spark Connect |
iceberg | Beta | Apache Iceberg REST Catalog |
glue | Alpha | AWS Glue Data Catalog |
Monitoring Federated Queries
Query performance metrics available via:- Query duration
- Rows scanned
- Data transferred
- Push-down effectiveness
Best Practices
- Use specific filters: Reduce data scanned at the source
- Select only needed columns: Minimize network transfer
- Leverage push-down: Let sources do the heavy lifting
- Monitor query plans: Use
EXPLAINto verify push-down - Consider acceleration: For frequently accessed data
- Use secrets: Never hardcode credentials
- Partition large datasets: Enable predicate push-down on partitions
Next Steps
Data Acceleration
Materialize federated data locally for faster queries
Data Connectors
Browse all available data connectors
Spicepods
Learn Spicepod configuration format
Query Federation Guide
Detailed federation feature guide