The S3 connector enables Spice to query data stored in Amazon S3 and S3-compatible storage systems (MinIO, Wasabi, etc.). It supports Parquet and CSV formats with automatic schema detection and query push-down.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/spiceai/spiceai/llms.txt
Use this file to discover all available pages before exploring further.
Status
Stable - Production-ready with comprehensive testingSupported Features
- Parquet and CSV file formats
- Automatic schema inference
- Predicate push-down for Parquet files
- Partition pruning
- S3 and S3-compatible endpoints
- Multiple authentication methods
- Data acceleration
- Globbing patterns for multiple files
Configuration
Basic Configuration
With Authentication
S3-Compatible Endpoint
With Acceleration
Parameters
File format:
parquet or csvAuthentication method:
default: Use AWS default credential chainkey: Use access key and secretrole: Use IAM role
AWS access key ID (when
s3_auth: key)AWS secret access key (when
s3_auth: key)AWS region for the S3 bucket
Custom S3-compatible endpoint URL
Allow HTTP connections (use with custom endpoints)
Timeout for S3 operations (e.g.,
60s, 5m)CSV-Specific Parameters
Whether CSV file has a header row
CSV field delimiter
CSV quote character
Authentication
Default Credentials Chain
Uses AWS default credential chain (environment variables, IAM role, etc.):Access Keys
Explicitly provide access key and secret:IAM Role
Use IAM role (recommended for EC2/EKS):Use Cases
Query Parquet Data Lake
CSV Analytics with Acceleration
Multi-Region Partitioned Data
MinIO/S3-Compatible Storage
Performance Tips
- Use Parquet: Parquet format provides columnar storage with compression and predicate push-down
- Enable Acceleration: For frequently queried data, enable acceleration for sub-second queries
- Partition Data: Organize data by date/region for partition pruning
- Use Globbing: Query multiple files efficiently with patterns like
*.parquet - Regional Proximity: Use S3 buckets in the same region as your Spice runtime
Limitations
- Write operations are not supported (read-only connector)
- Schema changes in source files require dataset refresh
- Very large files (>10GB) may benefit from partitioning