Spice provides enterprise-grade search capabilities combining vector similarity, full-text, and keyword search for both structured and unstructured data.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/spiceai/spiceai/llms.txt
Use this file to discover all available pages before exploring further.
Search Types
Spice supports three primary search methods:- Vector Similarity Search - Semantic search using embeddings and distance metrics
- Full-Text Search - BM25-powered text search with Tantivy
- Keyword Search - Traditional exact and partial keyword matching
Hybrid Search
Combine multiple search methods using Reciprocal Rank Fusion (RRF) to achieve better relevance than any single method alone. Hybrid search merges results from vector and text search, reranking by relative position rather than raw scores.SQL-Native Search
All search capabilities are exposed through SQL using User-Defined Table Functions (UDTFs):Vector Search UDTF
Text Search UDTF
Vector Storage Options
Spice supports multiple vector storage backends:- Amazon S3 Vectors - Petabyte-scale vector storage (recommended for production)
- pgvector - PostgreSQL extension for vector operations
- duckdb_vector - DuckDB with vector extension
- sqlite_vec - SQLite with vector extension
Embedding Generation
Generate embeddings automatically using:- AWS Bedrock - Amazon Titan, Cohere embeddings
- HuggingFace - Open-source embedding models
- Model2Vec - 500x faster static embeddings
- OpenAI - OpenAI embedding models
Distance Metrics
Supported vector distance metrics:- Cosine Similarity - Normalized dot product (default)
- Euclidean Distance - L2 distance
- Dot Product - Raw inner product
Special Columns
Search queries return special columns:_score- Relevance score (0.0 to 1.0 for vectors, float for text)_value- The matched content from the search column_match- Specific substring match (for chunked searches)
Architecture
Spice search is built on:- Apache DataFusion - SQL query engine and execution
- Apache Arrow - Columnar data format for zero-copy operations
- Tantivy - Full-text search library (BM25)
- Amazon S3 Vectors - Distributed vector storage
Use Cases
Retrieval-Augmented Generation (RAG)
Search for relevant context to ground LLM responses:Semantic Document Search
Find documents by meaning, not just keywords:Multi-Column Search
Search across multiple embedded columns:Getting Started
- Configure a dataset with embeddings
- Enable search indexing
- Query using
vector_search()ortext_search()UDTFs - Combine with standard SQL for filtering and sorting