Full-text search uses BM25 ranking with the Tantivy search engine to find documents by keyword relevance.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/spiceai/spiceai/llms.txt
Use this file to discover all available pages before exploring further.
text_search() UDTF
Thetext_search() User-Defined Table Function performs full-text search:
Parameters
table_name(required) - Name of the table with indexed textquery(required) - Search query stringcolumn_name(optional) - Specific column to searchlimit(optional) - Maximum results to return (default: 1000)include_score(optional) - Include_scorecolumn (default: true)
Basic Usage
BM25 Ranking
BM25 (Best Matching 25) is a probabilistic ranking function that scores documents based on:- Term Frequency (TF) - How often query terms appear in the document
- Inverse Document Frequency (IDF) - Rarity of terms across all documents
- Document Length - Normalized by average document length
Query Syntax
Text search supports simple, space-delimited queries:Configuration
Enable full-text search in your spicepod.yaml:Multi-Column Indexing
Index multiple columns for comprehensive search:Integration with SQL
Filtering Results
Joining with Tables
Aggregations
Tantivy Indexing
Spice uses Tantivy, a full-text search library written in Rust, providing:- Fast indexing and searching
- Low memory footprint
- BM25 ranking algorithm
- Simple tokenization and lowercasing
Index Storage
Full-text indexes are stored alongside accelerated data:Performance
Index Creation
Indexes are built during dataset acceleration:Query Performance
Full-text search is optimized for:- Small to medium datasets (< 10M documents)
- Keyword-based queries (not semantic search)
- Fast exact matching (microseconds for indexed terms)
Optimization Tips
- Limit results: Use the
limitparameter in the UDTF - Pre-filter: Apply WHERE clauses to reduce result set
- Accelerate data: Enable acceleration for the dataset
- Selective columns: Only index columns you need to search
Special Columns
_score
BM25 relevance score (higher = more relevant):_value
The matched content from the search column:Examples
Document Search
Log Analysis
E-commerce Product Search
FAQ Search
Limitations
- No boolean operators: AND, OR, NOT are treated as regular terms
- No phrase search: Multi-word queries are tokenized into OR terms
- No wildcards: Pattern matching not supported
- No filters in UDTF: Use SQL WHERE clauses instead
- Single-column queries: Search one column per query
Comparison: Text Search vs Vector Search
| Feature | Full-Text Search (BM25) | Vector Search |
|---|---|---|
| Best for | Exact keyword matching | Semantic similarity |
| Query type | Keywords, terms | Natural language |
| Speed | Very fast | Fast (depends on index) |
| Setup | Automatic indexing | Requires embeddings |
| Storage | Tantivy index | Vector store |
| Scoring | BM25 (TF-IDF based) | Distance metrics |
| Relevance | Keyword frequency | Semantic meaning |
Troubleshooting
No results found
- Verify column is indexed: Check
search.full_text.columnsin spicepod.yaml - Check query terms: Try individual words
- Verify data is loaded:
SELECT COUNT(*) FROM table
Unexpected results
- Remember queries are tokenized and lowercased
- Special characters are removed during tokenization
- BM25 scores are not normalized
Index not created
- Ensure
acceleration.enabled: true - Check logs for indexing errors
- Verify column data types (text/string columns only)
See Also
- Vector Search - Semantic search with embeddings
- Hybrid Search - Combine text and vector search
- Keyword Search - Simple pattern matching