Documentation Index
Fetch the complete documentation index at: https://mintlify.com/spiceai/spiceai/llms.txt
Use this file to discover all available pages before exploring further.
The spice search command provides an interactive REPL for performing semantic search across datasets using embeddings.
Usage
Options
| Flag | Default | Description |
|---|
-l, --limit <NUM> | 10 | Maximum number of search results |
--cache-control <MODE> | cache | Cache control: cache or no-cache |
--model <NAME> | (default) | Embedding model to use for search |
--endpoint <URL> | http://localhost:8090 | Remote Spice instance HTTP endpoint |
--headers <KEY:VALUE> | - | Custom HTTP headers (can be specified multiple times) |
-o, --output <FORMAT> | table | Output format: table or json |
Global Options
Inherits global flags:
--api-key <KEY> - API key for authentication
--cloud - Connect to Spice Cloud
Prerequisites
Datasets must have embeddings enabled:
datasets:
- from: postgres:documents
name: docs
embeddings:
- column: content
model: minilm
Load Embedding Model
models:
- from: huggingface:huggingface.co/sentence-transformers/all-MiniLM-L6-v2
name: minilm
Examples
Basic Search
Start the REPL:
Output:
Welcome to the Spice.ai search REPL! Enter your search queries.
search> machine learning tutorials
Results:
+------+-------------------------------------+--------+---------+
| Rank | Match | Score | Dataset |
+------+-------------------------------------+--------+---------+
| 1 | Introduction to Machine Learning... | 0.8923 | docs |
| 2 | Deep Learning Tutorial for Begin... | 0.8745 | docs |
| 3 | ML Fundamentals: A Complete Guid... | 0.8621 | docs |
+------+-------------------------------------+--------+---------+
Time: 0.124 seconds. 3 results.
Search with Primary Keys
When datasets have primary keys, they’re displayed:
Input:
search> database optimization
Results:
+------+-----+-------------------------------------+--------+---------+
| Rank | Key | Match | Score | Dataset |
+------+-----+-------------------------------------+--------+---------+
| 1 | 42 | Database Indexing Strategies for... | 0.9012 | docs |
| 2 | 87 | Query Optimization Techniques in... | 0.8834 | docs |
| 3 | 123 | Performance Tuning for PostgreS... | 0.8756 | docs |
+------+-----+-------------------------------------+--------+---------+
Time: 0.098 seconds. 3 results.
JSON Output
Input:
search> api documentation
Output:
{
"results": [
{
"matches": {
"content": "REST API Documentation Guide"
},
"score": 0.9123,
"dataset": "docs",
"primary_key": {
"id": 15
}
},
{
"matches": {
"content": "GraphQL API Reference"
},
"score": 0.8845,
"dataset": "docs",
"primary_key": {
"id": 27
}
}
],
"duration_ms": 82
}
Limit Results
Input:
search> kubernetes deployment
Returns top 5 results only.
Custom Model
Specify an embedding model:
Disable Cache
Force fresh results:
spice search --cache-control no-cache
Output:
search> example query
...
Time: 0.234 seconds. 10 results.
Note: No “(cached)” indicator.
Cached Results
With default cache mode:
First query:
search> example query
Time: 0.234 seconds. 10 results.
Repeat query:
search> example query
Time: 0.003 seconds. 10 results (cached).
Remote Runtime
spice search --endpoint http://remote-host:8090
spice search \
--endpoint https://api.example.com \
--headers "Authorization:Bearer token123" \
--headers "X-Tenant-ID:acme"
REPL Commands
| Command | Description |
|---|
<query> | Perform semantic search |
.clear | Clear the screen |
exit, quit | Exit the REPL |
.exit, .quit | Exit the REPL |
Ctrl+C | Cancel or exit |
Ctrl+D | Exit REPL |
Search Query Features
Natural Language
Use plain English queries:
search> how to configure kubernetes
search> best practices for API design
search> troubleshooting docker containers
Long Queries
Multi-sentence queries work:
search> I need information about setting up continuous integration pipelines with GitHub Actions for a Python project
Keywords
Keyword searches also work:
Results display in a table with:
- Rank: 1-based result ranking
- Key: Primary key value(s) (if dataset has primary key)
- Match: Matched text (first 3 lines, truncated)
- Score: Similarity score (0.0 - 1.0)
- Dataset: Source dataset name
Match Truncation
Long text is truncated:
- First 3 lines shown
- Long lines truncated with
...
- Multiple matches separated by
;
Multiple Datasets
When searching across multiple datasets:
+------+-----+-----------------------------+--------+-----------+
| Rank | Key | Match | Score | Dataset |
+------+-----+-----------------------------+--------+-----------+
| 1 | 42 | Machine learning overview | 0.9123 | docs |
| 2 | 15 | ML fundamentals | 0.8956 | tutorials |
| 3 | 87 | Introduction to ML | 0.8834 | docs |
+------+-----+-----------------------------+--------+-----------+
History
The REPL maintains search history in ~/.spice/search_history.txt:
- Navigate with Up/Down arrow keys
- Search history with Ctrl+R
- Persists across sessions
Environment Variables
| Variable | Description |
|---|
SPICE_API_KEY | API key for authentication |
Exit Codes
| Code | Description |
|---|
0 | Normal exit |
1 | Connection error or runtime unavailable |
Troubleshooting
No Results
search> example query
No results.
Possible causes:
- No datasets with embeddings configured
- Datasets haven’t been indexed yet
- Query doesn’t match any content
Solution:
Check dataset configuration:
datasets:
- from: postgres:documents
name: docs
embeddings:
- column: content
model: minilm
Connection Error
Error: Failed to connect to runtime at http://localhost:8090
Ensure runtime is running:
Model Not Found
Error: Model 'nonexistent' not found
Verify model is configured and loaded:
Slow Searches
First search may be slow while embeddings are generated. Subsequent searches use cache:
search> query
Time: 2.345 seconds. 10 results. # First search
search> query
Time: 0.012 seconds. 10 results (cached). # Cached
Search API
The search REPL uses the /v1/search HTTP API. Use it programmatically:
curl -X POST http://localhost:8090/v1/search \
-H "Content-Type: application/json" \
-d '{
"text": "machine learning",
"limit": 10
}'
Response:
{
"results": [
{
"matches": {
"content": "Machine Learning Overview"
},
"score": 0.9123,
"dataset": "docs",
"primary_key": {"id": 42}
}
],
"duration_ms": 82
}
See Search API Reference for full documentation.
Advanced Configuration
Multiple Embedding Columns
Search across multiple columns:
embeddings:
- column: title
model: minilm
- column: content
model: minilm
- column: summary
model: minilm
Different Models per Column
embeddings:
- column: english_text
model: minilm-en
- column: french_text
model: minilm-fr
Hybrid Search
Combine semantic and keyword search (configure in spicepod.yaml):
embeddings:
- column: content
model: minilm
hybrid:
enabled: true
weight: 0.7 # 70% semantic, 30% keyword