Documentation Index Fetch the complete documentation index at: https://mintlify.com/spiceai/spiceai/llms.txt
Use this file to discover all available pages before exploring further.
Spice provides OpenAI-compatible HTTP APIs that work seamlessly with the OpenAI SDK and other compatible clients. This allows you to use familiar tools and libraries while leveraging Spice’s data-grounded AI capabilities.
Available Endpoints
Spice exposes two primary OpenAI-compatible endpoints:
POST /v1/chat/completions - Chat completion with streaming support
POST /v1/embeddings - Generate text embeddings
These endpoints accept the same request format and return the same response structure as OpenAI’s API.
Chat Completions
Basic Usage
curl -X POST http://localhost:8090/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o-mini",
"messages": [
{"role": "user", "content": "What is Spice.ai?"}
]
}'
Streaming Responses
Enable streaming for real-time token generation:
curl -X POST http://localhost:8090/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o-mini",
"messages": [
{"role": "user", "content": "Count to 10"}
],
"stream": true
}'
With OpenAI Python SDK
from openai import OpenAI
client = OpenAI(
base_url = "http://localhost:8090/v1" ,
api_key = "not-needed" # Spice handles authentication
)
response = client.chat.completions.create(
model = "gpt-4o-mini" ,
messages = [
{ "role" : "system" , "content" : "You are a helpful assistant." },
{ "role" : "user" , "content" : "Explain vector databases" }
],
temperature = 0.7 ,
max_tokens = 150
)
print (response.choices[ 0 ].message.content)
Streaming with Python SDK
stream = client.chat.completions.create(
model = "gpt-4o-mini" ,
messages = [{ "role" : "user" , "content" : "Write a haiku about data" }],
stream = True
)
for chunk in stream:
if chunk.choices[ 0 ].delta.content:
print (chunk.choices[ 0 ].delta.content, end = "" )
JavaScript/TypeScript SDK
import OpenAI from 'openai' ;
const client = new OpenAI ({
baseURL: 'http://localhost:8090/v1' ,
apiKey: 'not-needed'
});
const completion = await client . chat . completions . create ({
model: 'gpt-4o-mini' ,
messages: [
{ role: 'user' , content: 'What is RAG?' }
]
});
console . log ( completion . choices [ 0 ]. message . content );
Embeddings
Basic Usage
curl -X POST http://localhost:8090/v1/embeddings \
-H "Content-Type: application/json" \
-d '{
"model": "text-embedding-3-small",
"input": "machine learning algorithms"
}'
With Python SDK
from openai import OpenAI
client = OpenAI(
base_url = "http://localhost:8090/v1" ,
api_key = "not-needed"
)
response = client.embeddings.create(
model = "text-embedding-3-small" ,
input = "semantic search with vector databases"
)
embedding = response.data[ 0 ].embedding
print ( f "Embedding dimension: { len (embedding) } " )
print ( f "First 5 values: { embedding[: 5 ] } " )
Batch Embeddings
response = client.embeddings.create(
model = "text-embedding-3-small" ,
input = [
"First document about AI" ,
"Second document about ML" ,
"Third document about data science"
]
)
for i, data in enumerate (response.data):
print ( f "Document { i } : { len (data.embedding) } dimensions" )
JavaScript/TypeScript SDK
const embedding = await client . embeddings . create ({
model: 'text-embedding-3-small' ,
input: 'retrieval augmented generation'
});
console . log ( embedding . data [ 0 ]. embedding );
Request Parameters
Chat Completion Parameters
Parameter Type Description modelstring Model name as defined in spicepod.yaml (required) messagesarray List of message objects with role and content (required) temperaturenumber Sampling temperature (0-2). Higher values = more random. Default: 1 max_tokensinteger Maximum tokens to generate. Default varies by model max_completion_tokensinteger Maximum completion tokens (OpenAI models only) streamboolean Enable streaming responses. Default: false top_pnumber Nucleus sampling parameter (0-1). Default: 1 frequency_penaltynumber Penalize repeated tokens (-2 to 2). Default: 0 presence_penaltynumber Penalize new tokens (-2 to 2). Default: 0 stopstring or array Stop sequences to end generation toolsarray Available tools for function calling (MCP integration) tool_choicestring or object Control tool selection behavior
Embedding Parameters
Parameter Type Description modelstring Embedding model name (required) inputstring or array Text(s) to embed (required) encoding_formatstring Return format: “float” or “base64”. Default: “float” dimensionsinteger Output dimension (supported models only)
Structured Outputs
For GPT-4o models from OpenAI, Spice supports structured outputs using JSON schema:
from pydantic import BaseModel
class QueryResult ( BaseModel ):
sql: str
explanation: str
response = client.chat.completions.create(
model = "gpt-4o" ,
messages = [
{ "role" : "user" , "content" : "Generate SQL to find top customers" }
],
response_format = {
"type" : "json_schema" ,
"json_schema" : {
"name" : "query_result" ,
"schema" : QueryResult.model_json_schema()
}
}
)
Reasoning Effort
For GPT-5, o3, and o4 models, control reasoning depth:
response = client.chat.completions.create(
model = "gpt-5" ,
messages = [
{ "role" : "user" , "content" : "Solve this complex problem..." }
],
reasoning_effort = "high" # low, medium, high
)
Model Gateway
Spice acts as a gateway to multiple providers while presenting a unified OpenAI-compatible interface:
models :
# OpenAI
- from : openai:gpt-4o-mini
name : openai-model
params :
openai_api_key : ${secrets:openai_key}
# Anthropic (via gateway)
- from : anthropic:claude-3-5-sonnet-20241022
name : claude-model
params :
anthropic_api_key : ${secrets:anthropic_key}
# xAI
- from : xai:grok-2-1212
name : grok-model
params :
xai_api_key : ${secrets:xai_key}
All models are accessible through the same /v1/chat/completions endpoint:
# Use any configured model
response = client.chat.completions.create(
model = "claude-model" , # or "grok-model", "openai-model"
messages = [{ "role" : "user" , "content" : "Hello" }]
)
Rate Limiting
Spice includes built-in rate limiting for API providers. Configure usage tiers for OpenAI:
models :
- from : openai:gpt-4o-mini
name : my-model
params :
openai_api_key : ${secrets:key}
openai_usage_tier : tier3 # free, tier1, tier2, tier3, tier4, tier5
Rate Limits by Tier:
Tier Requests/min Concurrent Requests Free 100 1 Tier 1 3,000 35 Tier 2 5,000 60 Tier 3 5,000 60 Tier 4 10,000 125 Tier 5 10,000 125
Error Handling
Spice returns OpenAI-compatible error responses:
{
"error" : {
"message" : "Model 'invalid-model' not found" ,
"type" : "invalid_request_error" ,
"code" : "model_not_found"
}
}
Common error codes:
model_not_found - Specified model doesn’t exist
invalid_request_error - Malformed request
rate_limit_exceeded - Too many requests
authentication_error - Missing or invalid API key (for upstream providers)
Health Checks
Spice performs health checks on model initialization:
# Check if model is ready
curl http://localhost:8090/health
Caching
Spice caches both requests and results for improved performance:
models :
- from : openai:gpt-4o-mini
name : cached-model
params :
openai_api_key : ${secrets:key}
caching :
enabled : true
max_size : 128MiB
ttl : 1h
Next Steps
Model Providers Configure different model providers
MCP Integration Add function calling with MCP
RAG Build RAG applications
Embeddings Generate embeddings at scale