Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/spiceai/spiceai/llms.txt

Use this file to discover all available pages before exploring further.

Spice provides OpenAI-compatible HTTP APIs that work seamlessly with the OpenAI SDK and other compatible clients. This allows you to use familiar tools and libraries while leveraging Spice’s data-grounded AI capabilities.

Available Endpoints

Spice exposes two primary OpenAI-compatible endpoints:
  • POST /v1/chat/completions - Chat completion with streaming support
  • POST /v1/embeddings - Generate text embeddings
These endpoints accept the same request format and return the same response structure as OpenAI’s API.

Chat Completions

Basic Usage

curl -X POST http://localhost:8090/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o-mini",
    "messages": [
      {"role": "user", "content": "What is Spice.ai?"}
    ]
  }'

Streaming Responses

Enable streaming for real-time token generation:
curl -X POST http://localhost:8090/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o-mini",
    "messages": [
      {"role": "user", "content": "Count to 10"}
    ],
    "stream": true
  }'

With OpenAI Python SDK

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:8090/v1",
    api_key="not-needed"  # Spice handles authentication
)

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Explain vector databases"}
    ],
    temperature=0.7,
    max_tokens=150
)

print(response.choices[0].message.content)

Streaming with Python SDK

stream = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "Write a haiku about data"}],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

JavaScript/TypeScript SDK

import OpenAI from 'openai';

const client = new OpenAI({
  baseURL: 'http://localhost:8090/v1',
  apiKey: 'not-needed'
});

const completion = await client.chat.completions.create({
  model: 'gpt-4o-mini',
  messages: [
    { role: 'user', content: 'What is RAG?' }
  ]
});

console.log(completion.choices[0].message.content);

Embeddings

Basic Usage

curl -X POST http://localhost:8090/v1/embeddings \
  -H "Content-Type: application/json" \
  -d '{
    "model": "text-embedding-3-small",
    "input": "machine learning algorithms"
  }'

With Python SDK

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:8090/v1",
    api_key="not-needed"
)

response = client.embeddings.create(
    model="text-embedding-3-small",
    input="semantic search with vector databases"
)

embedding = response.data[0].embedding
print(f"Embedding dimension: {len(embedding)}")
print(f"First 5 values: {embedding[:5]}")

Batch Embeddings

response = client.embeddings.create(
    model="text-embedding-3-small",
    input=[
        "First document about AI",
        "Second document about ML",
        "Third document about data science"
    ]
)

for i, data in enumerate(response.data):
    print(f"Document {i}: {len(data.embedding)} dimensions")

JavaScript/TypeScript SDK

const embedding = await client.embeddings.create({
  model: 'text-embedding-3-small',
  input: 'retrieval augmented generation'
});

console.log(embedding.data[0].embedding);

Request Parameters

Chat Completion Parameters

ParameterTypeDescription
modelstringModel name as defined in spicepod.yaml (required)
messagesarrayList of message objects with role and content (required)
temperaturenumberSampling temperature (0-2). Higher values = more random. Default: 1
max_tokensintegerMaximum tokens to generate. Default varies by model
max_completion_tokensintegerMaximum completion tokens (OpenAI models only)
streambooleanEnable streaming responses. Default: false
top_pnumberNucleus sampling parameter (0-1). Default: 1
frequency_penaltynumberPenalize repeated tokens (-2 to 2). Default: 0
presence_penaltynumberPenalize new tokens (-2 to 2). Default: 0
stopstring or arrayStop sequences to end generation
toolsarrayAvailable tools for function calling (MCP integration)
tool_choicestring or objectControl tool selection behavior

Embedding Parameters

ParameterTypeDescription
modelstringEmbedding model name (required)
inputstring or arrayText(s) to embed (required)
encoding_formatstringReturn format: “float” or “base64”. Default: “float”
dimensionsintegerOutput dimension (supported models only)

Structured Outputs

For GPT-4o models from OpenAI, Spice supports structured outputs using JSON schema:
from pydantic import BaseModel

class QueryResult(BaseModel):
    sql: str
    explanation: str

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "user", "content": "Generate SQL to find top customers"}
    ],
    response_format={
        "type": "json_schema",
        "json_schema": {
            "name": "query_result",
            "schema": QueryResult.model_json_schema()
        }
    }
)

Reasoning Effort

For GPT-5, o3, and o4 models, control reasoning depth:
response = client.chat.completions.create(
    model="gpt-5",
    messages=[
        {"role": "user", "content": "Solve this complex problem..."}
    ],
    reasoning_effort="high"  # low, medium, high
)

Model Gateway

Spice acts as a gateway to multiple providers while presenting a unified OpenAI-compatible interface:
models:
  # OpenAI
  - from: openai:gpt-4o-mini
    name: openai-model
    params:
      openai_api_key: ${secrets:openai_key}

  # Anthropic (via gateway)
  - from: anthropic:claude-3-5-sonnet-20241022
    name: claude-model
    params:
      anthropic_api_key: ${secrets:anthropic_key}

  # xAI
  - from: xai:grok-2-1212
    name: grok-model
    params:
      xai_api_key: ${secrets:xai_key}
All models are accessible through the same /v1/chat/completions endpoint:
# Use any configured model
response = client.chat.completions.create(
    model="claude-model",  # or "grok-model", "openai-model"
    messages=[{"role": "user", "content": "Hello"}]
)

Rate Limiting

Spice includes built-in rate limiting for API providers. Configure usage tiers for OpenAI:
models:
  - from: openai:gpt-4o-mini
    name: my-model
    params:
      openai_api_key: ${secrets:key}
      openai_usage_tier: tier3  # free, tier1, tier2, tier3, tier4, tier5
Rate Limits by Tier:
TierRequests/minConcurrent Requests
Free1001
Tier 13,00035
Tier 25,00060
Tier 35,00060
Tier 410,000125
Tier 510,000125

Error Handling

Spice returns OpenAI-compatible error responses:
{
  "error": {
    "message": "Model 'invalid-model' not found",
    "type": "invalid_request_error",
    "code": "model_not_found"
  }
}
Common error codes:
  • model_not_found - Specified model doesn’t exist
  • invalid_request_error - Malformed request
  • rate_limit_exceeded - Too many requests
  • authentication_error - Missing or invalid API key (for upstream providers)

Health Checks

Spice performs health checks on model initialization:
# Check if model is ready
curl http://localhost:8090/health

Caching

Spice caches both requests and results for improved performance:
models:
  - from: openai:gpt-4o-mini
    name: cached-model
    params:
      openai_api_key: ${secrets:key}
    caching:
      enabled: true
      max_size: 128MiB
      ttl: 1h

Next Steps

Model Providers

Configure different model providers

MCP Integration

Add function calling with MCP

RAG

Build RAG applications

Embeddings

Generate embeddings at scale