OpenAI-Compatible API

Spice provides OpenAI-compatible HTTP APIs that work seamlessly with the OpenAI SDK and other compatible clients. This allows you to use familiar tools and libraries while leveraging Spice’s data-grounded AI capabilities.

Available Endpoints

Spice exposes two primary OpenAI-compatible endpoints:

POST /v1/chat/completions - Chat completion with streaming support
POST /v1/embeddings - Generate text embeddings

These endpoints accept the same request format and return the same response structure as OpenAI’s API.

Chat Completions

Basic Usage

curl -X POST http://localhost:8090/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o-mini",
    "messages": [
      {"role": "user", "content": "What is Spice.ai?"}
    ]
  }'

Streaming Responses

Enable streaming for real-time token generation:

curl -X POST http://localhost:8090/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o-mini",
    "messages": [
      {"role": "user", "content": "Count to 10"}
    ],
    "stream": true
  }'

With OpenAI Python SDK

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:8090/v1",
    api_key="not-needed"  # Spice handles authentication
)

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Explain vector databases"}
    ],
    temperature=0.7,
    max_tokens=150
)

print(response.choices[0].message.content)

Streaming with Python SDK

stream = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "Write a haiku about data"}],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

JavaScript/TypeScript SDK

import OpenAI from 'openai';

const client = new OpenAI({
  baseURL: 'http://localhost:8090/v1',
  apiKey: 'not-needed'
});

const completion = await client.chat.completions.create({
  model: 'gpt-4o-mini',
  messages: [
    { role: 'user', content: 'What is RAG?' }
  ]
});

console.log(completion.choices[0].message.content);

Embeddings

Basic Usage

curl -X POST http://localhost:8090/v1/embeddings \
  -H "Content-Type: application/json" \
  -d '{
    "model": "text-embedding-3-small",
    "input": "machine learning algorithms"
  }'

With Python SDK

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:8090/v1",
    api_key="not-needed"
)

response = client.embeddings.create(
    model="text-embedding-3-small",
    input="semantic search with vector databases"
)

embedding = response.data[0].embedding
print(f"Embedding dimension: {len(embedding)}")
print(f"First 5 values: {embedding[:5]}")

Batch Embeddings

response = client.embeddings.create(
    model="text-embedding-3-small",
    input=[
        "First document about AI",
        "Second document about ML",
        "Third document about data science"
    ]
)

for i, data in enumerate(response.data):
    print(f"Document {i}: {len(data.embedding)} dimensions")

JavaScript/TypeScript SDK

const embedding = await client.embeddings.create({
  model: 'text-embedding-3-small',
  input: 'retrieval augmented generation'
});

console.log(embedding.data[0].embedding);

Request Parameters

Chat Completion Parameters

Parameter	Type	Description
`model`	string	Model name as defined in spicepod.yaml (required)
`messages`	array	List of message objects with `role` and `content` (required)
`temperature`	number	Sampling temperature (0-2). Higher values = more random. Default: 1
`max_tokens`	integer	Maximum tokens to generate. Default varies by model
`max_completion_tokens`	integer	Maximum completion tokens (OpenAI models only)
`stream`	boolean	Enable streaming responses. Default: false
`top_p`	number	Nucleus sampling parameter (0-1). Default: 1
`frequency_penalty`	number	Penalize repeated tokens (-2 to 2). Default: 0
`presence_penalty`	number	Penalize new tokens (-2 to 2). Default: 0
`stop`	string or array	Stop sequences to end generation
`tools`	array	Available tools for function calling (MCP integration)
`tool_choice`	string or object	Control tool selection behavior

Embedding Parameters

Parameter	Type	Description
`model`	string	Embedding model name (required)
`input`	string or array	Text(s) to embed (required)
`encoding_format`	string	Return format: “float” or “base64”. Default: “float”
`dimensions`	integer	Output dimension (supported models only)

Structured Outputs

For GPT-4o models from OpenAI, Spice supports structured outputs using JSON schema:

from pydantic import BaseModel

class QueryResult(BaseModel):
    sql: str
    explanation: str

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "user", "content": "Generate SQL to find top customers"}
    ],
    response_format={
        "type": "json_schema",
        "json_schema": {
            "name": "query_result",
            "schema": QueryResult.model_json_schema()
        }
    }
)

Reasoning Effort

For GPT-5, o3, and o4 models, control reasoning depth:

response = client.chat.completions.create(
    model="gpt-5",
    messages=[
        {"role": "user", "content": "Solve this complex problem..."}
    ],
    reasoning_effort="high"  # low, medium, high
)

Model Gateway

Spice acts as a gateway to multiple providers while presenting a unified OpenAI-compatible interface:

models:
  # OpenAI
  - from: openai:gpt-4o-mini
    name: openai-model
    params:
      openai_api_key: ${secrets:openai_key}

  # Anthropic (via gateway)
  - from: anthropic:claude-3-5-sonnet-20241022
    name: claude-model
    params:
      anthropic_api_key: ${secrets:anthropic_key}

  # xAI
  - from: xai:grok-2-1212
    name: grok-model
    params:
      xai_api_key: ${secrets:xai_key}

All models are accessible through the same /v1/chat/completions endpoint:

# Use any configured model
response = client.chat.completions.create(
    model="claude-model",  # or "grok-model", "openai-model"
    messages=[{"role": "user", "content": "Hello"}]
)

Rate Limiting

Spice includes built-in rate limiting for API providers. Configure usage tiers for OpenAI:

models:
  - from: openai:gpt-4o-mini
    name: my-model
    params:
      openai_api_key: ${secrets:key}
      openai_usage_tier: tier3  # free, tier1, tier2, tier3, tier4, tier5

Rate Limits by Tier:

Tier	Requests/min	Concurrent Requests
Free	100	1
Tier 1	3,000	35
Tier 2	5,000	60
Tier 3	5,000	60
Tier 4	10,000	125
Tier 5	10,000	125

Error Handling

Spice returns OpenAI-compatible error responses:

{
  "error": {
    "message": "Model 'invalid-model' not found",
    "type": "invalid_request_error",
    "code": "model_not_found"
  }
}

Common error codes:

model_not_found - Specified model doesn’t exist
invalid_request_error - Malformed request
rate_limit_exceeded - Too many requests
authentication_error - Missing or invalid API key (for upstream providers)

Health Checks

Spice performs health checks on model initialization:

# Check if model is ready
curl http://localhost:8090/health

Caching

Spice caches both requests and results for improved performance:

models:
  - from: openai:gpt-4o-mini
    name: cached-model
    params:
      openai_api_key: ${secrets:key}
    caching:
      enabled: true
      max_size: 128MiB
      ttl: 1h

Next Steps

Model Providers

Configure different model providers

MCP Integration

Add function calling with MCP

RAG

Build RAG applications

Embeddings

Generate embeddings at scale

Get Started

Core Concepts

Data Connectors

Data Accelerators

Search

AI & ML

Deployment

OpenAI-Compatible API

Available Endpoints

Chat Completions

Basic Usage

Streaming Responses

With OpenAI Python SDK

Streaming with Python SDK

JavaScript/TypeScript SDK

Embeddings

Basic Usage

With Python SDK

Batch Embeddings

JavaScript/TypeScript SDK

Request Parameters

Chat Completion Parameters

Embedding Parameters

Structured Outputs

Reasoning Effort

Model Gateway

Rate Limiting

Error Handling

Health Checks

Caching

Next Steps

Model Providers

MCP Integration

RAG

Embeddings

Get Started

Core Concepts

Data Connectors

Data Accelerators

Search

AI & ML

Deployment

Documentation Index

​Available Endpoints

​Chat Completions

​Basic Usage

​Streaming Responses

​With OpenAI Python SDK

​Streaming with Python SDK

​JavaScript/TypeScript SDK

​Embeddings

​Basic Usage

​With Python SDK

​Batch Embeddings

​JavaScript/TypeScript SDK

​Request Parameters

​Chat Completion Parameters

​Embedding Parameters

​Structured Outputs

​Reasoning Effort

​Model Gateway

​Rate Limiting

​Error Handling

​Health Checks

​Caching

​Next Steps

Model Providers

MCP Integration

RAG

Embeddings

Available Endpoints

Chat Completions

Basic Usage

Streaming Responses

With OpenAI Python SDK

Streaming with Python SDK

JavaScript/TypeScript SDK

Embeddings

Basic Usage

With Python SDK

Batch Embeddings

JavaScript/TypeScript SDK

Request Parameters

Chat Completion Parameters

Embedding Parameters

Structured Outputs

Reasoning Effort

Model Gateway

Rate Limiting

Error Handling

Health Checks

Caching

Next Steps