Documentation Index
Fetch the complete documentation index at: https://mintlify.com/spiceai/spiceai/llms.txt
Use this file to discover all available pages before exploring further.
Spice provides an OpenAI-compatible Chat Completions API at /v1/chat/completions, allowing you to use the OpenAI SDK and libraries to interact with your configured language models.
Endpoint
POST /v1/chat/completions
Authentication
Include your Spice API key in the request headers:
Authorization: Bearer <your-api-key>
Request Parameters
| Parameter | Type | Required | Description |
|---|
model | string | Yes | The name of the language model to use (e.g., gpt-4o, gpt-4o-mini) |
messages | array | Yes | Array of message objects with role and content |
stream | boolean | No | Whether to stream the response (default: false) |
temperature | number | No | Sampling temperature between 0 and 2 |
max_tokens | integer | No | Maximum number of tokens to generate |
top_p | number | No | Nucleus sampling parameter |
frequency_penalty | number | No | Penalty for token frequency |
presence_penalty | number | No | Penalty for token presence |
Message Roles
system - System instructions for the model
user - User messages
assistant - Assistant responses
developer - Developer-level instructions (supported by some models)
Non-Streaming Response
{
"id": "chatcmpl-123",
"object": "chat.completion",
"created": 1677652288,
"model": "gpt-4o-mini",
"system_fingerprint": "fp_44709d6fcb",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! How can I assist you today?"
},
"logprobs": null,
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 9,
"completion_tokens": 12,
"total_tokens": 21,
"completion_tokens_details": {
"reasoning_tokens": 0,
"accepted_prediction_tokens": 0,
"rejected_prediction_tokens": 0
}
}
}
Streaming Response
When stream: true, the response is sent as Server-Sent Events (SSE):
data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1677652288,"model":"gpt-4o","choices":[{"index":0,"delta":{"role":"assistant","content":"Hello"},"finish_reason":null}]}
data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1677652288,"model":"gpt-4o","choices":[{"index":0,"delta":{"content":"!"},"finish_reason":null}]}
data: [DONE]
Advanced: Completion Progress Tracking
Spice supports tracking completion progress through an optional header:
x-spiceai-completion-progress: enabled
When enabled with streaming, this includes intermediate progress events in the SSE stream alongside the completion chunks.
Examples
cURL
curl -X POST http://localhost:8090/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer <your-api-key>" \
-d '{
"model": "gpt-4o",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "Explain quantum computing in simple terms."
}
],
"stream": false,
"temperature": 0.7,
"max_tokens": 500
}'
OpenAI Python SDK
from openai import OpenAI
# Point the OpenAI client to your Spice instance
client = OpenAI(
base_url="http://localhost:8090/v1",
api_key="<your-api-key>"
)
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is the capital of France?"}
],
temperature=0.7,
max_tokens=100
)
print(response.choices[0].message.content)
Streaming Example (Python)
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:8090/v1",
api_key="<your-api-key>"
)
stream = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "user", "content": "Write a haiku about data."}
],
stream=True
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="")
OpenAI Node.js SDK
import OpenAI from 'openai';
const client = new OpenAI({
baseURL: 'http://localhost:8090/v1',
apiKey: '<your-api-key>'
});
const completion = await client.chat.completions.create({
model: 'gpt-4o',
messages: [
{ role: 'system', content: 'You are a helpful assistant.' },
{ role: 'user', content: 'Tell me a joke about databases.' }
],
temperature: 0.8,
max_tokens: 150
});
console.log(completion.choices[0].message.content);
Streaming Example (Node.js)
import OpenAI from 'openai';
const client = new OpenAI({
baseURL: 'http://localhost:8090/v1',
apiKey: '<your-api-key>'
});
const stream = await client.chat.completions.create({
model: 'gpt-4o',
messages: [{ role: 'user', content: 'Count from 1 to 5.' }],
stream: true
});
for await (const chunk of stream) {
process.stdout.write(chunk.choices[0]?.delta?.content || '');
}
Error Responses
Model Not Found (404)
{
"error": "model 'gpt-5' not found"
}
API Error (4xx/5xx)
{
"message": "Invalid API key provided",
"type": "invalid_request_error",
"param": null,
"code": "invalid_api_key"
}
Status codes follow OpenAI conventions:
400 - Invalid request parameters
401 - Invalid API key
402 - Insufficient quota
404 - Model not found
429 - Rate limit exceeded
500 - Internal server error
Supported Models
The available models depend on your Spice configuration. Common models include:
- OpenAI models:
gpt-4o, gpt-4o-mini, gpt-4-turbo, gpt-3.5-turbo
- Anthropic models:
claude-3-5-sonnet, claude-3-opus, claude-3-haiku
- Open-source models: Configure custom models in your Spicepod
List available models using the Models API.