Model Providers - Spice.ai

Spice supports multiple model providers for both LLM inference and embeddings, including hosted APIs and local model serving with hardware acceleration.

Supported Providers

LLM Providers

OpenAI - GPT-4, GPT-4o, GPT-3.5-turbo models
Anthropic - Claude 3 Opus, Sonnet, Haiku models
xAI - Grok models
AWS Bedrock - Amazon Nova models
Azure OpenAI - Azure-hosted OpenAI models
Databricks - Models via Databricks serving endpoints
Google - Gemini models
Perplexity - Perplexity API models
File - Local GGUF/GGML/SafeTensor models
HuggingFace - Models from HuggingFace Hub
Spice.ai - Models from Spice.ai Cloud Platform

Embedding Providers

OpenAI - text-embedding-3-small, text-embedding-3-large
AWS Bedrock - Amazon Titan, Cohere, Nova embeddings
Azure OpenAI - Azure-hosted embedding models
Google - Gemini embedding models
File - Local ONNX models
HuggingFace - ONNX-compatible models from HuggingFace
Model2Vec - Static embeddings (500x faster)

Configuration Format

Models are configured in spicepod.yaml:

version: v1
kind: Spicepod
name: my-app

models:
  - from: <provider>:<model_id>
    name: <local_name>
    params:
      <provider_params>

embeddings:
  - from: <provider>:<model_id>
    name: <local_name>
    params:
      <provider_params>

OpenAI

Configuration

models:
  - from: openai:gpt-4o-mini
    name: chat-model
    params:
      openai_api_key: ${secrets:openai_key}
      openai_org_id: org-xxx  # Optional
      openai_project_id: proj-xxx  # Optional
      openai_usage_tier: tier3  # Optional: free, tier1-5

embeddings:
  - from: openai:text-embedding-3-small
    name: text-embedding
    params:
      openai_api_key: ${secrets:openai_key}
      openai_usage_tier: tier3

Available Models

Chat Models:

gpt-4o - Latest GPT-4 Optimized
gpt-4o-mini - Efficient GPT-4 variant
gpt-4-turbo - GPT-4 Turbo
gpt-3.5-turbo - Fast, cost-effective

Embedding Models:

text-embedding-3-small - 1536 dimensions
text-embedding-3-large - 3072 dimensions
text-embedding-ada-002 - Legacy model

Parameters

Parameter	Description	Required
`openai_api_key`	OpenAI API key	Yes
`openai_api_base`	Custom API endpoint	No
`openai_org_id`	Organization ID	No
`openai_project_id`	Project ID	No
`openai_usage_tier`	Rate limit tier (free, tier1-5)	No

Anthropic

Configuration

models:
  - from: anthropic:claude-3-5-sonnet-20241022
    name: claude
    params:
      anthropic_api_key: ${secrets:anthropic_key}

Available Models

claude-3-5-sonnet-20241022 - Most capable Claude model
claude-3-5-haiku-20241022 - Fast and efficient
claude-3-opus-20240229 - Highest capability
claude-3-sonnet-20240229 - Balanced performance
claude-3-haiku-20240307 - Fastest responses

Parameters

Parameter	Description	Required
`anthropic_api_key`	Anthropic API key	Yes
`anthropic_api_base`	Custom API endpoint	No

xAI

Configuration

models:
  - from: xai:grok-2-1212
    name: grok
    params:
      xai_api_key: ${secrets:xai_key}

Available Models

grok-2-1212 - Latest Grok model
grok-vision-beta - Vision capabilities

AWS Bedrock

Configuration

models:
  - from: bedrock:amazon.nova-pro-v1:0
    name: nova-pro
    params:
      aws_region: us-east-1
      aws_access_key_id: ${secrets:aws_access_key}
      aws_secret_access_key: ${secrets:aws_secret}

embeddings:
  - from: bedrock:amazon.titan-embed-text-v2:0
    name: titan-embed
    params:
      aws_region: us-east-1
      aws_access_key_id: ${secrets:aws_access_key}
      aws_secret_access_key: ${secrets:aws_secret}
      normalize: true
      dimensions: 512

Available Models

Chat Models:

amazon.nova-pro-v1:0 - Amazon Nova Pro
amazon.nova-lite-v1:0 - Amazon Nova Lite
amazon.nova-micro-v1:0 - Amazon Nova Micro
anthropic.claude-3-5-sonnet-20241022-v2:0 - Claude via Bedrock

Embedding Models:

amazon.titan-embed-text-v2:0 - Titan Text Embeddings v2
cohere.embed-english-v3 - Cohere embeddings
cohere.embed-multilingual-v3 - Multilingual embeddings
amazon.nova-multimodal-embed-v2:0 - Nova multimodal embeddings

Parameters

Common:

Parameter	Description	Required
`aws_region`	AWS region	Yes
`aws_access_key_id`	AWS access key	Yes
`aws_secret_access_key`	AWS secret key	Yes

Titan Embeddings:

Parameter	Description	Default
`normalize`	Normalize embeddings	false
`dimensions`	Output dimensions	512

Cohere Embeddings:

Parameter	Description	Default
`truncate`	Truncation mode	NONE
`input_type`	Input type	SEARCH_DOCUMENT
`embedding_type`	Type (float, int8)	FLOAT

Nova Embeddings:

Parameter	Description	Default
`dimensions`	Output dimensions	1024
`embedding_purpose`	Purpose (query, storage)	STORAGE
`truncation_mode`	Truncation mode	NONE

Azure OpenAI

Configuration

models:
  - from: azure:gpt-4o-mini
    name: azure-chat
    params:
      azure_api_key: ${secrets:azure_key}
      azure_api_base: https://your-resource.openai.azure.com
      azure_api_version: 2024-02-15-preview
      azure_deployment_name: my-gpt4-deployment

embeddings:
  - from: azure:text-embedding-3-small
    name: azure-embed
    params:
      azure_api_key: ${secrets:azure_key}
      azure_api_base: https://your-resource.openai.azure.com
      azure_api_version: 2024-02-15-preview
      azure_deployment_name: my-embedding-deployment

Parameters

Parameter	Description	Required
`azure_api_key`	Azure OpenAI API key	Yes
`azure_api_base`	Azure endpoint URL	Yes
`azure_api_version`	API version	Yes
`azure_deployment_name`	Deployment name	Yes
`azure_entra_token`	Azure AD token (alternative auth)	No

Local Models (File)

Configuration

models:
  - from: file:models/Llama-3.2-1B-Instruct-Q4_K_M.gguf
    name: local-llm

embeddings:
  - from: file:models/all-MiniLM-L6-v2/
    name: local-embed

Supported Formats

LLM Formats:

GGUF - Quantized llama.cpp format (recommended)
GGML - Legacy llama.cpp format
SafeTensor - Hugging Face SafeTensor format

Embedding Formats:

ONNX - Optimized neural network exchange format

Hardware Acceleration

Spice automatically detects and utilizes available hardware:

NVIDIA GPUs - CUDA acceleration for GGUF/GGML models
Apple Silicon - Metal acceleration on M1/M2/M3 chips
CPU - Optimized CPU inference with SIMD

Example: Local Llama Model

models:
  - from: file:./models/Llama-3.2-3B-Instruct-Q4_K_M.gguf
    name: llama-local

# Download a GGUF model
wget https://huggingface.co/bartowski/Llama-3.2-3B-Instruct-GGUF/resolve/main/Llama-3.2-3B-Instruct-Q4_K_M.gguf \
  -P models/

# Start Spice
spice run

HuggingFace

Configuration

models:
  - from: huggingface:Qwen/Qwen2.5-0.5B-Instruct
    name: qwen
    params:
      huggingface_token: ${secrets:hf_token}  # Optional

embeddings:
  - from: huggingface:sentence-transformers/all-MiniLM-L6-v2
    name: minilm
    params:
      huggingface_token: ${secrets:hf_token}  # Optional

Models are automatically downloaded from HuggingFace Hub on first use.

Parameters

Parameter	Description	Required
`huggingface_token`	HF API token	No (for public models)

Model2Vec

Model2Vec provides static embeddings that are 500x faster than transformer models:

Configuration

embeddings:
  - from: model2vec:minishlab/potion-base-8M
    name: fast-embed
    params:
      huggingface_token: ${secrets:hf_token}  # Optional
      normalize: true
      parallelism: 4
      embed_max_token_length: 512
      embed_batch_size: 1024

Available Models

minishlab/potion-base-8M - 256 dimensions
minishlab/potion-multilingual-128M - Multilingual support

Parameters

Parameter	Description	Default
`normalize`	Normalize embeddings	true
`parallelism`	Number of threads	CPU cores
`embed_max_token_length`	Max token length	512
`embed_batch_size`	Batch size	1024

Performance

Model2Vec is ideal for:

High-throughput embedding pipelines
Real-time search applications
Resource-constrained environments
CPU-only deployments

Google (Gemini)

Configuration

models:
  - from: google:gemini-1.5-pro
    name: gemini
    params:
      google_api_key: ${secrets:google_key}

embeddings:
  - from: google:text-embedding-004
    name: gemini-embed
    params:
      google_api_key: ${secrets:google_key}

Databricks

Configuration

models:
  - from: databricks:databricks-meta-llama-3-1-70b-instruct
    name: llama-databricks
    params:
      databricks_host: https://your-workspace.databricks.com
      databricks_token: ${secrets:databricks_token}

Rate Limiting

Configure rate limits to avoid throttling:

models:
  - from: openai:gpt-4o-mini
    name: rate-limited
    params:
      openai_api_key: ${secrets:key}
      openai_usage_tier: tier3  # Automatic rate limiting

Built-in rate controllers manage:

Requests per minute
Concurrent requests
Exponential backoff on errors

Caching

Enable response caching for improved performance:

models:
  - from: openai:gpt-4o-mini
    name: cached-model
    params:
      openai_api_key: ${secrets:key}
    caching:
      enabled: true
      max_size: 128MiB
      ttl: 1h

Health Checks

Spice performs automatic health checks on model initialization:

Tests model connectivity
Validates credentials
Ensures model availability

Health check logs:

2026-03-03T10:15:30Z INFO Model 'gpt-4o-mini' health check passed
2026-03-03T10:15:31Z ERROR Model 'invalid-model' health check failed: model not found

Model Discovery

Spice can list available models from providers:

# OpenAI models
curl http://localhost:8090/v1/models

Response:

{
  "object": "list",
  "data": [
    {
      "id": "gpt-4o-mini",
      "object": "model",
      "owned_by": "openai"
    }
  ]
}

Best Practices

Credential Management

Use Spice’s secret management:

models:
  - from: openai:gpt-4o-mini
    name: secure-model
    params:
      openai_api_key: ${secrets:SPICE_SECRET_OPENAI_API_KEY}

Provider Selection

OpenAI - Best for general-purpose tasks, structured outputs
Anthropic - Longer context windows, strong reasoning
Local Models - Privacy, offline operation, cost control
Model2Vec - High-throughput embeddings, CPU efficiency
AWS Bedrock - Enterprise compliance, AWS integration

Performance Optimization

Use appropriate model sizes - Smaller models for simple tasks
Enable caching - Reduce redundant API calls
Configure rate limits - Avoid throttling
Local models for high volume - CUDA/Metal acceleration
Model2Vec for embeddings - 500x faster than transformers

Troubleshooting

Model Not Found

# Ensure model name matches spicepod.yaml
models:
  - from: openai:gpt-4o-mini
    name: chat-model  # Use this name in API calls

Authentication Errors

# Check secret configuration
spice secrets list

# Set secret
spice secrets set SPICE_SECRET_OPENAI_API_KEY your-key

Rate Limiting

# Configure usage tier
params:
  openai_usage_tier: tier3  # Adjust based on your OpenAI tier

Next Steps

OpenAI Compatibility

Use models with OpenAI SDK

Embeddings

Generate embeddings at scale

RAG

Build RAG workflows

MCP Integration

Add function calling

Get Started

Core Concepts

Data Connectors

Data Accelerators

Search

AI & ML

Deployment

Documentation Index

​Supported Providers

​LLM Providers

​Embedding Providers

​Configuration Format

​OpenAI

​Configuration

​Available Models

​Parameters

​Anthropic

​Configuration

​Available Models

​Parameters

​xAI

​Configuration

​Available Models

​AWS Bedrock

​Configuration

​Available Models

​Parameters

​Azure OpenAI

​Configuration

​Parameters

​Local Models (File)

​Configuration

​Supported Formats

​Hardware Acceleration

​Example: Local Llama Model

​HuggingFace

​Configuration

​Parameters

​Model2Vec

​Configuration

​Available Models

​Parameters

​Performance

​Google (Gemini)

​Configuration

​Databricks

​Configuration

​Rate Limiting

​Caching

​Health Checks

​Model Discovery

​Best Practices

​Credential Management

​Provider Selection

​Performance Optimization

​Troubleshooting

​Model Not Found

​Authentication Errors

​Rate Limiting

​Next Steps

OpenAI Compatibility

Embeddings

RAG

MCP Integration

Supported Providers

LLM Providers

Embedding Providers

Configuration Format

OpenAI

Configuration

Available Models

Parameters

Anthropic

Configuration

Available Models

Parameters

xAI

Configuration

Available Models

AWS Bedrock

Configuration

Available Models

Parameters

Azure OpenAI

Configuration

Parameters

Local Models (File)

Configuration

Supported Formats

Hardware Acceleration

Example: Local Llama Model

HuggingFace

Configuration

Parameters

Model2Vec

Configuration

Available Models

Parameters

Performance

Google (Gemini)

Configuration

Databricks

Configuration

Rate Limiting

Caching

Health Checks

Model Discovery

Best Practices

Credential Management

Provider Selection

Performance Optimization

Troubleshooting

Model Not Found

Authentication Errors

Rate Limiting

Next Steps