Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/spiceai/spiceai/llms.txt

Use this file to discover all available pages before exploring further.

Embeddings convert text into high-dimensional vectors that capture semantic meaning, enabling semantic search, similarity matching, and retrieval-augmented generation (RAG) workflows.

Overview

Spice provides embeddings through:
  1. OpenAI-Compatible API - POST /v1/embeddings endpoint
  2. Automatic Dataset Embedding - Embed columns during data ingestion
  3. Multiple Providers - OpenAI, Bedrock, HuggingFace, Model2Vec, local models
  4. Hardware Acceleration - CUDA/Metal for local models
  5. Caching - Request and result caching for performance

Configuration

Define embedding models in spicepod.yaml:
version: v1
kind: Spicepod
name: my-app

embeddings:
  - from: openai:text-embedding-3-small
    name: text-embedding
    params:
      openai_api_key: ${secrets:openai_key}

  - from: model2vec:minishlab/potion-base-8M
    name: fast-embed

Embedding API

Single Text

curl -X POST http://localhost:8090/v1/embeddings \
  -H "Content-Type: application/json" \
  -d '{
    "model": "text-embedding",
    "input": "machine learning algorithms"
  }'
Response:
{
  "object": "list",
  "data": [
    {
      "object": "embedding",
      "index": 0,
      "embedding": [0.0234, -0.0145, 0.0678, ...]
    }
  ],
  "model": "text-embedding",
  "usage": {
    "prompt_tokens": 4,
    "total_tokens": 4
  }
}

Batch Embeddings

curl -X POST http://localhost:8090/v1/embeddings \
  -H "Content-Type: application/json" \
  -d '{
    "model": "text-embedding",
    "input": [
      "First document about AI",
      "Second document about ML",
      "Third document about data"
    ]
  }'

Python SDK

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:8090/v1",
    api_key="not-needed"
)

response = client.embeddings.create(
    model="text-embedding",
    input="semantic search with vector databases"
)

embedding = response.data[0].embedding
print(f"Dimensions: {len(embedding)}")

Batch with Python

texts = [
    "Retrieval augmented generation",
    "Vector similarity search",
    "Semantic knowledge graphs"
]

response = client.embeddings.create(
    model="text-embedding",
    input=texts
)

for i, data in enumerate(response.data):
    print(f"Text {i}: {len(data.embedding)} dimensions")

Automatic Dataset Embedding

Spice can automatically embed dataset columns during ingestion:
datasets:
  - from: postgres:documents
    name: documents
    acceleration:
      enabled: true
      engine: duckdb
    columns:
      - name: content
        embeddings:
          - from: text-embedding
            row_id:
              - id
This configuration:
  1. Embeds the content column using the text-embedding model
  2. Uses id as the row identifier
  3. Stores embeddings for vector search

Full-Text Search Integration

Combine embeddings with full-text search:
columns:
  - name: description
    embeddings:
      - from: text-embedding
        row_id:
          - id
    full_text_search:
      enabled: true
      row_ids:
        - id
Now both vector and full-text search are available:
-- Vector search
SELECT * FROM vector_search(documents, 'machine learning', 10);

-- Full-text search
SELECT * FROM text_search(documents, 'neural networks', 10);

-- Hybrid search with RRF
SELECT * FROM (
  SELECT * FROM vector_search(documents, 'AI algorithms', rank_weight => 1.5)
  UNION ALL
  SELECT * FROM text_search(documents, 'deep learning', rank_weight => 1.2)
) ORDER BY _score DESC LIMIT 10;

Embedding Providers

OpenAI

embeddings:
  - from: openai:text-embedding-3-small
    name: openai-small
    params:
      openai_api_key: ${secrets:openai_key}
      openai_usage_tier: tier3
Available Models:
  • text-embedding-3-small - 1536 dimensions, efficient
  • text-embedding-3-large - 3072 dimensions, highest quality
  • text-embedding-ada-002 - Legacy model

AWS Bedrock

Titan Embeddings

embeddings:
  - from: bedrock:amazon.titan-embed-text-v2:0
    name: titan-embed
    params:
      aws_region: us-east-1
      aws_access_key_id: ${secrets:aws_access_key}
      aws_secret_access_key: ${secrets:aws_secret}
      normalize: true
      dimensions: 512
Parameters:
  • normalize - Normalize vectors (default: false)
  • dimensions - Output dimensions: 256, 512, or 1024

Cohere Embeddings

embeddings:
  - from: bedrock:cohere.embed-english-v3
    name: cohere-embed
    params:
      aws_region: us-east-1
      aws_access_key_id: ${secrets:aws_access_key}
      aws_secret_access_key: ${secrets:aws_secret}
      truncate: END
      input_type: SEARCH_DOCUMENT
      embedding_type: FLOAT
Parameters:
  • truncate - NONE, START, END (default: NONE)
  • input_type - SEARCH_DOCUMENT, SEARCH_QUERY, CLASSIFICATION, CLUSTERING
  • embedding_type - FLOAT, INT8

Nova Multimodal Embeddings

embeddings:
  - from: bedrock:amazon.nova-multimodal-embed-v2:0
    name: nova-embed
    params:
      aws_region: us-east-1
      aws_access_key_id: ${secrets:aws_access_key}
      aws_secret_access_key: ${secrets:aws_secret}
      dimensions: 1024
      embedding_purpose: STORAGE
      truncation_mode: NONE
Parameters:
  • dimensions - Output dimensions: 256, 384, 1024
  • embedding_purpose - STORAGE or QUERY
  • truncation_mode - NONE or TRUNCATE

Model2Vec (500x Faster)

Model2Vec provides static embeddings that are significantly faster:
embeddings:
  - from: model2vec:minishlab/potion-base-8M
    name: fast-embed
    params:
      normalize: true
      parallelism: 4
      embed_max_token_length: 512
      embed_batch_size: 1024
Available Models:
  • minishlab/potion-base-8M - 256 dimensions, English
  • minishlab/potion-multilingual-128M - 256 dimensions, multilingual
Performance Comparison:
Model TypeEmbeddings/secUse Case
Transformer (GPU)~1,000High accuracy
Transformer (CPU)~100General use
Model2Vec~50,000High throughput
When to Use Model2Vec:
  • High-volume embedding pipelines
  • Real-time search applications
  • CPU-only deployments
  • Cost-sensitive workloads

HuggingFace

embeddings:
  - from: huggingface:sentence-transformers/all-MiniLM-L6-v2
    name: minilm
    params:
      huggingface_token: ${secrets:hf_token}  # Optional
Models are automatically downloaded from HuggingFace Hub. Popular Models:
  • sentence-transformers/all-MiniLM-L6-v2 - Fast, 384 dimensions
  • sentence-transformers/all-mpnet-base-v2 - High quality, 768 dimensions
  • BAAI/bge-small-en-v1.5 - Efficient, 384 dimensions

Local ONNX Models

embeddings:
  - from: file:models/all-MiniLM-L6-v2/
    name: local-embed
Requires ONNX model files in the specified directory.

Google (Gemini)

embeddings:
  - from: google:text-embedding-004
    name: gemini-embed
    params:
      google_api_key: ${secrets:google_key}

Vector Storage

Spice supports multiple vector storage backends:

Amazon S3 Vectors (Petabyte-Scale)

datasets:
  - from: postgres:documents
    name: documents
    acceleration:
      enabled: true
    columns:
      - name: content
        embeddings:
          - from: text-embedding
            row_id:
              - id
            vector_store: s3_vectors
            vector_store_params:
              bucket: my-vectors
              region: us-east-1
S3 Vectors provides:
  • Petabyte-scale vector storage
  • Managed embedding lifecycle
  • Native SQL integration
  • Hybrid search with RRF

PostgreSQL (pgvector)

columns:
  - name: content
    embeddings:
      - from: text-embedding
        row_id:
          - id
        vector_store: pgvector
        vector_store_params:
          connection_string: ${secrets:postgres_url}

DuckDB Vector

columns:
  - name: content
    embeddings:
      - from: text-embedding
        row_id:
          - id
        vector_store: duckdb_vector

SQLite Vector

columns:
  - name: content
    embeddings:
      - from: text-embedding
        row_id:
          - id
        vector_store: sqlite_vec

Distance Metrics

Supported similarity metrics:
  • Cosine Similarity (default) - Best for normalized vectors
  • Euclidean Distance - L2 distance
  • Dot Product - Inner product similarity
Configure in vector search:
SELECT * 
FROM vector_search(
    documents, 
    'query text', 
    10,
    distance_metric => 'euclidean'
);

Chunking

For large documents, configure chunking:
columns:
  - name: long_content
    embeddings:
      - from: text-embedding
        row_id:
          - id
        chunking:
          enabled: true
          chunk_size: 512
          chunk_overlap: 50
          strategy: recursive  # or 'fixed'
Chunking Strategies:
  • recursive - Split on sentence boundaries
  • fixed - Fixed-size chunks

Caching

Enable embedding caching for performance:
embeddings:
  - from: openai:text-embedding-3-small
    name: cached-embed
    params:
      openai_api_key: ${secrets:key}
    caching:
      enabled: true
      max_size: 256MiB
      ttl: 24h
Cache keys are based on:
  • Model name
  • Input text
  • Model parameters

Batch Processing

Optimize batch embedding performance:
from openai import OpenAI
import time

client = OpenAI(base_url="http://localhost:8090/v1")

# Process in batches
batch_size = 100
texts = [...] # Large list of texts

all_embeddings = []
for i in range(0, len(texts), batch_size):
    batch = texts[i:i+batch_size]
    response = client.embeddings.create(
        model="text-embedding",
        input=batch
    )
    all_embeddings.extend([d.embedding for d in response.data])
    time.sleep(0.1)  # Rate limiting

Performance Optimization

Hardware Acceleration

Local Models with CUDA:
embeddings:
  - from: file:models/all-MiniLM-L6-v2/
    name: gpu-embed
    params:
      device: cuda  # Automatic GPU detection
Apple Silicon with Metal:
embeddings:
  - from: file:models/all-MiniLM-L6-v2/
    name: metal-embed
    params:
      device: metal  # Automatic M1/M2/M3 optimization

Parallelism

Configure parallel embedding execution:
embeddings:
  - from: model2vec:minishlab/potion-base-8M
    name: parallel-embed
    params:
      parallelism: 8  # Number of threads

Rate Limiting

For API providers:
embeddings:
  - from: openai:text-embedding-3-small
    name: rate-limited
    params:
      openai_api_key: ${secrets:key}
      openai_usage_tier: tier3

Embedding Dimensions

Different models produce different dimensions:
ProviderModelDimensions
OpenAItext-embedding-3-small1536
OpenAItext-embedding-3-large3072
Bedrocktitan-embed-text-v2256-1024
Bedrockcohere-embed-v31024
Model2Vecpotion-base-8M256
HuggingFaceall-MiniLM-L6-v2384

Health Checks

Spice performs health checks on embedding models:
2026-03-03T10:15:30Z INFO Embedding model 'text-embedding' health check passed
Health check embeds a test string to verify:
  • Model availability
  • Credential validity
  • Network connectivity

Error Handling

Common errors and solutions:

Model Not Found

# Ensure model name matches
embeddings:
  - from: openai:text-embedding-3-small
    name: text-embedding  # Use this name in API calls

Rate Limiting

from openai import RateLimitError

try:
    response = client.embeddings.create(...)
except RateLimitError:
    # Implement backoff
    time.sleep(60)
    response = client.embeddings.create(...)

Dimension Mismatch

# Ensure consistent dimensions across pipeline
embeddings:
  - from: openai:text-embedding-3-small  # 1536 dims
    name: embed

columns:
  - name: content
    embeddings:
      - from: embed  # Must match dimensions

Best Practices

  1. Choose Appropriate Models
    • OpenAI for accuracy
    • Model2Vec for throughput
    • Local models for privacy
  2. Enable Caching
    • Reduces API costs
    • Improves response time
    • Especially important for RAG
  3. Batch Processing
    • Process multiple texts together
    • Respect rate limits
    • Use appropriate batch sizes
  4. Monitor Performance
    • Track embedding latency
    • Monitor cache hit rates
    • Adjust parallelism
  5. Vector Storage Selection
    • S3 Vectors for petabyte scale
    • pgvector for PostgreSQL integration
    • DuckDB for analytical workloads

Next Steps

RAG

Build retrieval-augmented generation workflows

Model Providers

Configure embedding providers

Vector Search

Query embeddings with vector_search

OpenAI API

Use embeddings with OpenAI SDK