Embeddings - Spice.ai

Embeddings convert text into high-dimensional vectors that capture semantic meaning, enabling semantic search, similarity matching, and retrieval-augmented generation (RAG) workflows.

Overview

Spice provides embeddings through:

OpenAI-Compatible API - POST /v1/embeddings endpoint
Automatic Dataset Embedding - Embed columns during data ingestion
Multiple Providers - OpenAI, Bedrock, HuggingFace, Model2Vec, local models
Hardware Acceleration - CUDA/Metal for local models
Caching - Request and result caching for performance

Configuration

Define embedding models in spicepod.yaml:

version: v1
kind: Spicepod
name: my-app

embeddings:
  - from: openai:text-embedding-3-small
    name: text-embedding
    params:
      openai_api_key: ${secrets:openai_key}

  - from: model2vec:minishlab/potion-base-8M
    name: fast-embed

Embedding API

Single Text

curl -X POST http://localhost:8090/v1/embeddings \
  -H "Content-Type: application/json" \
  -d '{
    "model": "text-embedding",
    "input": "machine learning algorithms"
  }'

Response:

{
  "object": "list",
  "data": [
    {
      "object": "embedding",
      "index": 0,
      "embedding": [0.0234, -0.0145, 0.0678, ...]
    }
  ],
  "model": "text-embedding",
  "usage": {
    "prompt_tokens": 4,
    "total_tokens": 4
  }
}

Batch Embeddings

curl -X POST http://localhost:8090/v1/embeddings \
  -H "Content-Type: application/json" \
  -d '{
    "model": "text-embedding",
    "input": [
      "First document about AI",
      "Second document about ML",
      "Third document about data"
    ]
  }'

Python SDK

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:8090/v1",
    api_key="not-needed"
)

response = client.embeddings.create(
    model="text-embedding",
    input="semantic search with vector databases"
)

embedding = response.data[0].embedding
print(f"Dimensions: {len(embedding)}")

Batch with Python

texts = [
    "Retrieval augmented generation",
    "Vector similarity search",
    "Semantic knowledge graphs"
]

response = client.embeddings.create(
    model="text-embedding",
    input=texts
)

for i, data in enumerate(response.data):
    print(f"Text {i}: {len(data.embedding)} dimensions")

Automatic Dataset Embedding

Spice can automatically embed dataset columns during ingestion:

datasets:
  - from: postgres:documents
    name: documents
    acceleration:
      enabled: true
      engine: duckdb
    columns:
      - name: content
        embeddings:
          - from: text-embedding
            row_id:
              - id

This configuration:

Embeds the content column using the text-embedding model
Uses id as the row identifier
Stores embeddings for vector search

Full-Text Search Integration

Combine embeddings with full-text search:

columns:
  - name: description
    embeddings:
      - from: text-embedding
        row_id:
          - id
    full_text_search:
      enabled: true
      row_ids:
        - id

Now both vector and full-text search are available:

-- Vector search
SELECT * FROM vector_search(documents, 'machine learning', 10);

-- Full-text search
SELECT * FROM text_search(documents, 'neural networks', 10);

-- Hybrid search with RRF
SELECT * FROM (
  SELECT * FROM vector_search(documents, 'AI algorithms', rank_weight => 1.5)
  UNION ALL
  SELECT * FROM text_search(documents, 'deep learning', rank_weight => 1.2)
) ORDER BY _score DESC LIMIT 10;

Embedding Providers

OpenAI

embeddings:
  - from: openai:text-embedding-3-small
    name: openai-small
    params:
      openai_api_key: ${secrets:openai_key}
      openai_usage_tier: tier3

Available Models:

text-embedding-3-small - 1536 dimensions, efficient
text-embedding-3-large - 3072 dimensions, highest quality
text-embedding-ada-002 - Legacy model

AWS Bedrock

Titan Embeddings

embeddings:
  - from: bedrock:amazon.titan-embed-text-v2:0
    name: titan-embed
    params:
      aws_region: us-east-1
      aws_access_key_id: ${secrets:aws_access_key}
      aws_secret_access_key: ${secrets:aws_secret}
      normalize: true
      dimensions: 512

Parameters:

normalize - Normalize vectors (default: false)
dimensions - Output dimensions: 256, 512, or 1024

Cohere Embeddings

embeddings:
  - from: bedrock:cohere.embed-english-v3
    name: cohere-embed
    params:
      aws_region: us-east-1
      aws_access_key_id: ${secrets:aws_access_key}
      aws_secret_access_key: ${secrets:aws_secret}
      truncate: END
      input_type: SEARCH_DOCUMENT
      embedding_type: FLOAT

Parameters:

truncate - NONE, START, END (default: NONE)
input_type - SEARCH_DOCUMENT, SEARCH_QUERY, CLASSIFICATION, CLUSTERING
embedding_type - FLOAT, INT8

Nova Multimodal Embeddings

embeddings:
  - from: bedrock:amazon.nova-multimodal-embed-v2:0
    name: nova-embed
    params:
      aws_region: us-east-1
      aws_access_key_id: ${secrets:aws_access_key}
      aws_secret_access_key: ${secrets:aws_secret}
      dimensions: 1024
      embedding_purpose: STORAGE
      truncation_mode: NONE

Parameters:

dimensions - Output dimensions: 256, 384, 1024
embedding_purpose - STORAGE or QUERY
truncation_mode - NONE or TRUNCATE

Model2Vec (500x Faster)

Model2Vec provides static embeddings that are significantly faster:

embeddings:
  - from: model2vec:minishlab/potion-base-8M
    name: fast-embed
    params:
      normalize: true
      parallelism: 4
      embed_max_token_length: 512
      embed_batch_size: 1024

Available Models:

minishlab/potion-base-8M - 256 dimensions, English
minishlab/potion-multilingual-128M - 256 dimensions, multilingual

Performance Comparison:

Model Type	Embeddings/sec	Use Case
Transformer (GPU)	~1,000	High accuracy
Transformer (CPU)	~100	General use
Model2Vec	~50,000	High throughput

When to Use Model2Vec:

High-volume embedding pipelines
Real-time search applications
CPU-only deployments
Cost-sensitive workloads

HuggingFace

embeddings:
  - from: huggingface:sentence-transformers/all-MiniLM-L6-v2
    name: minilm
    params:
      huggingface_token: ${secrets:hf_token}  # Optional

Models are automatically downloaded from HuggingFace Hub. Popular Models:

sentence-transformers/all-MiniLM-L6-v2 - Fast, 384 dimensions
sentence-transformers/all-mpnet-base-v2 - High quality, 768 dimensions
BAAI/bge-small-en-v1.5 - Efficient, 384 dimensions

Local ONNX Models

embeddings:
  - from: file:models/all-MiniLM-L6-v2/
    name: local-embed

Requires ONNX model files in the specified directory.

Google (Gemini)

embeddings:
  - from: google:text-embedding-004
    name: gemini-embed
    params:
      google_api_key: ${secrets:google_key}

Vector Storage

Spice supports multiple vector storage backends:

Amazon S3 Vectors (Petabyte-Scale)

datasets:
  - from: postgres:documents
    name: documents
    acceleration:
      enabled: true
    columns:
      - name: content
        embeddings:
          - from: text-embedding
            row_id:
              - id
            vector_store: s3_vectors
            vector_store_params:
              bucket: my-vectors
              region: us-east-1

S3 Vectors provides:

Petabyte-scale vector storage
Managed embedding lifecycle
Native SQL integration
Hybrid search with RRF

PostgreSQL (pgvector)

columns:
  - name: content
    embeddings:
      - from: text-embedding
        row_id:
          - id
        vector_store: pgvector
        vector_store_params:
          connection_string: ${secrets:postgres_url}

DuckDB Vector

columns:
  - name: content
    embeddings:
      - from: text-embedding
        row_id:
          - id
        vector_store: duckdb_vector

SQLite Vector

columns:
  - name: content
    embeddings:
      - from: text-embedding
        row_id:
          - id
        vector_store: sqlite_vec

Distance Metrics

Supported similarity metrics:

Cosine Similarity (default) - Best for normalized vectors
Euclidean Distance - L2 distance
Dot Product - Inner product similarity

Configure in vector search:

SELECT * 
FROM vector_search(
    documents, 
    'query text', 
    10,
    distance_metric => 'euclidean'
);

Chunking

For large documents, configure chunking:

columns:
  - name: long_content
    embeddings:
      - from: text-embedding
        row_id:
          - id
        chunking:
          enabled: true
          chunk_size: 512
          chunk_overlap: 50
          strategy: recursive  # or 'fixed'

Chunking Strategies:

recursive - Split on sentence boundaries
fixed - Fixed-size chunks

Caching

Enable embedding caching for performance:

embeddings:
  - from: openai:text-embedding-3-small
    name: cached-embed
    params:
      openai_api_key: ${secrets:key}
    caching:
      enabled: true
      max_size: 256MiB
      ttl: 24h

Cache keys are based on:

Model name
Input text
Model parameters

Batch Processing

Optimize batch embedding performance:

from openai import OpenAI
import time

client = OpenAI(base_url="http://localhost:8090/v1")

# Process in batches
batch_size = 100
texts = [...] # Large list of texts

all_embeddings = []
for i in range(0, len(texts), batch_size):
    batch = texts[i:i+batch_size]
    response = client.embeddings.create(
        model="text-embedding",
        input=batch
    )
    all_embeddings.extend([d.embedding for d in response.data])
    time.sleep(0.1)  # Rate limiting

Performance Optimization

Hardware Acceleration

Local Models with CUDA:

embeddings:
  - from: file:models/all-MiniLM-L6-v2/
    name: gpu-embed
    params:
      device: cuda  # Automatic GPU detection

Apple Silicon with Metal:

embeddings:
  - from: file:models/all-MiniLM-L6-v2/
    name: metal-embed
    params:
      device: metal  # Automatic M1/M2/M3 optimization

Parallelism

Configure parallel embedding execution:

embeddings:
  - from: model2vec:minishlab/potion-base-8M
    name: parallel-embed
    params:
      parallelism: 8  # Number of threads

Rate Limiting

For API providers:

embeddings:
  - from: openai:text-embedding-3-small
    name: rate-limited
    params:
      openai_api_key: ${secrets:key}
      openai_usage_tier: tier3

Embedding Dimensions

Different models produce different dimensions:

Provider	Model	Dimensions
OpenAI	text-embedding-3-small	1536
OpenAI	text-embedding-3-large	3072
Bedrock	titan-embed-text-v2	256-1024
Bedrock	cohere-embed-v3	1024
Model2Vec	potion-base-8M	256
HuggingFace	all-MiniLM-L6-v2	384

Health Checks

Spice performs health checks on embedding models:

2026-03-03T10:15:30Z INFO Embedding model 'text-embedding' health check passed

Health check embeds a test string to verify:

Model availability
Credential validity
Network connectivity

Error Handling

Common errors and solutions:

Model Not Found

# Ensure model name matches
embeddings:
  - from: openai:text-embedding-3-small
    name: text-embedding  # Use this name in API calls

Rate Limiting

from openai import RateLimitError

try:
    response = client.embeddings.create(...)
except RateLimitError:
    # Implement backoff
    time.sleep(60)
    response = client.embeddings.create(...)

Dimension Mismatch

# Ensure consistent dimensions across pipeline
embeddings:
  - from: openai:text-embedding-3-small  # 1536 dims
    name: embed

columns:
  - name: content
    embeddings:
      - from: embed  # Must match dimensions

Best Practices

Choose Appropriate Models
- OpenAI for accuracy
- Model2Vec for throughput
- Local models for privacy
Enable Caching
- Reduces API costs
- Improves response time
- Especially important for RAG
Batch Processing
- Process multiple texts together
- Respect rate limits
- Use appropriate batch sizes
Monitor Performance
- Track embedding latency
- Monitor cache hit rates
- Adjust parallelism
Vector Storage Selection
- S3 Vectors for petabyte scale
- pgvector for PostgreSQL integration
- DuckDB for analytical workloads

Next Steps

RAG

Build retrieval-augmented generation workflows

Model Providers

Configure embedding providers

Vector Search

Query embeddings with vector_search

OpenAI API

Use embeddings with OpenAI SDK

Get Started

Core Concepts

Data Connectors

Data Accelerators

Search

AI & ML

Deployment

Documentation Index

​Overview

​Configuration

​Embedding API

​Single Text

​Batch Embeddings

​Python SDK

​Batch with Python

​Automatic Dataset Embedding

​Full-Text Search Integration

​Embedding Providers

​OpenAI

​AWS Bedrock

​Titan Embeddings

​Cohere Embeddings

​Nova Multimodal Embeddings

​Model2Vec (500x Faster)

​HuggingFace

​Local ONNX Models

​Google (Gemini)

​Vector Storage

​Amazon S3 Vectors (Petabyte-Scale)

​PostgreSQL (pgvector)

​DuckDB Vector

​SQLite Vector

​Distance Metrics

​Chunking

​Caching

​Batch Processing

​Performance Optimization

​Hardware Acceleration

​Parallelism

​Rate Limiting

​Embedding Dimensions

​Health Checks

​Error Handling

​Model Not Found

​Rate Limiting

​Dimension Mismatch

​Best Practices

​Next Steps

RAG

Model Providers

Vector Search

OpenAI API

Overview

Configuration

Embedding API

Single Text

Batch Embeddings

Python SDK

Batch with Python

Automatic Dataset Embedding

Full-Text Search Integration

Embedding Providers

OpenAI

AWS Bedrock

Titan Embeddings

Cohere Embeddings

Nova Multimodal Embeddings

Model2Vec (500x Faster)

HuggingFace

Local ONNX Models

Google (Gemini)

Vector Storage

Amazon S3 Vectors (Petabyte-Scale)

PostgreSQL (pgvector)

DuckDB Vector

SQLite Vector

Distance Metrics

Chunking

Caching

Batch Processing

Performance Optimization

Hardware Acceleration

Parallelism

Rate Limiting

Embedding Dimensions

Health Checks

Error Handling

Model Not Found

Rate Limiting

Dimension Mismatch

Best Practices

Next Steps