Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/spiceai/spiceai/llms.txt

Use this file to discover all available pages before exploring further.

The ClickHouse connector enables Spice to query ClickHouse databases with SQL push-down support for real-time analytical workloads.

Status

Alpha - Early access with ongoing development

Supported Features

  • SQL query push-down
  • Connection pooling
  • Native ClickHouse protocol
  • Data acceleration
  • Schema introspection
  • High-performance analytical queries

Configuration

Basic Configuration

version: v1
kind: Spicepod
name: clickhouse-demo

datasets:
  - from: clickhouse:my_table
    name: my_table
    params:
      clickhouse_host: localhost
      clickhouse_tcp_port: 9000
      clickhouse_db: default
      clickhouse_user: default
      clickhouse_pass: ${secrets:clickhouse_password}

With SSL/TLS

datasets:
  - from: clickhouse:events
    name: events
    params:
      clickhouse_host: clickhouse.example.com
      clickhouse_tcp_port: 9440
      clickhouse_db: analytics
      clickhouse_user: app_user
      clickhouse_pass: ${secrets:clickhouse_password}
      clickhouse_secure: true

With Data Acceleration

datasets:
  - from: clickhouse:metrics.system_logs
    name: system_logs
    params:
      clickhouse_host: ${env:CLICKHOUSE_HOST}
      clickhouse_tcp_port: 9000
      clickhouse_db: logs
      clickhouse_user: reader
      clickhouse_pass: ${secrets:clickhouse_password}
    acceleration:
      enabled: true
      engine: arrow
      refresh_interval: 30s

With Connection Pool

datasets:
  - from: clickhouse:analytics.user_events
    name: user_events
    params:
      clickhouse_host: clickhouse-cluster.internal
      clickhouse_tcp_port: 9000
      clickhouse_db: analytics
      clickhouse_user: spice
      clickhouse_pass: ${secrets:clickhouse_password}
      connection_pool_size: 20

Parameters

clickhouse_host
string
required
ClickHouse server hostname or IP address
clickhouse_tcp_port
integer
default:"9000"
ClickHouse native protocol port (9000 for non-SSL, 9440 for SSL)
clickhouse_db
string
required
Database name
clickhouse_user
string
required
Database user
clickhouse_pass
string
Database password (if required)
clickhouse_secure
boolean
default:"false"
Use SSL/TLS for connection
connection_pool_size
integer
default:"10"
Maximum number of connections in the pool
client_timeout
duration
default:"30s"
Query timeout (e.g., 60s, 5m)

Authentication

Password Authentication

params:
  clickhouse_user: myuser
  clickhouse_pass: ${secrets:clickhouse_password}

No Authentication

For local development:
params:
  clickhouse_user: default
  clickhouse_pass: ""

Use Cases

Real-Time Analytics Dashboard

datasets:
  - from: clickhouse:events.page_views
    name: page_views
    params:
      clickhouse_host: analytics.internal
      clickhouse_tcp_port: 9000
      clickhouse_db: events
      clickhouse_user: dashboard
      clickhouse_pass: ${secrets:clickhouse_password}
    acceleration:
      enabled: true
      engine: arrow
      refresh_interval: 10s
Real-time queries:
SELECT 
  toStartOfInterval(timestamp, INTERVAL 5 MINUTE) as time_bucket,
  page_path,
  COUNT(*) as views,
  COUNT(DISTINCT user_id) as unique_users
FROM page_views
WHERE timestamp >= NOW() - INTERVAL 1 HOUR
GROUP BY time_bucket, page_path
ORDER BY time_bucket DESC, views DESC;

Log Analytics

datasets:
  - from: clickhouse:logs.application_logs
    name: app_logs
    params:
      clickhouse_host: ${env:CLICKHOUSE_HOST}
      clickhouse_tcp_port: 9000
      clickhouse_db: logs
      clickhouse_user: log_reader
      clickhouse_pass: ${secrets:clickhouse_password}
Query logs:
SELECT 
  level,
  COUNT(*) as count,
  COUNT(DISTINCT service_name) as affected_services
FROM app_logs
WHERE timestamp >= NOW() - INTERVAL 1 HOUR
  AND level IN ('ERROR', 'CRITICAL')
GROUP BY level
ORDER BY count DESC;

Federated Query with Operational Database

datasets:
  # ClickHouse for analytics
  - from: clickhouse:analytics.user_behavior
    name: user_behavior
    params:
      clickhouse_host: clickhouse.internal
      clickhouse_tcp_port: 9000
      clickhouse_db: analytics
      clickhouse_user: analyst
      clickhouse_pass: ${secrets:clickhouse_password}

  # PostgreSQL for user data
  - from: postgres:public.users
    name: users
    params:
      pg_host: postgres.internal
      pg_db: app
      pg_user: readonly
      pg_pass: ${secrets:pg_password}
Join across systems:
SELECT 
  u.email,
  u.signup_date,
  COUNT(b.event_id) as total_events,
  MAX(b.timestamp) as last_activity
FROM users u
LEFT JOIN user_behavior b ON u.id = b.user_id
WHERE b.timestamp >= NOW() - INTERVAL 7 DAY
GROUP BY u.email, u.signup_date
HAVING total_events > 100
ORDER BY total_events DESC;

Time-Series Metrics

datasets:
  - from: clickhouse:metrics.server_metrics
    name: server_metrics
    params:
      clickhouse_host: metrics-db.internal
      clickhouse_tcp_port: 9000
      clickhouse_db: metrics
      clickhouse_user: monitoring
      clickhouse_pass: ${secrets:clickhouse_password}
    acceleration:
      enabled: true
      engine: duckdb
      mode: memory
      refresh_interval: 15s
Monitoring queries:
SELECT 
  server_name,
  AVG(cpu_usage) as avg_cpu,
  MAX(memory_usage) as max_memory,
  AVG(disk_io) as avg_disk_io
FROM server_metrics
WHERE timestamp >= NOW() - INTERVAL 5 MINUTE
GROUP BY server_name
HAVING avg_cpu > 80  -- Alert condition
ORDER BY avg_cpu DESC;

Performance Tips

  1. Query Push-down: Spice pushes WHERE clauses and aggregations to ClickHouse for optimal performance
  2. Acceleration: Enable for frequently queried data to reduce ClickHouse load
  3. Partition Keys: Ensure ClickHouse tables use proper partition keys for time-series data
  4. Connection Pooling: Adjust pool size based on query concurrency
  5. MergeTree Tables: Use ClickHouse MergeTree engines for best analytical performance

ClickHouse-Specific Features

Materialized Views

Query ClickHouse materialized views directly:
datasets:
  - from: clickhouse:analytics.hourly_stats_mv
    name: hourly_stats
    params:
      clickhouse_host: clickhouse.internal
      clickhouse_tcp_port: 9000
      clickhouse_db: analytics
      clickhouse_user: reader
      clickhouse_pass: ${secrets:clickhouse_password}

Distributed Tables

Connect to ClickHouse distributed tables:
datasets:
  - from: clickhouse:events.events_distributed
    name: all_events
    params:
      clickhouse_host: clickhouse-proxy.internal
      clickhouse_tcp_port: 9000
      clickhouse_db: events
      clickhouse_user: spice
      clickhouse_pass: ${secrets:clickhouse_password}

Data Type Mapping

ClickHouse types are automatically mapped to Arrow types:
  • Int8, Int16, Int32, Int64Int8, Int16, Int32, Int64
  • UInt8, UInt16, UInt32, UInt64UInt8, UInt16, UInt32, UInt64
  • Float32, Float64Float32, Float64
  • String, FixedStringUtf8
  • Date, Date32Date32
  • DateTime, DateTime64Timestamp
  • Array(T)List
  • Nullable(T) → Nullable Arrow type

Limitations

  • Write operations not supported (read-only)
  • Some ClickHouse-specific functions may not work in federated queries
  • Very large result sets may require pagination
  • Alpha status - expect API changes