Skip to content

Deploying Druid

Introduction

Apache Druid is a high-performance, real-time analytics database designed for fast aggregations and exploratory queries on event-driven data. Built to power interactive dashboards and high-concurrency analytics applications, Druid delivers sub-second query latencies even when analyzing trillions of events. With its column-oriented storage, distributed architecture, and support for streaming and batch ingestion, Druid has become the go-to choice for organizations that need to analyze large-scale time-series data in real time.

Druid combines the best aspects of data warehouses, time-series databases, and search systems into a unified platform. It supports both SQL and native queries, making it accessible to analysts while providing the performance that engineers demand. Companies like Netflix, Airbnb, and Reddit rely on Druid to power their analytics at massive scale.

Key Features:

  • Lightning-Fast Queries: Optimized column-oriented storage with bitmap indexes enables sub-second aggregations over billions of rows
  • Real-Time Streaming: Native support for Apache Kafka and Amazon Kinesis allows continuous data ingestion with immediate query availability
  • Scalable Architecture: Horizontally scalable with separate compute for ingestion, storage, and queries
  • Time-Optimized: Built specifically for time-series analytics with advanced time-based partitioning and rollup capabilities
  • Flexible Schema: Support for nested JSON data and schema evolution without downtime
  • High Availability: Automatic failover, replication, and self-healing capabilities ensure continuous operation
  • SQL Support: Full ANSI SQL compatibility with extensions for time-series operations
  • Approximate Algorithms: Built-in support for HyperLogLog, theta sketches, and other probabilistic data structures
  • Rich Ecosystem: Native integrations with Kafka, Hadoop, S3, and modern data infrastructure

This comprehensive guide walks you through deploying Apache Druid on Klutch.sh using Docker, including single-server and clustered configurations, metadata storage setup, deep storage integration, and production-ready best practices for operating Druid at scale.

Why Deploy Druid on Klutch.sh

Deploying Apache Druid on Klutch.sh provides several advantages for real-time analytics workloads:

  • Simplified Infrastructure: Klutch.sh automatically detects your Dockerfile and handles container orchestration, letting you focus on Druid configuration rather than infrastructure management
  • TCP Traffic Support: Native TCP routing allows direct connections to Druid’s query endpoints on port 8000, with internal routing to Druid’s coordinator (8081), broker (8082), and router (8888) ports
  • Persistent Storage: Attach persistent volumes for local segment cache and metadata, ensuring data durability across deployments and enabling fast query performance
  • Environment Management: Securely configure Druid properties through environment variables without exposing sensitive credentials in your repository
  • Vertical Scaling: Easily adjust CPU and memory resources to match your query concurrency and data volume requirements
  • GitHub Integration: Deploy directly from GitHub with automatic rebuilds when you update Druid configuration or dependencies
  • Cost Efficiency: Start with single-server Druid deployments for development and testing, then scale to clustered configurations as your data grows
  • Multi-Region Support: Deploy Druid instances in regions close to your data sources and users for optimal latency
  • HTTP and TCP: Support both HTTP REST API access and native query protocols on the same deployment

Prerequisites

Before deploying Druid on Klutch.sh, ensure you have:

  • A Klutch.sh account
  • A GitHub account for repository hosting
  • Docker installed locally for testing (optional but recommended)
  • Basic understanding of data warehousing and analytics concepts
  • Familiarity with SQL and time-series data
  • (Recommended) A PostgreSQL database for metadata storage in production
  • (Recommended) Object storage (S3-compatible) for deep storage in production
  • (Optional) A Kafka cluster for streaming ingestion

Understanding Druid Architecture

Apache Druid uses a distributed, microservices-based architecture with specialized processes for different functions:

Core Processes

Master Server Processes:

  • Coordinator: Manages data availability and segment balancing across Historical nodes
  • Overlord: Manages data ingestion workloads and task distribution to MiddleManager nodes

Query Server Processes:

  • Broker: Routes queries to appropriate data nodes and merges results
  • Router: Optional API gateway that provides a unified endpoint for the Druid cluster

Data Server Processes:

  • Historical: Serves queries on immutable, historical data segments
  • MiddleManager: Ingests data and creates new segments
  • Indexer: Alternative to MiddleManager for simplified deployment

External Dependencies

  • Deep Storage: Long-term storage for segments (S3, HDFS, local filesystem, etc.)
  • Metadata Storage: Stores segment metadata and system configuration (PostgreSQL, MySQL, or Derby)
  • ZooKeeper: Coordinates internal service discovery and leader election

Deployment Modes

Micro-Quickstart (Development): All processes in a single JVM with Derby metadata storage

Single-Server (Small Production): Multiple JVM processes on one machine with external metadata storage

Clustered (Large Production): Distributed processes across multiple machines for high availability

For Klutch.sh deployments, we’ll focus on single-server configurations that can scale vertically, with external metadata and deep storage for production use.

Preparing Your Repository

To deploy Druid on Klutch.sh, you’ll create a GitHub repository with a Dockerfile and configuration files.

Step 1: Create Repository Structure

Create a new directory for your Druid deployment:

Terminal window
mkdir druid-klutch
cd druid-klutch
git init

Create the following directory structure:

druid-klutch/
├── Dockerfile
├── docker-compose.yml # For local testing only
├── conf/
│ ├── druid/
│ │ └── cluster/
│ │ └── _common/
│ │ ├── common.runtime.properties
│ │ └── log4j2.xml
│ └── supervise/
│ └── single-server/
│ └── micro-quickstart.conf
├── scripts/
│ └── start-druid.sh
└── README.md

Step 2: Create the Dockerfile

Create a production-ready Dockerfile in your repository root:

# Use official Apache Druid image as base
FROM apache/druid:28.0.0
# Set environment variables for Java heap settings
# These will be overridden by Klutch.sh environment variables
ENV DRUID_XMX=1g
ENV DRUID_XMS=1g
ENV DRUID_MAXNEWSIZE=250m
ENV DRUID_NEWSIZE=250m
ENV DRUID_MAXDIRECTMEMORYSIZE=400m
# Set Druid service to start (single-server mode)
# Options: micro-quickstart, small, medium, large, xlarge
ENV DRUID_SINGLE_SERVER_TYPE=micro-quickstart
# Set working directory
WORKDIR /opt/druid
# Copy custom configurations if present
COPY conf/druid/cluster/_common/*.properties conf/druid/cluster/_common/ 2>/dev/null || true
COPY conf/druid/cluster/_common/*.xml conf/druid/cluster/_common/ 2>/dev/null || true
# Copy custom startup script
COPY scripts/start-druid.sh /opt/druid/scripts/
RUN chmod +x /opt/druid/scripts/start-druid.sh
# Create directories for persistent storage
RUN mkdir -p /opt/druid/var/druid/segments \
&& mkdir -p /opt/druid/var/druid/segment-cache \
&& mkdir -p /opt/druid/var/druid/task \
&& mkdir -p /opt/druid/var/tmp
# Expose Druid ports
# 8081: Coordinator
# 8082: Broker
# 8083: Historical
# 8090: Overlord
# 8091: MiddleManager
# 8888: Router (unified API endpoint)
EXPOSE 8081 8082 8083 8090 8091 8888
# Health check on router endpoint
HEALTHCHECK --interval=30s --timeout=10s --start-period=60s --retries=3 \
CMD curl -f http://localhost:8888/status/health || exit 1
# Use custom start script
CMD ["/opt/druid/scripts/start-druid.sh"]

Step 3: Create Common Configuration

Create conf/druid/cluster/_common/common.runtime.properties:

# Extensions
druid.extensions.loadList=["druid-histogram", "druid-datasketches", "druid-lookups-cached-global", "postgresql-metadata-storage", "druid-kafka-indexing-service", "druid-s3-extensions"]
# Logging
druid.startup.logging.logProperties=true
# Zookeeper
# For single-server, use embedded ZooKeeper
druid.zk.service.host=localhost
druid.zk.paths.base=/druid
# Metadata storage (Derby for quickstart, PostgreSQL for production)
# Override these with environment variables in Klutch.sh
druid.metadata.storage.type=derby
druid.metadata.storage.connector.connectURI=jdbc:derby://localhost:1527/var/druid/metadata.db;create=true
druid.metadata.storage.connector.host=localhost
druid.metadata.storage.connector.port=1527
druid.metadata.storage.connector.createTables=true
# Deep storage (local for quickstart, S3 for production)
druid.storage.type=local
druid.storage.storageDirectory=/opt/druid/var/druid/segments
# Indexing service logs
druid.indexer.logs.type=file
druid.indexer.logs.directory=/opt/druid/var/druid/indexing-logs
# Service discovery
druid.selectors.indexing.serviceName=druid/overlord
druid.selectors.coordinator.serviceName=druid/coordinator
# Monitoring
druid.monitoring.monitors=["org.apache.druid.java.util.metrics.JvmMonitor"]
druid.emitter=noop
# Storage type of double columns
druid.indexing.doubleStorage=double
# SQL
druid.sql.enable=true
# Lookups
druid.lookup.enableLookupSyncOnStartup=false

Step 4: Create Startup Script

Create scripts/start-druid.sh:

#!/bin/bash
set -e
echo "Starting Apache Druid in single-server mode: ${DRUID_SINGLE_SERVER_TYPE}"
# Override metadata storage if PostgreSQL credentials provided
if [ ! -z "$POSTGRES_HOST" ]; then
echo "Configuring PostgreSQL metadata storage..."
cat >> /opt/druid/conf/druid/cluster/_common/common.runtime.properties <<EOF
# PostgreSQL metadata storage (production)
druid.metadata.storage.type=postgresql
druid.metadata.storage.connector.connectURI=jdbc:postgresql://${POSTGRES_HOST}:${POSTGRES_PORT:-5432}/${POSTGRES_DB:-druid}
druid.metadata.storage.connector.user=${POSTGRES_USER:-druid}
druid.metadata.storage.connector.password=${POSTGRES_PASSWORD}
druid.metadata.storage.connector.createTables=true
EOF
fi
# Override deep storage if S3 credentials provided
if [ ! -z "$S3_BUCKET" ]; then
echo "Configuring S3 deep storage..."
cat >> /opt/druid/conf/druid/cluster/_common/common.runtime.properties <<EOF
# S3 deep storage (production)
druid.storage.type=s3
druid.storage.bucket=${S3_BUCKET}
druid.storage.baseKey=${S3_BASE_KEY:-druid/segments}
druid.s3.accessKey=${S3_ACCESS_KEY}
druid.s3.secretKey=${S3_SECRET_KEY}
druid.s3.endpoint.url=${S3_ENDPOINT:-}
EOF
fi
# Set Java heap sizes from environment variables
export DRUID_XMX=${DRUID_XMX:-1g}
export DRUID_XMS=${DRUID_XMS:-1g}
export DRUID_MAXNEWSIZE=${DRUID_MAXNEWSIZE:-250m}
export DRUID_NEWSIZE=${DRUID_NEWSIZE:-250m}
export DRUID_MAXDIRECTMEMORYSIZE=${DRUID_MAXDIRECTMEMORYSIZE:-400m}
# Start Druid in single-server mode
exec /opt/druid/bin/start-${DRUID_SINGLE_SERVER_TYPE}

Step 5: Create Docker Compose for Local Testing

Create docker-compose.yml for local development and testing:

version: "3.8"
services:
postgres:
image: postgres:16-alpine
container_name: druid-postgres
environment:
POSTGRES_DB: druid
POSTGRES_USER: druid
POSTGRES_PASSWORD: druid_password
ports:
- "5432:5432"
volumes:
- postgres_data:/var/lib/postgresql/data
healthcheck:
test: ["CMD-SHELL", "pg_isready -U druid"]
interval: 10s
timeout: 5s
retries: 5
druid:
build: .
container_name: druid-single
ports:
- "8888:8888" # Router
- "8081:8081" # Coordinator
- "8082:8082" # Broker
- "8083:8083" # Historical
- "8090:8090" # Overlord
- "8091:8091" # MiddleManager
environment:
# Java heap settings
DRUID_XMX: 2g
DRUID_XMS: 2g
DRUID_MAXNEWSIZE: 500m
DRUID_NEWSIZE: 500m
DRUID_MAXDIRECTMEMORYSIZE: 1g
# Server type
DRUID_SINGLE_SERVER_TYPE: micro-quickstart
# PostgreSQL metadata storage
POSTGRES_HOST: postgres
POSTGRES_PORT: 5432
POSTGRES_DB: druid
POSTGRES_USER: druid
POSTGRES_PASSWORD: druid_password
volumes:
- druid_data:/opt/druid/var
depends_on:
postgres:
condition: service_healthy
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8888/status/health"]
interval: 30s
timeout: 10s
retries: 3
start_period: 60s
volumes:
postgres_data:
druid_data:

Step 6: Create Documentation

Create a README.md:

# Apache Druid on Klutch.sh
Real-time analytics database for fast queries at scale.
## Features
- Sub-second query latencies on large datasets
- Real-time streaming ingestion from Kafka
- Column-oriented storage with bitmap indexes
- SQL and native query support
- Horizontal scalability
- High availability with automatic failover
## Local Development
Test locally with Docker Compose:
```bash
docker-compose up -d

Access Druid web console at: http://localhost:8888

Production Deployment on Klutch.sh

Required Environment Variables

Set these in the Klutch.sh dashboard:

Java Heap Configuration:

  • DRUID_XMX: Maximum heap size (e.g., 4g)
  • DRUID_XMS: Initial heap size (e.g., 4g)
  • DRUID_MAXNEWSIZE: Max new generation size (e.g., 1g)
  • DRUID_NEWSIZE: Initial new generation size (e.g., 1g)
  • DRUID_MAXDIRECTMEMORYSIZE: Max direct memory (e.g., 2g)

PostgreSQL Metadata Storage (Recommended for production):

  • POSTGRES_HOST: PostgreSQL host
  • POSTGRES_PORT: PostgreSQL port (default: 5432)
  • POSTGRES_DB: Database name (e.g., druid)
  • POSTGRES_USER: Database user
  • POSTGRES_PASSWORD: Database password

S3 Deep Storage (Recommended for production):

  • S3_BUCKET: S3 bucket name
  • S3_BASE_KEY: Base path in bucket (default: druid/segments)
  • S3_ACCESS_KEY: AWS access key
  • S3_SECRET_KEY: AWS secret key
  • S3_ENDPOINT: S3 endpoint (optional, for S3-compatible storage)

Persistent Volumes

Attach a persistent volume for local caching and temporary storage:

  • Mount Path: /opt/druid/var
  • Recommended Size: 50GB-200GB depending on query volume

Traffic Configuration

  • Traffic Type: Select HTTP for web console and API access
  • Internal Port: 8888 (Router endpoint)

Alternative for programmatic access:

  • Traffic Type: TCP for native Druid client connections
  • Internal Port: 8082 (Broker endpoint)

Usage

Web Console

Access the Druid web console to:

  • Load data through the data loader wizard
  • Execute SQL queries
  • Monitor ingestion tasks
  • View datasources and segments

SQL Queries

Query Druid using standard SQL:

SELECT
TIME_FLOOR(__time, 'PT1H') AS hour,
COUNT(*) AS event_count,
SUM(bytes_sent) AS total_bytes
FROM events
WHERE __time >= CURRENT_TIMESTAMP - INTERVAL '24' HOUR
GROUP BY 1
ORDER BY 1 DESC

Streaming Ingestion

Ingest data from Kafka using supervisor specs or the web console data loader.

License

Apache License 2.0

### Step 7: Initialize Git and Push to GitHub
```bash
# Add all files
git add .
# Commit
git commit -m "Initial Druid deployment configuration"
# Add GitHub remote (replace with your repository URL)
git remote add origin https://github.com/yourusername/druid-klutch.git
# Push to GitHub
git branch -M main
git push -u origin main

Deploying to Klutch.sh

Now that your repository is prepared, follow these steps to deploy Apache Druid on Klutch.sh.

Deployment Steps

  1. **Navigate to Klutch.sh Dashboard**

    Visit klutch.sh/app and log in to your account.

  2. **Create a New Project**

    Click “New Project” and give it a name like “Druid Analytics” to organize your deployment.

  3. **Create a New App**

    Click “New App” or “Create App” and select GitHub as your source.

  4. **Connect Your Repository**
    • Authenticate with GitHub if not already connected
    • Select your Druid repository from the list
    • Choose the main branch for deployment
  5. **Configure Application Settings**
    • App Name: Choose a unique name (e.g., druid-analytics)
    • Traffic Type: Select HTTP for web console access
    • Internal Port: Set to 8888 (Druid Router endpoint)

    For programmatic access via native Druid protocol, you can alternatively use:

    • Traffic Type: TCP
    • Internal Port: 8082 (Broker endpoint)
  6. **Set Environment Variables**

    Configure these environment variables in the Klutch.sh dashboard:

    Required - Java Heap Configuration:

    DRUID_XMX=4g
    DRUID_XMS=4g
    DRUID_MAXNEWSIZE=1g
    DRUID_NEWSIZE=1g
    DRUID_MAXDIRECTMEMORYSIZE=2g

    Recommended - PostgreSQL Metadata Storage:

    First, deploy PostgreSQL using our PostgreSQL guide, then configure:

    POSTGRES_HOST=your-postgres-app.klutch.sh
    POSTGRES_PORT=8000
    POSTGRES_DB=druid
    POSTGRES_USER=druid
    POSTGRES_PASSWORD=your-secure-password

    Recommended - S3 Deep Storage:

    S3_BUCKET=your-druid-bucket
    S3_BASE_KEY=druid/segments
    S3_ACCESS_KEY=your-access-key
    S3_SECRET_KEY=your-secret-key

    For S3-compatible storage (MinIO, Wasabi, etc.):

    S3_ENDPOINT=https://s3.your-provider.com

    Optional - Server Sizing:

    DRUID_SINGLE_SERVER_TYPE=small

    Options: micro-quickstart, small, medium, large, xlarge

  7. **Attach Persistent Volume**

    Critical for local segment caching and temporary storage:

    • Click “Add Volume” in the Volumes section
    • Mount Path: /opt/druid/var
    • Size: 50GB minimum, 100-200GB recommended for production

    This volume stores:

    • Segment cache for fast query performance
    • Task logs and temporary ingestion files
    • ZooKeeper data for embedded mode
  8. **Deploy Application**

    Click “Create” or “Deploy” to start the deployment. Klutch.sh will:

    • Automatically detect your Dockerfile
    • Build the Docker image with your Druid configuration
    • Attach the persistent volume
    • Start your Druid container
    • Assign a URL for external access

    The first deployment takes 3-5 minutes as Druid initializes metadata tables and starts all services.

  9. **Verify Deployment**

    Once deployed, your Druid instance will be available at:

    https://your-app-name.klutch.sh

    Access the Druid web console by visiting this URL in your browser. You should see:

    • The Druid console home page
    • Available datasources (empty on first deployment)
    • Status indicators showing all services running
  10. **Test Database Connection**

    Verify Druid is running properly:

    Via Web Console:

    • Navigate to the Query view
    • Execute a test query: SELECT 1
    • Verify successful execution

    Via HTTP API:

    Terminal window
    curl https://your-app-name.klutch.sh/status/health

    Expected response: {"status":"healthy"}

    Via SQL endpoint:

    Terminal window
    curl -X POST \
    -H 'Content-Type: application/json' \
    https://your-app-name.klutch.sh/druid/v2/sql \
    -d '{"query": "SELECT 1"}'

Connecting to Druid

Once deployed, you can connect to Druid from your applications using various methods and client libraries.

Connection URL Formats

HTTP REST API:

https://example-app.klutch.sh/druid/v2

SQL endpoint:

https://example-app.klutch.sh/druid/v2/sql

Web Console:

https://example-app.klutch.sh

Native Query (TCP traffic):

If deployed with TCP traffic on broker port:

example-app.klutch.sh:8000

Example Connection Code

Python (using pydruid)

from pydruid.client import PyDruid
# Connect to Druid
druid = PyDruid(
'https://example-app.klutch.sh',
'druid/v2/'
)
# Execute a native Druid query
result = druid.timeseries(
datasource='events',
granularity='hour',
intervals='2024-01-01/2024-01-02',
aggregations={'count': doublesum('count')},
filter=Dimension('country') == 'US'
)
print(result)

Python (using SQL with requests)

import requests
import json
# SQL query endpoint
url = 'https://example-app.klutch.sh/druid/v2/sql'
# Execute SQL query
query = {
"query": """
SELECT
TIME_FLOOR(__time, 'PT1H') AS hour,
COUNT(*) AS event_count,
SUM(bytes_sent) AS total_bytes
FROM events
WHERE __time >= CURRENT_TIMESTAMP - INTERVAL '24' HOUR
GROUP BY 1
ORDER BY 1 DESC
"""
}
response = requests.post(
url,
headers={'Content-Type': 'application/json'},
data=json.dumps(query)
)
results = response.json()
for row in results:
print(f"Hour: {row['hour']}, Events: {row['event_count']}, Bytes: {row['total_bytes']}")

Node.js (using axios)

const axios = require('axios');
// Druid SQL endpoint
const druidUrl = 'https://example-app.klutch.sh/druid/v2/sql';
// Execute SQL query
async function queryDruid() {
const query = {
query: `
SELECT
TIME_FLOOR(__time, 'PT1H') AS hour,
COUNT(*) AS event_count
FROM events
WHERE __time >= CURRENT_TIMESTAMP - INTERVAL '24' HOUR
GROUP BY 1
ORDER BY 1 DESC
`
};
try {
const response = await axios.post(druidUrl, query, {
headers: { 'Content-Type': 'application/json' }
});
console.log('Query results:', response.data);
return response.data;
} catch (error) {
console.error('Query failed:', error.message);
throw error;
}
}
// Run query
queryDruid();

Java (using Druid SQL JDBC)

import java.sql.*;
public class DruidExample {
public static void main(String[] args) {
String url = "jdbc:avatica:remote:url=https://example-app.klutch.sh/druid/v2/sql/avatica/";
try (Connection conn = DriverManager.getConnection(url)) {
String sql = """
SELECT
TIME_FLOOR(__time, 'PT1H') AS hour,
COUNT(*) AS event_count
FROM events
WHERE __time >= CURRENT_TIMESTAMP - INTERVAL '24' HOUR
GROUP BY 1
ORDER BY 1 DESC
""";
try (Statement stmt = conn.createStatement();
ResultSet rs = stmt.executeQuery(sql)) {
while (rs.next()) {
System.out.println(
"Hour: " + rs.getTimestamp("hour") +
", Events: " + rs.getLong("event_count")
);
}
}
} catch (SQLException e) {
e.printStackTrace();
}
}
}

Go (using HTTP client)

package main
import (
"bytes"
"encoding/json"
"fmt"
"io/ioutil"
"net/http"
)
type SQLQuery struct {
Query string `json:"query"`
}
type QueryResult struct {
Hour string `json:"hour"`
EventCount int64 `json:"event_count"`
}
func main() {
druidURL := "https://example-app.klutch.sh/druid/v2/sql"
query := SQLQuery{
Query: `
SELECT
TIME_FLOOR(__time, 'PT1H') AS hour,
COUNT(*) AS event_count
FROM events
WHERE __time >= CURRENT_TIMESTAMP - INTERVAL '24' HOUR
GROUP BY 1
ORDER BY 1 DESC
`,
}
jsonData, _ := json.Marshal(query)
resp, err := http.Post(
druidURL,
"application/json",
bytes.NewBuffer(jsonData),
)
if err != nil {
panic(err)
}
defer resp.Body.Close()
body, _ := ioutil.ReadAll(resp.Body)
var results []QueryResult
json.Unmarshal(body, &results)
for _, r := range results {
fmt.Printf("Hour: %s, Events: %d\n", r.Hour, r.EventCount)
}
}

Ruby (using HTTP client)

require 'net/http'
require 'json'
require 'uri'
# Druid SQL endpoint
uri = URI('https://example-app.klutch.sh/druid/v2/sql')
# SQL query
query = {
query: <<~SQL
SELECT
TIME_FLOOR(__time, 'PT1H') AS hour,
COUNT(*) AS event_count
FROM events
WHERE __time >= CURRENT_TIMESTAMP - INTERVAL '24' HOUR
GROUP BY 1
ORDER BY 1 DESC
SQL
}
# Execute query
http = Net::HTTP.new(uri.host, uri.port)
http.use_ssl = true
request = Net::HTTP::Post.new(uri.path)
request['Content-Type'] = 'application/json'
request.body = query.to_json
response = http.request(request)
results = JSON.parse(response.body)
results.each do |row|
puts "Hour: #{row['hour']}, Events: #{row['event_count']}"
end

PHP (using cURL)

<?php
// Druid SQL endpoint
$druidUrl = 'https://example-app.klutch.sh/druid/v2/sql';
// SQL query
$query = [
'query' => "
SELECT
TIME_FLOOR(__time, 'PT1H') AS hour,
COUNT(*) AS event_count
FROM events
WHERE __time >= CURRENT_TIMESTAMP - INTERVAL '24' HOUR
GROUP BY 1
ORDER BY 1 DESC
"
];
// Execute query using cURL
$ch = curl_init($druidUrl);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_POST, true);
curl_setopt($ch, CURLOPT_POSTFIELDS, json_encode($query));
curl_setopt($ch, CURLOPT_HTTPHEADER, [
'Content-Type: application/json'
]);
$response = curl_exec($ch);
$httpCode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
curl_close($ch);
if ($httpCode === 200) {
$results = json_decode($response, true);
foreach ($results as $row) {
echo sprintf(
"Hour: %s, Events: %d\n",
$row['hour'],
$row['event_count']
);
}
} else {
echo "Query failed with HTTP code: $httpCode\n";
}

Getting Started with Druid

After deploying Druid on Klutch.sh, follow these steps to load data and run your first queries.

Loading Sample Data

The easiest way to get started is through the Druid web console’s data loader:

  1. **Access Web Console**

    Navigate to https://your-app-name.klutch.sh

  2. **Open Data Loader**

    Click “Load data” from the home page or navigate to the Ingestion view.

  3. **Choose Data Source**

    Select from various options:

    • Local disk: Upload a file directly
    • HTTP: Load data from a URL
    • Inline: Paste data directly
    • Kafka: Connect to a Kafka topic
    • Amazon Kinesis: Stream from Kinesis

    For testing, choose “Example data” to load a sample dataset.

  4. **Configure Ingestion**

    Follow the wizard to:

    • Parse your data format (JSON, CSV, etc.)
    • Define time column and parsing format
    • Configure dimensions and metrics
    • Set rollup and partitioning options
    • Review and submit ingestion task
  5. **Monitor Ingestion**

    Watch the task progress in the Ingestion view. Once complete, your data will be queryable immediately.

Running Your First Query

Execute SQL queries through the web console:

  1. **Navigate to Query View**

    Click “Query” in the top navigation.

  2. **Write Your SQL**

    Enter a SQL query:

    SELECT
    TIME_FLOOR(__time, 'PT1H') AS hour,
    COUNT(*) AS events
    FROM wikipedia
    GROUP BY 1
    ORDER BY 1 DESC
    LIMIT 24
  3. **Execute Query**

    Click “Run” or press Ctrl+Enter (Cmd+Enter on Mac).

  4. **View Results**

    Results appear in a table below the query editor. You can:

    • Export results to CSV or JSON
    • Visualize data with built-in charts
    • Save queries for later use

Streaming Ingestion from Kafka

To ingest real-time data from Kafka, you’ll need a Kafka cluster. Deploy one using our Kafka deployment guide.

Create a supervisor spec for streaming ingestion:

{
"type": "kafka",
"spec": {
"dataSchema": {
"dataSource": "events",
"timestampSpec": {
"column": "timestamp",
"format": "iso"
},
"dimensionsSpec": {
"dimensions": [
"user_id",
"event_type",
"country",
"device"
]
},
"metricsSpec": [
{
"type": "count",
"name": "count"
},
{
"type": "longSum",
"name": "bytes_sent",
"fieldName": "bytes"
}
],
"granularitySpec": {
"type": "uniform",
"segmentGranularity": "HOUR",
"queryGranularity": "MINUTE",
"rollup": true
}
},
"ioConfig": {
"topic": "events",
"consumerProperties": {
"bootstrap.servers": "your-kafka-app.klutch.sh:8000"
},
"taskCount": 1,
"replicas": 1,
"taskDuration": "PT1H"
},
"tuningConfig": {
"type": "kafka",
"maxRowsPerSegment": 5000000
}
}
}

Submit this spec through the web console or API:

Terminal window
curl -X POST \
-H 'Content-Type: application/json' \
https://your-app-name.klutch.sh/druid/indexer/v1/supervisor \
-d @supervisor-spec.json

Batch Ingestion from Files

Ingest data from local files or cloud storage:

{
"type": "index_parallel",
"spec": {
"dataSchema": {
"dataSource": "events",
"timestampSpec": {
"column": "timestamp",
"format": "iso"
},
"dimensionsSpec": {
"dimensions": ["user_id", "event_type"]
},
"metricsSpec": [
{"type": "count", "name": "count"}
],
"granularitySpec": {
"type": "uniform",
"segmentGranularity": "DAY",
"queryGranularity": "HOUR"
}
},
"ioConfig": {
"type": "index_parallel",
"inputSource": {
"type": "http",
"uris": ["https://example.com/data.json"]
},
"inputFormat": {
"type": "json"
}
},
"tuningConfig": {
"type": "index_parallel",
"maxRowsPerSegment": 5000000
}
}
}

Advanced Configuration

Metadata Storage Configuration

For production deployments, use PostgreSQL for metadata storage instead of embedded Derby.

First, deploy PostgreSQL following our PostgreSQL guide. Then configure Druid to use it:

Environment Variables:

POSTGRES_HOST=your-postgres-app.klutch.sh
POSTGRES_PORT=8000
POSTGRES_DB=druid
POSTGRES_USER=druid
POSTGRES_PASSWORD=your-secure-password

The startup script automatically configures Druid to use PostgreSQL when these variables are set.

Manual Configuration (in common.runtime.properties):

druid.metadata.storage.type=postgresql
druid.metadata.storage.connector.connectURI=jdbc:postgresql://your-postgres-app.klutch.sh:8000/druid
druid.metadata.storage.connector.user=druid
druid.metadata.storage.connector.password=your-secure-password
druid.metadata.storage.connector.createTables=true

Deep Storage Configuration

Configure S3 or S3-compatible storage for segment archival:

AWS S3:

S3_BUCKET=my-druid-segments
S3_BASE_KEY=production/segments
S3_ACCESS_KEY=AKIAIOSFODNN7EXAMPLE
S3_SECRET_KEY=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY

S3-Compatible Storage (MinIO, Wasabi, etc.):

S3_BUCKET=druid-segments
S3_BASE_KEY=segments
S3_ACCESS_KEY=minioadmin
S3_SECRET_KEY=minioadmin
S3_ENDPOINT=https://minio.example.com

Manual Configuration (in common.runtime.properties):

druid.storage.type=s3
druid.storage.bucket=my-druid-segments
druid.storage.baseKey=production/segments
druid.s3.accessKey=AKIAIOSFODNN7EXAMPLE
druid.s3.secretKey=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY

Java Heap Tuning

Adjust Java heap sizes based on your workload:

Small Workload (2-4GB RAM available):

DRUID_XMX=2g
DRUID_XMS=2g
DRUID_MAXNEWSIZE=500m
DRUID_NEWSIZE=500m
DRUID_MAXDIRECTMEMORYSIZE=1g

Medium Workload (8-16GB RAM available):

DRUID_XMX=8g
DRUID_XMS=8g
DRUID_MAXNEWSIZE=2g
DRUID_NEWSIZE=2g
DRUID_MAXDIRECTMEMORYSIZE=4g

Large Workload (32GB+ RAM available):

DRUID_XMX=16g
DRUID_XMS=16g
DRUID_MAXNEWSIZE=4g
DRUID_NEWSIZE=4g
DRUID_MAXDIRECTMEMORYSIZE=8g

Guidelines:

  • Set DRUID_XMX and DRUID_XMS to the same value to avoid heap resizing
  • Allocate 50-75% of available RAM to heap memory
  • Reserve RAM for direct memory and OS cache
  • New generation size should be 25-30% of max heap

Query Performance Tuning

Optimize query performance with these settings:

Enable Query Caching:

Add to common.runtime.properties:

# Enable caching
druid.cache.type=caffeine
druid.cache.sizeInBytes=256000000
druid.cache.expireAfter=3600000
# Broker cache config
druid.broker.cache.useCache=true
druid.broker.cache.populateCache=true
# Historical cache config
druid.historical.cache.useCache=true
druid.historical.cache.populateCache=true

Segment Cache Size:

Adjust how much of the segment cache Historical nodes keep in memory:

# In Historical node config
druid.segmentCache.locations=[{"path":"/opt/druid/var/druid/segment-cache","maxSize":10737418240}]
druid.server.maxSize=10737418240

Parallel Query Processing:

# Enable parallel query processing
druid.processing.buffer.sizeBytes=134217728
druid.processing.numThreads=7
druid.processing.numMergeBuffers=2

Security Configuration

Enable authentication and authorization:

Basic Authentication:

# Enable basic auth
druid.auth.authenticatorChain=["basic"]
druid.auth.authenticator.basic.type=basic
druid.auth.authenticator.basic.initialAdminPassword=admin123
druid.auth.authenticator.basic.initialInternalClientPassword=internal123
druid.auth.authenticator.basic.credentialsValidator.type=metadata
druid.auth.authenticator.basic.skipOnFailure=false
# Enable authorization
druid.auth.authorizers=["basic"]
druid.auth.authorizer.basic.type=basic

TLS/SSL:

To enable HTTPS for Druid endpoints:

# Enable TLS
druid.enablePlaintextPort=false
druid.enableTlsPort=true
druid.tls.keyStorePath=/path/to/keystore.jks
druid.tls.keyStorePassword=keystorePassword
druid.tls.certAlias=druid

Note: When deployed on Klutch.sh, HTTPS is provided automatically by the platform. Internal Druid communication can use plaintext.

Monitoring and Metrics

Druid emits metrics that can be consumed by monitoring systems:

Enable Prometheus Metrics:

Add to common.runtime.properties:

druid.emitter=composing
druid.emitter.composing.emitters=["prometheus"]
druid.emitter.prometheus.strategy=exporter
druid.emitter.prometheus.port=9090

Key Metrics to Monitor:

  • query/time: Query execution time
  • query/bytes: Bytes processed per query
  • segment/scan/pending: Pending segment scans
  • jvm/mem/used: JVM memory usage
  • ingest/events/processed: Events ingested
  • segment/count: Total segments
  • segment/size: Total segment size

Health Check Endpoints:

Terminal window
# Overall cluster health
curl https://your-app-name.klutch.sh/status/health
# Coordinator status
curl https://your-app-name.klutch.sh/druid/coordinator/v1/leader
# Datasources
curl https://your-app-name.klutch.sh/druid/coordinator/v1/datasources

Production Best Practices

Resource Allocation

CPU Requirements:

  • Minimum: 2 CPU cores for micro-quickstart
  • Recommended: 4-8 CPU cores for production workloads
  • Druid scales well with CPU - more cores enable better query parallelism

Memory Requirements:

  • Minimum: 4GB RAM for testing
  • Small production: 8-16GB RAM
  • Medium production: 32-64GB RAM
  • Large production: 128GB+ RAM

Storage Requirements:

  • Persistent volume for segment cache: 50-200GB
  • Deep storage (S3): Based on data retention policy
  • Metadata storage (PostgreSQL): 10-50GB depending on datasources

Sizing Formula:

Required RAM = (Heap Memory + Direct Memory + OS Cache)
Heap Memory ≈ 50-60% of total RAM
Direct Memory ≈ 20-30% of total RAM
OS Cache ≈ 20-30% of total RAM

High Availability Setup

For production deployments requiring high availability:

Multiple Druid Instances:

  • Deploy multiple single-server Druid instances behind a load balancer
  • Each instance can serve queries independently
  • Share the same metadata storage and deep storage

External Dependencies:

  • Use managed PostgreSQL with replication for metadata
  • Use cloud object storage (S3) for deep storage with built-in redundancy
  • Deploy external ZooKeeper cluster for coordination (advanced)

Health Checks:

  • Configure load balancer health checks on /status/health
  • Set up monitoring alerts for service availability
  • Implement automatic failover for coordinator/overlord roles

Backup and Recovery

Metadata Backup:

Regular backups of PostgreSQL metadata database:

Terminal window
# Backup metadata
pg_dump -h your-postgres-app.klutch.sh -p 8000 -U druid druid > druid_metadata_backup.sql
# Restore metadata
psql -h your-postgres-app.klutch.sh -p 8000 -U druid druid < druid_metadata_backup.sql

Segment Backup:

Deep storage automatically serves as segment backup. Segments are immutable and safe in S3.

Disaster Recovery Plan:

  1. Maintain regular metadata database backups
  2. Ensure deep storage has versioning enabled
  3. Document configuration in version control (your GitHub repository)
  4. Test recovery procedures regularly
  5. Keep runbooks for common failure scenarios

Segment Retention:

Configure retention policies to automatically drop old data:

# Drop segments older than 90 days
druid.coordinator.period=PT300S
druid.coordinator.rules=[
{"type": "loadByInterval", "interval": "PT90D/PT0S", "tieredReplicants": {"_default_tier": 2}},
{"type": "dropForever"}
]

Security Hardening

Authentication:

  • Enable basic authentication with strong passwords
  • Rotate credentials regularly
  • Use separate credentials for internal and external access

Authorization:

  • Implement role-based access control (RBAC)
  • Restrict datasource access by user role
  • Audit query and ingestion operations

Network Security:

  • Use HTTPS for all external communications (provided by Klutch.sh)
  • Restrict database access to Druid’s IP address
  • Use VPC peering for cloud resources when possible

Secrets Management:

  • Store sensitive credentials in environment variables
  • Never commit secrets to version control
  • Rotate database and S3 credentials periodically

Query Limits:

Prevent resource exhaustion from expensive queries:

# Limit query time
druid.server.http.maxQueryTimeout=300000
# Limit concurrent queries
druid.broker.http.numConnections=20
# Limit result size
druid.server.http.maxSubqueryRows=100000

Performance Optimization

Segment Optimization:

  • Use appropriate segment granularity (HOUR, DAY, WEEK)
  • Smaller segments improve parallelism
  • Larger segments reduce metadata overhead
  • Aim for 5-10 million rows per segment

Query Optimization:

  • Use filters to reduce data scanned
  • Leverage rollup for pre-aggregation
  • Create appropriate indexes on filter columns
  • Avoid SELECT * queries

Ingestion Optimization:

  • Batch ingestion: Use parallel ingestion for large datasets
  • Streaming ingestion: Tune taskCount and taskDuration
  • Enable rollup to reduce segment size
  • Use appropriate queryGranularity for your use case

Caching Strategy:

  • Enable caching on Broker and Historical nodes
  • Set appropriate cache expiration times
  • Monitor cache hit rates
  • Size cache based on working set

Monitoring and Alerting

Key Metrics to Track:

  1. Query Performance:

    • Average query time
    • 95th/99th percentile latency
    • Query failures
  2. Ingestion Health:

    • Ingestion task success rate
    • Lag for streaming ingestion
    • Segment creation rate
  3. Resource Utilization:

    • JVM heap usage
    • Direct memory usage
    • CPU utilization
    • Disk I/O
  4. Cluster Health:

    • Service availability
    • Segment availability
    • Failed tasks

Alerting Thresholds:

Critical:
- Any service down > 1 minute
- Query failure rate > 5%
- JVM heap usage > 90%
- Disk usage > 85%
Warning:
- Query latency p95 > 5 seconds
- Ingestion lag > 10 minutes
- Heap usage > 75%
- Cache hit rate < 50%

Scaling Strategies

Vertical Scaling:

Start with vertical scaling for simplicity:

  • Increase CPU cores for better query parallelism
  • Add RAM for larger segment cache
  • Adjust Java heap sizes proportionally
  • Monitor resource utilization to identify bottlenecks

Horizontal Scaling:

When vertical scaling is insufficient:

  • Deploy dedicated Historical nodes for queries
  • Deploy dedicated MiddleManager nodes for ingestion
  • Separate Coordinator and Overlord from data nodes
  • Use external ZooKeeper cluster

Data Tiering:

Optimize costs with hot/cold data tiers:

  • Recent data on fast SSD storage (hot tier)
  • Historical data on cheaper storage (cold tier)
  • Configure rules for automatic tier movement
  • Use different replica counts per tier

Troubleshooting

Issue: Druid Fails to Start

Symptoms: Container starts but Druid processes don’t initialize

Possible Causes and Solutions:

  1. Insufficient Memory:

Check Java heap configuration:

Terminal window
# View container logs
# Look for "OutOfMemoryError" or "Cannot reserve enough space for object heap"

Solution: Increase heap size or container memory:

DRUID_XMX=4g
DRUID_XMS=4g
  1. Metadata Storage Connection Failed:

Check PostgreSQL connectivity:

Terminal window
# Test PostgreSQL connection from Druid container
curl https://your-postgres-app.klutch.sh:8000

Solution: Verify PostgreSQL credentials and network connectivity:

POSTGRES_HOST=your-postgres-app.klutch.sh
POSTGRES_PORT=8000
POSTGRES_DB=druid
POSTGRES_USER=druid
POSTGRES_PASSWORD=correct-password
  1. Port Conflicts:

Check if ports are already in use:

Solution: Ensure no other services are using Druid’s ports (8081-8091, 8888).

  1. Persistent Volume Issues:

Verify volume is mounted correctly:

Terminal window
# Check if volume is accessible
ls -la /opt/druid/var

Solution: Ensure persistent volume is attached at /opt/druid/var.

Issue: Slow Query Performance

Symptoms: Queries take longer than expected

Troubleshooting Steps:

  1. Check Query Plan:

Use EXPLAIN PLAN to understand query execution:

EXPLAIN PLAN FOR
SELECT COUNT(*) FROM events WHERE country = 'US'
  1. Verify Segment Pruning:

Ensure time filters enable segment pruning:

-- Good: Uses time filter
SELECT COUNT(*) FROM events
WHERE __time >= CURRENT_TIMESTAMP - INTERVAL '1' HOUR
-- Bad: Scans all segments
SELECT COUNT(*) FROM events
  1. Check Segment Cache:

Verify Historical nodes are caching segments:

Terminal window
curl https://your-app-name.klutch.sh/druid/coordinator/v1/servers?simple
  1. Monitor Resource Usage:

Check CPU and memory in Klutch.sh dashboard:

  • High CPU: Increase cores or optimize query
  • High memory: Increase heap size or reduce cache size
  1. Optimize Segment Size:

Merge small segments:

Terminal window
curl -X POST https://your-app-name.klutch.sh/druid/coordinator/v1/compact/tasks \
-H 'Content-Type: application/json' \
-d '{"dataSource": "events"}'

Issue: Ingestion Task Fails

Symptoms: Data not appearing in datasource, failed tasks in console

Common Causes:

  1. Invalid Data Format:

Check task logs for parsing errors:

Terminal window
curl https://your-app-name.klutch.sh/druid/indexer/v1/task/{taskId}/log

Solution: Verify input format matches data:

{
"inputFormat": {
"type": "json",
"flattenSpec": {
"useFieldDiscovery": true
}
}
}
  1. Timestamp Parsing Failed:

Ensure timestamp format is correct:

{
"timestampSpec": {
"column": "timestamp",
"format": "iso"
}
}

Common formats:

  • iso: ISO 8601 (e.g., 2024-01-01T12:00:00Z)
  • millis: Unix milliseconds
  • auto: Auto-detect format
  • yyyy-MM-dd HH:mm:ss: Custom format
  1. Insufficient Resources:

Check MiddleManager capacity:

Terminal window
curl https://your-app-name.klutch.sh/druid/indexer/v1/workers

Solution: Adjust worker capacity or increase container resources.

  1. Kafka Connection Issues:

For Kafka ingestion, verify connectivity:

Terminal window
# From Druid container
curl your-kafka-app.klutch.sh:8000

Solution: Check Kafka broker configuration and network access.

Issue: High Memory Usage

Symptoms: Container running out of memory, JVM crashes

Solutions:

  1. Reduce Heap Size:

If heap is too large, reduce it:

DRUID_XMX=4g
DRUID_XMS=4g

Ensure: Heap + Direct Memory + OS < Total Container RAM

  1. Reduce Segment Cache:

Limit segment cache size in common.runtime.properties:

druid.segmentCache.locations=[{"path":"/opt/druid/var/druid/segment-cache","maxSize":5368709120}]
  1. Reduce Processing Buffer:

Lower processing buffer size:

druid.processing.buffer.sizeBytes=67108864
druid.processing.numThreads=4
  1. Enable Caching Limits:

Configure cache eviction:

druid.cache.type=caffeine
druid.cache.sizeInBytes=128000000
  1. Increase Container Resources:

Scale up container in Klutch.sh dashboard to provide more RAM.

Issue: Cannot Connect to Druid

Symptoms: Unable to access web console or API

Troubleshooting Steps:

  1. Verify Deployment Status:

Check Klutch.sh dashboard for deployment status and logs.

  1. Check Port Configuration:

Ensure internal port is set correctly:

  • HTTP traffic: Port 8888 (Router)
  • TCP traffic: Port 8082 (Broker)
  1. Test Health Endpoint:
Terminal window
curl https://your-app-name.klutch.sh/status/health

Expected: {"status":"healthy"}

  1. Check Firewall Rules:

Ensure no firewall blocking traffic to Klutch.sh domain.

  1. Verify Service Status:

Check if all Druid services started:

Terminal window
curl https://your-app-name.klutch.sh/druid/coordinator/v1/servers

Issue: Segments Not Loading

Symptoms: Data ingested but not queryable

Troubleshooting Steps:

  1. Check Segment Availability:
Terminal window
curl https://your-app-name.klutch.sh/druid/coordinator/v1/datasources/{datasource}/loadstatus
  1. Verify Deep Storage Access:

Ensure S3 credentials are correct and bucket is accessible.

  1. Check Historical Node Capacity:
Terminal window
curl https://your-app-name.klutch.sh/druid/coordinator/v1/servers?simple
  1. Review Coordinator Logs:

Check for segment assignment errors in logs.

  1. Force Segment Load:

Manually trigger segment loading:

Terminal window
curl -X POST https://your-app-name.klutch.sh/druid/coordinator/v1/datasources/{datasource}

Additional Resources

  • PostgreSQL - Deploy PostgreSQL for Druid metadata storage
  • Kafka - Stream real-time data into Druid
  • ClickHouse - Alternative analytics database
  • Metabase - Visualize Druid data with dashboards

You now have Apache Druid running on Klutch.sh! Your real-time analytics database is ready to ingest streaming data, serve fast queries, and power interactive dashboards. Start loading data through the web console, configure metadata and deep storage for production, and scale your deployment as your analytics needs grow.