Deploying Druid
Introduction
Apache Druid is a high-performance, real-time analytics database designed for fast aggregations and exploratory queries on event-driven data. Built to power interactive dashboards and high-concurrency analytics applications, Druid delivers sub-second query latencies even when analyzing trillions of events. With its column-oriented storage, distributed architecture, and support for streaming and batch ingestion, Druid has become the go-to choice for organizations that need to analyze large-scale time-series data in real time.
Druid combines the best aspects of data warehouses, time-series databases, and search systems into a unified platform. It supports both SQL and native queries, making it accessible to analysts while providing the performance that engineers demand. Companies like Netflix, Airbnb, and Reddit rely on Druid to power their analytics at massive scale.
Key Features:
- Lightning-Fast Queries: Optimized column-oriented storage with bitmap indexes enables sub-second aggregations over billions of rows
- Real-Time Streaming: Native support for Apache Kafka and Amazon Kinesis allows continuous data ingestion with immediate query availability
- Scalable Architecture: Horizontally scalable with separate compute for ingestion, storage, and queries
- Time-Optimized: Built specifically for time-series analytics with advanced time-based partitioning and rollup capabilities
- Flexible Schema: Support for nested JSON data and schema evolution without downtime
- High Availability: Automatic failover, replication, and self-healing capabilities ensure continuous operation
- SQL Support: Full ANSI SQL compatibility with extensions for time-series operations
- Approximate Algorithms: Built-in support for HyperLogLog, theta sketches, and other probabilistic data structures
- Rich Ecosystem: Native integrations with Kafka, Hadoop, S3, and modern data infrastructure
This comprehensive guide walks you through deploying Apache Druid on Klutch.sh using Docker, including single-server and clustered configurations, metadata storage setup, deep storage integration, and production-ready best practices for operating Druid at scale.
Why Deploy Druid on Klutch.sh
Deploying Apache Druid on Klutch.sh provides several advantages for real-time analytics workloads:
- Simplified Infrastructure: Klutch.sh automatically detects your Dockerfile and handles container orchestration, letting you focus on Druid configuration rather than infrastructure management
- TCP Traffic Support: Native TCP routing allows direct connections to Druid’s query endpoints on port 8000, with internal routing to Druid’s coordinator (8081), broker (8082), and router (8888) ports
- Persistent Storage: Attach persistent volumes for local segment cache and metadata, ensuring data durability across deployments and enabling fast query performance
- Environment Management: Securely configure Druid properties through environment variables without exposing sensitive credentials in your repository
- Vertical Scaling: Easily adjust CPU and memory resources to match your query concurrency and data volume requirements
- GitHub Integration: Deploy directly from GitHub with automatic rebuilds when you update Druid configuration or dependencies
- Cost Efficiency: Start with single-server Druid deployments for development and testing, then scale to clustered configurations as your data grows
- Multi-Region Support: Deploy Druid instances in regions close to your data sources and users for optimal latency
- HTTP and TCP: Support both HTTP REST API access and native query protocols on the same deployment
Prerequisites
Before deploying Druid on Klutch.sh, ensure you have:
- A Klutch.sh account
- A GitHub account for repository hosting
- Docker installed locally for testing (optional but recommended)
- Basic understanding of data warehousing and analytics concepts
- Familiarity with SQL and time-series data
- (Recommended) A PostgreSQL database for metadata storage in production
- (Recommended) Object storage (S3-compatible) for deep storage in production
- (Optional) A Kafka cluster for streaming ingestion
Understanding Druid Architecture
Apache Druid uses a distributed, microservices-based architecture with specialized processes for different functions:
Core Processes
Master Server Processes:
- Coordinator: Manages data availability and segment balancing across Historical nodes
- Overlord: Manages data ingestion workloads and task distribution to MiddleManager nodes
Query Server Processes:
- Broker: Routes queries to appropriate data nodes and merges results
- Router: Optional API gateway that provides a unified endpoint for the Druid cluster
Data Server Processes:
- Historical: Serves queries on immutable, historical data segments
- MiddleManager: Ingests data and creates new segments
- Indexer: Alternative to MiddleManager for simplified deployment
External Dependencies
- Deep Storage: Long-term storage for segments (S3, HDFS, local filesystem, etc.)
- Metadata Storage: Stores segment metadata and system configuration (PostgreSQL, MySQL, or Derby)
- ZooKeeper: Coordinates internal service discovery and leader election
Deployment Modes
Micro-Quickstart (Development): All processes in a single JVM with Derby metadata storage
Single-Server (Small Production): Multiple JVM processes on one machine with external metadata storage
Clustered (Large Production): Distributed processes across multiple machines for high availability
For Klutch.sh deployments, we’ll focus on single-server configurations that can scale vertically, with external metadata and deep storage for production use.
Preparing Your Repository
To deploy Druid on Klutch.sh, you’ll create a GitHub repository with a Dockerfile and configuration files.
Step 1: Create Repository Structure
Create a new directory for your Druid deployment:
mkdir druid-klutchcd druid-klutchgit initCreate the following directory structure:
druid-klutch/├── Dockerfile├── docker-compose.yml # For local testing only├── conf/│ ├── druid/│ │ └── cluster/│ │ └── _common/│ │ ├── common.runtime.properties│ │ └── log4j2.xml│ └── supervise/│ └── single-server/│ └── micro-quickstart.conf├── scripts/│ └── start-druid.sh└── README.mdStep 2: Create the Dockerfile
Create a production-ready Dockerfile in your repository root:
# Use official Apache Druid image as baseFROM apache/druid:28.0.0
# Set environment variables for Java heap settings# These will be overridden by Klutch.sh environment variablesENV DRUID_XMX=1gENV DRUID_XMS=1gENV DRUID_MAXNEWSIZE=250mENV DRUID_NEWSIZE=250mENV DRUID_MAXDIRECTMEMORYSIZE=400m
# Set Druid service to start (single-server mode)# Options: micro-quickstart, small, medium, large, xlargeENV DRUID_SINGLE_SERVER_TYPE=micro-quickstart
# Set working directoryWORKDIR /opt/druid
# Copy custom configurations if presentCOPY conf/druid/cluster/_common/*.properties conf/druid/cluster/_common/ 2>/dev/null || trueCOPY conf/druid/cluster/_common/*.xml conf/druid/cluster/_common/ 2>/dev/null || true
# Copy custom startup scriptCOPY scripts/start-druid.sh /opt/druid/scripts/RUN chmod +x /opt/druid/scripts/start-druid.sh
# Create directories for persistent storageRUN mkdir -p /opt/druid/var/druid/segments \ && mkdir -p /opt/druid/var/druid/segment-cache \ && mkdir -p /opt/druid/var/druid/task \ && mkdir -p /opt/druid/var/tmp
# Expose Druid ports# 8081: Coordinator# 8082: Broker# 8083: Historical# 8090: Overlord# 8091: MiddleManager# 8888: Router (unified API endpoint)EXPOSE 8081 8082 8083 8090 8091 8888
# Health check on router endpointHEALTHCHECK --interval=30s --timeout=10s --start-period=60s --retries=3 \ CMD curl -f http://localhost:8888/status/health || exit 1
# Use custom start scriptCMD ["/opt/druid/scripts/start-druid.sh"]Step 3: Create Common Configuration
Create conf/druid/cluster/_common/common.runtime.properties:
# Extensionsdruid.extensions.loadList=["druid-histogram", "druid-datasketches", "druid-lookups-cached-global", "postgresql-metadata-storage", "druid-kafka-indexing-service", "druid-s3-extensions"]
# Loggingdruid.startup.logging.logProperties=true
# Zookeeper# For single-server, use embedded ZooKeeperdruid.zk.service.host=localhostdruid.zk.paths.base=/druid
# Metadata storage (Derby for quickstart, PostgreSQL for production)# Override these with environment variables in Klutch.shdruid.metadata.storage.type=derbydruid.metadata.storage.connector.connectURI=jdbc:derby://localhost:1527/var/druid/metadata.db;create=truedruid.metadata.storage.connector.host=localhostdruid.metadata.storage.connector.port=1527druid.metadata.storage.connector.createTables=true
# Deep storage (local for quickstart, S3 for production)druid.storage.type=localdruid.storage.storageDirectory=/opt/druid/var/druid/segments
# Indexing service logsdruid.indexer.logs.type=filedruid.indexer.logs.directory=/opt/druid/var/druid/indexing-logs
# Service discoverydruid.selectors.indexing.serviceName=druid/overlorddruid.selectors.coordinator.serviceName=druid/coordinator
# Monitoringdruid.monitoring.monitors=["org.apache.druid.java.util.metrics.JvmMonitor"]druid.emitter=noop
# Storage type of double columnsdruid.indexing.doubleStorage=double
# SQLdruid.sql.enable=true
# Lookupsdruid.lookup.enableLookupSyncOnStartup=falseStep 4: Create Startup Script
Create scripts/start-druid.sh:
#!/bin/bashset -e
echo "Starting Apache Druid in single-server mode: ${DRUID_SINGLE_SERVER_TYPE}"
# Override metadata storage if PostgreSQL credentials providedif [ ! -z "$POSTGRES_HOST" ]; then echo "Configuring PostgreSQL metadata storage..." cat >> /opt/druid/conf/druid/cluster/_common/common.runtime.properties <<EOF
# PostgreSQL metadata storage (production)druid.metadata.storage.type=postgresqldruid.metadata.storage.connector.connectURI=jdbc:postgresql://${POSTGRES_HOST}:${POSTGRES_PORT:-5432}/${POSTGRES_DB:-druid}druid.metadata.storage.connector.user=${POSTGRES_USER:-druid}druid.metadata.storage.connector.password=${POSTGRES_PASSWORD}druid.metadata.storage.connector.createTables=trueEOFfi
# Override deep storage if S3 credentials providedif [ ! -z "$S3_BUCKET" ]; then echo "Configuring S3 deep storage..." cat >> /opt/druid/conf/druid/cluster/_common/common.runtime.properties <<EOF
# S3 deep storage (production)druid.storage.type=s3druid.storage.bucket=${S3_BUCKET}druid.storage.baseKey=${S3_BASE_KEY:-druid/segments}druid.s3.accessKey=${S3_ACCESS_KEY}druid.s3.secretKey=${S3_SECRET_KEY}druid.s3.endpoint.url=${S3_ENDPOINT:-}EOFfi
# Set Java heap sizes from environment variablesexport DRUID_XMX=${DRUID_XMX:-1g}export DRUID_XMS=${DRUID_XMS:-1g}export DRUID_MAXNEWSIZE=${DRUID_MAXNEWSIZE:-250m}export DRUID_NEWSIZE=${DRUID_NEWSIZE:-250m}export DRUID_MAXDIRECTMEMORYSIZE=${DRUID_MAXDIRECTMEMORYSIZE:-400m}
# Start Druid in single-server modeexec /opt/druid/bin/start-${DRUID_SINGLE_SERVER_TYPE}Step 5: Create Docker Compose for Local Testing
Create docker-compose.yml for local development and testing:
version: "3.8"
services: postgres: image: postgres:16-alpine container_name: druid-postgres environment: POSTGRES_DB: druid POSTGRES_USER: druid POSTGRES_PASSWORD: druid_password ports: - "5432:5432" volumes: - postgres_data:/var/lib/postgresql/data healthcheck: test: ["CMD-SHELL", "pg_isready -U druid"] interval: 10s timeout: 5s retries: 5
druid: build: . container_name: druid-single ports: - "8888:8888" # Router - "8081:8081" # Coordinator - "8082:8082" # Broker - "8083:8083" # Historical - "8090:8090" # Overlord - "8091:8091" # MiddleManager environment: # Java heap settings DRUID_XMX: 2g DRUID_XMS: 2g DRUID_MAXNEWSIZE: 500m DRUID_NEWSIZE: 500m DRUID_MAXDIRECTMEMORYSIZE: 1g
# Server type DRUID_SINGLE_SERVER_TYPE: micro-quickstart
# PostgreSQL metadata storage POSTGRES_HOST: postgres POSTGRES_PORT: 5432 POSTGRES_DB: druid POSTGRES_USER: druid POSTGRES_PASSWORD: druid_password volumes: - druid_data:/opt/druid/var depends_on: postgres: condition: service_healthy healthcheck: test: ["CMD", "curl", "-f", "http://localhost:8888/status/health"] interval: 30s timeout: 10s retries: 3 start_period: 60s
volumes: postgres_data: druid_data:Step 6: Create Documentation
Create a README.md:
# Apache Druid on Klutch.sh
Real-time analytics database for fast queries at scale.
## Features
- Sub-second query latencies on large datasets- Real-time streaming ingestion from Kafka- Column-oriented storage with bitmap indexes- SQL and native query support- Horizontal scalability- High availability with automatic failover
## Local Development
Test locally with Docker Compose:
```bashdocker-compose up -dAccess Druid web console at: http://localhost:8888
Production Deployment on Klutch.sh
Required Environment Variables
Set these in the Klutch.sh dashboard:
Java Heap Configuration:
DRUID_XMX: Maximum heap size (e.g.,4g)DRUID_XMS: Initial heap size (e.g.,4g)DRUID_MAXNEWSIZE: Max new generation size (e.g.,1g)DRUID_NEWSIZE: Initial new generation size (e.g.,1g)DRUID_MAXDIRECTMEMORYSIZE: Max direct memory (e.g.,2g)
PostgreSQL Metadata Storage (Recommended for production):
POSTGRES_HOST: PostgreSQL hostPOSTGRES_PORT: PostgreSQL port (default: 5432)POSTGRES_DB: Database name (e.g.,druid)POSTGRES_USER: Database userPOSTGRES_PASSWORD: Database password
S3 Deep Storage (Recommended for production):
S3_BUCKET: S3 bucket nameS3_BASE_KEY: Base path in bucket (default:druid/segments)S3_ACCESS_KEY: AWS access keyS3_SECRET_KEY: AWS secret keyS3_ENDPOINT: S3 endpoint (optional, for S3-compatible storage)
Persistent Volumes
Attach a persistent volume for local caching and temporary storage:
- Mount Path:
/opt/druid/var - Recommended Size: 50GB-200GB depending on query volume
Traffic Configuration
- Traffic Type: Select HTTP for web console and API access
- Internal Port:
8888(Router endpoint)
Alternative for programmatic access:
- Traffic Type: TCP for native Druid client connections
- Internal Port:
8082(Broker endpoint)
Usage
Web Console
Access the Druid web console to:
- Load data through the data loader wizard
- Execute SQL queries
- Monitor ingestion tasks
- View datasources and segments
SQL Queries
Query Druid using standard SQL:
SELECT TIME_FLOOR(__time, 'PT1H') AS hour, COUNT(*) AS event_count, SUM(bytes_sent) AS total_bytesFROM eventsWHERE __time >= CURRENT_TIMESTAMP - INTERVAL '24' HOURGROUP BY 1ORDER BY 1 DESCStreaming Ingestion
Ingest data from Kafka using supervisor specs or the web console data loader.
License
Apache License 2.0
### Step 7: Initialize Git and Push to GitHub
```bash# Add all filesgit add .
# Commitgit commit -m "Initial Druid deployment configuration"
# Add GitHub remote (replace with your repository URL)git remote add origin https://github.com/yourusername/druid-klutch.git
# Push to GitHubgit branch -M maingit push -u origin mainDeploying to Klutch.sh
Now that your repository is prepared, follow these steps to deploy Apache Druid on Klutch.sh.
Deployment Steps
- **Navigate to Klutch.sh Dashboard**
Visit klutch.sh/app and log in to your account.
- **Create a New Project**
Click “New Project” and give it a name like “Druid Analytics” to organize your deployment.
- **Create a New App**
Click “New App” or “Create App” and select GitHub as your source.
- **Connect Your Repository**
- Authenticate with GitHub if not already connected
- Select your Druid repository from the list
- Choose the
mainbranch for deployment
- **Configure Application Settings**
- App Name: Choose a unique name (e.g.,
druid-analytics) - Traffic Type: Select HTTP for web console access
- Internal Port: Set to
8888(Druid Router endpoint)
For programmatic access via native Druid protocol, you can alternatively use:
- Traffic Type: TCP
- Internal Port:
8082(Broker endpoint)
- App Name: Choose a unique name (e.g.,
- **Set Environment Variables**
Configure these environment variables in the Klutch.sh dashboard:
Required - Java Heap Configuration:
DRUID_XMX=4gDRUID_XMS=4gDRUID_MAXNEWSIZE=1gDRUID_NEWSIZE=1gDRUID_MAXDIRECTMEMORYSIZE=2gRecommended - PostgreSQL Metadata Storage:
First, deploy PostgreSQL using our PostgreSQL guide, then configure:
POSTGRES_HOST=your-postgres-app.klutch.shPOSTGRES_PORT=8000POSTGRES_DB=druidPOSTGRES_USER=druidPOSTGRES_PASSWORD=your-secure-passwordRecommended - S3 Deep Storage:
S3_BUCKET=your-druid-bucketS3_BASE_KEY=druid/segmentsS3_ACCESS_KEY=your-access-keyS3_SECRET_KEY=your-secret-keyFor S3-compatible storage (MinIO, Wasabi, etc.):
S3_ENDPOINT=https://s3.your-provider.comOptional - Server Sizing:
DRUID_SINGLE_SERVER_TYPE=smallOptions:
micro-quickstart,small,medium,large,xlarge - **Attach Persistent Volume**
Critical for local segment caching and temporary storage:
- Click “Add Volume” in the Volumes section
- Mount Path:
/opt/druid/var - Size: 50GB minimum, 100-200GB recommended for production
This volume stores:
- Segment cache for fast query performance
- Task logs and temporary ingestion files
- ZooKeeper data for embedded mode
- **Deploy Application**
Click “Create” or “Deploy” to start the deployment. Klutch.sh will:
- Automatically detect your Dockerfile
- Build the Docker image with your Druid configuration
- Attach the persistent volume
- Start your Druid container
- Assign a URL for external access
The first deployment takes 3-5 minutes as Druid initializes metadata tables and starts all services.
- **Verify Deployment**
Once deployed, your Druid instance will be available at:
https://your-app-name.klutch.shAccess the Druid web console by visiting this URL in your browser. You should see:
- The Druid console home page
- Available datasources (empty on first deployment)
- Status indicators showing all services running
- **Test Database Connection**
Verify Druid is running properly:
Via Web Console:
- Navigate to the Query view
- Execute a test query:
SELECT 1 - Verify successful execution
Via HTTP API:
Terminal window curl https://your-app-name.klutch.sh/status/healthExpected response:
{"status":"healthy"}Via SQL endpoint:
Terminal window curl -X POST \-H 'Content-Type: application/json' \https://your-app-name.klutch.sh/druid/v2/sql \-d '{"query": "SELECT 1"}'
Connecting to Druid
Once deployed, you can connect to Druid from your applications using various methods and client libraries.
Connection URL Formats
HTTP REST API:
https://example-app.klutch.sh/druid/v2SQL endpoint:
https://example-app.klutch.sh/druid/v2/sqlWeb Console:
https://example-app.klutch.shNative Query (TCP traffic):
If deployed with TCP traffic on broker port:
example-app.klutch.sh:8000Example Connection Code
Python (using pydruid)
from pydruid.client import PyDruid
# Connect to Druiddruid = PyDruid( 'https://example-app.klutch.sh', 'druid/v2/')
# Execute a native Druid queryresult = druid.timeseries( datasource='events', granularity='hour', intervals='2024-01-01/2024-01-02', aggregations={'count': doublesum('count')}, filter=Dimension('country') == 'US')
print(result)Python (using SQL with requests)
import requestsimport json
# SQL query endpointurl = 'https://example-app.klutch.sh/druid/v2/sql'
# Execute SQL queryquery = { "query": """ SELECT TIME_FLOOR(__time, 'PT1H') AS hour, COUNT(*) AS event_count, SUM(bytes_sent) AS total_bytes FROM events WHERE __time >= CURRENT_TIMESTAMP - INTERVAL '24' HOUR GROUP BY 1 ORDER BY 1 DESC """}
response = requests.post( url, headers={'Content-Type': 'application/json'}, data=json.dumps(query))
results = response.json()for row in results: print(f"Hour: {row['hour']}, Events: {row['event_count']}, Bytes: {row['total_bytes']}")Node.js (using axios)
const axios = require('axios');
// Druid SQL endpointconst druidUrl = 'https://example-app.klutch.sh/druid/v2/sql';
// Execute SQL queryasync function queryDruid() { const query = { query: ` SELECT TIME_FLOOR(__time, 'PT1H') AS hour, COUNT(*) AS event_count FROM events WHERE __time >= CURRENT_TIMESTAMP - INTERVAL '24' HOUR GROUP BY 1 ORDER BY 1 DESC ` };
try { const response = await axios.post(druidUrl, query, { headers: { 'Content-Type': 'application/json' } });
console.log('Query results:', response.data); return response.data; } catch (error) { console.error('Query failed:', error.message); throw error; }}
// Run queryqueryDruid();Java (using Druid SQL JDBC)
import java.sql.*;
public class DruidExample { public static void main(String[] args) { String url = "jdbc:avatica:remote:url=https://example-app.klutch.sh/druid/v2/sql/avatica/";
try (Connection conn = DriverManager.getConnection(url)) { String sql = """ SELECT TIME_FLOOR(__time, 'PT1H') AS hour, COUNT(*) AS event_count FROM events WHERE __time >= CURRENT_TIMESTAMP - INTERVAL '24' HOUR GROUP BY 1 ORDER BY 1 DESC """;
try (Statement stmt = conn.createStatement(); ResultSet rs = stmt.executeQuery(sql)) {
while (rs.next()) { System.out.println( "Hour: " + rs.getTimestamp("hour") + ", Events: " + rs.getLong("event_count") ); } } } catch (SQLException e) { e.printStackTrace(); } }}Go (using HTTP client)
package main
import ( "bytes" "encoding/json" "fmt" "io/ioutil" "net/http")
type SQLQuery struct { Query string `json:"query"`}
type QueryResult struct { Hour string `json:"hour"` EventCount int64 `json:"event_count"`}
func main() { druidURL := "https://example-app.klutch.sh/druid/v2/sql"
query := SQLQuery{ Query: ` SELECT TIME_FLOOR(__time, 'PT1H') AS hour, COUNT(*) AS event_count FROM events WHERE __time >= CURRENT_TIMESTAMP - INTERVAL '24' HOUR GROUP BY 1 ORDER BY 1 DESC `, }
jsonData, _ := json.Marshal(query)
resp, err := http.Post( druidURL, "application/json", bytes.NewBuffer(jsonData), ) if err != nil { panic(err) } defer resp.Body.Close()
body, _ := ioutil.ReadAll(resp.Body)
var results []QueryResult json.Unmarshal(body, &results)
for _, r := range results { fmt.Printf("Hour: %s, Events: %d\n", r.Hour, r.EventCount) }}Ruby (using HTTP client)
require 'net/http'require 'json'require 'uri'
# Druid SQL endpointuri = URI('https://example-app.klutch.sh/druid/v2/sql')
# SQL queryquery = { query: <<~SQL SELECT TIME_FLOOR(__time, 'PT1H') AS hour, COUNT(*) AS event_count FROM events WHERE __time >= CURRENT_TIMESTAMP - INTERVAL '24' HOUR GROUP BY 1 ORDER BY 1 DESC SQL}
# Execute queryhttp = Net::HTTP.new(uri.host, uri.port)http.use_ssl = true
request = Net::HTTP::Post.new(uri.path)request['Content-Type'] = 'application/json'request.body = query.to_json
response = http.request(request)results = JSON.parse(response.body)
results.each do |row| puts "Hour: #{row['hour']}, Events: #{row['event_count']}"endPHP (using cURL)
<?php
// Druid SQL endpoint$druidUrl = 'https://example-app.klutch.sh/druid/v2/sql';
// SQL query$query = [ 'query' => " SELECT TIME_FLOOR(__time, 'PT1H') AS hour, COUNT(*) AS event_count FROM events WHERE __time >= CURRENT_TIMESTAMP - INTERVAL '24' HOUR GROUP BY 1 ORDER BY 1 DESC "];
// Execute query using cURL$ch = curl_init($druidUrl);curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);curl_setopt($ch, CURLOPT_POST, true);curl_setopt($ch, CURLOPT_POSTFIELDS, json_encode($query));curl_setopt($ch, CURLOPT_HTTPHEADER, [ 'Content-Type: application/json']);
$response = curl_exec($ch);$httpCode = curl_getinfo($ch, CURLINFO_HTTP_CODE);curl_close($ch);
if ($httpCode === 200) { $results = json_decode($response, true);
foreach ($results as $row) { echo sprintf( "Hour: %s, Events: %d\n", $row['hour'], $row['event_count'] ); }} else { echo "Query failed with HTTP code: $httpCode\n";}Getting Started with Druid
After deploying Druid on Klutch.sh, follow these steps to load data and run your first queries.
Loading Sample Data
The easiest way to get started is through the Druid web console’s data loader:
- **Access Web Console**
Navigate to
https://your-app-name.klutch.sh - **Open Data Loader**
Click “Load data” from the home page or navigate to the Ingestion view.
- **Choose Data Source**
Select from various options:
- Local disk: Upload a file directly
- HTTP: Load data from a URL
- Inline: Paste data directly
- Kafka: Connect to a Kafka topic
- Amazon Kinesis: Stream from Kinesis
For testing, choose “Example data” to load a sample dataset.
- **Configure Ingestion**
Follow the wizard to:
- Parse your data format (JSON, CSV, etc.)
- Define time column and parsing format
- Configure dimensions and metrics
- Set rollup and partitioning options
- Review and submit ingestion task
- **Monitor Ingestion**
Watch the task progress in the Ingestion view. Once complete, your data will be queryable immediately.
Running Your First Query
Execute SQL queries through the web console:
- **Navigate to Query View**
Click “Query” in the top navigation.
- **Write Your SQL**
Enter a SQL query:
SELECTTIME_FLOOR(__time, 'PT1H') AS hour,COUNT(*) AS eventsFROM wikipediaGROUP BY 1ORDER BY 1 DESCLIMIT 24 - **Execute Query**
Click “Run” or press Ctrl+Enter (Cmd+Enter on Mac).
- **View Results**
Results appear in a table below the query editor. You can:
- Export results to CSV or JSON
- Visualize data with built-in charts
- Save queries for later use
Streaming Ingestion from Kafka
To ingest real-time data from Kafka, you’ll need a Kafka cluster. Deploy one using our Kafka deployment guide.
Create a supervisor spec for streaming ingestion:
{ "type": "kafka", "spec": { "dataSchema": { "dataSource": "events", "timestampSpec": { "column": "timestamp", "format": "iso" }, "dimensionsSpec": { "dimensions": [ "user_id", "event_type", "country", "device" ] }, "metricsSpec": [ { "type": "count", "name": "count" }, { "type": "longSum", "name": "bytes_sent", "fieldName": "bytes" } ], "granularitySpec": { "type": "uniform", "segmentGranularity": "HOUR", "queryGranularity": "MINUTE", "rollup": true } }, "ioConfig": { "topic": "events", "consumerProperties": { "bootstrap.servers": "your-kafka-app.klutch.sh:8000" }, "taskCount": 1, "replicas": 1, "taskDuration": "PT1H" }, "tuningConfig": { "type": "kafka", "maxRowsPerSegment": 5000000 } }}Submit this spec through the web console or API:
curl -X POST \ -H 'Content-Type: application/json' \ https://your-app-name.klutch.sh/druid/indexer/v1/supervisor \ -d @supervisor-spec.jsonBatch Ingestion from Files
Ingest data from local files or cloud storage:
{ "type": "index_parallel", "spec": { "dataSchema": { "dataSource": "events", "timestampSpec": { "column": "timestamp", "format": "iso" }, "dimensionsSpec": { "dimensions": ["user_id", "event_type"] }, "metricsSpec": [ {"type": "count", "name": "count"} ], "granularitySpec": { "type": "uniform", "segmentGranularity": "DAY", "queryGranularity": "HOUR" } }, "ioConfig": { "type": "index_parallel", "inputSource": { "type": "http", "uris": ["https://example.com/data.json"] }, "inputFormat": { "type": "json" } }, "tuningConfig": { "type": "index_parallel", "maxRowsPerSegment": 5000000 } }}Advanced Configuration
Metadata Storage Configuration
For production deployments, use PostgreSQL for metadata storage instead of embedded Derby.
First, deploy PostgreSQL following our PostgreSQL guide. Then configure Druid to use it:
Environment Variables:
POSTGRES_HOST=your-postgres-app.klutch.shPOSTGRES_PORT=8000POSTGRES_DB=druidPOSTGRES_USER=druidPOSTGRES_PASSWORD=your-secure-passwordThe startup script automatically configures Druid to use PostgreSQL when these variables are set.
Manual Configuration (in common.runtime.properties):
druid.metadata.storage.type=postgresqldruid.metadata.storage.connector.connectURI=jdbc:postgresql://your-postgres-app.klutch.sh:8000/druiddruid.metadata.storage.connector.user=druiddruid.metadata.storage.connector.password=your-secure-passworddruid.metadata.storage.connector.createTables=trueDeep Storage Configuration
Configure S3 or S3-compatible storage for segment archival:
AWS S3:
S3_BUCKET=my-druid-segmentsS3_BASE_KEY=production/segmentsS3_ACCESS_KEY=AKIAIOSFODNN7EXAMPLES3_SECRET_KEY=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEYS3-Compatible Storage (MinIO, Wasabi, etc.):
S3_BUCKET=druid-segmentsS3_BASE_KEY=segmentsS3_ACCESS_KEY=minioadminS3_SECRET_KEY=minioadminS3_ENDPOINT=https://minio.example.comManual Configuration (in common.runtime.properties):
druid.storage.type=s3druid.storage.bucket=my-druid-segmentsdruid.storage.baseKey=production/segmentsdruid.s3.accessKey=AKIAIOSFODNN7EXAMPLEdruid.s3.secretKey=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEYJava Heap Tuning
Adjust Java heap sizes based on your workload:
Small Workload (2-4GB RAM available):
DRUID_XMX=2gDRUID_XMS=2gDRUID_MAXNEWSIZE=500mDRUID_NEWSIZE=500mDRUID_MAXDIRECTMEMORYSIZE=1gMedium Workload (8-16GB RAM available):
DRUID_XMX=8gDRUID_XMS=8gDRUID_MAXNEWSIZE=2gDRUID_NEWSIZE=2gDRUID_MAXDIRECTMEMORYSIZE=4gLarge Workload (32GB+ RAM available):
DRUID_XMX=16gDRUID_XMS=16gDRUID_MAXNEWSIZE=4gDRUID_NEWSIZE=4gDRUID_MAXDIRECTMEMORYSIZE=8gGuidelines:
- Set
DRUID_XMXandDRUID_XMSto the same value to avoid heap resizing - Allocate 50-75% of available RAM to heap memory
- Reserve RAM for direct memory and OS cache
- New generation size should be 25-30% of max heap
Query Performance Tuning
Optimize query performance with these settings:
Enable Query Caching:
Add to common.runtime.properties:
# Enable cachingdruid.cache.type=caffeinedruid.cache.sizeInBytes=256000000druid.cache.expireAfter=3600000
# Broker cache configdruid.broker.cache.useCache=truedruid.broker.cache.populateCache=true
# Historical cache configdruid.historical.cache.useCache=truedruid.historical.cache.populateCache=trueSegment Cache Size:
Adjust how much of the segment cache Historical nodes keep in memory:
# In Historical node configdruid.segmentCache.locations=[{"path":"/opt/druid/var/druid/segment-cache","maxSize":10737418240}]druid.server.maxSize=10737418240Parallel Query Processing:
# Enable parallel query processingdruid.processing.buffer.sizeBytes=134217728druid.processing.numThreads=7druid.processing.numMergeBuffers=2Security Configuration
Enable authentication and authorization:
Basic Authentication:
# Enable basic authdruid.auth.authenticatorChain=["basic"]druid.auth.authenticator.basic.type=basicdruid.auth.authenticator.basic.initialAdminPassword=admin123druid.auth.authenticator.basic.initialInternalClientPassword=internal123druid.auth.authenticator.basic.credentialsValidator.type=metadatadruid.auth.authenticator.basic.skipOnFailure=false
# Enable authorizationdruid.auth.authorizers=["basic"]druid.auth.authorizer.basic.type=basicTLS/SSL:
To enable HTTPS for Druid endpoints:
# Enable TLSdruid.enablePlaintextPort=falsedruid.enableTlsPort=truedruid.tls.keyStorePath=/path/to/keystore.jksdruid.tls.keyStorePassword=keystorePassworddruid.tls.certAlias=druidNote: When deployed on Klutch.sh, HTTPS is provided automatically by the platform. Internal Druid communication can use plaintext.
Monitoring and Metrics
Druid emits metrics that can be consumed by monitoring systems:
Enable Prometheus Metrics:
Add to common.runtime.properties:
druid.emitter=composingdruid.emitter.composing.emitters=["prometheus"]druid.emitter.prometheus.strategy=exporterdruid.emitter.prometheus.port=9090Key Metrics to Monitor:
query/time: Query execution timequery/bytes: Bytes processed per querysegment/scan/pending: Pending segment scansjvm/mem/used: JVM memory usageingest/events/processed: Events ingestedsegment/count: Total segmentssegment/size: Total segment size
Health Check Endpoints:
# Overall cluster healthcurl https://your-app-name.klutch.sh/status/health
# Coordinator statuscurl https://your-app-name.klutch.sh/druid/coordinator/v1/leader
# Datasourcescurl https://your-app-name.klutch.sh/druid/coordinator/v1/datasourcesProduction Best Practices
Resource Allocation
CPU Requirements:
- Minimum: 2 CPU cores for micro-quickstart
- Recommended: 4-8 CPU cores for production workloads
- Druid scales well with CPU - more cores enable better query parallelism
Memory Requirements:
- Minimum: 4GB RAM for testing
- Small production: 8-16GB RAM
- Medium production: 32-64GB RAM
- Large production: 128GB+ RAM
Storage Requirements:
- Persistent volume for segment cache: 50-200GB
- Deep storage (S3): Based on data retention policy
- Metadata storage (PostgreSQL): 10-50GB depending on datasources
Sizing Formula:
Required RAM = (Heap Memory + Direct Memory + OS Cache)Heap Memory ≈ 50-60% of total RAMDirect Memory ≈ 20-30% of total RAMOS Cache ≈ 20-30% of total RAMHigh Availability Setup
For production deployments requiring high availability:
Multiple Druid Instances:
- Deploy multiple single-server Druid instances behind a load balancer
- Each instance can serve queries independently
- Share the same metadata storage and deep storage
External Dependencies:
- Use managed PostgreSQL with replication for metadata
- Use cloud object storage (S3) for deep storage with built-in redundancy
- Deploy external ZooKeeper cluster for coordination (advanced)
Health Checks:
- Configure load balancer health checks on
/status/health - Set up monitoring alerts for service availability
- Implement automatic failover for coordinator/overlord roles
Backup and Recovery
Metadata Backup:
Regular backups of PostgreSQL metadata database:
# Backup metadatapg_dump -h your-postgres-app.klutch.sh -p 8000 -U druid druid > druid_metadata_backup.sql
# Restore metadatapsql -h your-postgres-app.klutch.sh -p 8000 -U druid druid < druid_metadata_backup.sqlSegment Backup:
Deep storage automatically serves as segment backup. Segments are immutable and safe in S3.
Disaster Recovery Plan:
- Maintain regular metadata database backups
- Ensure deep storage has versioning enabled
- Document configuration in version control (your GitHub repository)
- Test recovery procedures regularly
- Keep runbooks for common failure scenarios
Segment Retention:
Configure retention policies to automatically drop old data:
# Drop segments older than 90 daysdruid.coordinator.period=PT300Sdruid.coordinator.rules=[ {"type": "loadByInterval", "interval": "PT90D/PT0S", "tieredReplicants": {"_default_tier": 2}}, {"type": "dropForever"}]Security Hardening
Authentication:
- Enable basic authentication with strong passwords
- Rotate credentials regularly
- Use separate credentials for internal and external access
Authorization:
- Implement role-based access control (RBAC)
- Restrict datasource access by user role
- Audit query and ingestion operations
Network Security:
- Use HTTPS for all external communications (provided by Klutch.sh)
- Restrict database access to Druid’s IP address
- Use VPC peering for cloud resources when possible
Secrets Management:
- Store sensitive credentials in environment variables
- Never commit secrets to version control
- Rotate database and S3 credentials periodically
Query Limits:
Prevent resource exhaustion from expensive queries:
# Limit query timedruid.server.http.maxQueryTimeout=300000
# Limit concurrent queriesdruid.broker.http.numConnections=20
# Limit result sizedruid.server.http.maxSubqueryRows=100000Performance Optimization
Segment Optimization:
- Use appropriate segment granularity (HOUR, DAY, WEEK)
- Smaller segments improve parallelism
- Larger segments reduce metadata overhead
- Aim for 5-10 million rows per segment
Query Optimization:
- Use filters to reduce data scanned
- Leverage rollup for pre-aggregation
- Create appropriate indexes on filter columns
- Avoid SELECT * queries
Ingestion Optimization:
- Batch ingestion: Use parallel ingestion for large datasets
- Streaming ingestion: Tune taskCount and taskDuration
- Enable rollup to reduce segment size
- Use appropriate queryGranularity for your use case
Caching Strategy:
- Enable caching on Broker and Historical nodes
- Set appropriate cache expiration times
- Monitor cache hit rates
- Size cache based on working set
Monitoring and Alerting
Key Metrics to Track:
-
Query Performance:
- Average query time
- 95th/99th percentile latency
- Query failures
-
Ingestion Health:
- Ingestion task success rate
- Lag for streaming ingestion
- Segment creation rate
-
Resource Utilization:
- JVM heap usage
- Direct memory usage
- CPU utilization
- Disk I/O
-
Cluster Health:
- Service availability
- Segment availability
- Failed tasks
Alerting Thresholds:
Critical:- Any service down > 1 minute- Query failure rate > 5%- JVM heap usage > 90%- Disk usage > 85%
Warning:- Query latency p95 > 5 seconds- Ingestion lag > 10 minutes- Heap usage > 75%- Cache hit rate < 50%Scaling Strategies
Vertical Scaling:
Start with vertical scaling for simplicity:
- Increase CPU cores for better query parallelism
- Add RAM for larger segment cache
- Adjust Java heap sizes proportionally
- Monitor resource utilization to identify bottlenecks
Horizontal Scaling:
When vertical scaling is insufficient:
- Deploy dedicated Historical nodes for queries
- Deploy dedicated MiddleManager nodes for ingestion
- Separate Coordinator and Overlord from data nodes
- Use external ZooKeeper cluster
Data Tiering:
Optimize costs with hot/cold data tiers:
- Recent data on fast SSD storage (hot tier)
- Historical data on cheaper storage (cold tier)
- Configure rules for automatic tier movement
- Use different replica counts per tier
Troubleshooting
Issue: Druid Fails to Start
Symptoms: Container starts but Druid processes don’t initialize
Possible Causes and Solutions:
- Insufficient Memory:
Check Java heap configuration:
# View container logs# Look for "OutOfMemoryError" or "Cannot reserve enough space for object heap"Solution: Increase heap size or container memory:
DRUID_XMX=4gDRUID_XMS=4g- Metadata Storage Connection Failed:
Check PostgreSQL connectivity:
# Test PostgreSQL connection from Druid containercurl https://your-postgres-app.klutch.sh:8000Solution: Verify PostgreSQL credentials and network connectivity:
POSTGRES_HOST=your-postgres-app.klutch.shPOSTGRES_PORT=8000POSTGRES_DB=druidPOSTGRES_USER=druidPOSTGRES_PASSWORD=correct-password- Port Conflicts:
Check if ports are already in use:
Solution: Ensure no other services are using Druid’s ports (8081-8091, 8888).
- Persistent Volume Issues:
Verify volume is mounted correctly:
# Check if volume is accessiblels -la /opt/druid/varSolution: Ensure persistent volume is attached at /opt/druid/var.
Issue: Slow Query Performance
Symptoms: Queries take longer than expected
Troubleshooting Steps:
- Check Query Plan:
Use EXPLAIN PLAN to understand query execution:
EXPLAIN PLAN FORSELECT COUNT(*) FROM events WHERE country = 'US'- Verify Segment Pruning:
Ensure time filters enable segment pruning:
-- Good: Uses time filterSELECT COUNT(*) FROM eventsWHERE __time >= CURRENT_TIMESTAMP - INTERVAL '1' HOUR
-- Bad: Scans all segmentsSELECT COUNT(*) FROM events- Check Segment Cache:
Verify Historical nodes are caching segments:
curl https://your-app-name.klutch.sh/druid/coordinator/v1/servers?simple- Monitor Resource Usage:
Check CPU and memory in Klutch.sh dashboard:
- High CPU: Increase cores or optimize query
- High memory: Increase heap size or reduce cache size
- Optimize Segment Size:
Merge small segments:
curl -X POST https://your-app-name.klutch.sh/druid/coordinator/v1/compact/tasks \ -H 'Content-Type: application/json' \ -d '{"dataSource": "events"}'Issue: Ingestion Task Fails
Symptoms: Data not appearing in datasource, failed tasks in console
Common Causes:
- Invalid Data Format:
Check task logs for parsing errors:
curl https://your-app-name.klutch.sh/druid/indexer/v1/task/{taskId}/logSolution: Verify input format matches data:
{ "inputFormat": { "type": "json", "flattenSpec": { "useFieldDiscovery": true } }}- Timestamp Parsing Failed:
Ensure timestamp format is correct:
{ "timestampSpec": { "column": "timestamp", "format": "iso" }}Common formats:
iso: ISO 8601 (e.g., 2024-01-01T12:00:00Z)millis: Unix millisecondsauto: Auto-detect formatyyyy-MM-dd HH:mm:ss: Custom format
- Insufficient Resources:
Check MiddleManager capacity:
curl https://your-app-name.klutch.sh/druid/indexer/v1/workersSolution: Adjust worker capacity or increase container resources.
- Kafka Connection Issues:
For Kafka ingestion, verify connectivity:
# From Druid containercurl your-kafka-app.klutch.sh:8000Solution: Check Kafka broker configuration and network access.
Issue: High Memory Usage
Symptoms: Container running out of memory, JVM crashes
Solutions:
- Reduce Heap Size:
If heap is too large, reduce it:
DRUID_XMX=4gDRUID_XMS=4gEnsure: Heap + Direct Memory + OS < Total Container RAM
- Reduce Segment Cache:
Limit segment cache size in common.runtime.properties:
druid.segmentCache.locations=[{"path":"/opt/druid/var/druid/segment-cache","maxSize":5368709120}]- Reduce Processing Buffer:
Lower processing buffer size:
druid.processing.buffer.sizeBytes=67108864druid.processing.numThreads=4- Enable Caching Limits:
Configure cache eviction:
druid.cache.type=caffeinedruid.cache.sizeInBytes=128000000- Increase Container Resources:
Scale up container in Klutch.sh dashboard to provide more RAM.
Issue: Cannot Connect to Druid
Symptoms: Unable to access web console or API
Troubleshooting Steps:
- Verify Deployment Status:
Check Klutch.sh dashboard for deployment status and logs.
- Check Port Configuration:
Ensure internal port is set correctly:
- HTTP traffic: Port 8888 (Router)
- TCP traffic: Port 8082 (Broker)
- Test Health Endpoint:
curl https://your-app-name.klutch.sh/status/healthExpected: {"status":"healthy"}
- Check Firewall Rules:
Ensure no firewall blocking traffic to Klutch.sh domain.
- Verify Service Status:
Check if all Druid services started:
curl https://your-app-name.klutch.sh/druid/coordinator/v1/serversIssue: Segments Not Loading
Symptoms: Data ingested but not queryable
Troubleshooting Steps:
- Check Segment Availability:
curl https://your-app-name.klutch.sh/druid/coordinator/v1/datasources/{datasource}/loadstatus- Verify Deep Storage Access:
Ensure S3 credentials are correct and bucket is accessible.
- Check Historical Node Capacity:
curl https://your-app-name.klutch.sh/druid/coordinator/v1/servers?simple- Review Coordinator Logs:
Check for segment assignment errors in logs.
- Force Segment Load:
Manually trigger segment loading:
curl -X POST https://your-app-name.klutch.sh/druid/coordinator/v1/datasources/{datasource}Additional Resources
- Apache Druid Documentation
- Druid SQL Documentation
- Data Ingestion Guide
- API Reference
- Druid Docker Image
- Druid GitHub Repository
- Performance Tuning Guide
- Security Overview
Related Guides
- PostgreSQL - Deploy PostgreSQL for Druid metadata storage
- Kafka - Stream real-time data into Druid
- ClickHouse - Alternative analytics database
- Metabase - Visualize Druid data with dashboards
You now have Apache Druid running on Klutch.sh! Your real-time analytics database is ready to ingest streaming data, serve fast queries, and power interactive dashboards. Start loading data through the web console, configure metadata and deep storage for production, and scale your deployment as your analytics needs grow.