Deploying DevLake

Introduction

Apache DevLake is a powerful open-source dev data platform that helps engineering teams collect, analyze, and visualize development metrics from multiple data sources. It integrates with popular tools like GitHub, GitLab, Jira, Jenkins, and more to provide comprehensive insights into your engineering processes, DORA metrics, sprint analytics, and code quality.

Deploying DevLake on Klutch.sh gives you a managed, scalable platform for running your engineering metrics dashboard with support for persistent storage, secure environment variables, and production-grade reliability. This guide walks you through deploying DevLake using a Dockerfile, configuring databases, setting up persistent volumes, and following production best practices.

Prerequisites

A Klutch.sh account
A GitHub repository for your DevLake deployment
Basic familiarity with Docker and environment variables
A MySQL or PostgreSQL database (DevLake requires an external database for production)
API tokens/credentials for the data sources you want to connect (GitHub, GitLab, Jira, etc.)

Understanding DevLake Architecture

DevLake consists of several components:

Config UI: Web interface for configuring data sources and connections
API Server: Backend API that handles data collection and transformations
Database: MySQL or PostgreSQL for storing collected metrics and configurations
Grafana: Visualization dashboard for viewing metrics and analytics (optional but recommended)

For production deployments, you’ll need:

A DevLake instance (API + Config UI)
A MySQL or PostgreSQL database
Persistent storage for configuration and logs
(Optional) Grafana for advanced visualizations

1. Prepare Your Repository

Create a new GitHub repository for your DevLake deployment or use an existing one. Your repository should contain:

devlake-deployment/
├── Dockerfile
├── .env.example
└── README.md

Important: Never commit secrets or credentials to your repository. Use Klutch.sh environment variables for all sensitive data.

2. Sample Dockerfile

Create a Dockerfile in your repository root. Klutch.sh will automatically detect and use it for deployment.

FROM apache/devlake:latest

# Set working directory
WORKDIR /app

# Expose ports
# 4000 - Config UI
# 8080 - API Server
EXPOSE 4000 8080

# Health check
HEALTHCHECK --interval=30s --timeout=10s --start-period=60s --retries=3 \
  CMD curl -f http://localhost:4000/api/ping || exit 1

# The base image already includes the entrypoint
# No need to override CMD unless customizing startup

Notes:

The official apache/devlake image includes both the Config UI and API server
Port 4000 is for the Config UI (web interface)
Port 8080 is for the API server
Pin to a specific version tag (e.g., apache/devlake:v0.20.0) for production to ensure reproducible builds

3. Deploying to Klutch.sh

Follow these steps to deploy DevLake on Klutch.sh:

Push Your Repository

Push your repository (including the Dockerfile) to GitHub.

git add .
git commit -m "Add DevLake Dockerfile"
git push origin main

Create a New Project

Set Up Database

Before deploying DevLake, set up a MySQL or PostgreSQL database. You can use a managed database service or deploy one on Klutch.sh. Note the following connection details:

Database host
Database port (usually 3306 for MySQL, 5432 for PostgreSQL)
Database name
Database user
Database password

Create the DevLake App

Create a new app in your project:

Select Repository: Choose your GitHub repository with the Dockerfile
Select Branch: Choose the branch you want to deploy (e.g., main)
Traffic Type: Select HTTP
Internal Port: Set to 4000 (this is the Config UI port)
Region: Choose your preferred region
Compute: Select appropriate resources (minimum 2GB RAM recommended)

Configure Environment Variables

Add the following environment variables in the Klutch.sh app settings (mark sensitive values as secrets):

Database Configuration (MySQL example):

DB_URL=mysql://username:password@host:3306/dbname?charset=utf8mb4&parseTime=True&loc=UTC

Database Configuration (PostgreSQL example):

DB_URL=postgres://username:password@host:5432/dbname?sslmode=disable

API Configuration:

PORT=8080
ENABLE_GRAFANA=true
GRAFANA_ENDPOINT=https://your-grafana-instance.klutch.sh

Encryption Key (generate a random 128-character string):

ENCRYPTION_SECRET=your-random-128-character-encryption-key

Attach Persistent Volume

DevLake stores configuration files and logs that should persist across deployments.

In your app settings, attach a persistent volume:

Mount Path: /app/.config
Size: At least 5GB (adjust based on your data volume)

This ensures your data source configurations and logs are preserved during restarts and updates.

Deploy the App

Click “Create” to deploy. Klutch.sh will:

Build the Docker image from your Dockerfile
Deploy the container with your configuration
Mount the persistent volume
Make the app available at example-app.klutch.sh

4. Initial Configuration

Once deployed, access your DevLake instance at your app URL (e.g., https://example-app.klutch.sh).

Access the Config UI

Open your DevLake app URL in a browser. You’ll see the DevLake Config UI.

Set Up Database Connection

On first launch, DevLake will automatically use the DB_URL environment variable to connect to your database and run migrations.

Configure Data Sources

Add connections to your data sources:

Click “Connections” in the sidebar
Click “Add Connection”
Select your data source type (GitHub, GitLab, Jira, Jenkins, etc.)
Enter the required credentials:
- GitHub: Personal Access Token with repo access
- GitLab: Personal Access Token with API access
- Jira: Username and API token
- Jenkins: Username and API token

Security Tip: Use tokens with minimal required permissions for each data source.

Create a Project

Navigate to “Projects” in the Config UI
Click “Create Project”
Add your repositories and boards
Configure the metrics you want to track

Run Data Collection

Go to “Blueprints” (data collection pipelines)
Create a new blueprint or use a template
Configure the collection frequency
Run the blueprint to start collecting data

5. Setting Up Grafana (Optional)

For advanced visualizations, you can deploy Grafana alongside DevLake:

Deploy Grafana on Klutch.sh

Create a separate Grafana app on Klutch.sh using the official Grafana image.

Configure Grafana Connection

Set the GRAFANA_ENDPOINT environment variable in your DevLake app to point to your Grafana instance.

Import DevLake Dashboards

DevLake provides pre-built Grafana dashboards for DORA metrics, sprint analytics, and more. Import these dashboards from the DevLake GitHub repository.

6. Environment Variables Reference

Here’s a comprehensive list of environment variables for DevLake:

Required:

DB_URL=mysql://user:pass@host:3306/dbname?charset=utf8mb4&parseTime=True&loc=UTC
ENCRYPTION_SECRET=your-128-character-secret

Optional:

PORT=8080
API_TIMEOUT=120s
ENABLE_GRAFANA=true
GRAFANA_ENDPOINT=https://grafana.example.com
LOG_LEVEL=info

For PostgreSQL:

DB_URL=postgres://user:pass@host:5432/dbname?sslmode=disable

7. Sample Code: Getting Started with DevLake API

DevLake exposes a REST API that you can use to programmatically manage connections and trigger data collection.

Create a Connection (Example: GitHub)

curl -X POST https://example-app.klutch.sh/api/connections \
  -H "Content-Type: application/json" \
  -d '{
    "name": "my-github",
    "plugin": "github",
    "connectionId": 1,
    "endpoint": "https://api.github.com/",
    "token": "ghp_your_token_here",
    "rateLimitPerHour": 5000
  }'

Trigger a Data Collection

curl -X POST https://example-app.klutch.sh/api/blueprints/:blueprintId/trigger

Query Collected Metrics

curl -X GET https://example-app.klutch.sh/api/metrics/dora

8. Persistent Storage Best Practices

DevLake requires persistent storage for:

Configuration files: Data source connections and settings
Logs: Application and collection logs
Temporary data: Cache and intermediate processing files

Recommended mount paths:

/app/.config - Main configuration directory (required)
/app/logs - Log files (optional but recommended)

Volume sizing:

Minimum: 5GB for basic usage
Recommended: 20GB+ for production with multiple data sources
Large deployments: 50GB+ depending on data volume

9. Production Best Practices

Use External Database

Always use a dedicated MySQL or PostgreSQL database for production. Do not use SQLite (which is only suitable for testing).

Database recommendations:

MySQL 8.0+ or PostgreSQL 13+
At least 2GB RAM for the database
Regular backups (daily recommended)
Connection pooling enabled

Secure Your Deployment

Use strong encryption keys (128+ characters)
Rotate API tokens regularly
Restrict database access to DevLake’s IP only
Enable SSL/TLS for database connections
Use Klutch.sh’s secret management for all credentials

Monitor Resource Usage

DevLake can be resource-intensive when collecting large amounts of data:

Monitor CPU and memory usage
Scale vertically (more RAM/CPU) if needed
Adjust collection frequency based on your needs
Use incremental collection instead of full syncs when possible

Backup Strategy

Database: Schedule regular database backups (automated recommended)
Configuration: Backup the /app/.config volume regularly
Logs: Archive logs to object storage for long-term retention

Performance Optimization

Set appropriate API_TIMEOUT values for large data collections
Use webhooks instead of polling where supported
Schedule data collections during off-peak hours
Limit the number of concurrent collections

10. Troubleshooting

DevLake Won’t Start

Check database connectivity:

# Test database connection
mysql -h your-db-host -u username -p

Verify environment variables:

Ensure DB_URL is correctly formatted
Check that ENCRYPTION_SECRET is set
Verify database exists and user has proper permissions

Data Collection Failing

Common issues:

Invalid or expired API tokens
Rate limiting from data sources
Network connectivity issues
Insufficient permissions on tokens

Solutions:

Regenerate API tokens with proper scopes
Increase rateLimitPerHour or adjust collection frequency
Check Klutch.sh network logs
Verify token permissions in the data source settings

High Memory Usage

DevLake can consume significant memory during data collection:

Reduce the number of concurrent collections
Increase the app’s memory allocation
Use incremental collection instead of full syncs
Schedule collections during off-peak hours

Database Migration Errors

If you see database migration errors:

# Check migration status in DevLake logs
# Migrations run automatically on startup
# If stuck, may need to manually reset migrations

11. Upgrading DevLake

To upgrade DevLake to a newer version:

Update Dockerfile

Edit your Dockerfile to use the new version tag:

FROM apache/devlake:v0.21.0  # Update version here

Commit and Push

git add Dockerfile
git commit -m "Upgrade DevLake to v0.21.0"
git push origin main

Deploy Update

Klutch.sh will automatically rebuild and redeploy with the new version. The database migrations will run automatically on startup.

Verify Upgrade

Check the DevLake UI to confirm the new version
Verify all data sources are still connected
Run a test data collection
Check Grafana dashboards (if using)

12. Resources

Summary

Deploying Apache DevLake on Klutch.sh provides a powerful platform for engineering metrics and analytics. With Dockerfile-based deployment, persistent storage, and production-ready configurations, you can track DORA metrics, sprint performance, code quality, and more across all your development tools.

Key takeaways:

Use the official apache/devlake Docker image
Configure external MySQL/PostgreSQL for production
Attach persistent volumes for configuration and logs
Secure all credentials using Klutch.sh environment variables
Monitor resource usage and scale as needed
Regular backups of database and configuration

For questions or issues, refer to the DevLake community or Klutch.sh documentation.