Deploying DevLake
Introduction
Apache DevLake is a powerful open-source dev data platform that helps engineering teams collect, analyze, and visualize development metrics from multiple data sources. It integrates with popular tools like GitHub, GitLab, Jira, Jenkins, and more to provide comprehensive insights into your engineering processes, DORA metrics, sprint analytics, and code quality.
Deploying DevLake on Klutch.sh gives you a managed, scalable platform for running your engineering metrics dashboard with support for persistent storage, secure environment variables, and production-grade reliability. This guide walks you through deploying DevLake using a Dockerfile, configuring databases, setting up persistent volumes, and following production best practices.
Prerequisites
- A Klutch.sh account
- A GitHub repository for your DevLake deployment
- Basic familiarity with Docker and environment variables
- A MySQL or PostgreSQL database (DevLake requires an external database for production)
- API tokens/credentials for the data sources you want to connect (GitHub, GitLab, Jira, etc.)
Understanding DevLake Architecture
DevLake consists of several components:
- Config UI: Web interface for configuring data sources and connections
- API Server: Backend API that handles data collection and transformations
- Database: MySQL or PostgreSQL for storing collected metrics and configurations
- Grafana: Visualization dashboard for viewing metrics and analytics (optional but recommended)
For production deployments, you’ll need:
- A DevLake instance (API + Config UI)
- A MySQL or PostgreSQL database
- Persistent storage for configuration and logs
- (Optional) Grafana for advanced visualizations
1. Prepare Your Repository
Create a new GitHub repository for your DevLake deployment or use an existing one. Your repository should contain:
devlake-deployment/├── Dockerfile├── .env.example└── README.mdImportant: Never commit secrets or credentials to your repository. Use Klutch.sh environment variables for all sensitive data.
2. Sample Dockerfile
Create a Dockerfile in your repository root. Klutch.sh will automatically detect and use it for deployment.
FROM apache/devlake:latest
# Set working directoryWORKDIR /app
# Expose ports# 4000 - Config UI# 8080 - API ServerEXPOSE 4000 8080
# Health checkHEALTHCHECK --interval=30s --timeout=10s --start-period=60s --retries=3 \ CMD curl -f http://localhost:4000/api/ping || exit 1
# The base image already includes the entrypoint# No need to override CMD unless customizing startupNotes:
- The official
apache/devlakeimage includes both the Config UI and API server - Port 4000 is for the Config UI (web interface)
- Port 8080 is for the API server
- Pin to a specific version tag (e.g.,
apache/devlake:v0.20.0) for production to ensure reproducible builds
3. Deploying to Klutch.sh
Follow these steps to deploy DevLake on Klutch.sh:
- Database host
- Database port (usually 3306 for MySQL, 5432 for PostgreSQL)
- Database name
- Database user
- Database password
- Select Repository: Choose your GitHub repository with the Dockerfile
- Select Branch: Choose the branch you want to deploy (e.g.,
main) - Traffic Type: Select HTTP
- Internal Port: Set to 4000 (this is the Config UI port)
- Region: Choose your preferred region
- Compute: Select appropriate resources (minimum 2GB RAM recommended)
- Mount Path:
/app/.config - Size: At least 5GB (adjust based on your data volume)
- Build the Docker image from your Dockerfile
- Deploy the container with your configuration
- Mount the persistent volume
- Make the app available at
example-app.klutch.sh
Push Your Repository
Push your repository (including the Dockerfile) to GitHub.
git add .git commit -m "Add DevLake Dockerfile"git push origin mainCreate a New Project
Log in to Klutch.sh dashboard and create a new project.
Set Up Database
Before deploying DevLake, set up a MySQL or PostgreSQL database. You can use a managed database service or deploy one on Klutch.sh. Note the following connection details:
Create the DevLake App
Create a new app in your project:
Configure Environment Variables
Add the following environment variables in the Klutch.sh app settings (mark sensitive values as secrets):
Database Configuration (MySQL example):
DB_URL=mysql://username:password@host:3306/dbname?charset=utf8mb4&parseTime=True&loc=UTCDatabase Configuration (PostgreSQL example):
DB_URL=postgres://username:password@host:5432/dbname?sslmode=disableAPI Configuration:
PORT=8080ENABLE_GRAFANA=trueGRAFANA_ENDPOINT=https://your-grafana-instance.klutch.shEncryption Key (generate a random 128-character string):
ENCRYPTION_SECRET=your-random-128-character-encryption-keyAttach Persistent Volume
DevLake stores configuration files and logs that should persist across deployments.
In your app settings, attach a persistent volume:
This ensures your data source configurations and logs are preserved during restarts and updates.
Deploy the App
Click “Create” to deploy. Klutch.sh will:
4. Initial Configuration
Once deployed, access your DevLake instance at your app URL (e.g., https://example-app.klutch.sh).
- Click “Connections” in the sidebar
- Click “Add Connection”
- Select your data source type (GitHub, GitLab, Jira, Jenkins, etc.)
- Enter the required credentials:
- GitHub: Personal Access Token with repo access
- GitLab: Personal Access Token with API access
- Jira: Username and API token
- Jenkins: Username and API token
- Navigate to “Projects” in the Config UI
- Click “Create Project”
- Add your repositories and boards
- Configure the metrics you want to track
- Go to “Blueprints” (data collection pipelines)
- Create a new blueprint or use a template
- Configure the collection frequency
- Run the blueprint to start collecting data
Access the Config UI
Open your DevLake app URL in a browser. You’ll see the DevLake Config UI.
Set Up Database Connection
On first launch, DevLake will automatically use the DB_URL environment variable to connect to your database and run migrations.
Configure Data Sources
Add connections to your data sources:
Security Tip: Use tokens with minimal required permissions for each data source.
Create a Project
Run Data Collection
5. Setting Up Grafana (Optional)
For advanced visualizations, you can deploy Grafana alongside DevLake:
Deploy Grafana on Klutch.sh
Create a separate Grafana app on Klutch.sh using the official Grafana image.
Configure Grafana Connection
Set the GRAFANA_ENDPOINT environment variable in your DevLake app to point to your Grafana instance.
Import DevLake Dashboards
DevLake provides pre-built Grafana dashboards for DORA metrics, sprint analytics, and more. Import these dashboards from the DevLake GitHub repository.
6. Environment Variables Reference
Here’s a comprehensive list of environment variables for DevLake:
Required:
DB_URL=mysql://user:pass@host:3306/dbname?charset=utf8mb4&parseTime=True&loc=UTCENCRYPTION_SECRET=your-128-character-secretOptional:
PORT=8080API_TIMEOUT=120sENABLE_GRAFANA=trueGRAFANA_ENDPOINT=https://grafana.example.comLOG_LEVEL=infoFor PostgreSQL:
DB_URL=postgres://user:pass@host:5432/dbname?sslmode=disable7. Sample Code: Getting Started with DevLake API
DevLake exposes a REST API that you can use to programmatically manage connections and trigger data collection.
Create a Connection (Example: GitHub)
curl -X POST https://example-app.klutch.sh/api/connections \ -H "Content-Type: application/json" \ -d '{ "name": "my-github", "plugin": "github", "connectionId": 1, "endpoint": "https://api.github.com/", "token": "ghp_your_token_here", "rateLimitPerHour": 5000 }'Trigger a Data Collection
curl -X POST https://example-app.klutch.sh/api/blueprints/:blueprintId/triggerQuery Collected Metrics
curl -X GET https://example-app.klutch.sh/api/metrics/dora8. Persistent Storage Best Practices
DevLake requires persistent storage for:
- Configuration files: Data source connections and settings
- Logs: Application and collection logs
- Temporary data: Cache and intermediate processing files
Recommended mount paths:
/app/.config- Main configuration directory (required)/app/logs- Log files (optional but recommended)
Volume sizing:
- Minimum: 5GB for basic usage
- Recommended: 20GB+ for production with multiple data sources
- Large deployments: 50GB+ depending on data volume
9. Production Best Practices
- MySQL 8.0+ or PostgreSQL 13+
- At least 2GB RAM for the database
- Regular backups (daily recommended)
- Connection pooling enabled
- Use strong encryption keys (128+ characters)
- Rotate API tokens regularly
- Restrict database access to DevLake’s IP only
- Enable SSL/TLS for database connections
- Use Klutch.sh’s secret management for all credentials
- Monitor CPU and memory usage
- Scale vertically (more RAM/CPU) if needed
- Adjust collection frequency based on your needs
- Use incremental collection instead of full syncs when possible
- Database: Schedule regular database backups (automated recommended)
- Configuration: Backup the
/app/.configvolume regularly - Logs: Archive logs to object storage for long-term retention
- Set appropriate
API_TIMEOUTvalues for large data collections - Use webhooks instead of polling where supported
- Schedule data collections during off-peak hours
- Limit the number of concurrent collections
Use External Database
Always use a dedicated MySQL or PostgreSQL database for production. Do not use SQLite (which is only suitable for testing).
Database recommendations:
Secure Your Deployment
Monitor Resource Usage
DevLake can be resource-intensive when collecting large amounts of data:
Backup Strategy
Performance Optimization
10. Troubleshooting
DevLake Won’t Start
Check database connectivity:
# Test database connectionmysql -h your-db-host -u username -pVerify environment variables:
- Ensure
DB_URLis correctly formatted - Check that
ENCRYPTION_SECRETis set - Verify database exists and user has proper permissions
Data Collection Failing
Common issues:
- Invalid or expired API tokens
- Rate limiting from data sources
- Network connectivity issues
- Insufficient permissions on tokens
Solutions:
- Regenerate API tokens with proper scopes
- Increase
rateLimitPerHouror adjust collection frequency - Check Klutch.sh network logs
- Verify token permissions in the data source settings
High Memory Usage
DevLake can consume significant memory during data collection:
- Reduce the number of concurrent collections
- Increase the app’s memory allocation
- Use incremental collection instead of full syncs
- Schedule collections during off-peak hours
Database Migration Errors
If you see database migration errors:
# Check migration status in DevLake logs# Migrations run automatically on startup# If stuck, may need to manually reset migrations11. Upgrading DevLake
To upgrade DevLake to a newer version:
- Check the DevLake UI to confirm the new version
- Verify all data sources are still connected
- Run a test data collection
- Check Grafana dashboards (if using)
Update Dockerfile
Edit your Dockerfile to use the new version tag:
FROM apache/devlake:v0.21.0 # Update version hereCommit and Push
git add Dockerfilegit commit -m "Upgrade DevLake to v0.21.0"git push origin mainDeploy Update
Klutch.sh will automatically rebuild and redeploy with the new version. The database migrations will run automatically on startup.
Verify Upgrade
12. Resources
- Apache DevLake Official Documentation
- DevLake GitHub Repository
- DevLake Getting Started Guide
- Klutch.sh Quick Start Guide
- Klutch.sh Volumes Guide
- Klutch.sh Deployments Guide
Summary
Deploying Apache DevLake on Klutch.sh provides a powerful platform for engineering metrics and analytics. With Dockerfile-based deployment, persistent storage, and production-ready configurations, you can track DORA metrics, sprint performance, code quality, and more across all your development tools.
Key takeaways:
- Use the official
apache/devlakeDocker image - Configure external MySQL/PostgreSQL for production
- Attach persistent volumes for configuration and logs
- Secure all credentials using Klutch.sh environment variables
- Monitor resource usage and scale as needed
- Regular backups of database and configuration
For questions or issues, refer to the DevLake community or Klutch.sh documentation.