Deploying Dagu

Dagu is a modern workflow orchestration platform that brings simplicity and power to task automation. Built with Go and featuring a clean web interface, Dagu allows you to define workflows as Directed Acyclic Graphs (DAGs) using simple YAML configuration files. Unlike heavyweight workflow tools that require complex setup and maintenance, Dagu focuses on being lightweight, fast, and easy to understand while still providing the essential features needed for production workflows.

What makes Dagu stand out is its approach to workflow definition. Each DAG is defined in a single YAML file with clear syntax for defining tasks, dependencies, schedules, and conditions. The built-in web UI provides real-time monitoring, execution history, logs, and manual triggering capabilities. Features like conditional execution, parameterized workflows, notification hooks, and retry logic make Dagu suitable for everything from simple cron-replacement tasks to complex data pipelines and multi-step automation workflows.

Why Deploy Dagu on Klutch.sh?

Klutch.sh provides an excellent platform for hosting Dagu with several key advantages:

Simple Docker Deployment: Deploy your Dockerfile and Klutch.sh automatically handles containerization and orchestration
Persistent Storage: Attach volumes for DAG definitions, execution history, and logs with guaranteed durability
Automatic HTTPS: All deployments come with automatic SSL certificates for secure web UI access
Resource Scalability: Scale CPU and memory resources based on workflow complexity
Zero Server Management: Focus on building workflows, not managing infrastructure
Cost-Effective: Pay only for resources used, scale based on workflow execution needs
Always-On Scheduling: Keep your scheduler running 24/7 without managing servers

Prerequisites

Before deploying Dagu, ensure you have:

A Klutch.sh account (sign up at klutch.sh)
Git installed locally
Basic understanding of workflow orchestration and DAGs
Familiarity with Docker and container concepts
Knowledge of YAML syntax
Understanding of cron expressions for scheduling
Basic shell scripting knowledge (for task definitions)

Understanding Dagu’s Architecture

Dagu uses a straightforward architecture designed for simplicity and reliability:

Core Components

Scheduler: The heart of Dagu that:

Monitors DAG files for changes
Evaluates schedules and triggers workflows
Manages workflow execution state
Handles retry logic and failure recovery
Processes manual triggers from web UI
Maintains execution history
Enforces workflow dependencies

Executor: Responsible for running individual tasks:

Executes shell commands
Manages environment variables
Captures stdout/stderr logs
Reports task status back to scheduler
Handles timeouts and cancellations
Supports parallel task execution
Manages working directories

Web Server: Provides the user interface:

DAG visualization and monitoring
Real-time execution status
Log viewing and search
Manual workflow triggering
Execution history browser
Configuration editor
System status dashboard

Data Store: Simple file-based storage:

DAG definitions stored as YAML files
Execution history in SQLite database
Task logs as individual files
Configuration in single file
No complex database setup required
Easy backup and restore

DAG Structure

DAG Definition: Each workflow is defined in a YAML file:

Metadata (name, description, tags)
Schedule definition (cron expression)
Environment variables
Task definitions
Dependencies between tasks
Error handling rules
Notification configurations

Task Types:

Command: Execute shell commands
HTTP: Make HTTP/HTTPS requests
Email: Send email notifications
Signal: Send signals to other DAGs
Wait: Pause execution for duration

Dependency Model:

Tasks define dependencies on other tasks
Parallel execution when no dependencies
Sequential execution for dependent tasks
Conditional execution based on status
Support for task groups

Execution Flow

Scheduler reads DAG definitions from filesystem
Evaluates schedule expressions against current time
Checks preconditions and parameters
Creates execution record in database
Queues tasks respecting dependencies
Executor picks up tasks and runs them
Results logged to files and database
Notifications sent based on configuration
Web UI updates in real-time
Execution completes with final status

Configuration System

Global Configuration: System-wide settings

Server port and host
DAG directory location
Log directory path
Execution history retention
Timezone settings
Authentication options

DAG Configuration: Per-workflow settings

Schedule expressions
Retry policies
Timeout values
Environment variables
Working directories
Log retention

Environment Variables: Support for:

Global environment variables
DAG-level variables
Task-level variables
Runtime parameter substitution
Secrets management

Monitoring and Logging

Execution Tracking:

Real-time status updates
Start and end timestamps
Task-level status
Exit codes and signals
Resource usage tracking

Log Management:

Separate log files per execution
Task-level log capture
Searchable log viewer
Log rotation and retention
Export capabilities

Notifications:

Webhook notifications
Email alerts
Slack integration
Custom notification handlers
Configurable triggers (success, failure, start)

Installation and Setup

Step 1: Create the Dockerfile

Create a Dockerfile in your project root:

FROM golang:1.21-alpine AS builder

# Install build dependencies
RUN apk add --no-cache git make

# Set working directory
WORKDIR /app

# Clone Dagu repository
RUN git clone https://github.com/dagu-dev/dagu.git .

# Build Dagu
RUN make build

# Final stage
FROM alpine:latest

# Install runtime dependencies
RUN apk add --no-cache \
    ca-certificates \
    tzdata \
    curl \
    bash \
    sqlite

# Create dagu user
RUN addgroup -g 1000 dagu && \
    adduser -D -u 1000 -G dagu dagu

# Create directories
RUN mkdir -p /home/dagu/.dagu/dags \
    /home/dagu/.dagu/logs \
    /home/dagu/.dagu/data \
    /home/dagu/.dagu/history && \
    chown -R dagu:dagu /home/dagu

# Copy binary from builder
COPY --from=builder /app/bin/dagu /usr/local/bin/dagu

# Switch to dagu user
USER dagu
WORKDIR /home/dagu

# Copy configuration
COPY --chown=dagu:dagu config.yaml /home/dagu/.dagu/config.yaml

# Expose web UI port
EXPOSE 8080

# Health check
HEALTHCHECK --interval=30s --timeout=5s --start-period=10s --retries=3 \
    CMD curl -f http://localhost:8080/health || exit 1

# Start Dagu
CMD ["dagu", "start-all"]

Step 2: Create Dagu Configuration

Create config.yaml for Dagu settings:

# Dagu Configuration

# Server settings
host: 0.0.0.0
port: 8080

# Base configuration directory
baseConfig: /home/dagu/.dagu

# DAG directory
dags: /home/dagu/.dagu/dags

# Log directory
logDir: /home/dagu/.dagu/logs

# Data directory for SQLite database
dataDir: /home/dagu/.dagu/data

# History directory
historyRetentionDays: 30

# Timezone
location: UTC

# Authentication (basic auth)
isAuthToken: false
# For production, enable authentication:
# isAuthToken: true
# authToken:
#   - username: admin
#     password: your-secure-password

# Logging
logLevel: info
logFormat: text

# Execution settings
maxActiveRuns: 1000
maxCleanUpTimeSec: 60

# Web UI settings
navbarColor: "#1f2937"
navbarTitle: "Dagu Workflow Orchestrator"

# API settings
apiBaseURL: ""

# TLS settings (if needed)
tls:
  certFile: ""
  keyFile: ""

# SMTP settings for email notifications
smtp:
  host: ""
  port: ""
  username: ""
  password: ""

# Webhook settings
webhook:
  url: ""
  headers: {}

Step 3: Create Example DAG Files

Create dags/hello_world.yaml:

name: hello_world
description: Simple hello world workflow
tags:
  - example
  - basic

schedule: "0 * * * *"  # Run every hour

params:
  - NAME
  - default: "World"

env:
  - LOG_LEVEL: info

steps:
  - name: say_hello
    command: echo "Hello, ${NAME}!"

  - name: show_date
    command: date
    depends:
      - say_hello

  - name: create_file
    command: echo "Execution completed at $(date)" > /tmp/hello_output.txt
    depends:
      - show_date

Create dags/data_pipeline.yaml:

name: data_pipeline
description: Example data processing pipeline
tags:
  - data
  - pipeline

schedule: "0 2 * * *"  # Run daily at 2 AM

env:
  - DATA_DIR: /home/dagu/.dagu/data
  - LOG_FILE: /home/dagu/.dagu/logs/pipeline.log

steps:
  - name: fetch_data
    command: |
      echo "Fetching data from API..."
      curl -s https://api.example.com/data > ${DATA_DIR}/raw_data.json

  - name: validate_data
    command: |
      echo "Validating data..."
      if [ ! -s "${DATA_DIR}/raw_data.json" ]; then
        echo "Error: Data file is empty"
        exit 1
      fi
      echo "Data validation passed"
    depends:
      - fetch_data
    continueOn:
      failure: false

  - name: transform_data
    command: |
      echo "Transforming data..."
      # Add your transformation logic here
      cat ${DATA_DIR}/raw_data.json | jq '.' > ${DATA_DIR}/processed_data.json
    depends:
      - validate_data
    retryPolicy:
      limit: 3
      intervalSec: 30

  - name: load_data
    command: |
      echo "Loading data..."
      # Add your load logic here
      echo "Data loaded successfully" >> ${LOG_FILE}
    depends:
      - transform_data

  - name: cleanup
    command: |
      echo "Cleaning up temporary files..."
      rm -f ${DATA_DIR}/raw_data.json
    depends:
      - load_data

mailOn:
  failure: true
  success: false

smtp:
  host: smtp.gmail.com
  port: "587"
  username: your-email@gmail.com
  password: your-app-password
  from: noreply@example.com
  to: admin@example.com

Create dags/conditional_workflow.yaml:

name: conditional_workflow
description: Workflow with conditional execution
tags:
  - advanced
  - conditional

schedule: "0 */6 * * *"  # Run every 6 hours

env:
  - ENVIRONMENT: production

steps:
  - name: check_environment
    command: |
      if [ "$ENVIRONMENT" = "production" ]; then
        echo "Running in production"
        exit 0
      else
        echo "Not production environment"
        exit 1
      fi

  - name: production_task
    command: echo "Executing production task..."
    depends:
      - check_environment
    preconditions:
      - condition: "$ENVIRONMENT"
        expected: "production"

  - name: backup_task
    command: |
      echo "Creating backup..."
      tar -czf backup_$(date +%Y%m%d).tar.gz /home/dagu/.dagu/data
    depends:
      - production_task
    retryPolicy:
      limit: 3
      intervalSec: 60

  - name: notification
    command: |
      curl -X POST https://hooks.slack.com/services/YOUR/WEBHOOK/URL \
        -H 'Content-Type: application/json' \
        -d '{"text":"Workflow completed successfully"}'
    depends:
      - backup_task
    continueOn:
      failure: true

Step 4: Create Health Check Script

Create healthcheck.sh:

#!/bin/bash
set -e

# Check if Dagu server is responding
if curl -f -s http://localhost:8080/health > /dev/null 2>&1; then
    echo "Dagu is healthy"
    exit 0
else
    echo "Dagu is not responding"
    exit 1
fi

Make it executable:

chmod +x healthcheck.sh

Step 5: Create Docker Compose for Local Development

Create docker-compose.yml:

version: '3.8'

services:
  dagu:
    build:
      context: .
      dockerfile: Dockerfile
    ports:
      - "8080:8080"
    volumes:
      - dagu-dags:/home/dagu/.dagu/dags
      - dagu-logs:/home/dagu/.dagu/logs
      - dagu-data:/home/dagu/.dagu/data
      - dagu-history:/home/dagu/.dagu/history
      - ./dags:/home/dagu/.dagu/dags:ro
    environment:
      - TZ=UTC
    restart: unless-stopped

volumes:
  dagu-dags:
  dagu-logs:
  dagu-data:
  dagu-history:

Step 6: Create DAG Management Script

Create dag-manager.sh:

#!/bin/bash

DAG_DIR="/home/dagu/.dagu/dags"
COMMAND=$1
DAG_FILE=$2

case "$COMMAND" in
    list)
        echo "Available DAGs:"
        ls -1 $DAG_DIR/*.yaml 2>/dev/null || echo "No DAGs found"
        ;;
    validate)
        if [ -z "$DAG_FILE" ]; then
            echo "Usage: $0 validate <dag-file>"
            exit 1
        fi
        dagu validate $DAG_DIR/$DAG_FILE
        ;;
    run)
        if [ -z "$DAG_FILE" ]; then
            echo "Usage: $0 run <dag-file>"
            exit 1
        fi
        dagu start $DAG_DIR/$DAG_FILE
        ;;
    status)
        if [ -z "$DAG_FILE" ]; then
            echo "Usage: $0 status <dag-file>"
            exit 1
        fi
        dagu status $DAG_DIR/$DAG_FILE
        ;;
    stop)
        if [ -z "$DAG_FILE" ]; then
            echo "Usage: $0 stop <dag-file>"
            exit 1
        fi
        dagu stop $DAG_DIR/$DAG_FILE
        ;;
    logs)
        if [ -z "$DAG_FILE" ]; then
            echo "Usage: $0 logs <dag-file>"
            exit 1
        fi
        dagu logs $DAG_DIR/$DAG_FILE
        ;;
    *)
        echo "Usage: $0 {list|validate|run|status|stop|logs} [dag-file]"
        exit 1
        ;;
esac

Make it executable:

chmod +x dag-manager.sh

Step 7: Initialize Git Repository

git init
git add Dockerfile config.yaml dags/ healthcheck.sh docker-compose.yml dag-manager.sh
git commit -m "Initial Dagu deployment configuration"

Step 8: Test Locally

Before deploying to Klutch.sh, test locally:

# Build and start container
docker-compose up -d

# Check logs
docker-compose logs -f dagu

# Access Dagu at http://localhost:8080

# Validate DAG
docker-compose exec dagu dagu validate /home/dagu/.dagu/dags/hello_world.yaml

# Manually run DAG
docker-compose exec dagu dagu start /home/dagu/.dagu/dags/hello_world.yaml

# Check DAG status
docker-compose exec dagu dagu status /home/dagu/.dagu/dags/hello_world.yaml

Deploying to Klutch.sh

Step 1: Push Repository to GitHub

Create a new repository and push:

git remote add origin https://github.com/yourusername/dagu-klutch.git
git branch -M master
git push -u origin master

Step 2: Deploy Dagu to Klutch.sh

Navigate to klutch.sh/app
Click "New Project" and select "Import from GitHub"
Authorize Klutch.sh to access your GitHub repositories
Select your Dagu repository
Klutch.sh will automatically detect the Dockerfile

Step 3: Configure Traffic Settings

In the project settings, select **HTTP** as the traffic type
Set the internal port to **8080**
Klutch.sh will automatically provision an HTTPS endpoint

Step 4: Add Persistent Storage

Dagu requires persistent storage for DAGs, logs, and execution history:

In your project settings, navigate to the "Storage" section
Add a volume with mount path: `/home/dagu/.dagu/dags` and size: `5GB` (for DAG definitions)
Add a volume with mount path: `/home/dagu/.dagu/logs` and size: `10GB` (for execution logs)
Add a volume with mount path: `/home/dagu/.dagu/data` and size: `5GB` (for database)
Add a volume with mount path: `/home/dagu/.dagu/history` and size: `5GB` (for execution history)

Storage recommendations:

Light usage (< 10 DAGs): 5GB dags, 5GB logs, 2GB data, 2GB history
Medium usage (10-50 DAGs): 10GB dags, 20GB logs, 5GB data, 5GB history
Heavy usage (50+ DAGs): 20GB dags, 50GB logs, 10GB data, 10GB history

Step 5: Configure Environment Variables (Optional)

Add environment variables in Klutch.sh dashboard if needed:

TZ: Your timezone (e.g., America/New_York, Europe/London)
DAGU_LOG_LEVEL: Logging level (debug, info, warn, error)
DAGU_MAX_ACTIVE_RUNS: Maximum concurrent workflow runs (default: 1000)

Step 6: Deploy Dagu

Review your configuration settings in Klutch.sh
Click "Deploy" to start the deployment
Monitor build logs for any errors
Wait for initialization (typically 2-3 minutes)
Once deployed, Dagu will be available at `your-app.klutch.sh`

Step 7: Upload DAG Files

After deployment, upload your DAG files:

Access your Dagu deployment at `https://your-app.klutch.sh`
Navigate to the DAGs directory via the web UI or API
Upload your YAML DAG files
Dagu will automatically detect and load them

Alternatively, use the container terminal in Klutch.sh:

# Create a new DAG file
cat > /home/dagu/.dagu/dags/my_workflow.yaml <<EOF
name: my_workflow
description: My custom workflow
schedule: "0 9 * * *"

steps:
  - name: task1
    command: echo "Task 1 executed"
EOF

# Validate the DAG
dagu validate /home/dagu/.dagu/dags/my_workflow.yaml

Getting Started with Dagu

Accessing the Web UI

After deployment:

Navigate to Your Deployment: Visit https://your-app.klutch.sh
View Dashboard: The main dashboard shows:
- List of all DAGs
- Current execution status
- Recent runs
- System statistics
Navigate DAGs: Click on any DAG to see:
- DAG visualization graph
- Execution history
- Schedule information
- Task details

Creating Your First DAG

Let’s create a simple notification workflow:

Create a new file `dags/daily_report.yaml`:

name: daily_report
description: Daily system report workflow
tags:
  - monitoring
  - reporting

schedule: "0 8 * * *"  # Run daily at 8 AM

env:
  - REPORT_DIR: /home/dagu/.dagu/data/reports
  - REPORT_DATE: $(date +%Y-%m-%d)

steps:
  - name: create_report_directory
    command: mkdir -p ${REPORT_DIR}

  - name: gather_system_info
    command: |
      echo "System Report - ${REPORT_DATE}" > ${REPORT_DIR}/report_${REPORT_DATE}.txt
      echo "========================" >> ${REPORT_DIR}/report_${REPORT_DATE}.txt
      echo "" >> ${REPORT_DIR}/report_${REPORT_DATE}.txt
      echo "System Uptime:" >> ${REPORT_DIR}/report_${REPORT_DATE}.txt
      uptime >> ${REPORT_DIR}/report_${REPORT_DATE}.txt
      echo "" >> ${REPORT_DIR}/report_${REPORT_DATE}.txt
      echo "Disk Usage:" >> ${REPORT_DIR}/report_${REPORT_DATE}.txt
      df -h >> ${REPORT_DIR}/report_${REPORT_DATE}.txt
    depends:
      - create_report_directory

  - name: count_dags
    command: |
      DAG_COUNT=$(ls -1 /home/dagu/.dagu/dags/*.yaml | wc -l)
      echo "" >> ${REPORT_DIR}/report_${REPORT_DATE}.txt
      echo "Total DAGs: ${DAG_COUNT}" >> ${REPORT_DIR}/report_${REPORT_DATE}.txt
    depends:
      - gather_system_info

  - name: send_notification
    command: |
      echo "Daily report generated: ${REPORT_DIR}/report_${REPORT_DATE}.txt"
      # Add webhook or email notification here
    depends:
      - count_dags

mailOn:
  success: true
  failure: true

Validate the DAG:

dagu validate /home/dagu/.dagu/dags/daily_report.yaml

Test run the DAG manually:

dagu start /home/dagu/.dagu/dags/daily_report.yaml

Monitor execution in the web UI
View logs for each task
Check the generated report file

Understanding DAG Syntax

Basic Structure:

name: workflow_name          # Required: Unique identifier
description: Description     # Optional: Human-readable description
tags:                        # Optional: Tags for organization
  - tag1
  - tag2
schedule: "0 * * * *"       # Optional: Cron expression

Environment Variables:

env:
  - VAR_NAME: value          # Static value
  - DYNAMIC: $(command)      # Command substitution
params:                      # Parameters that can be passed at runtime
  - PARAM_NAME
  - default: "default_value"

Task Definition:

steps:
  - name: task_name          # Required: Task identifier
    command: |               # Required: Command to execute
      echo "Hello"
      echo "World"
    dir: /path/to/dir        # Optional: Working directory
    depends:                 # Optional: Task dependencies
      - task1
      - task2
    continueOn:              # Optional: Continue on error
      failure: true
    retryPolicy:             # Optional: Retry configuration
      limit: 3
      intervalSec: 30
    output: OUTPUT_VAR       # Optional: Capture output

Scheduling:

# Cron format: minute hour day month dayofweek
schedule: "0 * * * *"        # Every hour
schedule: "0 0 * * *"        # Daily at midnight
schedule: "0 9 * * 1-5"      # Weekdays at 9 AM
schedule: "*/15 * * * *"     # Every 15 minutes
schedule: "0 0 1 * *"        # First day of month

Running Workflows

Manual Execution:

Via web UI:

Navigate to DAGs list
Click on DAG name
Click “Run” button
Optionally provide parameters
Monitor execution in real-time

Via command line:

# Start a DAG
dagu start /home/dagu/.dagu/dags/my_workflow.yaml

# Start with parameters
dagu start /home/dagu/.dagu/dags/my_workflow.yaml \
  -p NAME=John -p ENV=production

# Start specific task
dagu start /home/dagu/.dagu/dags/my_workflow.yaml \
  --step task_name

Scheduled Execution:

Dagu automatically runs DAGs based on their schedule:

schedule: "0 2 * * *"  # Runs automatically daily at 2 AM

Stopping Workflows:

# Stop a running workflow
dagu stop /home/dagu/.dagu/dags/my_workflow.yaml

# Stop via web UI: Click "Stop" button on running workflow

Monitoring Workflows

Real-Time Status:

Web UI shows live updates
Task status indicators (running, success, failed)
Progress bars for long-running tasks
Execution time tracking

Viewing Logs:

Via web UI:

Click on DAG
Select execution from history
Click on task name
View stdout/stderr logs

Via command line:

# View logs for latest run
dagu logs /home/dagu/.dagu/dags/my_workflow.yaml

# View logs for specific execution
dagu logs /home/dagu/.dagu/dags/my_workflow.yaml \
  --req-id 20240101-120000-abc123

Execution History:

View past executions with status
Filter by success/failure
Search by date range
Export execution data

Handling Failures

Retry Logic:

steps:
  - name: unreliable_task
    command: ./might-fail.sh
    retryPolicy:
      limit: 5              # Retry up to 5 times
      intervalSec: 60       # Wait 60 seconds between retries

Continue on Failure:

steps:
  - name: optional_task
    command: ./cleanup.sh
    continueOn:
      failure: true         # Don't fail entire workflow if this fails

Error Notifications:

mailOn:
  failure: true            # Send email on failure
  success: false           # Don't send on success

smtp:
  host: smtp.gmail.com
  port: "587"
  username: your-email@gmail.com
  password: your-app-password
  from: dagu@example.com
  to: admin@example.com

Production Best Practices

Security Configuration

Enable Authentication:

Update config.yaml:

isAuthToken: true
authToken:
  - username: admin
    password: your-secure-password-here
  - username: viewer
    password: another-password

Generate secure passwords:

openssl rand -base64 32

Restrict File Permissions:

chmod 600 /home/dagu/.dagu/config.yaml
chmod 700 /home/dagu/.dagu/dags

Use Environment Variables for Secrets:

Instead of hardcoding secrets in DAG files:

env:
  - API_KEY: ${API_KEY}          # Read from environment
  - DB_PASSWORD: ${DB_PASSWORD}  # Avoid hardcoding

Set environment variables in Klutch.sh dashboard.

Limit DAG File Access:

# In DAG files, avoid using sudo or privileged commands
steps:
  - name: safe_task
    command: echo "Safe operation"
    # Avoid: sudo rm -rf /

Performance Optimization

Limit Concurrent Executions:

# In config.yaml
maxActiveRuns: 100  # Adjust based on available resources

# In DAG files
maxActiveRuns: 1    # Prevent concurrent runs of same DAG

Optimize Long-Running Tasks:

steps:
  - name: long_task
    command: ./long-process.sh
    timeout: 3600      # Kill after 1 hour
    output: RESULT     # Capture output for next task

Use Task Parallelization:

steps:
  # These run in parallel (no dependencies)
  - name: task1
    command: ./process1.sh

  - name: task2
    command: ./process2.sh

  - name: task3
    command: ./process3.sh

  # This runs after all above complete
  - name: aggregate
    command: ./combine-results.sh
    depends:
      - task1
      - task2
      - task3

Resource Management:

For resource-intensive workflows, adjust Klutch.sh resources:

Increase CPU allocation for parallel tasks
Increase memory for data processing tasks
Monitor resource usage in dashboard

Backup Strategy

DAG Files Backup:

#!/bin/bash
BACKUP_DIR="/backups/dags_$(date +%Y%m%d_%H%M%S)"
mkdir -p $BACKUP_DIR

# Backup DAG files
cp -r /home/dagu/.dagu/dags $BACKUP_DIR/

# Compress backup
tar -czf ${BACKUP_DIR}.tar.gz $BACKUP_DIR
rm -rf $BACKUP_DIR

# Keep only last 30 days
find /backups -name "dags_*.tar.gz" -mtime +30 -delete

echo "DAG backup complete: ${BACKUP_DIR}.tar.gz"

Database Backup:

#!/bin/bash
BACKUP_DIR="/backups/db_$(date +%Y%m%d_%H%M%S)"
mkdir -p $BACKUP_DIR

# Backup SQLite database
cp /home/dagu/.dagu/data/dagu.db $BACKUP_DIR/

# Compress backup
gzip $BACKUP_DIR/dagu.db

echo "Database backup complete: ${BACKUP_DIR}/dagu.db.gz"

Automated Backup DAG:

Create dags/backup_system.yaml:

name: backup_system
description: Automated backup workflow
schedule: "0 3 * * *"  # Daily at 3 AM

steps:
  - name: backup_dags
    command: bash /home/dagu/backup-dags.sh

  - name: backup_database
    command: bash /home/dagu/backup-database.sh
    depends:
      - backup_dags

  - name: verify_backups
    command: |
      ls -lh /backups/*.tar.gz | tail -5
      ls -lh /backups/*.gz | tail -5
    depends:
      - backup_database

mailOn:
  failure: true

Log Management

Configure Log Retention:

# In config.yaml
historyRetentionDays: 30  # Keep 30 days of history

Implement Log Rotation:

Create dags/cleanup_logs.yaml:

name: cleanup_logs
description: Clean up old log files
schedule: "0 4 * * 0"  # Weekly on Sunday at 4 AM

steps:
  - name: remove_old_logs
    command: |
      find /home/dagu/.dagu/logs -name "*.log" -mtime +30 -delete
      echo "Cleaned up logs older than 30 days"

  - name: report_disk_usage
    command: |
      echo "Current disk usage:"
      du -sh /home/dagu/.dagu/logs
      du -sh /home/dagu/.dagu/data

Centralize Important Logs:

steps:
  - name: critical_task
    command: |
      ./important-process.sh 2>&1 | tee -a /home/dagu/.dagu/logs/critical.log

Monitoring and Alerting

Health Check DAG:

Create dags/health_check.yaml:

name: health_check
description: System health monitoring
schedule: "*/30 * * * *"  # Every 30 minutes

steps:
  - name: check_disk_space
    command: |
      USAGE=$(df -h /home/dagu/.dagu | awk 'NR==2 {print $5}' | sed 's/%//')
      if [ $USAGE -gt 80 ]; then
        echo "ALERT: Disk usage at ${USAGE}%"
        exit 1
      fi
      echo "Disk usage OK: ${USAGE}%"

  - name: check_dags_count
    command: |
      COUNT=$(ls -1 /home/dagu/.dagu/dags/*.yaml 2>/dev/null | wc -l)
      echo "Total DAGs: ${COUNT}"
    depends:
      - check_disk_space

  - name: check_recent_failures
    command: |
      # Check for failed runs in last 24 hours
      echo "Checking recent failures..."
      # Add logic to query database
    depends:
      - check_dags_count

mailOn:
  failure: true

Webhook Notifications:

steps:
  - name: notify_completion
    command: |
      curl -X POST https://hooks.slack.com/services/YOUR/WEBHOOK/URL \
        -H 'Content-Type: application/json' \
        -d '{
          "text": "Workflow ${DAG_NAME} completed",
          "username": "Dagu",
          "icon_emoji": ":white_check_mark:"
        }'

Custom Metrics:

steps:
  - name: record_metrics
    command: |
      # Record execution metrics
      echo "{\"timestamp\":\"$(date -Iseconds)\",\"dag\":\"${DAG_NAME}\",\"status\":\"success\"}" \
        >> /home/dagu/.dagu/data/metrics.jsonl

Maintenance Tasks

Database Optimization:

Create dags/maintenance.yaml:

name: maintenance
description: Database maintenance tasks
schedule: "0 5 * * 0"  # Weekly on Sunday at 5 AM

steps:
  - name: vacuum_database
    command: |
      sqlite3 /home/dagu/.dagu/data/dagu.db "VACUUM;"
      echo "Database vacuumed"

  - name: check_integrity
    command: |
      sqlite3 /home/dagu/.dagu/data/dagu.db "PRAGMA integrity_check;"
    depends:
      - vacuum_database

  - name: analyze_tables
    command: |
      sqlite3 /home/dagu/.dagu/data/dagu.db "ANALYZE;"
      echo "Database analyzed"
    depends:
      - check_integrity

Cleanup Old Executions:

Dagu automatically cleans up based on historyRetentionDays, but you can add custom cleanup:

steps:
  - name: cleanup_old_files
    command: |
      find /home/dagu/.dagu/history -type f -mtime +60 -delete
      echo "Cleaned up history files older than 60 days"

Troubleshooting

DAG Not Running on Schedule

Symptoms: Scheduled DAG doesn’t execute at expected time

Solutions:

Check Schedule Syntax:

# Validate cron expression
dagu validate /home/dagu/.dagu/dags/my_workflow.yaml

# Test online: https://crontab.guru/

Verify Timezone:

# In config.yaml
location: America/New_York  # Ensure correct timezone

# In DAG
schedule: "0 9 * * *"  # 9 AM in configured timezone

Check Scheduler Status:

# View Dagu logs
docker logs dagu-container

# Look for scheduler errors
grep "scheduler" /home/dagu/.dagu/logs/dagu.log

Verify DAG is Active:

DAG files must be in /home/dagu/.dagu/dags
Files must have .yaml extension
No syntax errors in YAML

Task Fails with Command Not Found

Symptoms: Task fails with “command not found” error

Solutions:

Use Full Paths:

steps:
  - name: task
    command: /usr/bin/python3 script.py  # Full path
    # Instead of: python3 script.py

Install Dependencies in Dockerfile:

RUN apk add --no-cache \
    python3 \
    nodejs \
    curl

Check PATH:

env:
  - PATH: /usr/local/bin:/usr/bin:/bin

Verify Command Exists:

steps:
  - name: check_command
    command: which python3 || echo "python3 not found"

Web UI Not Accessible

Symptoms: Cannot access Dagu web interface

Solutions:

Check Container Status:

# Verify container is running
docker ps | grep dagu

# Check logs
docker logs dagu-container

Verify Port Configuration:

Ensure internal port is 8080
Check Klutch.sh traffic settings
Verify HTTPS endpoint is active

Check Health Endpoint:

curl http://localhost:8080/health

Review Firewall/Network:

Ensure HTTP traffic type is selected
Check Klutch.sh networking settings

Logs Not Appearing

Symptoms: Task logs not visible in web UI

Solutions:

Check Log Directory Permissions:

ls -la /home/dagu/.dagu/logs
chown -R dagu:dagu /home/dagu/.dagu/logs

Verify Log Path:

# In config.yaml
logDir: /home/dagu/.dagu/logs  # Must be writable

Ensure Persistent Storage:

Verify logs volume is mounted
Check volume size hasn’t exceeded limit

Check Disk Space:

df -h /home/dagu/.dagu/logs

Task Hangs Indefinitely

Symptoms: Task runs forever without completing

Solutions:

Add Timeout:

steps:
  - name: task
    command: ./long-running-script.sh
    timeout: 1800  # 30 minutes

Debug Long-Running Command:

steps:
  - name: debug_task
    command: |
      set -x  # Enable debug output
      ./script.sh

Check for Blocking Input:

# Ensure commands don't wait for user input
steps:
  - name: non_interactive
    command: yes | ./script.sh  # Auto-answer prompts

Review Process State:

# Inside container
ps aux | grep <process>

Database Locked Error

Symptoms: “database is locked” error in logs

Solutions:

Limit Concurrent Writes:

# In config.yaml
maxActiveRuns: 50  # Reduce concurrent operations

Check for Abandoned Connections:

# Restart Dagu if needed
dagu restart

Increase Database Timeout:

# In config.yaml or DAG
database:
  timeout: 30000  # 30 seconds

Advanced Configuration

Parameterized Workflows

Create reusable workflows with parameters:

name: parameterized_workflow
description: Workflow with runtime parameters

params:
  - ENVIRONMENT
  - default: development
  - TARGET_DIR
  - default: /tmp

steps:
  - name: use_parameters
    command: |
      echo "Environment: ${ENVIRONMENT}"
      echo "Target directory: ${TARGET_DIR}"
      mkdir -p ${TARGET_DIR}/${ENVIRONMENT}

  - name: conditional_step
    command: |
      if [ "${ENVIRONMENT}" = "production" ]; then
        echo "Running production workflow"
      else
        echo "Running non-production workflow"
      fi
    depends:
      - use_parameters

Run with parameters:

dagu start /home/dagu/.dagu/dags/parameterized_workflow.yaml \
  -p ENVIRONMENT=production \
  -p TARGET_DIR=/data

Signal Communication Between DAGs

Trigger DAGs from other DAGs:

name: parent_dag
steps:
  - name: trigger_child
    command: |
      dagu start /home/dagu/.dagu/dags/child_dag.yaml

  - name: wait_for_child
    command: |
      while ! dagu status /home/dagu/.dagu/dags/child_dag.yaml | grep -q "finished"; do
        sleep 5
      done
    depends:
      - trigger_child

HTTP Request Tasks

Make API calls within workflows:

name: api_workflow
steps:
  - name: fetch_data
    command: |
      curl -X GET https://api.example.com/data \
        -H "Authorization: Bearer ${API_TOKEN}" \
        -o /tmp/response.json

  - name: process_response
    command: |
      cat /tmp/response.json | jq '.results[] | .id' > /tmp/ids.txt
    depends:
      - fetch_data

  - name: post_results
    command: |
      curl -X POST https://api.example.com/results \
        -H "Content-Type: application/json" \
        -d @/tmp/ids.txt
    depends:
      - process_response

Dynamic Task Generation

Generate tasks dynamically based on data:

name: dynamic_tasks
steps:
  - name: list_files
    command: |
      ls /data/*.csv > /tmp/files.txt
    output: FILES

  - name: process_files
    command: |
      while read file; do
        echo "Processing: $file"
        ./process.sh "$file"
      done < /tmp/files.txt
    depends:
      - list_files

Integration with External Tools

Docker-in-Docker:

steps:
  - name: run_container
    command: |
      docker run --rm alpine:latest echo "Hello from Docker"

Git Integration:

steps:
  - name: clone_repo
    command: |
      git clone https://github.com/user/repo.git /tmp/repo

  - name: run_tests
    command: |
      cd /tmp/repo
      ./run-tests.sh
    depends:
      - clone_repo

Database Operations:

steps:
  - name: query_database
    command: |
      psql -h $DB_HOST -U $DB_USER -d $DB_NAME -c "
        SELECT * FROM users WHERE created > NOW() - INTERVAL '1 day';
      " > /tmp/new_users.txt

Custom Notification Handlers

Slack Integration:

name: workflow_with_slack
steps:
  - name: main_task
    command: ./process.sh

handlerOn:
  success:
    - name: notify_slack_success
      command: |
        curl -X POST ${SLACK_WEBHOOK_URL} \
          -H 'Content-Type: application/json' \
          -d '{
            "text": "✅ Workflow succeeded",
            "attachments": [{
              "color": "good",
              "fields": [
                {"title": "DAG", "value": "${DAG_NAME}", "short": true},
                {"title": "Time", "value": "$(date)", "short": true}
              ]
            }]
          }'

  failure:
    - name: notify_slack_failure
      command: |
        curl -X POST ${SLACK_WEBHOOK_URL} \
          -H 'Content-Type: application/json' \
          -d '{
            "text": "❌ Workflow failed",
            "attachments": [{
              "color": "danger",
              "fields": [
                {"title": "DAG", "value": "${DAG_NAME}", "short": true},
                {"title": "Time", "value": "$(date)", "short": true}
              ]
            }]
          }'

PagerDuty Integration:

handlerOn:
  failure:
    - name: trigger_pagerduty
      command: |
        curl -X POST https://events.pagerduty.com/v2/enqueue \
          -H 'Content-Type: application/json' \
          -d '{
            "routing_key": "${PAGERDUTY_KEY}",
            "event_action": "trigger",
            "payload": {
              "summary": "Dagu workflow failed: ${DAG_NAME}",
              "severity": "error",
              "source": "dagu",
              "custom_details": {
                "dag": "${DAG_NAME}",
                "time": "$(date)"
              }
            }
          }'

Additional Resources

Conclusion

Dagu provides a modern, lightweight approach to workflow orchestration that prioritizes simplicity without sacrificing functionality. By deploying on Klutch.sh, you benefit from automatic HTTPS, persistent storage, and simple Docker-based deployment while maintaining the flexibility and power that Dagu offers for workflow automation.

The platform’s file-based configuration and clean web interface make it accessible to developers who need reliable task scheduling and workflow management without the complexity of enterprise orchestration platforms. Whether you’re replacing cron jobs, building data pipelines, automating deployments, or managing complex multi-step processes, Dagu scales to meet your needs while remaining easy to understand and maintain.

Start with simple scheduled tasks and expand to complex workflows with dependencies, conditional execution, and retry logic as your automation requirements grow. The YAML-based DAG definitions are version-controllable, shareable, and self-documenting, making workflow management a straightforward part of your development process.

Deploy Dagu today and bring clarity and reliability to your workflow automation while keeping complexity at bay.