Skip to content

Deploying Operational.co

Introduction

Operational.co is an open-source incident management and on-call platform designed to help engineering teams respond to and resolve incidents efficiently. It provides tools for managing on-call rotations, escalation policies, incident tracking, and post-incident reviews, creating a structured approach to handling production issues.

Built for modern DevOps and SRE practices, Operational.co integrates with monitoring tools and alerting systems to streamline incident response workflows. Teams can define escalation policies, manage on-call schedules, and track incidents from detection through resolution and review.

Key highlights of Operational.co:

  • Incident Management: Create, track, and resolve incidents with structured workflows
  • On-Call Scheduling: Define and manage on-call rotations for teams
  • Escalation Policies: Configure automatic escalation when incidents are not acknowledged
  • Alert Integration: Connect with monitoring tools and alerting systems
  • Status Pages: Communicate incident status to stakeholders
  • Post-Mortems: Document and learn from incidents with structured reviews
  • Team Management: Organize responders into teams with specific responsibilities
  • Notification Channels: Alert via email, SMS, Slack, and other channels
  • API Access: Programmatic access for automation and integration
  • Self-Hosted: Complete control over your incident data

This guide walks through deploying Operational.co on Klutch.sh using Docker, configuring integrations, and setting up incident management for your team.

Why Deploy Operational.co on Klutch.sh

Deploying Operational.co on Klutch.sh provides several advantages:

Simplified Deployment: Klutch.sh automatically builds and deploys your incident management platform. Push to GitHub, and your service deploys automatically.

Persistent Storage: Attach persistent volumes for database and configuration. Your incident history and configurations survive container restarts.

HTTPS by Default: Klutch.sh provides automatic SSL certificates for secure access to your incident platform.

Always-On Availability: Your incident management system runs 24/7, essential for receiving alerts and managing on-call schedules.

GitHub Integration: Store configuration in Git for version-controlled infrastructure.

Scalable Resources: Allocate resources based on team size and alert volume.

Custom Domains: Use your organization’s domain for professional incident management URLs.

Prerequisites

Before deploying Operational.co on Klutch.sh, ensure you have:

  • A Klutch.sh account
  • A GitHub account with a repository for your configuration
  • Basic familiarity with Docker and containerization concepts
  • SMTP server or email service for notifications
  • (Optional) Slack workspace for chat notifications

Understanding Operational.co Architecture

Operational.co consists of several components:

API Server: Handles incident management, user authentication, and business logic.

Web Interface: Dashboard for managing incidents, schedules, and team configuration.

Background Workers: Process notifications, escalations, and scheduled tasks.

Database: Stores incidents, users, schedules, and configuration. PostgreSQL recommended.

Cache: Redis for session management and background job queues.

Preparing Your Repository

Create a GitHub repository containing your Dockerfile and configuration.

Repository Structure

operational-deploy/
├── Dockerfile
├── .dockerignore
└── README.md

Creating the Dockerfile

Create a Dockerfile for Operational.co:

FROM node:18-alpine
WORKDIR /app
# Install dependencies
RUN apk add --no-cache git python3 make g++
# Clone Operational.co repository
RUN git clone https://github.com/operational-co/operational.git .
# Install dependencies
RUN npm install
# Build the application
RUN npm run build
# Environment configuration
ENV NODE_ENV=production
ENV PORT=3000
# Expose the application port
EXPOSE 3000
# Start the application
CMD ["npm", "start"]

Environment Variables Reference

VariableRequiredDescription
DATABASE_URLYesPostgreSQL connection string
REDIS_URLYesRedis connection string
SECRET_KEYYesApplication secret for sessions
SMTP_HOSTYesSMTP server hostname
SMTP_PORTYesSMTP server port
SMTP_USERYesSMTP username
SMTP_PASSYesSMTP password
APP_URLYesPublic URL of the application
SLACK_WEBHOOK_URLNoSlack webhook for notifications

Deploying Operational.co on Klutch.sh

Follow these steps to deploy your incident management platform:

    Generate Security Keys

    Generate secure keys for your deployment:

    Terminal window
    # Generate secret key
    openssl rand -hex 32

    Deploy Required Services

    Operational.co requires:

    1. PostgreSQL Database: Deploy a PostgreSQL instance
    2. Redis: Deploy a Redis instance for caching and job queues

    Note the connection URLs for configuration.

    Push Your Repository to GitHub

    Initialize and push your repository:

    Terminal window
    git init
    git add Dockerfile .dockerignore README.md
    git commit -m "Initial Operational.co configuration"
    git remote add origin https://github.com/yourusername/operational-deploy.git
    git push -u origin main

    Create a New Project on Klutch.sh

    Navigate to the Klutch.sh dashboard and create a new project named “operational” or “incident-management”.

    Create a New App

    Within your project, create a new app. Connect your GitHub account and select the repository containing your Dockerfile.

    Configure HTTP Traffic

    In the deployment settings:

    • Select HTTP as the traffic type
    • Set the internal port to 3000

    Set Environment Variables

    Configure your instance:

    VariableValue
    DATABASE_URLYour PostgreSQL connection string
    REDIS_URLYour Redis connection string
    SECRET_KEYYour generated secret key
    SMTP_HOSTYour SMTP server
    SMTP_PORTSMTP port (typically 587)
    SMTP_USERSMTP username
    SMTP_PASSSMTP password
    APP_URLhttps://your-app-name.klutch.sh

    Attach Persistent Volumes

    Add persistent storage:

    Mount PathRecommended SizePurpose
    /app/data10 GBApplication data and uploads

    Deploy Your Application

    Click Deploy to start the build process.

    Complete Initial Setup

    Access Operational.co at https://your-app-name.klutch.sh:

    1. Create your admin account
    2. Set up your organization
    3. Configure notification channels
    4. Add team members

Initial Configuration

Creating Your Organization

Set up your organization structure:

  1. Create your organization
  2. Add teams (e.g., Backend, Frontend, Infrastructure)
  3. Invite team members
  4. Assign roles and permissions

Configuring Notification Channels

Set up how alerts are delivered:

  1. Email: Configure SMTP for email notifications
  2. Slack: Add Slack webhook URL for channel notifications
  3. SMS: Configure SMS provider for urgent alerts
  4. Phone: Set up voice calls for critical incidents

Setting Up On-Call Schedules

Create on-call rotations:

  1. Navigate to Schedules
  2. Create a new schedule
  3. Define rotation type (daily, weekly, custom)
  4. Add team members to the rotation
  5. Set rotation times and handoff procedures

Incident Management

Creating Incidents

When an incident occurs:

  1. Create a new incident manually or via API/integration
  2. Set severity level
  3. Assign to on-call responder
  4. Add initial description and impact

Incident Workflow

Standard incident lifecycle:

  1. Triggered: Incident is created
  2. Acknowledged: Responder confirms awareness
  3. Investigating: Active investigation in progress
  4. Resolved: Issue has been fixed
  5. Post-Mortem: Review and documentation

Escalation Policies

Configure automatic escalation:

  1. Create an escalation policy
  2. Define escalation levels
  3. Set timeout periods for each level
  4. Assign responders at each level

Example escalation:

  • Level 1: Primary on-call (5-minute timeout)
  • Level 2: Secondary on-call (10-minute timeout)
  • Level 3: Team lead (15-minute timeout)
  • Level 4: Engineering manager

Integration Setup

Monitoring Integration

Connect with monitoring tools:

  1. Configure webhook endpoints
  2. Map alert severity to incident priority
  3. Test integration with sample alerts

Slack Integration

Enable Slack notifications:

  1. Create a Slack app or incoming webhook
  2. Add webhook URL to configuration
  3. Configure which events trigger notifications
  4. Test with a sample incident

API Integration

Use the API for automation:

Terminal window
# Create an incident via API
curl -X POST https://your-app-name.klutch.sh/api/incidents \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"title": "Database connection errors",
"severity": "high",
"description": "Multiple services reporting database timeouts"
}'

Status Pages

Creating a Status Page

Communicate with stakeholders:

  1. Navigate to Status Pages
  2. Create a new status page
  3. Add components to monitor
  4. Configure public URL

Updating Status

During incidents:

  1. Link incident to affected components
  2. Update component status
  3. Post status updates for subscribers

Post-Incident Reviews

Creating Post-Mortems

Learn from incidents:

  1. After resolution, create a post-mortem
  2. Document timeline of events
  3. Identify root cause
  4. Define action items
  5. Share with team

Post-Mortem Template

Standard sections:

  • Summary: Brief description of what happened
  • Impact: Who/what was affected and for how long
  • Timeline: Chronological events during the incident
  • Root Cause: Why the incident occurred
  • Resolution: How the incident was resolved
  • Action Items: Follow-up tasks to prevent recurrence
  • Lessons Learned: What the team learned

Best Practices

On-Call Management

  • Define clear escalation policies
  • Ensure adequate coverage across time zones
  • Rotate fairly to prevent burnout
  • Provide runbooks for common issues

Incident Response

  • Acknowledge quickly to stop escalation
  • Communicate status regularly
  • Focus on resolution first, investigation later
  • Document everything during the incident

Continuous Improvement

  • Review all incidents in post-mortems
  • Track action items to completion
  • Share learnings across teams
  • Update runbooks based on incidents

Troubleshooting Common Issues

Notifications Not Sending

Symptoms: Alerts not reaching responders.

Solutions:

  • Verify SMTP configuration
  • Check notification channel settings
  • Verify recipient contact information
  • Review application logs

On-Call Not Escalating

Symptoms: Unacknowledged incidents not escalating.

Solutions:

  • Verify escalation policy configuration
  • Check timeout settings
  • Ensure on-call schedule is active
  • Verify responders are in rotation

Integration Not Working

Symptoms: External alerts not creating incidents.

Solutions:

  • Verify webhook URL is correct
  • Check authentication tokens
  • Review incoming webhook logs
  • Test with sample payloads

Additional Resources

Conclusion

Deploying Operational.co on Klutch.sh provides a comprehensive incident management solution for your engineering team. With on-call scheduling, escalation policies, and incident tracking, Operational.co brings structure to your incident response process.

The combination of persistent storage for incident history, reliable uptime for alert reception, and HTTPS security makes Klutch.sh well-suited for hosting Operational.co. Whether managing a small team or a large engineering organization, your self-hosted incident platform provides the control and reliability that commercial services cannot match.

Start with basic on-call schedules and incident tracking, then expand with integrations and status pages as your needs grow. With Operational.co on Klutch.sh, you own your incident management infrastructure.