Deploying Operational.co

Introduction

Operational.co is an open-source incident management and on-call platform designed to help engineering teams respond to and resolve incidents efficiently. It provides tools for managing on-call rotations, escalation policies, incident tracking, and post-incident reviews, creating a structured approach to handling production issues.

Built for modern DevOps and SRE practices, Operational.co integrates with monitoring tools and alerting systems to streamline incident response workflows. Teams can define escalation policies, manage on-call schedules, and track incidents from detection through resolution and review.

Key highlights of Operational.co:

Incident Management: Create, track, and resolve incidents with structured workflows
On-Call Scheduling: Define and manage on-call rotations for teams
Escalation Policies: Configure automatic escalation when incidents are not acknowledged
Alert Integration: Connect with monitoring tools and alerting systems
Status Pages: Communicate incident status to stakeholders
Post-Mortems: Document and learn from incidents with structured reviews
Team Management: Organize responders into teams with specific responsibilities
Notification Channels: Alert via email, SMS, Slack, and other channels
API Access: Programmatic access for automation and integration
Self-Hosted: Complete control over your incident data

This guide walks through deploying Operational.co on Klutch.sh using Docker, configuring integrations, and setting up incident management for your team.

Why Deploy Operational.co on Klutch.sh

Deploying Operational.co on Klutch.sh provides several advantages:

Simplified Deployment: Klutch.sh automatically builds and deploys your incident management platform. Push to GitHub, and your service deploys automatically.

Persistent Storage: Attach persistent volumes for database and configuration. Your incident history and configurations survive container restarts.

HTTPS by Default: Klutch.sh provides automatic SSL certificates for secure access to your incident platform.

Always-On Availability: Your incident management system runs 24/7, essential for receiving alerts and managing on-call schedules.

GitHub Integration: Store configuration in Git for version-controlled infrastructure.

Scalable Resources: Allocate resources based on team size and alert volume.

Custom Domains: Use your organization’s domain for professional incident management URLs.

Prerequisites

Before deploying Operational.co on Klutch.sh, ensure you have:

A Klutch.sh account
A GitHub account with a repository for your configuration
Basic familiarity with Docker and containerization concepts
SMTP server or email service for notifications
(Optional) Slack workspace for chat notifications

Understanding Operational.co Architecture

Operational.co consists of several components:

API Server: Handles incident management, user authentication, and business logic.

Web Interface: Dashboard for managing incidents, schedules, and team configuration.

Background Workers: Process notifications, escalations, and scheduled tasks.

Database: Stores incidents, users, schedules, and configuration. PostgreSQL recommended.

Cache: Redis for session management and background job queues.

Preparing Your Repository

Create a GitHub repository containing your Dockerfile and configuration.

Repository Structure

operational-deploy/
├── Dockerfile
├── .dockerignore
└── README.md

Creating the Dockerfile

Create a Dockerfile for Operational.co:

FROM node:18-alpine

WORKDIR /app

# Install dependencies
RUN apk add --no-cache git python3 make g++

# Clone Operational.co repository
RUN git clone https://github.com/operational-co/operational.git .

# Install dependencies
RUN npm install

# Build the application
RUN npm run build

# Environment configuration
ENV NODE_ENV=production
ENV PORT=3000

# Expose the application port
EXPOSE 3000

# Start the application
CMD ["npm", "start"]

Environment Variables Reference

Variable	Required	Description
`DATABASE_URL`	Yes	PostgreSQL connection string
`REDIS_URL`	Yes	Redis connection string
`SECRET_KEY`	Yes	Application secret for sessions
`SMTP_HOST`	Yes	SMTP server hostname
`SMTP_PORT`	Yes	SMTP server port
`SMTP_USER`	Yes	SMTP username
`SMTP_PASS`	Yes	SMTP password
`APP_URL`	Yes	Public URL of the application
`SLACK_WEBHOOK_URL`	No	Slack webhook for notifications

Deploying Operational.co on Klutch.sh

Follow these steps to deploy your incident management platform:

Generate Security Keys

Generate secure keys for your deployment:

# Generate secret key
openssl rand -hex 32

Deploy Required Services

Operational.co requires:

PostgreSQL Database: Deploy a PostgreSQL instance
Redis: Deploy a Redis instance for caching and job queues

Note the connection URLs for configuration.

Push Your Repository to GitHub

Initialize and push your repository:

git init
git add Dockerfile .dockerignore README.md
git commit -m "Initial Operational.co configuration"
git remote add origin https://github.com/yourusername/operational-deploy.git
git push -u origin main

Create a New Project on Klutch.sh

Navigate to the Klutch.sh dashboard and create a new project named “operational” or “incident-management”.

Create a New App

Within your project, create a new app. Connect your GitHub account and select the repository containing your Dockerfile.

Configure HTTP Traffic

In the deployment settings:

Select HTTP as the traffic type
Set the internal port to 3000

Set Environment Variables

Configure your instance:

Variable	Value
`DATABASE_URL`	Your PostgreSQL connection string
`REDIS_URL`	Your Redis connection string
`SECRET_KEY`	Your generated secret key
`SMTP_HOST`	Your SMTP server
`SMTP_PORT`	SMTP port (typically 587)
`SMTP_USER`	SMTP username
`SMTP_PASS`	SMTP password
`APP_URL`	`https://your-app-name.klutch.sh`

Attach Persistent Volumes

Add persistent storage:

Mount Path	Recommended Size	Purpose
`/app/data`	10 GB	Application data and uploads

Deploy Your Application

Click Deploy to start the build process.

Complete Initial Setup

Access Operational.co at https://your-app-name.klutch.sh:

Create your admin account
Set up your organization
Configure notification channels
Add team members

Initial Configuration

Creating Your Organization

Set up your organization structure:

Create your organization
Add teams (e.g., Backend, Frontend, Infrastructure)
Invite team members
Assign roles and permissions

Configuring Notification Channels

Set up how alerts are delivered:

Email: Configure SMTP for email notifications
Slack: Add Slack webhook URL for channel notifications
SMS: Configure SMS provider for urgent alerts
Phone: Set up voice calls for critical incidents

Setting Up On-Call Schedules

Create on-call rotations:

Navigate to Schedules
Create a new schedule
Define rotation type (daily, weekly, custom)
Add team members to the rotation
Set rotation times and handoff procedures

Incident Management

Creating Incidents

When an incident occurs:

Create a new incident manually or via API/integration
Set severity level
Assign to on-call responder
Add initial description and impact

Incident Workflow

Standard incident lifecycle:

Triggered: Incident is created
Acknowledged: Responder confirms awareness
Investigating: Active investigation in progress
Resolved: Issue has been fixed
Post-Mortem: Review and documentation

Escalation Policies

Configure automatic escalation:

Create an escalation policy
Define escalation levels
Set timeout periods for each level
Assign responders at each level

Example escalation:

Level 1: Primary on-call (5-minute timeout)
Level 2: Secondary on-call (10-minute timeout)
Level 3: Team lead (15-minute timeout)
Level 4: Engineering manager

Integration Setup

Monitoring Integration

Connect with monitoring tools:

Configure webhook endpoints
Map alert severity to incident priority
Test integration with sample alerts

Slack Integration

Enable Slack notifications:

Create a Slack app or incoming webhook
Add webhook URL to configuration
Configure which events trigger notifications
Test with a sample incident

API Integration

Use the API for automation:

# Create an incident via API
curl -X POST https://your-app-name.klutch.sh/api/incidents \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "title": "Database connection errors",
    "severity": "high",
    "description": "Multiple services reporting database timeouts"
  }'

Status Pages

Creating a Status Page

Communicate with stakeholders:

Navigate to Status Pages
Create a new status page
Add components to monitor
Configure public URL

Updating Status

During incidents:

Link incident to affected components
Update component status
Post status updates for subscribers

Post-Incident Reviews

Creating Post-Mortems

Learn from incidents:

After resolution, create a post-mortem
Document timeline of events
Identify root cause
Define action items
Share with team

Post-Mortem Template

Standard sections:

Summary: Brief description of what happened
Impact: Who/what was affected and for how long
Timeline: Chronological events during the incident
Root Cause: Why the incident occurred
Resolution: How the incident was resolved
Action Items: Follow-up tasks to prevent recurrence
Lessons Learned: What the team learned

Best Practices

On-Call Management

Define clear escalation policies
Ensure adequate coverage across time zones
Rotate fairly to prevent burnout
Provide runbooks for common issues

Incident Response

Acknowledge quickly to stop escalation
Communicate status regularly
Focus on resolution first, investigation later
Document everything during the incident

Continuous Improvement

Review all incidents in post-mortems
Track action items to completion
Share learnings across teams
Update runbooks based on incidents

Troubleshooting Common Issues

Notifications Not Sending

Symptoms: Alerts not reaching responders.

Solutions:

Verify SMTP configuration
Check notification channel settings
Verify recipient contact information
Review application logs

On-Call Not Escalating

Symptoms: Unacknowledged incidents not escalating.

Solutions:

Verify escalation policy configuration
Check timeout settings
Ensure on-call schedule is active
Verify responders are in rotation

Integration Not Working

Symptoms: External alerts not creating incidents.

Solutions:

Verify webhook URL is correct
Check authentication tokens
Review incoming webhook logs
Test with sample payloads

Additional Resources

Conclusion

Deploying Operational.co on Klutch.sh provides a comprehensive incident management solution for your engineering team. With on-call scheduling, escalation policies, and incident tracking, Operational.co brings structure to your incident response process.

The combination of persistent storage for incident history, reliable uptime for alert reception, and HTTPS security makes Klutch.sh well-suited for hosting Operational.co. Whether managing a small team or a large engineering organization, your self-hosted incident platform provides the control and reliability that commercial services cannot match.

Start with basic on-call schedules and incident tracking, then expand with integrations and status pages as your needs grow. With Operational.co on Klutch.sh, you own your incident management infrastructure.