Skip to content

Deploying a CKAN Data Portal

Built with Python, CKAN provides a comprehensive platform for publishing, organizing, sharing, and accessing datasets with a powerful web interface, full-featured API, advanced search capabilities, and rich visualization tools. Whether you’re building a government open data portal, an enterprise data hub, or a community data platform, CKAN delivers the features and flexibility needed to manage data assets at scale.

CKAN powers hundreds of data portals worldwide, including official government data portals like Canada’s open.canada.ca/data, Australia’s data.gov.au, Singapore’s data.gov.sg, and humanitarian data platforms like data.humdata.org. It has been recognized as a Digital Public Good by the United Nations, officially supporting 9 of the 17 Sustainable Development Goals. With support for organizing datasets into custom collections, managing resources with multiple file formats, controlling access with fine-grained permissions, extending functionality through plugins, and integrating with external systems, CKAN provides enterprise-grade capabilities for data governance and management.

Key Features

  • Dataset Management: Organize, catalog, and manage datasets with rich metadata and custom fields
  • Resource Management: Handle multiple file formats and data types within datasets, with versioning support
  • Full-Text Search: Powerful search and filtering capabilities with faceted search for discovering datasets
  • Data Publishing: Easy-to-use interface for data custodians to publish datasets with metadata and documentation
  • Data Access Control: Fine-grained permissions system for controlling who can view, edit, or publish datasets
  • Custom Metadata: Extend metadata schemas with custom fields and validation rules for domain-specific needs
  • Organizations and Groups: Organize datasets by organization, department, or thematic group
  • Data API: Complete REST API for programmatic access to catalog metadata and resources
  • Harvesting: Automatic harvesting of datasets from other CKAN instances and external sources
  • Data Preview: Built-in previews for common data formats including CSV, JSON, and interactive maps
  • Data Dictionary: Create and maintain data dictionaries for describing dataset structure and column definitions
  • Activity Stream: Track dataset changes and activity with complete audit trail of modifications
  • User Management: Role-based access control with different permission levels for administrators and data managers
  • Data Retention: Configure data lifecycle and retention policies for archived and historical datasets
  • Multi-language Support: Translate portal content and metadata into multiple languages
  • Workflow Management: Define custom workflows for dataset review, approval, and publication
  • Integration APIs: REST API for integration with external applications and reporting tools
  • Extensibility: Plugin system for extending functionality with custom features and visualizations
  • Mobile Responsive: Responsive web design for accessing data portals on all devices
  • Open Standards: Support for DCAT, RDF, and other open data standards for interoperability

Prerequisites

To deploy CKAN on Klutch.sh, ensure you have the following:

  • An active Klutch.sh account with access to the dashboard at klutch.sh/app
  • A GitHub repository for version control
  • Understanding of data portal concepts and open data standards
  • PostgreSQL database for CKAN metadata storage (can be deployed separately or on same instance)
  • Redis instance for caching and job queue management
  • Familiarity with Docker and containerization concepts
  • Access to manage persistent storage volumes for dataset files and uploads

Important Considerations

Deployment Steps

  1. Create the Dockerfile

    Create a Dockerfile in the root directory of your repository:

    FROM ckan/ckan-base:2.11
    # Install additional system dependencies
    RUN apt-get update && \
    apt-get install -y --no-install-recommends \
    git \
    curl \
    ca-certificates && \
    rm -rf /var/lib/apt/lists/*
    # Create CKAN storage directories
    RUN mkdir -p /var/lib/ckan/storage && \
    chown -R ckan:ckan /var/lib/ckan
    # Install core CKAN
    RUN pip install --no-cache-dir \
    ckan==2.11.4
    # Copy any custom configuration files
    COPY ckan-config.ini /etc/ckan/production.ini
    COPY entrypoint.sh /entrypoint.sh
    RUN chmod +x /entrypoint.sh
    # Expose CKAN web port
    EXPOSE 5000
    # Set working directory
    WORKDIR /var/lib/ckan
    # Define volumes for persistent storage
    VOLUME ["/var/lib/ckan"]
    # Health check
    HEALTHCHECK --interval=30s --timeout=10s --start-period=15s --retries=3 \
    CMD curl -f http://localhost:5000/ || exit 1
    # Use custom entrypoint for initialization
    ENTRYPOINT ["/entrypoint.sh"]
    # Default command
    CMD ["ckan", "-c", "/etc/ckan/production.ini", "run"]

    This Dockerfile uses the official CKAN base image and configures the necessary ports, volumes, and dependencies.

  2. Create an Entrypoint Script

    Create an entrypoint.sh file for initialization:

    #!/bin/bash
    set -e
    echo "Starting CKAN initialization..."
    # Wait for database to be ready
    echo "Waiting for PostgreSQL database..."
    until pg_isready -h ${CKAN_SQLALCHEMY_URL##postgresql://[^@]*@} -U ${CKAN_SQLALCHEMY_URL##*:} 2>/dev/null || true; do
    echo "Database not ready, retrying..."
    sleep 5
    done
    echo "Database is ready!"
    # Initialize database if needed
    echo "Initializing CKAN database..."
    ckan -c /etc/ckan/production.ini db init || true
    # Create default admin user if it doesn't exist
    if ! ckan -c /etc/ckan/production.ini user list | grep -q admin; then
    echo "Creating default admin user..."
    ckan -c /etc/ckan/production.ini user add \
    admin \
    password="${CKAN_ADMIN_PASSWORD:-admin}" \
    email="${CKAN_ADMIN_EMAIL:-admin@example.com}" \
    fullname="Administrator"
    # Grant sysadmin rights
    ckan -c /etc/ckan/production.ini sysadmin add admin
    fi
    echo "CKAN initialization complete!"
    # Execute main command
    exec "$@"

    Make the script executable and update the Dockerfile CMD if using this custom entrypoint.

  3. Create CKAN Configuration File

    Create a ckan-config.ini file for CKAN settings:

    # CKAN Configuration File
    [app:main]
    use = egg:ckan
    debug = false
    # Database URL - set via environment variable CKAN_SQLALCHEMY_URL
    sqlalchemy.url = postgresql://ckan:password@postgres-host/ckan
    # Redis URL - set via environment variable CKAN_REDIS_URL
    cache_url = redis://redis-host:6379/1
    ckan.redis.url = redis://redis-host:6379/0
    # Site title and description
    ckan.site_title = Open Data Portal
    ckan.site_description = Sharing data openly
    ckan.site_url = https://example-app.klutch.sh
    # API Token Lifetime
    api_token.nbf_leeway = 0
    api_token.algorithm = HS256
    # Storage settings
    ckan.storage_path = /var/lib/ckan/storage
    ckan.max_resource_size = 500 # In MB
    # Plugins configuration
    ckan.plugins = text_view image_view recline_view datastore datapusher
    # Session settings
    beaker.session.key = CKAN
    beaker.session.type = ext:memcached
    beaker.session.url = redis-host:6379
    beaker.session.sa.url = postgresql://ckan:password@postgres-host/ckan_sessions
    # Email settings - configure for notifications
    smtp.server = ${CKAN_SMTP_SERVER}
    smtp.user = ${CKAN_SMTP_USER}
    smtp.password = ${CKAN_SMTP_PASSWORD}
    email_to = ${CKAN_EMAIL_TO}
    error_email_from = ckan@example.com
    # Authorization settings
    ckan.auth.create_user_via_api = false
    ckan.auth.create_unowned_dataset = false
    # Security headers
    ckan.views.default_views = True
    ckan.user.create_organizations = True
    # Locale settings
    ckan.locale_default = en
    ckan.locale_order = en pt_BR
    [loggers]
    keys = root, ckan, ckanext
    [handlers]
    keys = console
    [formatters]
    keys = generic
    [logger_root]
    level = WARNING
    handlers = console
    [logger_ckan]
    level = INFO
    handlers = console
    qualname = ckan
    [logger_ckanext]
    level = DEBUG
    handlers = console
    qualname = ckanext
    [handler_console]
    class = StreamHandler
    args = (sys.stderr,)
    level = NOTSET
    formatter = generic
    [formatter_generic]
    format = %(asctime)s %(levelname)-5.5s [%(name)s] %(message)s
  4. Create Environment Configuration File

    Create an .env.example file with all configuration variables:

    # CKAN Core Configuration
    CKAN_SITE_TITLE=Open Data Portal
    CKAN_SITE_DESCRIPTION=Sharing data openly
    CKAN_SITE_URL=https://example-app.klutch.sh
    # Database Configuration
    # PostgreSQL connection for CKAN metadata
    CKAN_SQLALCHEMY_URL=postgresql://ckan:secure-password@postgres-host:5432/ckan
    # Redis Configuration
    # Redis for caching and job queue
    CKAN_REDIS_URL=redis://redis-host:6379/0
    # Admin User Configuration
    CKAN_ADMIN_USERNAME=admin
    CKAN_ADMIN_PASSWORD=secure-admin-password
    CKAN_ADMIN_EMAIL=admin@example.com
    # API Configuration
    CKAN_API_TOKEN_EXPIRATION_DAYS=730
    CKAN_MAX_RESOURCE_SIZE=500
    # Storage Configuration
    CKAN_STORAGE_PATH=/var/lib/ckan/storage
    # Security Settings
    CKAN_SECRET_KEY=your-secret-key-change-this
    # Email/SMTP Configuration (optional)
    CKAN_SMTP_SERVER=smtp.example.com
    CKAN_SMTP_PORT=587
    CKAN_SMTP_USER=notifications@example.com
    CKAN_SMTP_PASSWORD=smtp-password
    CKAN_EMAIL_TO=admin@example.com
    # Authentication Settings
    CKAN_AUTH_CREATE_USER_VIA_API=false
    CKAN_AUTH_CREATE_UNOWNED_DATASET=false
    # Plugins Configuration
    CKAN_PLUGINS=text_view image_view recline_view datastore datapusher
    # Localization
    CKAN_LOCALE_DEFAULT=en
    CKAN_LOCALE_ORDER=en pt_BR
    # Logging Level
    CKAN_LOG_LEVEL=info
    # Session Settings
    CKAN_BEAKER_SESSION_TYPE=redis
    CKAN_BEAKER_SESSION_URL=redis-host:6379
    # Users can create organizations
    CKAN_USER_CREATE_ORGANIZATIONS=true
    # User registration
    CKAN_USER_REGISTER_ENABLED=true
  5. Create a .gitignore File

    Create a .gitignore file to exclude sensitive files:

    # Environment variables
    .env
    .env.local
    .env.*.local
    # CKAN data and storage
    /var/lib/ckan/
    data/
    uploads/
    # IDE and editor files
    .vscode/
    .idea/
    *.swp
    *.swo
    *~
    # OS files
    .DS_Store
    Thumbs.db
    # Python
    __pycache__/
    *.py[cod]
    *$py.class
    *.egg-info/
    dist/
    build/
    # Logs
    *.log
    logs/
    # Docker
    .dockerignore
    # Node (if frontend customization)
    node_modules/
    npm-debug.log
  6. Push to GitHub

    Initialize a Git repository and push your files:

    Terminal window
    git init
    git add Dockerfile entrypoint.sh ckan-config.ini .env.example .gitignore
    git commit -m "Initial CKAN deployment setup"
    git branch -M main
    git remote add origin https://github.com/YOUR_USERNAME/YOUR_REPOSITORY.git
    git push -u origin main

    Replace YOUR_USERNAME and YOUR_REPOSITORY with your actual GitHub credentials.

  7. Deploy Database Infrastructure

    Before deploying CKAN, you need PostgreSQL and Redis instances. You can either:

    Option A: Deploy on Klutch.sh

    1. Log in to klutch.sh/app
    2. Create a PostgreSQL app by selecting a PostgreSQL Docker image
    3. Configure database name as ckan and create a user ckan
    4. Similarly, create a Redis app
    5. Note the database and Redis connection details

    Option B: Use Managed Services

    • Use your hosting provider’s managed PostgreSQL and Redis services
    • Note the connection URLs for configuration in CKAN
  8. Deploy CKAN on Klutch.sh

    1. Log in to your Klutch.sh dashboard at klutch.sh/app
    2. Click Create App and select GitHub
    3. Connect your GitHub account and select your CKAN repository
    4. Configure deployment settings:
      • App Name: ckan-portal (or your preferred name)
      • Branch: Select main
      • Build Command: Leave default
      • Start Command: Leave default or customize via environment variables
    5. In environment variables, set:
      • CKAN_SQLALCHEMY_URL: Your PostgreSQL connection string
      • CKAN_REDIS_URL: Your Redis connection URL
      • CKAN_ADMIN_PASSWORD: Secure admin password
      • CKAN_SITE_URL: Your deployment URL
      • Other CKAN configuration variables from .env.example
    6. Click Deploy and wait for deployment to complete

    Klutch.sh automatically detects the Dockerfile and uses it for building and deploying your CKAN instance.

  9. Configure Persistent Storage

    After deployment, attach persistent volumes for data persistence:

    1. Go to your deployed CKAN app in the Klutch.sh dashboard
    2. Navigate to Storage or Volumes section
    3. Click Add Persistent Volume
    4. Configure volumes:
      • Mount Path: /var/lib/ckan/storage
      • Size: 100GB (adjust based on expected data)
    5. Save and restart the container

    This ensures all uploaded datasets, files, and resources persist across container updates and restarts.

  10. Configure Network Traffic

    Set up HTTP traffic for your CKAN portal:

    1. In your app settings, go to Network section
    2. Set Traffic Type to HTTP
    3. Internal port should be 5000 (CKAN’s default)
    4. Your CKAN portal will be accessible at https://example-app.klutch.sh

Initial Setup and Configuration

After your CKAN deployment is running, follow these steps to set up your data portal:

Access the Web Interface

Open your browser and navigate to https://example-app.klutch.sh. You’ll see the CKAN home page with the dataset catalog.

Log in as Administrator

  1. Click the Log In button in the top navigation
  2. Use the admin credentials you set in environment variables
  3. You’ll be logged in as sysadmin with full administrative privileges

Create Your First Organization

  1. Go to Organizations in the top navigation
  2. Click Create Organization
  3. Fill in organization details:
    • Name: Organization identifier (e.g., health-dept)
    • Title: Display name (e.g., Health Department)
    • Description: Brief description of the organization
    • Image: Optional organization logo
  4. Click Create Organization

Create a Dataset

  1. Click DatasetsAdd Dataset
  2. Fill in dataset metadata:
    • Title: Dataset name
    • Description: Detailed description
    • Tags: Keywords for searchability
    • Organization: Select the organization that owns the data
    • Visibility: Public or Private
  3. Click Next to add resources
  4. Upload or link data files:
    • File: Upload CSV, JSON, Excel, or other formats
    • Format: Specify the data format
    • Description: Describe the resource content
  5. Click Finish to publish the dataset

Configure Data Preview

  1. Go to Admin PanelExtensions
  2. Ensure preview plugins are enabled:
    • Text View (for CSV, JSON)
    • Image View
    • Recline View (for tabular data)
  3. This enables interactive data previews in the portal

Set Up User Accounts

  1. Go to Admin PanelUsers
  2. Click Create User
  3. Fill in user details:
    • Name: Username for login
    • Email: User email address
    • Password: Initial password
    • Sysadmin: Check to grant admin access
  4. Click Create User

Configure Portal Settings

  1. Go to Admin PanelConfig
  2. Customize:
    • Site Title: Your portal name
    • Site Description: Portal tagline
    • Logo: Custom logo image
    • Custom CSS: Additional styling
  3. Click Save to apply changes

Environment Variables

Basic Configuration

CKAN_SITE_TITLE=Open Data Portal
CKAN_SITE_URL=https://example-app.klutch.sh
CKAN_SQLALCHEMY_URL=postgresql://ckan:password@postgres:5432/ckan
CKAN_REDIS_URL=redis://redis:6379/0
CKAN_ADMIN_PASSWORD=secure-password
CKAN_SECRET_KEY=your-secret-key

These variables control the basic CKAN configuration including site settings, database connections, and administrative credentials.

Production Configuration

For production deployments, use these environment variables with Nixpacks for advanced customization:

# Advanced Configuration for Production Deployments
# Database Connection
CKAN_SQLALCHEMY_URL=postgresql://ckan:secure-password@postgres.example.com:5432/ckan
CKAN_SQLALCHEMY_POOL_SIZE=10
CKAN_SQLALCHEMY_POOL_RECYCLE=3600
CKAN_SQLALCHEMY_POOL_PRE_PING=true
# Redis Configuration
CKAN_REDIS_URL=redis://redis.example.com:6379/0
CKAN_REDIS_SESSIONS_URL=redis://redis.example.com:6379/1
# Site Configuration
CKAN_SITE_TITLE=National Open Data Portal
CKAN_SITE_DESCRIPTION=Share and access government datasets
CKAN_SITE_URL=https://data.example.gov
CKAN_SITE_LOGO=/images/custom-logo.png
# User Authentication
CKAN_ADMIN_USERNAME=admin
CKAN_ADMIN_PASSWORD=very-secure-password
CKAN_AUTH_CREATE_USER_VIA_API=false
CKAN_AUTH_ROLES_THAT_CASCADE_TO_SUB_GROUPS=admin editor
# Security Settings
CKAN_SECRET_KEY=your-very-long-secret-key-minimum-32-chars
CKAN_BEAKER_SESSION_SECRET=session-secret-key
CKAN_BEAKER_SESSION_TYPE=redis
CKAN_BEAKER_SESSION_URL=redis.example.com:6379
# Storage Configuration
CKAN_STORAGE_PATH=/var/lib/ckan/storage
CKAN_MAX_RESOURCE_SIZE=1000 # In MB, 1000 = 1GB max upload
# Email/SMTP Configuration
CKAN_SMTP_SERVER=smtp.example.com
CKAN_SMTP_PORT=587
CKAN_SMTP_USER=noreply@example.com
CKAN_SMTP_PASSWORD=smtp-password
CKAN_SMTP_TLS=true
CKAN_ERROR_EMAIL_FROM=ckan-errors@example.com
CKAN_EMAIL_TO=admin@example.com
# Plugins (install as needed)
CKAN_PLUGINS=text_view image_view recline_view datastore datapusher harvest ckan_harvester
# Organization Settings
CKAN_USER_CREATE_ORGANIZATIONS=true
CKAN_USER_REGISTER_ENABLED=true
CKAN_USER_RESET_PASSWORD_ENABLED=true
# Search Configuration
CKAN_SEARCH_BACKEND=db # or solr if using Solr search
# API Configuration
CKAN_API_TOKEN_EXPIRATION_DAYS=730
CKAN_API_TOKEN_NBFLEEWAY=0
# Localization
CKAN_LOCALE_DEFAULT=en
CKAN_LOCALE_ORDER=en fr es
CKAN_LOCALES_FILTERED_OUT=
# Logging
CKAN_LOG_LEVEL=info
CKAN_PROPAGATE_ERRORS=true
# Views and Previews
CKAN_VIEWS_DEFAULT_VIEWS=true
# Activity Tracking
CKAN_ACTIVITY_STREAM_ENABLED=true
CKAN_ACTIVITY_STREAM_SQL_ENABLED=true

To apply these variables in Klutch.sh:

  1. Go to your app settings
  2. Navigate to Environment Variables
  3. Add each variable with appropriate values for your deployment
  4. Click Save and redeploy if necessary

Code Examples

Bash: Automated Data Harvest Script

This script automatically harvests and imports datasets from remote CKAN instances:

#!/bin/bash
# CKAN Automated Data Harvest Script
# Harvests datasets from remote CKAN sources
set -e
CKAN_API_URL="https://example-app.klutch.sh/api/3"
CKAN_API_KEY="your-api-key"
HARVEST_LOG="/var/log/ckan-harvest.log"
echo "$(date): Starting CKAN harvest process..." >> "$HARVEST_LOG"
# Function to harvest from a remote CKAN instance
harvest_from_remote() {
local remote_url=$1
local source_name=$2
echo "Harvesting from: $remote_url" >> "$HARVEST_LOG"
# Create harvest source
curl -X POST "$CKAN_API_URL/action/harvest_source_create" \
-H "X-CKAN-API-Key: $CKAN_API_KEY" \
-H "Content-Type: application/json" \
-d "{
\"title\": \"$source_name\",
\"name\": \"$(echo $source_name | tr ' ' '-' | tr '[:upper:]' '[:lower:]')\",
\"url\": \"$remote_url\",
\"source_type\": \"ckan\"
}" >> "$HARVEST_LOG" 2>&1
}
# Function to run harvest job
run_harvest() {
echo "Running harvest jobs..." >> "$HARVEST_LOG"
curl -X POST "$CKAN_API_URL/action/harvest_jobs_run" \
-H "X-CKAN-API-Key: $CKAN_API_KEY" \
-H "Content-Type: application/json" \
-d "{}" >> "$HARVEST_LOG" 2>&1
}
# Harvest from configured sources
harvest_from_remote "https://remote-ckan.example.com/api/3" "Remote Open Data"
harvest_from_remote "https://other-data-portal.org/api/3" "Other Data Portal"
# Run the harvest
run_harvest
echo "$(date): Harvest process completed." >> "$HARVEST_LOG"

Python: CKAN API Client for Dataset Management

This example shows how to interact with CKAN’s API to create and manage datasets programmatically:

#!/usr/bin/env python3
import requests
import json
from typing import Dict, Optional
class CKANClient:
"""Client for interacting with CKAN API"""
def __init__(self, base_url: str, api_key: str):
self.base_url = base_url.rstrip('/')
self.api_key = api_key
self.headers = {
'X-CKAN-API-Key': self.api_key,
'Content-Type': 'application/json'
}
def _make_request(self, endpoint: str, method: str = 'GET', data: Optional[Dict] = None) -> Dict:
"""Make API request to CKAN"""
url = f"{self.base_url}/api/3/action/{endpoint}"
if method == 'GET':
response = requests.get(url, headers=self.headers)
elif method == 'POST':
response = requests.post(url, headers=self.headers, json=data)
else:
raise ValueError(f"Unsupported HTTP method: {method}")
response.raise_for_status()
return response.json()
def create_dataset(self, name: str, title: str, owner_org: str, **kwargs) -> Dict:
"""Create a new dataset"""
payload = {
'name': name,
'title': title,
'owner_org': owner_org,
**kwargs
}
result = self._make_request('package_create', method='POST', data=payload)
return result['result']
def create_resource(self, package_id: str, url: str, name: str, **kwargs) -> Dict:
"""Add a resource to a dataset"""
payload = {
'package_id': package_id,
'url': url,
'name': name,
**kwargs
}
result = self._make_request('resource_create', method='POST', data=payload)
return result['result']
def search_datasets(self, query: str, rows: int = 10) -> Dict:
"""Search for datasets"""
data = {
'q': query,
'rows': rows
}
result = self._make_request('package_search', method='POST', data=data)
return result['result']
def get_dataset(self, dataset_id: str) -> Dict:
"""Get dataset details"""
result = self._make_request(f'package_show?id={dataset_id}')
return result['result']
def list_organizations(self) -> Dict:
"""List all organizations"""
result = self._make_request('organization_list')
return result['result']
def get_user_info(self) -> Dict:
"""Get current user information"""
result = self._make_request('user_show?id=me')
return result['result']
# Usage example
if __name__ == '__main__':
# Initialize client
client = CKANClient(
base_url='https://example-app.klutch.sh',
api_key='your-api-key'
)
# Create a new dataset
dataset = client.create_dataset(
name='population-statistics',
title='Population Statistics 2024',
owner_org='statistics-department',
notes='Annual population data by region',
tags=[{'name': 'population'}, {'name': 'statistics'}]
)
print(f"Created dataset: {dataset['id']}")
# Add a resource
resource = client.create_resource(
package_id=dataset['id'],
url='https://example.com/population-data.csv',
name='Population Data 2024',
format='CSV',
description='CSV file containing population figures'
)
print(f"Added resource: {resource['id']}")
# Search datasets
results = client.search_datasets('population', rows=5)
print(f"Found {results['count']} datasets matching 'population'")
# List organizations
orgs = client.list_organizations()
print(f"Available organizations: {[org['title'] for org in orgs]}")

cURL: REST API Examples

Execute these cURL commands to interact with your CKAN instance:

Terminal window
# Set variables
CKAN_URL="https://example-app.klutch.sh"
API_KEY="your-api-key"
# Get site information
curl -X GET "$CKAN_URL/api/3/action/status_show" \
-H "X-CKAN-API-Key: $API_KEY"
# Search datasets
curl -X POST "$CKAN_URL/api/3/action/package_search" \
-H "X-CKAN-API-Key: $API_KEY" \
-H "Content-Type: application/json" \
-d '{
"q": "population",
"rows": 10
}'
# Create a new dataset
curl -X POST "$CKAN_URL/api/3/action/package_create" \
-H "X-CKAN-API-Key: $API_KEY" \
-H "Content-Type: application/json" \
-d '{
"name": "environmental-data",
"title": "Environmental Monitoring Data",
"owner_org": "environmental-agency",
"notes": "Air quality and pollution monitoring",
"tags": [
{"name": "environment"},
{"name": "air-quality"}
]
}'
# Get organization information
curl -X GET "$CKAN_URL/api/3/action/organization_show?id=environmental-agency" \
-H "X-CKAN-API-Key: $API_KEY"
# List all datasets
curl -X POST "$CKAN_URL/api/3/action/package_list" \
-H "X-CKAN-API-Key: $API_KEY" \
-H "Content-Type: application/json" \
-d '{
"limit": 20,
"offset": 0
}'
# Create a resource (attach file to dataset)
curl -X POST "$CKAN_URL/api/3/action/resource_create" \
-H "X-CKAN-API-Key: $API_KEY" \
-d '{
"package_id": "environmental-data",
"url": "https://example.com/air-quality.csv",
"name": "Air Quality Data",
"format": "CSV"
}'
# Get user information
curl -X GET "$CKAN_URL/api/3/action/user_show?id=admin" \
-H "X-CKAN-API-Key: $API_KEY"
# Modify dataset
curl -X POST "$CKAN_URL/api/3/action/package_patch" \
-H "X-CKAN-API-Key: $API_KEY" \
-H "Content-Type: application/json" \
-d '{
"id": "environmental-data",
"notes": "Updated: Air quality and pollution monitoring data with real-time updates"
}'
# Get activity stream
curl -X GET "$CKAN_URL/api/3/action/dashboard_activity_list" \
-H "X-CKAN-API-Key: $API_KEY"

Best Practices

Data Organization

  • Use Organizations: Group related datasets by organization or department for better organization and access control
  • Consistent Naming: Use descriptive, consistent naming conventions for datasets and resources
  • Complete Metadata: Fill in comprehensive metadata including descriptions, keywords, and license information
  • Resource Versioning: Upload new versions of resources with clear versioning to track data evolution
  • Data Quality: Validate data format and quality before publishing to ensure portal credibility

Access Control and Security

  • User Permissions: Assign appropriate roles (Admin, Editor, Member) based on responsibilities
  • Organization Ownership: Ensure datasets are assigned to appropriate organizations with clear ownership
  • Public vs Private: Mark sensitive data as private and restrict access to authorized users only
  • API Key Management: Create separate API keys for different applications and rotate them regularly
  • HTTPS/SSL: Ensure your CKAN portal uses HTTPS for secure communication

Portal Management

  • Regular Updates: Keep CKAN and plugins updated to the latest versions for security and feature improvements
  • Backup Strategy: Regularly back up PostgreSQL database and file storage to protect against data loss
  • Monitoring: Monitor portal performance, disk space, and database size for production health
  • User Support: Provide documentation and guidance for users on how to find, understand, and use datasets
  • Data Governance: Establish clear policies for data publication, retention, and archival

Performance Optimization

  • Caching: Use Redis caching to improve search and API response times
  • Database Tuning: Configure PostgreSQL connection pooling and indexes for optimal performance
  • Search Optimization: Use Solr for advanced search capabilities if database search is insufficient
  • CDN: Consider using a CDN for static assets and file downloads
  • Resource Limits: Configure appropriate upload size limits and query timeouts

Integration and Automation

  • API Integration: Use CKAN API to integrate with external systems and workflows
  • Data Harvesting: Set up automatic harvesting from other data portals to aggregate data
  • Webhooks: Implement event-driven workflows for dataset updates and notifications
  • Automated Backups: Configure scheduled backups to cloud storage or dedicated backup servers
  • Monitoring Alerts: Set up alerts for deployment issues, database problems, and storage warnings

Troubleshooting

Issue: Database Connection Fails on Startup

Solution: Verify that PostgreSQL is running and accessible from the CKAN container. Check CKAN_SQLALCHEMY_URL environment variable for correct connection string. Ensure database user has proper permissions on the CKAN database.

Issue: Redis Connection Errors

Solution: Confirm Redis is running and reachable from CKAN. Check CKAN_REDIS_URL environment variable. Verify firewall rules allow connections from CKAN container to Redis instance.

Issue: Persistent Storage Data Lost

Solution: Ensure persistent volume is properly mounted at /var/lib/ckan/storage before deployment. Verify volume attachment in Klutch.sh settings. Check that volume has sufficient free space.

Solution: Database search slows with large datasets. Consider switching to Solr for full-text search. Ensure PostgreSQL has proper indexes. Monitor database query performance and optimize slow queries.

Issue: File Upload Failures

Solution: Check that persistent storage has write permissions for the CKAN user. Verify CKAN_MAX_RESOURCE_SIZE is set high enough for your uploads. Check that /var/lib/ckan/storage directory exists and is writable.

Updating CKAN

To update CKAN to a newer version:

  1. Go to your Klutch.sh app dashboard
  2. Navigate to Deployments section
  3. Select the latest deployment and note the current version
  4. Update the Dockerfile to use a newer CKAN image tag
  5. Commit and push changes to GitHub
  6. Klutch.sh automatically redeploys with the new version
  7. CKAN runs database migrations automatically on startup
  8. Verify deployment is healthy and data is intact

Always back up your database and file storage before updating.

Use Cases

Government Open Data Portals

Deploy CKAN to create official government data portals for publishing datasets from multiple agencies. Enable citizens and researchers to discover and access government data with full-text search, dataset previews, and download capabilities.

Enterprise Data Hub

Build an internal data portal for organizations to catalog and share datasets across departments. Control data access with permissions, track data lineage with activity streams, and integrate with business intelligence tools.

Humanitarian Data Platform

Create a humanitarian data platform to share datasets related to disasters, health, food security, and development. Enable organizations to quickly publish and discover critical data during emergencies.

Scientific Data Repository

Deploy CKAN for research institutions to manage and share scientific datasets. Support versioning, metadata standards, and integration with research workflows and collaboration tools.

Thematic Data Hub

Build specialized data portals focused on specific domains like climate data, biodiversity, public health, or urban planning. Customize metadata schemas and visualizations for domain-specific needs.

Additional Resources

Conclusion

CKAN empowers organizations to build comprehensive open data portals that make data accessible, discoverable, and usable for citizens, researchers, and businesses. By deploying CKAN on Klutch.sh, you gain a scalable and maintainable platform for data governance and sharing backed by a proven, production-tested system used by governments and enterprises worldwide.

The combination of CKAN’s powerful data management capabilities with Klutch.sh’s simplified deployment model makes it straightforward to establish an enterprise-grade data portal. Whether you’re launching a government open data initiative, creating an internal enterprise data hub, or building a humanitarian data platform, CKAN provides the tools, API capabilities, and extensibility needed for effective data management and discovery.

Start by deploying CKAN with PostgreSQL and Redis infrastructure, configure persistent storage for reliable data management, and gradually expand your portal with additional datasets and customizations. Leverage CKAN’s comprehensive API for automation, plugin system for extending functionality, and active community for support and best practices.