Deploying a CKAN Data Portal

Built with Python, CKAN provides a comprehensive platform for publishing, organizing, sharing, and accessing datasets with a powerful web interface, full-featured API, advanced search capabilities, and rich visualization tools. Whether you’re building a government open data portal, an enterprise data hub, or a community data platform, CKAN delivers the features and flexibility needed to manage data assets at scale.

CKAN powers hundreds of data portals worldwide, including official government data portals like Canada’s open.canada.ca/data, Australia’s data.gov.au, Singapore’s data.gov.sg, and humanitarian data platforms like data.humdata.org. It has been recognized as a Digital Public Good by the United Nations, officially supporting 9 of the 17 Sustainable Development Goals. With support for organizing datasets into custom collections, managing resources with multiple file formats, controlling access with fine-grained permissions, extending functionality through plugins, and integrating with external systems, CKAN provides enterprise-grade capabilities for data governance and management.

Key Features

Dataset Management: Organize, catalog, and manage datasets with rich metadata and custom fields
Resource Management: Handle multiple file formats and data types within datasets, with versioning support
Full-Text Search: Powerful search and filtering capabilities with faceted search for discovering datasets
Data Publishing: Easy-to-use interface for data custodians to publish datasets with metadata and documentation
Data Access Control: Fine-grained permissions system for controlling who can view, edit, or publish datasets
Custom Metadata: Extend metadata schemas with custom fields and validation rules for domain-specific needs
Organizations and Groups: Organize datasets by organization, department, or thematic group
Data API: Complete REST API for programmatic access to catalog metadata and resources
Harvesting: Automatic harvesting of datasets from other CKAN instances and external sources
Data Preview: Built-in previews for common data formats including CSV, JSON, and interactive maps
Data Dictionary: Create and maintain data dictionaries for describing dataset structure and column definitions
Activity Stream: Track dataset changes and activity with complete audit trail of modifications
User Management: Role-based access control with different permission levels for administrators and data managers
Data Retention: Configure data lifecycle and retention policies for archived and historical datasets
Multi-language Support: Translate portal content and metadata into multiple languages
Workflow Management: Define custom workflows for dataset review, approval, and publication
Integration APIs: REST API for integration with external applications and reporting tools
Extensibility: Plugin system for extending functionality with custom features and visualizations
Mobile Responsive: Responsive web design for accessing data portals on all devices
Open Standards: Support for DCAT, RDF, and other open data standards for interoperability

Prerequisites

To deploy CKAN on Klutch.sh, ensure you have the following:

An active Klutch.sh account with access to the dashboard at klutch.sh/app
A GitHub repository for version control
Understanding of data portal concepts and open data standards
PostgreSQL database for CKAN metadata storage (can be deployed separately or on same instance)
Redis instance for caching and job queue management
Familiarity with Docker and containerization concepts
Access to manage persistent storage volumes for dataset files and uploads

Important Considerations

Deployment Steps

Create the Dockerfile

Create a Dockerfile in the root directory of your repository:

FROM ckan/ckan-base:2.11

# Install additional system dependencies
RUN apt-get update && \
    apt-get install -y --no-install-recommends \
    git \
    curl \
    ca-certificates && \
    rm -rf /var/lib/apt/lists/*

# Create CKAN storage directories
RUN mkdir -p /var/lib/ckan/storage && \
    chown -R ckan:ckan /var/lib/ckan

# Install core CKAN
RUN pip install --no-cache-dir \
    ckan==2.11.4

# Copy any custom configuration files
COPY ckan-config.ini /etc/ckan/production.ini
COPY entrypoint.sh /entrypoint.sh

RUN chmod +x /entrypoint.sh

# Expose CKAN web port
EXPOSE 5000

# Set working directory
WORKDIR /var/lib/ckan

# Define volumes for persistent storage
VOLUME ["/var/lib/ckan"]

# Health check
HEALTHCHECK --interval=30s --timeout=10s --start-period=15s --retries=3 \
    CMD curl -f http://localhost:5000/ || exit 1

# Use custom entrypoint for initialization
ENTRYPOINT ["/entrypoint.sh"]

# Default command
CMD ["ckan", "-c", "/etc/ckan/production.ini", "run"]

This Dockerfile uses the official CKAN base image and configures the necessary ports, volumes, and dependencies.

Create an Entrypoint Script

Create an entrypoint.sh file for initialization:

#!/bin/bash
set -e

echo "Starting CKAN initialization..."

# Wait for database to be ready
echo "Waiting for PostgreSQL database..."
until pg_isready -h ${CKAN_SQLALCHEMY_URL##postgresql://[^@]*@} -U ${CKAN_SQLALCHEMY_URL##*:} 2>/dev/null || true; do
  echo "Database not ready, retrying..."
  sleep 5
done

echo "Database is ready!"

# Initialize database if needed
echo "Initializing CKAN database..."
ckan -c /etc/ckan/production.ini db init || true

# Create default admin user if it doesn't exist
if ! ckan -c /etc/ckan/production.ini user list | grep -q admin; then
  echo "Creating default admin user..."
  ckan -c /etc/ckan/production.ini user add \
    admin \
    password="${CKAN_ADMIN_PASSWORD:-admin}" \
    email="${CKAN_ADMIN_EMAIL:-admin@example.com}" \
    fullname="Administrator"

  # Grant sysadmin rights
  ckan -c /etc/ckan/production.ini sysadmin add admin
fi

echo "CKAN initialization complete!"

# Execute main command
exec "$@"

Make the script executable and update the Dockerfile CMD if using this custom entrypoint.

Create CKAN Configuration File

Create a ckan-config.ini file for CKAN settings:

# CKAN Configuration File
[app:main]
use = egg:ckan
debug = false

# Database URL - set via environment variable CKAN_SQLALCHEMY_URL
sqlalchemy.url = postgresql://ckan:password@postgres-host/ckan

# Redis URL - set via environment variable CKAN_REDIS_URL
cache_url = redis://redis-host:6379/1
ckan.redis.url = redis://redis-host:6379/0

# Site title and description
ckan.site_title = Open Data Portal
ckan.site_description = Sharing data openly
ckan.site_url = https://example-app.klutch.sh

# API Token Lifetime
api_token.nbf_leeway = 0
api_token.algorithm = HS256

# Storage settings
ckan.storage_path = /var/lib/ckan/storage
ckan.max_resource_size = 500  # In MB

# Plugins configuration
ckan.plugins = text_view image_view recline_view datastore datapusher

# Session settings
beaker.session.key = CKAN
beaker.session.type = ext:memcached
beaker.session.url = redis-host:6379
beaker.session.sa.url = postgresql://ckan:password@postgres-host/ckan_sessions

# Email settings - configure for notifications
smtp.server = ${CKAN_SMTP_SERVER}
smtp.user = ${CKAN_SMTP_USER}
smtp.password = ${CKAN_SMTP_PASSWORD}
email_to = ${CKAN_EMAIL_TO}
error_email_from = ckan@example.com

# Authorization settings
ckan.auth.create_user_via_api = false
ckan.auth.create_unowned_dataset = false

# Security headers
ckan.views.default_views = True
ckan.user.create_organizations = True

# Locale settings
ckan.locale_default = en
ckan.locale_order = en pt_BR

[loggers]
keys = root, ckan, ckanext

[handlers]
keys = console

[formatters]
keys = generic

[logger_root]
level = WARNING
handlers = console

[logger_ckan]
level = INFO
handlers = console
qualname = ckan

[logger_ckanext]
level = DEBUG
handlers = console
qualname = ckanext

[handler_console]
class = StreamHandler
args = (sys.stderr,)
level = NOTSET
formatter = generic

[formatter_generic]
format = %(asctime)s %(levelname)-5.5s [%(name)s] %(message)s

Create Environment Configuration File

Create an .env.example file with all configuration variables:

# CKAN Core Configuration
CKAN_SITE_TITLE=Open Data Portal
CKAN_SITE_DESCRIPTION=Sharing data openly
CKAN_SITE_URL=https://example-app.klutch.sh

# Database Configuration
# PostgreSQL connection for CKAN metadata
CKAN_SQLALCHEMY_URL=postgresql://ckan:secure-password@postgres-host:5432/ckan

# Redis Configuration
# Redis for caching and job queue
CKAN_REDIS_URL=redis://redis-host:6379/0

# Admin User Configuration
CKAN_ADMIN_USERNAME=admin
CKAN_ADMIN_PASSWORD=secure-admin-password
CKAN_ADMIN_EMAIL=admin@example.com

# API Configuration
CKAN_API_TOKEN_EXPIRATION_DAYS=730
CKAN_MAX_RESOURCE_SIZE=500

# Storage Configuration
CKAN_STORAGE_PATH=/var/lib/ckan/storage

# Security Settings
CKAN_SECRET_KEY=your-secret-key-change-this

# Email/SMTP Configuration (optional)
CKAN_SMTP_SERVER=smtp.example.com
CKAN_SMTP_PORT=587
CKAN_SMTP_USER=notifications@example.com
CKAN_SMTP_PASSWORD=smtp-password
CKAN_EMAIL_TO=admin@example.com

# Authentication Settings
CKAN_AUTH_CREATE_USER_VIA_API=false
CKAN_AUTH_CREATE_UNOWNED_DATASET=false

# Plugins Configuration
CKAN_PLUGINS=text_view image_view recline_view datastore datapusher

# Localization
CKAN_LOCALE_DEFAULT=en
CKAN_LOCALE_ORDER=en pt_BR

# Logging Level
CKAN_LOG_LEVEL=info

# Session Settings
CKAN_BEAKER_SESSION_TYPE=redis
CKAN_BEAKER_SESSION_URL=redis-host:6379

# Users can create organizations
CKAN_USER_CREATE_ORGANIZATIONS=true

# User registration
CKAN_USER_REGISTER_ENABLED=true

Create a .gitignore File

Create a .gitignore file to exclude sensitive files:

# Environment variables
.env
.env.local
.env.*.local

# CKAN data and storage
/var/lib/ckan/
data/
uploads/

# IDE and editor files
.vscode/
.idea/
*.swp
*.swo
*~

# OS files
.DS_Store
Thumbs.db

# Python
__pycache__/
*.py[cod]
*$py.class
*.egg-info/
dist/
build/

# Logs
*.log
logs/

# Docker
.dockerignore

# Node (if frontend customization)
node_modules/
npm-debug.log

Push to GitHub

Initialize a Git repository and push your files:

git init
git add Dockerfile entrypoint.sh ckan-config.ini .env.example .gitignore
git commit -m "Initial CKAN deployment setup"
git branch -M main
git remote add origin https://github.com/YOUR_USERNAME/YOUR_REPOSITORY.git
git push -u origin main

Replace YOUR_USERNAME and YOUR_REPOSITORY with your actual GitHub credentials.

Deploy Database Infrastructure
Before deploying CKAN, you need PostgreSQL and Redis instances. You can either:
Option A: Deploy on Klutch.sh
1. Log in to klutch.sh/app
2. Create a PostgreSQL app by selecting a PostgreSQL Docker image
3. Configure database name as ckan and create a user ckan
4. Similarly, create a Redis app
5. Note the database and Redis connection details
Option B: Use Managed Services
- Use your hosting provider’s managed PostgreSQL and Redis services
- Note the connection URLs for configuration in CKAN
Deploy CKAN on Klutch.sh
1. Log in to your Klutch.sh dashboard at klutch.sh/app
2. Click Create App and select GitHub
3. Connect your GitHub account and select your CKAN repository
4. Configure deployment settings:
  - App Name: ckan-portal (or your preferred name)
  - Branch: Select main
  - Build Command: Leave default
  - Start Command: Leave default or customize via environment variables
5. In environment variables, set:
  - CKAN_SQLALCHEMY_URL: Your PostgreSQL connection string
  - CKAN_REDIS_URL: Your Redis connection URL
  - CKAN_ADMIN_PASSWORD: Secure admin password
  - CKAN_SITE_URL: Your deployment URL
  - Other CKAN configuration variables from .env.example
6. Click Deploy and wait for deployment to complete
Klutch.sh automatically detects the Dockerfile and uses it for building and deploying your CKAN instance.
Configure Persistent Storage
After deployment, attach persistent volumes for data persistence:
1. Go to your deployed CKAN app in the Klutch.sh dashboard
2. Navigate to Storage or Volumes section
3. Click Add Persistent Volume
4. Configure volumes:
  - Mount Path: /var/lib/ckan/storage
  - Size: 100GB (adjust based on expected data)
5. Save and restart the container
This ensures all uploaded datasets, files, and resources persist across container updates and restarts.
Configure Network Traffic
Set up HTTP traffic for your CKAN portal:
1. In your app settings, go to Network section
2. Set Traffic Type to HTTP
3. Internal port should be 5000 (CKAN’s default)
4. Your CKAN portal will be accessible at https://example-app.klutch.sh

Initial Setup and Configuration

After your CKAN deployment is running, follow these steps to set up your data portal:

Access the Web Interface

Open your browser and navigate to https://example-app.klutch.sh. You’ll see the CKAN home page with the dataset catalog.

Log in as Administrator

Click the Log In button in the top navigation
Use the admin credentials you set in environment variables
You’ll be logged in as sysadmin with full administrative privileges

Create Your First Organization

Go to Organizations in the top navigation
Click Create Organization
Fill in organization details:
- Name: Organization identifier (e.g., health-dept)
- Title: Display name (e.g., Health Department)
- Description: Brief description of the organization
- Image: Optional organization logo
Click Create Organization

Create a Dataset

Click Datasets → Add Dataset
Fill in dataset metadata:
- Title: Dataset name
- Description: Detailed description
- Tags: Keywords for searchability
- Organization: Select the organization that owns the data
- Visibility: Public or Private
Click Next to add resources
Upload or link data files:
- File: Upload CSV, JSON, Excel, or other formats
- Format: Specify the data format
- Description: Describe the resource content
Click Finish to publish the dataset

Configure Data Preview

Go to Admin Panel → Extensions
Ensure preview plugins are enabled:
- Text View (for CSV, JSON)
- Image View
- Recline View (for tabular data)
This enables interactive data previews in the portal

Set Up User Accounts

Go to Admin Panel → Users
Click Create User
Fill in user details:
- Name: Username for login
- Email: User email address
- Password: Initial password
- Sysadmin: Check to grant admin access
Click Create User

Configure Portal Settings

Go to Admin Panel → Config
Customize:
- Site Title: Your portal name
- Site Description: Portal tagline
- Logo: Custom logo image
- Custom CSS: Additional styling
Click Save to apply changes

Environment Variables

Basic Configuration

CKAN_SITE_TITLE=Open Data Portal
CKAN_SITE_URL=https://example-app.klutch.sh
CKAN_SQLALCHEMY_URL=postgresql://ckan:password@postgres:5432/ckan
CKAN_REDIS_URL=redis://redis:6379/0
CKAN_ADMIN_PASSWORD=secure-password
CKAN_SECRET_KEY=your-secret-key

These variables control the basic CKAN configuration including site settings, database connections, and administrative credentials.

Production Configuration

For production deployments, use these environment variables with Nixpacks for advanced customization:

# Advanced Configuration for Production Deployments

# Database Connection
CKAN_SQLALCHEMY_URL=postgresql://ckan:secure-password@postgres.example.com:5432/ckan
CKAN_SQLALCHEMY_POOL_SIZE=10
CKAN_SQLALCHEMY_POOL_RECYCLE=3600
CKAN_SQLALCHEMY_POOL_PRE_PING=true

# Redis Configuration
CKAN_REDIS_URL=redis://redis.example.com:6379/0
CKAN_REDIS_SESSIONS_URL=redis://redis.example.com:6379/1

# Site Configuration
CKAN_SITE_TITLE=National Open Data Portal
CKAN_SITE_DESCRIPTION=Share and access government datasets
CKAN_SITE_URL=https://data.example.gov
CKAN_SITE_LOGO=/images/custom-logo.png

# User Authentication
CKAN_ADMIN_USERNAME=admin
CKAN_ADMIN_PASSWORD=very-secure-password
CKAN_AUTH_CREATE_USER_VIA_API=false
CKAN_AUTH_ROLES_THAT_CASCADE_TO_SUB_GROUPS=admin editor

# Security Settings
CKAN_SECRET_KEY=your-very-long-secret-key-minimum-32-chars
CKAN_BEAKER_SESSION_SECRET=session-secret-key
CKAN_BEAKER_SESSION_TYPE=redis
CKAN_BEAKER_SESSION_URL=redis.example.com:6379

# Storage Configuration
CKAN_STORAGE_PATH=/var/lib/ckan/storage
CKAN_MAX_RESOURCE_SIZE=1000  # In MB, 1000 = 1GB max upload

# Email/SMTP Configuration
CKAN_SMTP_SERVER=smtp.example.com
CKAN_SMTP_PORT=587
CKAN_SMTP_USER=noreply@example.com
CKAN_SMTP_PASSWORD=smtp-password
CKAN_SMTP_TLS=true
CKAN_ERROR_EMAIL_FROM=ckan-errors@example.com
CKAN_EMAIL_TO=admin@example.com

# Plugins (install as needed)
CKAN_PLUGINS=text_view image_view recline_view datastore datapusher harvest ckan_harvester

# Organization Settings
CKAN_USER_CREATE_ORGANIZATIONS=true
CKAN_USER_REGISTER_ENABLED=true
CKAN_USER_RESET_PASSWORD_ENABLED=true

# Search Configuration
CKAN_SEARCH_BACKEND=db  # or solr if using Solr search

# API Configuration
CKAN_API_TOKEN_EXPIRATION_DAYS=730
CKAN_API_TOKEN_NBFLEEWAY=0

# Localization
CKAN_LOCALE_DEFAULT=en
CKAN_LOCALE_ORDER=en fr es
CKAN_LOCALES_FILTERED_OUT=

# Logging
CKAN_LOG_LEVEL=info
CKAN_PROPAGATE_ERRORS=true

# Views and Previews
CKAN_VIEWS_DEFAULT_VIEWS=true

# Activity Tracking
CKAN_ACTIVITY_STREAM_ENABLED=true
CKAN_ACTIVITY_STREAM_SQL_ENABLED=true

To apply these variables in Klutch.sh:

Go to your app settings
Navigate to Environment Variables
Add each variable with appropriate values for your deployment
Click Save and redeploy if necessary

Code Examples

Bash: Automated Data Harvest Script

This script automatically harvests and imports datasets from remote CKAN instances:

#!/bin/bash

# CKAN Automated Data Harvest Script
# Harvests datasets from remote CKAN sources

set -e

CKAN_API_URL="https://example-app.klutch.sh/api/3"
CKAN_API_KEY="your-api-key"
HARVEST_LOG="/var/log/ckan-harvest.log"

echo "$(date): Starting CKAN harvest process..." >> "$HARVEST_LOG"

# Function to harvest from a remote CKAN instance
harvest_from_remote() {
  local remote_url=$1
  local source_name=$2

  echo "Harvesting from: $remote_url" >> "$HARVEST_LOG"

  # Create harvest source
  curl -X POST "$CKAN_API_URL/action/harvest_source_create" \
    -H "X-CKAN-API-Key: $CKAN_API_KEY" \
    -H "Content-Type: application/json" \
    -d "{
      \"title\": \"$source_name\",
      \"name\": \"$(echo $source_name | tr ' ' '-' | tr '[:upper:]' '[:lower:]')\",
      \"url\": \"$remote_url\",
      \"source_type\": \"ckan\"
    }" >> "$HARVEST_LOG" 2>&1
}

# Function to run harvest job
run_harvest() {
  echo "Running harvest jobs..." >> "$HARVEST_LOG"

  curl -X POST "$CKAN_API_URL/action/harvest_jobs_run" \
    -H "X-CKAN-API-Key: $CKAN_API_KEY" \
    -H "Content-Type: application/json" \
    -d "{}" >> "$HARVEST_LOG" 2>&1
}

# Harvest from configured sources
harvest_from_remote "https://remote-ckan.example.com/api/3" "Remote Open Data"
harvest_from_remote "https://other-data-portal.org/api/3" "Other Data Portal"

# Run the harvest
run_harvest

echo "$(date): Harvest process completed." >> "$HARVEST_LOG"

Python: CKAN API Client for Dataset Management

This example shows how to interact with CKAN’s API to create and manage datasets programmatically:

#!/usr/bin/env python3

import requests
import json
from typing import Dict, Optional

class CKANClient:
    """Client for interacting with CKAN API"""

    def __init__(self, base_url: str, api_key: str):
        self.base_url = base_url.rstrip('/')
        self.api_key = api_key
        self.headers = {
            'X-CKAN-API-Key': self.api_key,
            'Content-Type': 'application/json'
        }

    def _make_request(self, endpoint: str, method: str = 'GET', data: Optional[Dict] = None) -> Dict:
        """Make API request to CKAN"""
        url = f"{self.base_url}/api/3/action/{endpoint}"

        if method == 'GET':
            response = requests.get(url, headers=self.headers)
        elif method == 'POST':
            response = requests.post(url, headers=self.headers, json=data)
        else:
            raise ValueError(f"Unsupported HTTP method: {method}")

        response.raise_for_status()
        return response.json()

    def create_dataset(self, name: str, title: str, owner_org: str, **kwargs) -> Dict:
        """Create a new dataset"""
        payload = {
            'name': name,
            'title': title,
            'owner_org': owner_org,
            **kwargs
        }
        result = self._make_request('package_create', method='POST', data=payload)
        return result['result']

    def create_resource(self, package_id: str, url: str, name: str, **kwargs) -> Dict:
        """Add a resource to a dataset"""
        payload = {
            'package_id': package_id,
            'url': url,
            'name': name,
            **kwargs
        }
        result = self._make_request('resource_create', method='POST', data=payload)
        return result['result']

    def search_datasets(self, query: str, rows: int = 10) -> Dict:
        """Search for datasets"""
        data = {
            'q': query,
            'rows': rows
        }
        result = self._make_request('package_search', method='POST', data=data)
        return result['result']

    def get_dataset(self, dataset_id: str) -> Dict:
        """Get dataset details"""
        result = self._make_request(f'package_show?id={dataset_id}')
        return result['result']

    def list_organizations(self) -> Dict:
        """List all organizations"""
        result = self._make_request('organization_list')
        return result['result']

    def get_user_info(self) -> Dict:
        """Get current user information"""
        result = self._make_request('user_show?id=me')
        return result['result']

# Usage example
if __name__ == '__main__':
    # Initialize client
    client = CKANClient(
        base_url='https://example-app.klutch.sh',
        api_key='your-api-key'
    )

    # Create a new dataset
    dataset = client.create_dataset(
        name='population-statistics',
        title='Population Statistics 2024',
        owner_org='statistics-department',
        notes='Annual population data by region',
        tags=[{'name': 'population'}, {'name': 'statistics'}]
    )
    print(f"Created dataset: {dataset['id']}")

    # Add a resource
    resource = client.create_resource(
        package_id=dataset['id'],
        url='https://example.com/population-data.csv',
        name='Population Data 2024',
        format='CSV',
        description='CSV file containing population figures'
    )
    print(f"Added resource: {resource['id']}")

    # Search datasets
    results = client.search_datasets('population', rows=5)
    print(f"Found {results['count']} datasets matching 'population'")

    # List organizations
    orgs = client.list_organizations()
    print(f"Available organizations: {[org['title'] for org in orgs]}")

cURL: REST API Examples

Execute these cURL commands to interact with your CKAN instance:

# Set variables
CKAN_URL="https://example-app.klutch.sh"
API_KEY="your-api-key"

# Get site information
curl -X GET "$CKAN_URL/api/3/action/status_show" \
  -H "X-CKAN-API-Key: $API_KEY"

# Search datasets
curl -X POST "$CKAN_URL/api/3/action/package_search" \
  -H "X-CKAN-API-Key: $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "q": "population",
    "rows": 10
  }'

# Create a new dataset
curl -X POST "$CKAN_URL/api/3/action/package_create" \
  -H "X-CKAN-API-Key: $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "environmental-data",
    "title": "Environmental Monitoring Data",
    "owner_org": "environmental-agency",
    "notes": "Air quality and pollution monitoring",
    "tags": [
      {"name": "environment"},
      {"name": "air-quality"}
    ]
  }'

# Get organization information
curl -X GET "$CKAN_URL/api/3/action/organization_show?id=environmental-agency" \
  -H "X-CKAN-API-Key: $API_KEY"

# List all datasets
curl -X POST "$CKAN_URL/api/3/action/package_list" \
  -H "X-CKAN-API-Key: $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "limit": 20,
    "offset": 0
  }'

# Create a resource (attach file to dataset)
curl -X POST "$CKAN_URL/api/3/action/resource_create" \
  -H "X-CKAN-API-Key: $API_KEY" \
  -d '{
    "package_id": "environmental-data",
    "url": "https://example.com/air-quality.csv",
    "name": "Air Quality Data",
    "format": "CSV"
  }'

# Get user information
curl -X GET "$CKAN_URL/api/3/action/user_show?id=admin" \
  -H "X-CKAN-API-Key: $API_KEY"

# Modify dataset
curl -X POST "$CKAN_URL/api/3/action/package_patch" \
  -H "X-CKAN-API-Key: $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "id": "environmental-data",
    "notes": "Updated: Air quality and pollution monitoring data with real-time updates"
  }'

# Get activity stream
curl -X GET "$CKAN_URL/api/3/action/dashboard_activity_list" \
  -H "X-CKAN-API-Key: $API_KEY"

Best Practices

Data Organization

Use Organizations: Group related datasets by organization or department for better organization and access control
Consistent Naming: Use descriptive, consistent naming conventions for datasets and resources
Complete Metadata: Fill in comprehensive metadata including descriptions, keywords, and license information
Resource Versioning: Upload new versions of resources with clear versioning to track data evolution
Data Quality: Validate data format and quality before publishing to ensure portal credibility

Access Control and Security

User Permissions: Assign appropriate roles (Admin, Editor, Member) based on responsibilities
Organization Ownership: Ensure datasets are assigned to appropriate organizations with clear ownership
Public vs Private: Mark sensitive data as private and restrict access to authorized users only
API Key Management: Create separate API keys for different applications and rotate them regularly
HTTPS/SSL: Ensure your CKAN portal uses HTTPS for secure communication

Portal Management

Regular Updates: Keep CKAN and plugins updated to the latest versions for security and feature improvements
Backup Strategy: Regularly back up PostgreSQL database and file storage to protect against data loss
Monitoring: Monitor portal performance, disk space, and database size for production health
User Support: Provide documentation and guidance for users on how to find, understand, and use datasets
Data Governance: Establish clear policies for data publication, retention, and archival

Performance Optimization

Caching: Use Redis caching to improve search and API response times
Database Tuning: Configure PostgreSQL connection pooling and indexes for optimal performance
Search Optimization: Use Solr for advanced search capabilities if database search is insufficient
CDN: Consider using a CDN for static assets and file downloads
Resource Limits: Configure appropriate upload size limits and query timeouts

Integration and Automation

API Integration: Use CKAN API to integrate with external systems and workflows
Data Harvesting: Set up automatic harvesting from other data portals to aggregate data
Webhooks: Implement event-driven workflows for dataset updates and notifications
Automated Backups: Configure scheduled backups to cloud storage or dedicated backup servers
Monitoring Alerts: Set up alerts for deployment issues, database problems, and storage warnings

Troubleshooting

Issue: Database Connection Fails on Startup

Solution: Verify that PostgreSQL is running and accessible from the CKAN container. Check CKAN_SQLALCHEMY_URL environment variable for correct connection string. Ensure database user has proper permissions on the CKAN database.

Issue: Redis Connection Errors

Solution: Confirm Redis is running and reachable from CKAN. Check CKAN_REDIS_URL environment variable. Verify firewall rules allow connections from CKAN container to Redis instance.

Issue: Persistent Storage Data Lost

Solution: Ensure persistent volume is properly mounted at /var/lib/ckan/storage before deployment. Verify volume attachment in Klutch.sh settings. Check that volume has sufficient free space.

Issue: Slow Dataset Search

Solution: Database search slows with large datasets. Consider switching to Solr for full-text search. Ensure PostgreSQL has proper indexes. Monitor database query performance and optimize slow queries.

Issue: File Upload Failures

Solution: Check that persistent storage has write permissions for the CKAN user. Verify CKAN_MAX_RESOURCE_SIZE is set high enough for your uploads. Check that /var/lib/ckan/storage directory exists and is writable.

Updating CKAN

To update CKAN to a newer version:

Go to your Klutch.sh app dashboard
Navigate to Deployments section
Select the latest deployment and note the current version
Update the Dockerfile to use a newer CKAN image tag
Commit and push changes to GitHub
Klutch.sh automatically redeploys with the new version
CKAN runs database migrations automatically on startup
Verify deployment is healthy and data is intact

Always back up your database and file storage before updating.

Use Cases

Government Open Data Portals

Deploy CKAN to create official government data portals for publishing datasets from multiple agencies. Enable citizens and researchers to discover and access government data with full-text search, dataset previews, and download capabilities.

Enterprise Data Hub

Build an internal data portal for organizations to catalog and share datasets across departments. Control data access with permissions, track data lineage with activity streams, and integrate with business intelligence tools.

Humanitarian Data Platform

Create a humanitarian data platform to share datasets related to disasters, health, food security, and development. Enable organizations to quickly publish and discover critical data during emergencies.

Scientific Data Repository

Deploy CKAN for research institutions to manage and share scientific datasets. Support versioning, metadata standards, and integration with research workflows and collaboration tools.

Thematic Data Hub

Build specialized data portals focused on specific domains like climate data, biodiversity, public health, or urban planning. Customize metadata schemas and visualizations for domain-specific needs.

Additional Resources

Official CKAN Documentation: https://docs.ckan.org
CKAN GitHub Repository: https://github.com/ckan/ckan
CKAN Community Chat: https://gitter.im/ckan/chat
CKAN Extension Development: https://docs.ckan.org/en/latest/extensions/index.html
API Documentation: https://docs.ckan.org/en/latest/api/index.html
CKAN Showcase Sites: https://ckan.org/showcase
CKAN Blog: https://ckan.org/blog
Stack Overflow CKAN Tag: https://stackoverflow.com/questions/tagged/ckan

Conclusion

CKAN empowers organizations to build comprehensive open data portals that make data accessible, discoverable, and usable for citizens, researchers, and businesses. By deploying CKAN on Klutch.sh, you gain a scalable and maintainable platform for data governance and sharing backed by a proven, production-tested system used by governments and enterprises worldwide.

The combination of CKAN’s powerful data management capabilities with Klutch.sh’s simplified deployment model makes it straightforward to establish an enterprise-grade data portal. Whether you’re launching a government open data initiative, creating an internal enterprise data hub, or building a humanitarian data platform, CKAN provides the tools, API capabilities, and extensibility needed for effective data management and discovery.

Start by deploying CKAN with PostgreSQL and Redis infrastructure, configure persistent storage for reliable data management, and gradually expand your portal with additional datasets and customizations. Leverage CKAN’s comprehensive API for automation, plugin system for extending functionality, and active community for support and best practices.