Deploying a CKAN Data Portal
Built with Python, CKAN provides a comprehensive platform for publishing, organizing, sharing, and accessing datasets with a powerful web interface, full-featured API, advanced search capabilities, and rich visualization tools. Whether you’re building a government open data portal, an enterprise data hub, or a community data platform, CKAN delivers the features and flexibility needed to manage data assets at scale.
CKAN powers hundreds of data portals worldwide, including official government data portals like Canada’s open.canada.ca/data, Australia’s data.gov.au, Singapore’s data.gov.sg, and humanitarian data platforms like data.humdata.org. It has been recognized as a Digital Public Good by the United Nations, officially supporting 9 of the 17 Sustainable Development Goals. With support for organizing datasets into custom collections, managing resources with multiple file formats, controlling access with fine-grained permissions, extending functionality through plugins, and integrating with external systems, CKAN provides enterprise-grade capabilities for data governance and management.
Key Features
- Dataset Management: Organize, catalog, and manage datasets with rich metadata and custom fields
- Resource Management: Handle multiple file formats and data types within datasets, with versioning support
- Full-Text Search: Powerful search and filtering capabilities with faceted search for discovering datasets
- Data Publishing: Easy-to-use interface for data custodians to publish datasets with metadata and documentation
- Data Access Control: Fine-grained permissions system for controlling who can view, edit, or publish datasets
- Custom Metadata: Extend metadata schemas with custom fields and validation rules for domain-specific needs
- Organizations and Groups: Organize datasets by organization, department, or thematic group
- Data API: Complete REST API for programmatic access to catalog metadata and resources
- Harvesting: Automatic harvesting of datasets from other CKAN instances and external sources
- Data Preview: Built-in previews for common data formats including CSV, JSON, and interactive maps
- Data Dictionary: Create and maintain data dictionaries for describing dataset structure and column definitions
- Activity Stream: Track dataset changes and activity with complete audit trail of modifications
- User Management: Role-based access control with different permission levels for administrators and data managers
- Data Retention: Configure data lifecycle and retention policies for archived and historical datasets
- Multi-language Support: Translate portal content and metadata into multiple languages
- Workflow Management: Define custom workflows for dataset review, approval, and publication
- Integration APIs: REST API for integration with external applications and reporting tools
- Extensibility: Plugin system for extending functionality with custom features and visualizations
- Mobile Responsive: Responsive web design for accessing data portals on all devices
- Open Standards: Support for DCAT, RDF, and other open data standards for interoperability
Prerequisites
To deploy CKAN on Klutch.sh, ensure you have the following:
- An active Klutch.sh account with access to the dashboard at klutch.sh/app
- A GitHub repository for version control
- Understanding of data portal concepts and open data standards
- PostgreSQL database for CKAN metadata storage (can be deployed separately or on same instance)
- Redis instance for caching and job queue management
- Familiarity with Docker and containerization concepts
- Access to manage persistent storage volumes for dataset files and uploads
Important Considerations
Deployment Steps
Create the Dockerfile
Create a
Dockerfilein the root directory of your repository:FROM ckan/ckan-base:2.11# Install additional system dependenciesRUN apt-get update && \apt-get install -y --no-install-recommends \git \curl \ca-certificates && \rm -rf /var/lib/apt/lists/*# Create CKAN storage directoriesRUN mkdir -p /var/lib/ckan/storage && \chown -R ckan:ckan /var/lib/ckan# Install core CKANRUN pip install --no-cache-dir \ckan==2.11.4# Copy any custom configuration filesCOPY ckan-config.ini /etc/ckan/production.iniCOPY entrypoint.sh /entrypoint.shRUN chmod +x /entrypoint.sh# Expose CKAN web portEXPOSE 5000# Set working directoryWORKDIR /var/lib/ckan# Define volumes for persistent storageVOLUME ["/var/lib/ckan"]# Health checkHEALTHCHECK --interval=30s --timeout=10s --start-period=15s --retries=3 \CMD curl -f http://localhost:5000/ || exit 1# Use custom entrypoint for initializationENTRYPOINT ["/entrypoint.sh"]# Default commandCMD ["ckan", "-c", "/etc/ckan/production.ini", "run"]This Dockerfile uses the official CKAN base image and configures the necessary ports, volumes, and dependencies.
Create an Entrypoint Script
Create an
entrypoint.shfile for initialization:#!/bin/bashset -eecho "Starting CKAN initialization..."# Wait for database to be readyecho "Waiting for PostgreSQL database..."until pg_isready -h ${CKAN_SQLALCHEMY_URL##postgresql://[^@]*@} -U ${CKAN_SQLALCHEMY_URL##*:} 2>/dev/null || true; doecho "Database not ready, retrying..."sleep 5doneecho "Database is ready!"# Initialize database if neededecho "Initializing CKAN database..."ckan -c /etc/ckan/production.ini db init || true# Create default admin user if it doesn't existif ! ckan -c /etc/ckan/production.ini user list | grep -q admin; thenecho "Creating default admin user..."ckan -c /etc/ckan/production.ini user add \admin \password="${CKAN_ADMIN_PASSWORD:-admin}" \email="${CKAN_ADMIN_EMAIL:-admin@example.com}" \fullname="Administrator"# Grant sysadmin rightsckan -c /etc/ckan/production.ini sysadmin add adminfiecho "CKAN initialization complete!"# Execute main commandexec "$@"Make the script executable and update the Dockerfile CMD if using this custom entrypoint.
Create CKAN Configuration File
Create a
ckan-config.inifile for CKAN settings:# CKAN Configuration File[app:main]use = egg:ckandebug = false# Database URL - set via environment variable CKAN_SQLALCHEMY_URLsqlalchemy.url = postgresql://ckan:password@postgres-host/ckan# Redis URL - set via environment variable CKAN_REDIS_URLcache_url = redis://redis-host:6379/1ckan.redis.url = redis://redis-host:6379/0# Site title and descriptionckan.site_title = Open Data Portalckan.site_description = Sharing data openlyckan.site_url = https://example-app.klutch.sh# API Token Lifetimeapi_token.nbf_leeway = 0api_token.algorithm = HS256# Storage settingsckan.storage_path = /var/lib/ckan/storageckan.max_resource_size = 500 # In MB# Plugins configurationckan.plugins = text_view image_view recline_view datastore datapusher# Session settingsbeaker.session.key = CKANbeaker.session.type = ext:memcachedbeaker.session.url = redis-host:6379beaker.session.sa.url = postgresql://ckan:password@postgres-host/ckan_sessions# Email settings - configure for notificationssmtp.server = ${CKAN_SMTP_SERVER}smtp.user = ${CKAN_SMTP_USER}smtp.password = ${CKAN_SMTP_PASSWORD}email_to = ${CKAN_EMAIL_TO}error_email_from = ckan@example.com# Authorization settingsckan.auth.create_user_via_api = falseckan.auth.create_unowned_dataset = false# Security headersckan.views.default_views = Trueckan.user.create_organizations = True# Locale settingsckan.locale_default = enckan.locale_order = en pt_BR[loggers]keys = root, ckan, ckanext[handlers]keys = console[formatters]keys = generic[logger_root]level = WARNINGhandlers = console[logger_ckan]level = INFOhandlers = consolequalname = ckan[logger_ckanext]level = DEBUGhandlers = consolequalname = ckanext[handler_console]class = StreamHandlerargs = (sys.stderr,)level = NOTSETformatter = generic[formatter_generic]format = %(asctime)s %(levelname)-5.5s [%(name)s] %(message)sCreate Environment Configuration File
Create an
.env.examplefile with all configuration variables:# CKAN Core ConfigurationCKAN_SITE_TITLE=Open Data PortalCKAN_SITE_DESCRIPTION=Sharing data openlyCKAN_SITE_URL=https://example-app.klutch.sh# Database Configuration# PostgreSQL connection for CKAN metadataCKAN_SQLALCHEMY_URL=postgresql://ckan:secure-password@postgres-host:5432/ckan# Redis Configuration# Redis for caching and job queueCKAN_REDIS_URL=redis://redis-host:6379/0# Admin User ConfigurationCKAN_ADMIN_USERNAME=adminCKAN_ADMIN_PASSWORD=secure-admin-passwordCKAN_ADMIN_EMAIL=admin@example.com# API ConfigurationCKAN_API_TOKEN_EXPIRATION_DAYS=730CKAN_MAX_RESOURCE_SIZE=500# Storage ConfigurationCKAN_STORAGE_PATH=/var/lib/ckan/storage# Security SettingsCKAN_SECRET_KEY=your-secret-key-change-this# Email/SMTP Configuration (optional)CKAN_SMTP_SERVER=smtp.example.comCKAN_SMTP_PORT=587CKAN_SMTP_USER=notifications@example.comCKAN_SMTP_PASSWORD=smtp-passwordCKAN_EMAIL_TO=admin@example.com# Authentication SettingsCKAN_AUTH_CREATE_USER_VIA_API=falseCKAN_AUTH_CREATE_UNOWNED_DATASET=false# Plugins ConfigurationCKAN_PLUGINS=text_view image_view recline_view datastore datapusher# LocalizationCKAN_LOCALE_DEFAULT=enCKAN_LOCALE_ORDER=en pt_BR# Logging LevelCKAN_LOG_LEVEL=info# Session SettingsCKAN_BEAKER_SESSION_TYPE=redisCKAN_BEAKER_SESSION_URL=redis-host:6379# Users can create organizationsCKAN_USER_CREATE_ORGANIZATIONS=true# User registrationCKAN_USER_REGISTER_ENABLED=trueCreate a .gitignore File
Create a
.gitignorefile to exclude sensitive files:# Environment variables.env.env.local.env.*.local# CKAN data and storage/var/lib/ckan/data/uploads/# IDE and editor files.vscode/.idea/*.swp*.swo*~# OS files.DS_StoreThumbs.db# Python__pycache__/*.py[cod]*$py.class*.egg-info/dist/build/# Logs*.loglogs/# Docker.dockerignore# Node (if frontend customization)node_modules/npm-debug.logPush to GitHub
Initialize a Git repository and push your files:
Terminal window git initgit add Dockerfile entrypoint.sh ckan-config.ini .env.example .gitignoregit commit -m "Initial CKAN deployment setup"git branch -M maingit remote add origin https://github.com/YOUR_USERNAME/YOUR_REPOSITORY.gitgit push -u origin mainReplace
YOUR_USERNAMEandYOUR_REPOSITORYwith your actual GitHub credentials.Deploy Database Infrastructure
Before deploying CKAN, you need PostgreSQL and Redis instances. You can either:
Option A: Deploy on Klutch.sh
- Log in to klutch.sh/app
- Create a PostgreSQL app by selecting a PostgreSQL Docker image
- Configure database name as
ckanand create a userckan - Similarly, create a Redis app
- Note the database and Redis connection details
Option B: Use Managed Services
- Use your hosting provider’s managed PostgreSQL and Redis services
- Note the connection URLs for configuration in CKAN
Deploy CKAN on Klutch.sh
- Log in to your Klutch.sh dashboard at klutch.sh/app
- Click Create App and select GitHub
- Connect your GitHub account and select your CKAN repository
- Configure deployment settings:
- App Name:
ckan-portal(or your preferred name) - Branch: Select
main - Build Command: Leave default
- Start Command: Leave default or customize via environment variables
- App Name:
- In environment variables, set:
CKAN_SQLALCHEMY_URL: Your PostgreSQL connection stringCKAN_REDIS_URL: Your Redis connection URLCKAN_ADMIN_PASSWORD: Secure admin passwordCKAN_SITE_URL: Your deployment URL- Other CKAN configuration variables from
.env.example
- Click Deploy and wait for deployment to complete
Klutch.sh automatically detects the Dockerfile and uses it for building and deploying your CKAN instance.
Configure Persistent Storage
After deployment, attach persistent volumes for data persistence:
- Go to your deployed CKAN app in the Klutch.sh dashboard
- Navigate to Storage or Volumes section
- Click Add Persistent Volume
- Configure volumes:
- Mount Path:
/var/lib/ckan/storage - Size:
100GB(adjust based on expected data)
- Mount Path:
- Save and restart the container
This ensures all uploaded datasets, files, and resources persist across container updates and restarts.
Configure Network Traffic
Set up HTTP traffic for your CKAN portal:
- In your app settings, go to Network section
- Set Traffic Type to
HTTP - Internal port should be
5000(CKAN’s default) - Your CKAN portal will be accessible at
https://example-app.klutch.sh
Initial Setup and Configuration
After your CKAN deployment is running, follow these steps to set up your data portal:
Access the Web Interface
Open your browser and navigate to https://example-app.klutch.sh. You’ll see the CKAN home page with the dataset catalog.
Log in as Administrator
- Click the Log In button in the top navigation
- Use the admin credentials you set in environment variables
- You’ll be logged in as sysadmin with full administrative privileges
Create Your First Organization
- Go to Organizations in the top navigation
- Click Create Organization
- Fill in organization details:
- Name: Organization identifier (e.g.,
health-dept) - Title: Display name (e.g.,
Health Department) - Description: Brief description of the organization
- Image: Optional organization logo
- Name: Organization identifier (e.g.,
- Click Create Organization
Create a Dataset
- Click Datasets → Add Dataset
- Fill in dataset metadata:
- Title: Dataset name
- Description: Detailed description
- Tags: Keywords for searchability
- Organization: Select the organization that owns the data
- Visibility: Public or Private
- Click Next to add resources
- Upload or link data files:
- File: Upload CSV, JSON, Excel, or other formats
- Format: Specify the data format
- Description: Describe the resource content
- Click Finish to publish the dataset
Configure Data Preview
- Go to Admin Panel → Extensions
- Ensure preview plugins are enabled:
- Text View (for CSV, JSON)
- Image View
- Recline View (for tabular data)
- This enables interactive data previews in the portal
Set Up User Accounts
- Go to Admin Panel → Users
- Click Create User
- Fill in user details:
- Name: Username for login
- Email: User email address
- Password: Initial password
- Sysadmin: Check to grant admin access
- Click Create User
Configure Portal Settings
- Go to Admin Panel → Config
- Customize:
- Site Title: Your portal name
- Site Description: Portal tagline
- Logo: Custom logo image
- Custom CSS: Additional styling
- Click Save to apply changes
Environment Variables
Basic Configuration
CKAN_SITE_TITLE=Open Data PortalCKAN_SITE_URL=https://example-app.klutch.shCKAN_SQLALCHEMY_URL=postgresql://ckan:password@postgres:5432/ckanCKAN_REDIS_URL=redis://redis:6379/0CKAN_ADMIN_PASSWORD=secure-passwordCKAN_SECRET_KEY=your-secret-keyThese variables control the basic CKAN configuration including site settings, database connections, and administrative credentials.
Production Configuration
For production deployments, use these environment variables with Nixpacks for advanced customization:
# Advanced Configuration for Production Deployments
# Database ConnectionCKAN_SQLALCHEMY_URL=postgresql://ckan:secure-password@postgres.example.com:5432/ckanCKAN_SQLALCHEMY_POOL_SIZE=10CKAN_SQLALCHEMY_POOL_RECYCLE=3600CKAN_SQLALCHEMY_POOL_PRE_PING=true
# Redis ConfigurationCKAN_REDIS_URL=redis://redis.example.com:6379/0CKAN_REDIS_SESSIONS_URL=redis://redis.example.com:6379/1
# Site ConfigurationCKAN_SITE_TITLE=National Open Data PortalCKAN_SITE_DESCRIPTION=Share and access government datasetsCKAN_SITE_URL=https://data.example.govCKAN_SITE_LOGO=/images/custom-logo.png
# User AuthenticationCKAN_ADMIN_USERNAME=adminCKAN_ADMIN_PASSWORD=very-secure-passwordCKAN_AUTH_CREATE_USER_VIA_API=falseCKAN_AUTH_ROLES_THAT_CASCADE_TO_SUB_GROUPS=admin editor
# Security SettingsCKAN_SECRET_KEY=your-very-long-secret-key-minimum-32-charsCKAN_BEAKER_SESSION_SECRET=session-secret-keyCKAN_BEAKER_SESSION_TYPE=redisCKAN_BEAKER_SESSION_URL=redis.example.com:6379
# Storage ConfigurationCKAN_STORAGE_PATH=/var/lib/ckan/storageCKAN_MAX_RESOURCE_SIZE=1000 # In MB, 1000 = 1GB max upload
# Email/SMTP ConfigurationCKAN_SMTP_SERVER=smtp.example.comCKAN_SMTP_PORT=587CKAN_SMTP_USER=noreply@example.comCKAN_SMTP_PASSWORD=smtp-passwordCKAN_SMTP_TLS=trueCKAN_ERROR_EMAIL_FROM=ckan-errors@example.comCKAN_EMAIL_TO=admin@example.com
# Plugins (install as needed)CKAN_PLUGINS=text_view image_view recline_view datastore datapusher harvest ckan_harvester
# Organization SettingsCKAN_USER_CREATE_ORGANIZATIONS=trueCKAN_USER_REGISTER_ENABLED=trueCKAN_USER_RESET_PASSWORD_ENABLED=true
# Search ConfigurationCKAN_SEARCH_BACKEND=db # or solr if using Solr search
# API ConfigurationCKAN_API_TOKEN_EXPIRATION_DAYS=730CKAN_API_TOKEN_NBFLEEWAY=0
# LocalizationCKAN_LOCALE_DEFAULT=enCKAN_LOCALE_ORDER=en fr esCKAN_LOCALES_FILTERED_OUT=
# LoggingCKAN_LOG_LEVEL=infoCKAN_PROPAGATE_ERRORS=true
# Views and PreviewsCKAN_VIEWS_DEFAULT_VIEWS=true
# Activity TrackingCKAN_ACTIVITY_STREAM_ENABLED=trueCKAN_ACTIVITY_STREAM_SQL_ENABLED=trueTo apply these variables in Klutch.sh:
- Go to your app settings
- Navigate to Environment Variables
- Add each variable with appropriate values for your deployment
- Click Save and redeploy if necessary
Code Examples
Bash: Automated Data Harvest Script
This script automatically harvests and imports datasets from remote CKAN instances:
#!/bin/bash
# CKAN Automated Data Harvest Script# Harvests datasets from remote CKAN sources
set -e
CKAN_API_URL="https://example-app.klutch.sh/api/3"CKAN_API_KEY="your-api-key"HARVEST_LOG="/var/log/ckan-harvest.log"
echo "$(date): Starting CKAN harvest process..." >> "$HARVEST_LOG"
# Function to harvest from a remote CKAN instanceharvest_from_remote() { local remote_url=$1 local source_name=$2
echo "Harvesting from: $remote_url" >> "$HARVEST_LOG"
# Create harvest source curl -X POST "$CKAN_API_URL/action/harvest_source_create" \ -H "X-CKAN-API-Key: $CKAN_API_KEY" \ -H "Content-Type: application/json" \ -d "{ \"title\": \"$source_name\", \"name\": \"$(echo $source_name | tr ' ' '-' | tr '[:upper:]' '[:lower:]')\", \"url\": \"$remote_url\", \"source_type\": \"ckan\" }" >> "$HARVEST_LOG" 2>&1}
# Function to run harvest jobrun_harvest() { echo "Running harvest jobs..." >> "$HARVEST_LOG"
curl -X POST "$CKAN_API_URL/action/harvest_jobs_run" \ -H "X-CKAN-API-Key: $CKAN_API_KEY" \ -H "Content-Type: application/json" \ -d "{}" >> "$HARVEST_LOG" 2>&1}
# Harvest from configured sourcesharvest_from_remote "https://remote-ckan.example.com/api/3" "Remote Open Data"harvest_from_remote "https://other-data-portal.org/api/3" "Other Data Portal"
# Run the harvestrun_harvest
echo "$(date): Harvest process completed." >> "$HARVEST_LOG"Python: CKAN API Client for Dataset Management
This example shows how to interact with CKAN’s API to create and manage datasets programmatically:
#!/usr/bin/env python3
import requestsimport jsonfrom typing import Dict, Optional
class CKANClient: """Client for interacting with CKAN API"""
def __init__(self, base_url: str, api_key: str): self.base_url = base_url.rstrip('/') self.api_key = api_key self.headers = { 'X-CKAN-API-Key': self.api_key, 'Content-Type': 'application/json' }
def _make_request(self, endpoint: str, method: str = 'GET', data: Optional[Dict] = None) -> Dict: """Make API request to CKAN""" url = f"{self.base_url}/api/3/action/{endpoint}"
if method == 'GET': response = requests.get(url, headers=self.headers) elif method == 'POST': response = requests.post(url, headers=self.headers, json=data) else: raise ValueError(f"Unsupported HTTP method: {method}")
response.raise_for_status() return response.json()
def create_dataset(self, name: str, title: str, owner_org: str, **kwargs) -> Dict: """Create a new dataset""" payload = { 'name': name, 'title': title, 'owner_org': owner_org, **kwargs } result = self._make_request('package_create', method='POST', data=payload) return result['result']
def create_resource(self, package_id: str, url: str, name: str, **kwargs) -> Dict: """Add a resource to a dataset""" payload = { 'package_id': package_id, 'url': url, 'name': name, **kwargs } result = self._make_request('resource_create', method='POST', data=payload) return result['result']
def search_datasets(self, query: str, rows: int = 10) -> Dict: """Search for datasets""" data = { 'q': query, 'rows': rows } result = self._make_request('package_search', method='POST', data=data) return result['result']
def get_dataset(self, dataset_id: str) -> Dict: """Get dataset details""" result = self._make_request(f'package_show?id={dataset_id}') return result['result']
def list_organizations(self) -> Dict: """List all organizations""" result = self._make_request('organization_list') return result['result']
def get_user_info(self) -> Dict: """Get current user information""" result = self._make_request('user_show?id=me') return result['result']
# Usage exampleif __name__ == '__main__': # Initialize client client = CKANClient( base_url='https://example-app.klutch.sh', api_key='your-api-key' )
# Create a new dataset dataset = client.create_dataset( name='population-statistics', title='Population Statistics 2024', owner_org='statistics-department', notes='Annual population data by region', tags=[{'name': 'population'}, {'name': 'statistics'}] ) print(f"Created dataset: {dataset['id']}")
# Add a resource resource = client.create_resource( package_id=dataset['id'], url='https://example.com/population-data.csv', name='Population Data 2024', format='CSV', description='CSV file containing population figures' ) print(f"Added resource: {resource['id']}")
# Search datasets results = client.search_datasets('population', rows=5) print(f"Found {results['count']} datasets matching 'population'")
# List organizations orgs = client.list_organizations() print(f"Available organizations: {[org['title'] for org in orgs]}")cURL: REST API Examples
Execute these cURL commands to interact with your CKAN instance:
# Set variablesCKAN_URL="https://example-app.klutch.sh"API_KEY="your-api-key"
# Get site informationcurl -X GET "$CKAN_URL/api/3/action/status_show" \ -H "X-CKAN-API-Key: $API_KEY"
# Search datasetscurl -X POST "$CKAN_URL/api/3/action/package_search" \ -H "X-CKAN-API-Key: $API_KEY" \ -H "Content-Type: application/json" \ -d '{ "q": "population", "rows": 10 }'
# Create a new datasetcurl -X POST "$CKAN_URL/api/3/action/package_create" \ -H "X-CKAN-API-Key: $API_KEY" \ -H "Content-Type: application/json" \ -d '{ "name": "environmental-data", "title": "Environmental Monitoring Data", "owner_org": "environmental-agency", "notes": "Air quality and pollution monitoring", "tags": [ {"name": "environment"}, {"name": "air-quality"} ] }'
# Get organization informationcurl -X GET "$CKAN_URL/api/3/action/organization_show?id=environmental-agency" \ -H "X-CKAN-API-Key: $API_KEY"
# List all datasetscurl -X POST "$CKAN_URL/api/3/action/package_list" \ -H "X-CKAN-API-Key: $API_KEY" \ -H "Content-Type: application/json" \ -d '{ "limit": 20, "offset": 0 }'
# Create a resource (attach file to dataset)curl -X POST "$CKAN_URL/api/3/action/resource_create" \ -H "X-CKAN-API-Key: $API_KEY" \ -d '{ "package_id": "environmental-data", "url": "https://example.com/air-quality.csv", "name": "Air Quality Data", "format": "CSV" }'
# Get user informationcurl -X GET "$CKAN_URL/api/3/action/user_show?id=admin" \ -H "X-CKAN-API-Key: $API_KEY"
# Modify datasetcurl -X POST "$CKAN_URL/api/3/action/package_patch" \ -H "X-CKAN-API-Key: $API_KEY" \ -H "Content-Type: application/json" \ -d '{ "id": "environmental-data", "notes": "Updated: Air quality and pollution monitoring data with real-time updates" }'
# Get activity streamcurl -X GET "$CKAN_URL/api/3/action/dashboard_activity_list" \ -H "X-CKAN-API-Key: $API_KEY"Best Practices
Data Organization
- Use Organizations: Group related datasets by organization or department for better organization and access control
- Consistent Naming: Use descriptive, consistent naming conventions for datasets and resources
- Complete Metadata: Fill in comprehensive metadata including descriptions, keywords, and license information
- Resource Versioning: Upload new versions of resources with clear versioning to track data evolution
- Data Quality: Validate data format and quality before publishing to ensure portal credibility
Access Control and Security
- User Permissions: Assign appropriate roles (Admin, Editor, Member) based on responsibilities
- Organization Ownership: Ensure datasets are assigned to appropriate organizations with clear ownership
- Public vs Private: Mark sensitive data as private and restrict access to authorized users only
- API Key Management: Create separate API keys for different applications and rotate them regularly
- HTTPS/SSL: Ensure your CKAN portal uses HTTPS for secure communication
Portal Management
- Regular Updates: Keep CKAN and plugins updated to the latest versions for security and feature improvements
- Backup Strategy: Regularly back up PostgreSQL database and file storage to protect against data loss
- Monitoring: Monitor portal performance, disk space, and database size for production health
- User Support: Provide documentation and guidance for users on how to find, understand, and use datasets
- Data Governance: Establish clear policies for data publication, retention, and archival
Performance Optimization
- Caching: Use Redis caching to improve search and API response times
- Database Tuning: Configure PostgreSQL connection pooling and indexes for optimal performance
- Search Optimization: Use Solr for advanced search capabilities if database search is insufficient
- CDN: Consider using a CDN for static assets and file downloads
- Resource Limits: Configure appropriate upload size limits and query timeouts
Integration and Automation
- API Integration: Use CKAN API to integrate with external systems and workflows
- Data Harvesting: Set up automatic harvesting from other data portals to aggregate data
- Webhooks: Implement event-driven workflows for dataset updates and notifications
- Automated Backups: Configure scheduled backups to cloud storage or dedicated backup servers
- Monitoring Alerts: Set up alerts for deployment issues, database problems, and storage warnings
Troubleshooting
Issue: Database Connection Fails on Startup
Solution: Verify that PostgreSQL is running and accessible from the CKAN container. Check CKAN_SQLALCHEMY_URL environment variable for correct connection string. Ensure database user has proper permissions on the CKAN database.
Issue: Redis Connection Errors
Solution: Confirm Redis is running and reachable from CKAN. Check CKAN_REDIS_URL environment variable. Verify firewall rules allow connections from CKAN container to Redis instance.
Issue: Persistent Storage Data Lost
Solution: Ensure persistent volume is properly mounted at /var/lib/ckan/storage before deployment. Verify volume attachment in Klutch.sh settings. Check that volume has sufficient free space.
Issue: Slow Dataset Search
Solution: Database search slows with large datasets. Consider switching to Solr for full-text search. Ensure PostgreSQL has proper indexes. Monitor database query performance and optimize slow queries.
Issue: File Upload Failures
Solution: Check that persistent storage has write permissions for the CKAN user. Verify CKAN_MAX_RESOURCE_SIZE is set high enough for your uploads. Check that /var/lib/ckan/storage directory exists and is writable.
Updating CKAN
To update CKAN to a newer version:
- Go to your Klutch.sh app dashboard
- Navigate to Deployments section
- Select the latest deployment and note the current version
- Update the Dockerfile to use a newer CKAN image tag
- Commit and push changes to GitHub
- Klutch.sh automatically redeploys with the new version
- CKAN runs database migrations automatically on startup
- Verify deployment is healthy and data is intact
Always back up your database and file storage before updating.
Use Cases
Government Open Data Portals
Deploy CKAN to create official government data portals for publishing datasets from multiple agencies. Enable citizens and researchers to discover and access government data with full-text search, dataset previews, and download capabilities.
Enterprise Data Hub
Build an internal data portal for organizations to catalog and share datasets across departments. Control data access with permissions, track data lineage with activity streams, and integrate with business intelligence tools.
Humanitarian Data Platform
Create a humanitarian data platform to share datasets related to disasters, health, food security, and development. Enable organizations to quickly publish and discover critical data during emergencies.
Scientific Data Repository
Deploy CKAN for research institutions to manage and share scientific datasets. Support versioning, metadata standards, and integration with research workflows and collaboration tools.
Thematic Data Hub
Build specialized data portals focused on specific domains like climate data, biodiversity, public health, or urban planning. Customize metadata schemas and visualizations for domain-specific needs.
Additional Resources
- Official CKAN Documentation: https://docs.ckan.org
- CKAN GitHub Repository: https://github.com/ckan/ckan
- CKAN Community Chat: https://gitter.im/ckan/chat
- CKAN Extension Development: https://docs.ckan.org/en/latest/extensions/index.html
- API Documentation: https://docs.ckan.org/en/latest/api/index.html
- CKAN Showcase Sites: https://ckan.org/showcase
- CKAN Blog: https://ckan.org/blog
- Stack Overflow CKAN Tag: https://stackoverflow.com/questions/tagged/ckan
Conclusion
CKAN empowers organizations to build comprehensive open data portals that make data accessible, discoverable, and usable for citizens, researchers, and businesses. By deploying CKAN on Klutch.sh, you gain a scalable and maintainable platform for data governance and sharing backed by a proven, production-tested system used by governments and enterprises worldwide.
The combination of CKAN’s powerful data management capabilities with Klutch.sh’s simplified deployment model makes it straightforward to establish an enterprise-grade data portal. Whether you’re launching a government open data initiative, creating an internal enterprise data hub, or building a humanitarian data platform, CKAN provides the tools, API capabilities, and extensibility needed for effective data management and discovery.
Start by deploying CKAN with PostgreSQL and Redis infrastructure, configure persistent storage for reliable data management, and gradually expand your portal with additional datasets and customizations. Leverage CKAN’s comprehensive API for automation, plugin system for extending functionality, and active community for support and best practices.