Deploying InvenioRDM
Introduction
InvenioRDM is a turn-key research data management repository developed at CERN (the European Organization for Nuclear Research). Built on the battle-tested Invenio framework that powers CERN’s Zenodo repository, InvenioRDM provides institutions with a robust platform for managing, preserving, and sharing research data and publications.
Designed to meet the needs of modern research workflows, InvenioRDM supports FAIR data principles (Findable, Accessible, Interoperable, Reusable) out of the box. It provides DOI minting, rich metadata support, versioning, communities, and powerful search capabilities - everything an institution needs to run a professional repository.
Key highlights of InvenioRDM:
- FAIR Data Ready: Built-in support for FAIR data principles
- DOI Integration: Mint DataCite DOIs for persistent identifiers
- Rich Metadata: Comprehensive metadata schemas with custom extensions
- Communities: Organize content into curated collections
- Access Control: Fine-grained permissions and restricted access options
- Versioning: Track changes with full version history
- File Management: Support for large files with integrity checking
- OAI-PMH: Harvest metadata for aggregators and search engines
- REST API: Complete API for programmatic access
- Customizable: Extensive theming and extension capabilities
- Enterprise Grade: Proven at scale with Zenodo handling millions of records
- Open Source: MIT licensed with active development by CERN and community
This guide walks through deploying InvenioRDM on Klutch.sh using Docker, setting up your own research repository.
Why Deploy InvenioRDM on Klutch.sh
Deploying InvenioRDM on Klutch.sh provides several advantages for research data management:
Simplified Deployment: Klutch.sh handles the container orchestration for InvenioRDM’s multi-service architecture.
Persistent Storage: Attach persistent volumes for your database, search index, and file storage. Your research data survives deployments.
HTTPS by Default: Klutch.sh provides automatic SSL certificates, essential for institutional repositories.
GitHub Integration: Connect your configuration repository directly from GitHub. Updates trigger automatic redeployments.
Scalable Resources: Allocate CPU, memory, and storage based on repository size.
Custom Domains: Assign your institutional domain for professional repository URLs.
Prerequisites
Before deploying InvenioRDM on Klutch.sh, ensure you have:
- A Klutch.sh account
- A GitHub account with a repository for your InvenioRDM configuration
- Basic familiarity with Docker and containerization concepts
- A PostgreSQL database
- An Elasticsearch/OpenSearch instance
- Redis for caching
- (Optional) DataCite credentials for DOI minting
Understanding InvenioRDM Architecture
InvenioRDM requires multiple services:
Web Application: Flask-based Python application serving the UI and API.
PostgreSQL: Primary database for records, users, and metadata.
Elasticsearch/OpenSearch: Search engine for fast record discovery.
Redis: Cache and message broker for background tasks.
Celery Workers: Background job processing for file processing, indexing, etc.
File Storage: Local filesystem or S3-compatible storage for files.
Preparing Your Repository
Due to InvenioRDM’s complexity, using the official deployment method is recommended.
Repository Structure
inveniordm-deploy/├── Dockerfile├── docker-compose.yml├── invenio.cfg└── .dockerignoreCreating the Dockerfile
Create a Dockerfile for the InvenioRDM application:
FROM python:3.9-slim
# Install system dependenciesRUN apt-get update && apt-get install -y \ build-essential \ libpq-dev \ git \ curl \ && rm -rf /var/lib/apt/lists/*
# Install Node.js for frontend assetsRUN curl -fsSL https://deb.nodesource.com/setup_18.x | bash - \ && apt-get install -y nodejs
# Create application directoryWORKDIR /opt/invenio
# Install invenio-cliRUN pip install invenio-cli
# Initialize InvenioRDM instanceRUN invenio-cli init rdm --no-input
# Install Python dependenciesWORKDIR /opt/invenio/my-siteRUN pip install -e .RUN pip install celery redis psycopg2-binary
# Build frontend assetsRUN invenio-cli assets build
# Set environment variablesENV INVENIO_APP_ALLOWED_HOSTS=${INVENIO_APP_ALLOWED_HOSTS:-['0.0.0.0', 'localhost', '127.0.0.1']}ENV INVENIO_SQLALCHEMY_DATABASE_URI=${INVENIO_SQLALCHEMY_DATABASE_URI}ENV INVENIO_SEARCH_HOSTS=${INVENIO_SEARCH_HOSTS:-['search:9200']}
# Expose portEXPOSE 5000
# Health checkHEALTHCHECK --interval=30s --timeout=10s --start-period=60s --retries=3 \ CMD curl -f http://localhost:5000/ping || exit 1
# Run the applicationCMD ["invenio", "run", "-h", "0.0.0.0"]InvenioRDM Configuration (invenio.cfg)
Create an invenio.cfg file:
# InvenioRDM Configuration
# Site name and infoSITE_HOSTNAME = "repository.example.com"SITE_UI_URL = "https://repository.example.com"SITE_API_URL = "https://repository.example.com/api"
# Secret key - change this!SECRET_KEY = "CHANGE_ME_TO_A_SECURE_RANDOM_STRING"
# DatabaseSQLALCHEMY_DATABASE_URI = "postgresql://user:pass@host/inveniordm"
# SearchSEARCH_HOSTS = [{"host": "search-host", "port": 9200}]
# CacheCACHE_TYPE = "redis"CACHE_REDIS_URL = "redis://redis:6379/0"
# CeleryCELERY_BROKER_URL = "redis://redis:6379/1"
# FilesFILES_REST_STORAGE_CLASS = "L"
# DOI Configuration (optional)DATACITE_ENABLED = False# DATACITE_USERNAME = "your-username"# DATACITE_PASSWORD = "your-password"# DATACITE_PREFIX = "10.xxxxx"
# MailMAIL_SUPPRESS_SEND = TrueEnvironment Variables Reference
| Variable | Required | Default | Description |
|---|---|---|---|
INVENIO_APP_ALLOWED_HOSTS | Yes | localhost | Allowed hostnames |
INVENIO_SQLALCHEMY_DATABASE_URI | Yes | - | PostgreSQL connection string |
INVENIO_SEARCH_HOSTS | Yes | - | Elasticsearch hosts |
INVENIO_CACHE_REDIS_URL | Yes | - | Redis URL for caching |
INVENIO_CELERY_BROKER_URL | Yes | - | Redis URL for Celery |
SECRET_KEY | Yes | - | Flask secret key |
INVENIO_LOGGING_CONSOLE_LEVEL | No | WARNING | Log level |
INVENIO_MAIL_SUPPRESS_SEND | No | True | Suppress email sending |
Deploying InvenioRDM on Klutch.sh
Due to InvenioRDM’s multi-service requirements, deploy each component separately:
- Create a new app with PostgreSQL image
- Configure persistent volume for data
- Note the connection details
- Use Elasticsearch or OpenSearch image
- Configure persistent volume for indices
- Note the connection details
- Use Redis image
- Configure for Celery and caching
- Note the connection details
- Select HTTP as the traffic type
- Set the internal port to 5000
Deploy PostgreSQL
Create a PostgreSQL app on Klutch.sh:
Deploy Elasticsearch/OpenSearch
Create a search service app:
Deploy Redis
Create a Redis app:
Push Your Repository to GitHub
git initgit add Dockerfile invenio.cfg .dockerignoregit commit -m "Initial InvenioRDM deployment configuration"git remote add origin https://github.com/yourusername/inveniordm-deploy.gitgit push -u origin mainCreate the InvenioRDM App
Navigate to the Klutch.sh dashboard and create a new app for InvenioRDM.
Configure HTTP Traffic
In the deployment settings:
Set Environment Variables
Configure all required environment variables:
| Variable | Value |
|---|---|
INVENIO_SQLALCHEMY_DATABASE_URI | Your PostgreSQL URL |
INVENIO_SEARCH_HOSTS | Your Elasticsearch host |
INVENIO_CACHE_REDIS_URL | Your Redis URL |
INVENIO_CELERY_BROKER_URL | Your Redis URL |
SECRET_KEY | A secure random string |
INVENIO_APP_ALLOWED_HOSTS | Your domain |
Attach Persistent Volumes
Add volumes for file storage:
| Mount Path | Recommended Size | Purpose |
|---|---|---|
/opt/invenio/var/instance/data | 100+ GB | Uploaded files |
/opt/invenio/var/instance/static | 5 GB | Static assets |
Deploy and Initialize
Deploy the application, then run initialization commands:
invenio db initinvenio db createinvenio index initinvenio files location create --default default s3://dataCreate Admin User
invenio users create admin@example.com --password=yourpasswordinvenio roles add admin@example.com adminManaging Your Repository
Creating Records
- Log into InvenioRDM as an authenticated user
- Click “New Upload”
- Fill in metadata (title, creators, description)
- Upload files
- Choose access settings (open, restricted, embargoed)
- Publish the record
Communities
Create curated collections:
- Go to Communities
- Create a new community
- Set review policies
- Invite curators
- Members can submit records for review
DOI Minting
If DataCite is configured:
- Records receive DOIs automatically on publication
- DOIs persist even if records are updated
- Version DOIs link to the latest version
Troubleshooting Common Issues
Search Not Working
Symptoms: Records not appearing in search.
Solutions:
- Verify Elasticsearch is running and accessible
- Rebuild search index:
invenio index reindex - Check Elasticsearch logs for errors
Database Errors
Symptoms: Application errors related to database.
Solutions:
- Verify PostgreSQL connection string
- Run migrations:
invenio db upgrade - Check database accessibility
File Upload Issues
Symptoms: Cannot upload files.
Solutions:
- Verify file storage is configured
- Check volume permissions
- Review file size limits
Additional Resources
- InvenioRDM Documentation
- Invenio Software Website
- InvenioRDM GitHub Repository
- Zenodo (InvenioRDM in Production)
- Klutch.sh Persistent Volumes
- Klutch.sh Deployments
Conclusion
Deploying InvenioRDM on Klutch.sh enables your institution to run a professional research data repository with the same technology powering CERN’s Zenodo. While the multi-service architecture requires more setup than simpler applications, the result is an enterprise-grade repository supporting FAIR data principles, DOI minting, and comprehensive research data management.