Deploying InvenioRDM

Introduction

InvenioRDM is a turn-key research data management repository developed at CERN (the European Organization for Nuclear Research). Built on the battle-tested Invenio framework that powers CERN’s Zenodo repository, InvenioRDM provides institutions with a robust platform for managing, preserving, and sharing research data and publications.

Designed to meet the needs of modern research workflows, InvenioRDM supports FAIR data principles (Findable, Accessible, Interoperable, Reusable) out of the box. It provides DOI minting, rich metadata support, versioning, communities, and powerful search capabilities - everything an institution needs to run a professional repository.

Key highlights of InvenioRDM:

FAIR Data Ready: Built-in support for FAIR data principles
DOI Integration: Mint DataCite DOIs for persistent identifiers
Rich Metadata: Comprehensive metadata schemas with custom extensions
Communities: Organize content into curated collections
Access Control: Fine-grained permissions and restricted access options
Versioning: Track changes with full version history
File Management: Support for large files with integrity checking
OAI-PMH: Harvest metadata for aggregators and search engines
REST API: Complete API for programmatic access
Customizable: Extensive theming and extension capabilities
Enterprise Grade: Proven at scale with Zenodo handling millions of records
Open Source: MIT licensed with active development by CERN and community

This guide walks through deploying InvenioRDM on Klutch.sh using Docker, setting up your own research repository.

Why Deploy InvenioRDM on Klutch.sh

Deploying InvenioRDM on Klutch.sh provides several advantages for research data management:

Simplified Deployment: Klutch.sh handles the container orchestration for InvenioRDM’s multi-service architecture.

Persistent Storage: Attach persistent volumes for your database, search index, and file storage. Your research data survives deployments.

HTTPS by Default: Klutch.sh provides automatic SSL certificates, essential for institutional repositories.

GitHub Integration: Connect your configuration repository directly from GitHub. Updates trigger automatic redeployments.

Scalable Resources: Allocate CPU, memory, and storage based on repository size.

Custom Domains: Assign your institutional domain for professional repository URLs.

Prerequisites

Before deploying InvenioRDM on Klutch.sh, ensure you have:

A Klutch.sh account
A GitHub account with a repository for your InvenioRDM configuration
Basic familiarity with Docker and containerization concepts
A PostgreSQL database
An Elasticsearch/OpenSearch instance
Redis for caching
(Optional) DataCite credentials for DOI minting

Understanding InvenioRDM Architecture

InvenioRDM requires multiple services:

Web Application: Flask-based Python application serving the UI and API.

PostgreSQL: Primary database for records, users, and metadata.

Elasticsearch/OpenSearch: Search engine for fast record discovery.

Redis: Cache and message broker for background tasks.

Celery Workers: Background job processing for file processing, indexing, etc.

File Storage: Local filesystem or S3-compatible storage for files.

Preparing Your Repository

Due to InvenioRDM’s complexity, using the official deployment method is recommended.

Repository Structure

inveniordm-deploy/
├── Dockerfile
├── docker-compose.yml
├── invenio.cfg
└── .dockerignore

Creating the Dockerfile

Create a Dockerfile for the InvenioRDM application:

FROM python:3.9-slim

# Install system dependencies
RUN apt-get update && apt-get install -y \
    build-essential \
    libpq-dev \
    git \
    curl \
    && rm -rf /var/lib/apt/lists/*

# Install Node.js for frontend assets
RUN curl -fsSL https://deb.nodesource.com/setup_18.x | bash - \
    && apt-get install -y nodejs

# Create application directory
WORKDIR /opt/invenio

# Install invenio-cli
RUN pip install invenio-cli

# Initialize InvenioRDM instance
RUN invenio-cli init rdm --no-input

# Install Python dependencies
WORKDIR /opt/invenio/my-site
RUN pip install -e .
RUN pip install celery redis psycopg2-binary

# Build frontend assets
RUN invenio-cli assets build

# Set environment variables
ENV INVENIO_APP_ALLOWED_HOSTS=${INVENIO_APP_ALLOWED_HOSTS:-['0.0.0.0', 'localhost', '127.0.0.1']}
ENV INVENIO_SQLALCHEMY_DATABASE_URI=${INVENIO_SQLALCHEMY_DATABASE_URI}
ENV INVENIO_SEARCH_HOSTS=${INVENIO_SEARCH_HOSTS:-['search:9200']}

# Expose port
EXPOSE 5000

# Health check
HEALTHCHECK --interval=30s --timeout=10s --start-period=60s --retries=3 \
  CMD curl -f http://localhost:5000/ping || exit 1

# Run the application
CMD ["invenio", "run", "-h", "0.0.0.0"]

InvenioRDM Configuration (invenio.cfg)

Create an invenio.cfg file:

# InvenioRDM Configuration

# Site name and info
SITE_HOSTNAME = "repository.example.com"
SITE_UI_URL = "https://repository.example.com"
SITE_API_URL = "https://repository.example.com/api"

# Secret key - change this!
SECRET_KEY = "CHANGE_ME_TO_A_SECURE_RANDOM_STRING"

# Database
SQLALCHEMY_DATABASE_URI = "postgresql://user:pass@host/inveniordm"

# Search
SEARCH_HOSTS = [{"host": "search-host", "port": 9200}]

# Cache
CACHE_TYPE = "redis"
CACHE_REDIS_URL = "redis://redis:6379/0"

# Celery
CELERY_BROKER_URL = "redis://redis:6379/1"

# Files
FILES_REST_STORAGE_CLASS = "L"

# DOI Configuration (optional)
DATACITE_ENABLED = False
# DATACITE_USERNAME = "your-username"
# DATACITE_PASSWORD = "your-password"
# DATACITE_PREFIX = "10.xxxxx"

# Mail
MAIL_SUPPRESS_SEND = True

Environment Variables Reference

Variable	Required	Default	Description
`INVENIO_APP_ALLOWED_HOSTS`	Yes	localhost	Allowed hostnames
`INVENIO_SQLALCHEMY_DATABASE_URI`	Yes	-	PostgreSQL connection string
`INVENIO_SEARCH_HOSTS`	Yes	-	Elasticsearch hosts
`INVENIO_CACHE_REDIS_URL`	Yes	-	Redis URL for caching
`INVENIO_CELERY_BROKER_URL`	Yes	-	Redis URL for Celery
`SECRET_KEY`	Yes	-	Flask secret key
`INVENIO_LOGGING_CONSOLE_LEVEL`	No	WARNING	Log level
`INVENIO_MAIL_SUPPRESS_SEND`	No	True	Suppress email sending

Deploying InvenioRDM on Klutch.sh

Due to InvenioRDM’s multi-service requirements, deploy each component separately:

Deploy PostgreSQL

Create a PostgreSQL app on Klutch.sh:

Create a new app with PostgreSQL image
Configure persistent volume for data
Note the connection details

Deploy Elasticsearch/OpenSearch

Create a search service app:

Use Elasticsearch or OpenSearch image
Configure persistent volume for indices
Note the connection details

Deploy Redis

Create a Redis app:

Use Redis image
Configure for Celery and caching
Note the connection details

Push Your Repository to GitHub

git init
git add Dockerfile invenio.cfg .dockerignore
git commit -m "Initial InvenioRDM deployment configuration"
git remote add origin https://github.com/yourusername/inveniordm-deploy.git
git push -u origin main

Create the InvenioRDM App

Navigate to the Klutch.sh dashboard and create a new app for InvenioRDM.

Configure HTTP Traffic

In the deployment settings:

Select HTTP as the traffic type
Set the internal port to 5000

Set Environment Variables

Configure all required environment variables:

Variable	Value
`INVENIO_SQLALCHEMY_DATABASE_URI`	Your PostgreSQL URL
`INVENIO_SEARCH_HOSTS`	Your Elasticsearch host
`INVENIO_CACHE_REDIS_URL`	Your Redis URL
`INVENIO_CELERY_BROKER_URL`	Your Redis URL
`SECRET_KEY`	A secure random string
`INVENIO_APP_ALLOWED_HOSTS`	Your domain

Attach Persistent Volumes

Add volumes for file storage:

Mount Path	Recommended Size	Purpose
`/opt/invenio/var/instance/data`	100+ GB	Uploaded files
`/opt/invenio/var/instance/static`	5 GB	Static assets

Deploy and Initialize

Deploy the application, then run initialization commands:

invenio db init
invenio db create
invenio index init
invenio files location create --default default s3://data

Create Admin User

invenio users create admin@example.com --password=yourpassword
invenio roles add admin@example.com admin

Managing Your Repository

Creating Records

Log into InvenioRDM as an authenticated user
Click “New Upload”
Fill in metadata (title, creators, description)
Upload files
Choose access settings (open, restricted, embargoed)
Publish the record

Communities

Create curated collections:

Go to Communities
Create a new community
Set review policies
Invite curators
Members can submit records for review

DOI Minting

If DataCite is configured:

Records receive DOIs automatically on publication
DOIs persist even if records are updated
Version DOIs link to the latest version

Troubleshooting Common Issues

Search Not Working

Symptoms: Records not appearing in search.

Solutions:

Verify Elasticsearch is running and accessible
Rebuild search index: invenio index reindex
Check Elasticsearch logs for errors

Database Errors

Symptoms: Application errors related to database.

Solutions:

Verify PostgreSQL connection string
Run migrations: invenio db upgrade
Check database accessibility

File Upload Issues

Symptoms: Cannot upload files.

Solutions:

Verify file storage is configured
Check volume permissions
Review file size limits

Additional Resources

Conclusion

Deploying InvenioRDM on Klutch.sh enables your institution to run a professional research data repository with the same technology powering CERN’s Zenodo. While the multi-service architecture requires more setup than simpler applications, the result is an enterprise-grade repository supporting FAIR data principles, DOI minting, and comprehensive research data management.