Skip to content

Deploying InvenioRDM

Introduction

InvenioRDM is a turn-key research data management repository developed at CERN (the European Organization for Nuclear Research). Built on the battle-tested Invenio framework that powers CERN’s Zenodo repository, InvenioRDM provides institutions with a robust platform for managing, preserving, and sharing research data and publications.

Designed to meet the needs of modern research workflows, InvenioRDM supports FAIR data principles (Findable, Accessible, Interoperable, Reusable) out of the box. It provides DOI minting, rich metadata support, versioning, communities, and powerful search capabilities - everything an institution needs to run a professional repository.

Key highlights of InvenioRDM:

  • FAIR Data Ready: Built-in support for FAIR data principles
  • DOI Integration: Mint DataCite DOIs for persistent identifiers
  • Rich Metadata: Comprehensive metadata schemas with custom extensions
  • Communities: Organize content into curated collections
  • Access Control: Fine-grained permissions and restricted access options
  • Versioning: Track changes with full version history
  • File Management: Support for large files with integrity checking
  • OAI-PMH: Harvest metadata for aggregators and search engines
  • REST API: Complete API for programmatic access
  • Customizable: Extensive theming and extension capabilities
  • Enterprise Grade: Proven at scale with Zenodo handling millions of records
  • Open Source: MIT licensed with active development by CERN and community

This guide walks through deploying InvenioRDM on Klutch.sh using Docker, setting up your own research repository.

Why Deploy InvenioRDM on Klutch.sh

Deploying InvenioRDM on Klutch.sh provides several advantages for research data management:

Simplified Deployment: Klutch.sh handles the container orchestration for InvenioRDM’s multi-service architecture.

Persistent Storage: Attach persistent volumes for your database, search index, and file storage. Your research data survives deployments.

HTTPS by Default: Klutch.sh provides automatic SSL certificates, essential for institutional repositories.

GitHub Integration: Connect your configuration repository directly from GitHub. Updates trigger automatic redeployments.

Scalable Resources: Allocate CPU, memory, and storage based on repository size.

Custom Domains: Assign your institutional domain for professional repository URLs.

Prerequisites

Before deploying InvenioRDM on Klutch.sh, ensure you have:

  • A Klutch.sh account
  • A GitHub account with a repository for your InvenioRDM configuration
  • Basic familiarity with Docker and containerization concepts
  • A PostgreSQL database
  • An Elasticsearch/OpenSearch instance
  • Redis for caching
  • (Optional) DataCite credentials for DOI minting

Understanding InvenioRDM Architecture

InvenioRDM requires multiple services:

Web Application: Flask-based Python application serving the UI and API.

PostgreSQL: Primary database for records, users, and metadata.

Elasticsearch/OpenSearch: Search engine for fast record discovery.

Redis: Cache and message broker for background tasks.

Celery Workers: Background job processing for file processing, indexing, etc.

File Storage: Local filesystem or S3-compatible storage for files.

Preparing Your Repository

Due to InvenioRDM’s complexity, using the official deployment method is recommended.

Repository Structure

inveniordm-deploy/
├── Dockerfile
├── docker-compose.yml
├── invenio.cfg
└── .dockerignore

Creating the Dockerfile

Create a Dockerfile for the InvenioRDM application:

FROM python:3.9-slim
# Install system dependencies
RUN apt-get update && apt-get install -y \
build-essential \
libpq-dev \
git \
curl \
&& rm -rf /var/lib/apt/lists/*
# Install Node.js for frontend assets
RUN curl -fsSL https://deb.nodesource.com/setup_18.x | bash - \
&& apt-get install -y nodejs
# Create application directory
WORKDIR /opt/invenio
# Install invenio-cli
RUN pip install invenio-cli
# Initialize InvenioRDM instance
RUN invenio-cli init rdm --no-input
# Install Python dependencies
WORKDIR /opt/invenio/my-site
RUN pip install -e .
RUN pip install celery redis psycopg2-binary
# Build frontend assets
RUN invenio-cli assets build
# Set environment variables
ENV INVENIO_APP_ALLOWED_HOSTS=${INVENIO_APP_ALLOWED_HOSTS:-['0.0.0.0', 'localhost', '127.0.0.1']}
ENV INVENIO_SQLALCHEMY_DATABASE_URI=${INVENIO_SQLALCHEMY_DATABASE_URI}
ENV INVENIO_SEARCH_HOSTS=${INVENIO_SEARCH_HOSTS:-['search:9200']}
# Expose port
EXPOSE 5000
# Health check
HEALTHCHECK --interval=30s --timeout=10s --start-period=60s --retries=3 \
CMD curl -f http://localhost:5000/ping || exit 1
# Run the application
CMD ["invenio", "run", "-h", "0.0.0.0"]

InvenioRDM Configuration (invenio.cfg)

Create an invenio.cfg file:

# InvenioRDM Configuration
# Site name and info
SITE_HOSTNAME = "repository.example.com"
SITE_UI_URL = "https://repository.example.com"
SITE_API_URL = "https://repository.example.com/api"
# Secret key - change this!
SECRET_KEY = "CHANGE_ME_TO_A_SECURE_RANDOM_STRING"
# Database
SQLALCHEMY_DATABASE_URI = "postgresql://user:pass@host/inveniordm"
# Search
SEARCH_HOSTS = [{"host": "search-host", "port": 9200}]
# Cache
CACHE_TYPE = "redis"
CACHE_REDIS_URL = "redis://redis:6379/0"
# Celery
CELERY_BROKER_URL = "redis://redis:6379/1"
# Files
FILES_REST_STORAGE_CLASS = "L"
# DOI Configuration (optional)
DATACITE_ENABLED = False
# DATACITE_USERNAME = "your-username"
# DATACITE_PASSWORD = "your-password"
# DATACITE_PREFIX = "10.xxxxx"
# Mail
MAIL_SUPPRESS_SEND = True

Environment Variables Reference

VariableRequiredDefaultDescription
INVENIO_APP_ALLOWED_HOSTSYeslocalhostAllowed hostnames
INVENIO_SQLALCHEMY_DATABASE_URIYes-PostgreSQL connection string
INVENIO_SEARCH_HOSTSYes-Elasticsearch hosts
INVENIO_CACHE_REDIS_URLYes-Redis URL for caching
INVENIO_CELERY_BROKER_URLYes-Redis URL for Celery
SECRET_KEYYes-Flask secret key
INVENIO_LOGGING_CONSOLE_LEVELNoWARNINGLog level
INVENIO_MAIL_SUPPRESS_SENDNoTrueSuppress email sending

Deploying InvenioRDM on Klutch.sh

Due to InvenioRDM’s multi-service requirements, deploy each component separately:

    Deploy PostgreSQL

    Create a PostgreSQL app on Klutch.sh:

    1. Create a new app with PostgreSQL image
    2. Configure persistent volume for data
    3. Note the connection details

    Deploy Elasticsearch/OpenSearch

    Create a search service app:

    1. Use Elasticsearch or OpenSearch image
    2. Configure persistent volume for indices
    3. Note the connection details

    Deploy Redis

    Create a Redis app:

    1. Use Redis image
    2. Configure for Celery and caching
    3. Note the connection details

    Push Your Repository to GitHub

    Terminal window
    git init
    git add Dockerfile invenio.cfg .dockerignore
    git commit -m "Initial InvenioRDM deployment configuration"
    git remote add origin https://github.com/yourusername/inveniordm-deploy.git
    git push -u origin main

    Create the InvenioRDM App

    Navigate to the Klutch.sh dashboard and create a new app for InvenioRDM.

    Configure HTTP Traffic

    In the deployment settings:

    • Select HTTP as the traffic type
    • Set the internal port to 5000

    Set Environment Variables

    Configure all required environment variables:

    VariableValue
    INVENIO_SQLALCHEMY_DATABASE_URIYour PostgreSQL URL
    INVENIO_SEARCH_HOSTSYour Elasticsearch host
    INVENIO_CACHE_REDIS_URLYour Redis URL
    INVENIO_CELERY_BROKER_URLYour Redis URL
    SECRET_KEYA secure random string
    INVENIO_APP_ALLOWED_HOSTSYour domain

    Attach Persistent Volumes

    Add volumes for file storage:

    Mount PathRecommended SizePurpose
    /opt/invenio/var/instance/data100+ GBUploaded files
    /opt/invenio/var/instance/static5 GBStatic assets

    Deploy and Initialize

    Deploy the application, then run initialization commands:

    Terminal window
    invenio db init
    invenio db create
    invenio index init
    invenio files location create --default default s3://data

    Create Admin User

    Terminal window
    invenio users create admin@example.com --password=yourpassword
    invenio roles add admin@example.com admin

Managing Your Repository

Creating Records

  1. Log into InvenioRDM as an authenticated user
  2. Click “New Upload”
  3. Fill in metadata (title, creators, description)
  4. Upload files
  5. Choose access settings (open, restricted, embargoed)
  6. Publish the record

Communities

Create curated collections:

  1. Go to Communities
  2. Create a new community
  3. Set review policies
  4. Invite curators
  5. Members can submit records for review

DOI Minting

If DataCite is configured:

  1. Records receive DOIs automatically on publication
  2. DOIs persist even if records are updated
  3. Version DOIs link to the latest version

Troubleshooting Common Issues

Search Not Working

Symptoms: Records not appearing in search.

Solutions:

  • Verify Elasticsearch is running and accessible
  • Rebuild search index: invenio index reindex
  • Check Elasticsearch logs for errors

Database Errors

Symptoms: Application errors related to database.

Solutions:

  • Verify PostgreSQL connection string
  • Run migrations: invenio db upgrade
  • Check database accessibility

File Upload Issues

Symptoms: Cannot upload files.

Solutions:

  • Verify file storage is configured
  • Check volume permissions
  • Review file size limits

Additional Resources

Conclusion

Deploying InvenioRDM on Klutch.sh enables your institution to run a professional research data repository with the same technology powering CERN’s Zenodo. While the multi-service architecture requires more setup than simpler applications, the result is an enterprise-grade repository supporting FAIR data principles, DOI minting, and comprehensive research data management.