Deploying DSpace

DSpace is the world’s leading open-source repository platform for research outputs, publications, datasets, and learning resources. Developed by MIT Libraries and HP Labs, and now maintained by LYRASIS and DuraSpace, DSpace powers digital repositories for thousands of academic institutions, research organizations, government agencies, and cultural heritage institutions worldwide. The platform provides robust tools for ingesting, preserving, indexing, and distributing digital content while ensuring long-term preservation and accessibility.

What makes DSpace particularly powerful is its comprehensive feature set designed for institutional repositories. The system supports complex metadata schemas including Dublin Core, qualified Dublin Core, and custom metadata fields. It handles a wide variety of digital content formats from PDFs and images to datasets and multimedia files. Features like customizable workflows, embargo periods, access controls, OAI-PMH harvesting, DOI minting, and integration with authentication systems make DSpace a complete solution for managing institutional knowledge.

Why Deploy DSpace on Klutch.sh?

Klutch.sh provides an excellent platform for hosting DSpace with several key advantages:

Simple Docker Deployment: Deploy your Dockerfile and Klutch.sh automatically handles containerization and orchestration
Persistent Storage: Attach volumes for content storage (assetstore), database, and Solr indexes with guaranteed durability
Automatic HTTPS: All deployments come with automatic SSL certificates for secure repository access
Resource Scalability: Scale CPU and memory resources as your repository grows
Database Integration: Run PostgreSQL alongside DSpace with persistent storage
Cost-Effective: Pay only for resources used, scale based on repository size
Zero Server Management: Focus on curating content, not managing infrastructure

Prerequisites

Before deploying DSpace, ensure you have:

A Klutch.sh account (sign up at klutch.sh)
Git installed locally
Basic understanding of institutional repositories and digital preservation
Familiarity with Docker and container concepts
PostgreSQL knowledge for database management
Understanding of metadata standards (Dublin Core, etc.)
Java and Tomcat knowledge (helpful but not required)

Understanding DSpace’s Architecture

DSpace uses a multi-tier architecture designed for scalability and long-term digital preservation:

Core Components

DSpace Backend (REST API): Built with Spring Boot, the backend provides:

RESTful API for all repository operations
Content ingestion and management
Metadata handling and validation
Workflow processing
Authorization and authentication
OAI-PMH data provider
Batch import/export functionality

DSpace Frontend (Angular): Modern single-page application offering:

User-friendly interface for browsing and searching
Submission workflows for depositing content
Administrative interfaces
Customizable themes and branding
Responsive design for mobile devices
Internationalization support

PostgreSQL Database: Stores all repository data including:

Item metadata
Bitstream metadata and mappings
Community and collection structure
User accounts and permissions
Workflow state
Handle assignments
Statistics and usage data

Assetstore: File system storage for digital objects:

Original uploaded files (bitstreams)
Generated thumbnails and derivatives
Preserved formats
Organized by internal ID for efficient access
Supports multiple assetstore locations

Solr Search Index: Apache Solr provides:

Full-text search across metadata and content
Faceted browsing by date, author, subject
Statistics and reporting
Authority control for names and subjects
Configurable relevance ranking

Handle Server (Optional): Persistent identifier system:

Assigns permanent URLs to items
Ensures long-term accessibility
Integrates with global Handle system
Supports custom handle prefixes

Content Model

Communities: Top-level organizational units

Represent departments, research centers, or topic areas
Can contain sub-communities
Define authorization policies
Support custom logos and descriptions

Collections: Groupings of related items

Belong to communities
Define metadata schemas
Configure submission workflows
Set access policies and embargo rules
Support collection-specific branding

Items: Individual repository objects

Consist of metadata and bitstreams
Immutable after archiving (versioning supported)
Assigned unique handles
Support relationships to other items
Can be versioned for updates

Bitstreams: Actual files attached to items

Original submission files
License agreements
Generated thumbnails
Format identification via PRONOM
Checksum verification for integrity

Workflow System

Submission Workflows: Configurable multi-step process

Describe item with metadata
Upload files
Verify submission
License agreement
Optional review steps
Automated notifications

Review Workflows: Optional curation process

Reviewers assigned by collection
Approve, reject, or request changes
Track submission history
Email notifications
Batch processing capabilities

Authentication and Authorization

Authentication Options:

Local database authentication
LDAP/Active Directory integration
Shibboleth/SAML single sign-on
ORCID authentication
OAuth providers
IP-based authentication

Authorization Model:

Resource policies define access
Actions: READ, WRITE, ADD, REMOVE, ADMIN
Inherited from communities/collections
Item-level overrides supported
Time-based embargoes
Group-based permissions

Preservation Features

Format Identification: Automatic format detection using PRONOM registry

Checksum Verification: Regular integrity checks on stored files

Metadata Preservation: Support for Dublin Core and custom schemas

Export Capabilities: AIP, METS, DSpace Intermediary Format

Version Control: Track changes to items over time

Installation and Setup

Step 1: Create the Dockerfile

Create a Dockerfile in your project root:

FROM dspace/dspace:7.6

# Set environment variables
ENV DSPACE_INSTALL_DIR=/dspace \
    DSPACE_CFG=/dspace/config/local.cfg \
    CATALINA_HOME=/usr/local/tomcat \
    JAVA_OPTS="-Xmx2048m -XX:MaxMetaspaceSize=512m"

# Install additional utilities
USER root
RUN apt-get update && apt-get install -y \
    curl \
    vim \
    wget \
    postgresql-client \
    && rm -rf /var/lib/apt/lists/*

# Create necessary directories
RUN mkdir -p /dspace/assetstore \
    /dspace/solr \
    /dspace/upload \
    /dspace/reports \
    /dspace/log

# Set proper permissions
RUN chown -R dspace:dspace /dspace

# Copy custom configuration
COPY local.cfg /dspace/config/local.cfg
COPY dspace.cfg /dspace/config/dspace.cfg

# Switch back to dspace user
USER dspace

# Expose ports
EXPOSE 8080

# Health check
HEALTHCHECK --interval=30s --timeout=10s --start-period=120s --retries=3 \
    CMD curl -f http://localhost:8080/server/api || exit 1

# Start DSpace
CMD ["catalina.sh", "run"]

Step 2: Create DSpace Configuration

Create local.cfg for environment-specific settings:

# Database Configuration
db.url = jdbc:postgresql://postgres-host.klutch.sh:8000/dspace
db.username = dspace
db.password = your_secure_password
db.schema = public

# DSpace Installation Directory
dspace.dir = /dspace

# DSpace Server Configuration
dspace.server.url = https://your-app.klutch.sh
dspace.ui.url = https://your-app.klutch.sh
dspace.name = My Institution Repository

# Assetstore Configuration
assetstore.dir = ${dspace.dir}/assetstore
assetstore.incoming = 0

# Solr Configuration
solr.server = http://localhost:8983/solr

# Handle Configuration
handle.canonical.prefix = ${dspace.server.url}/handle/
handle.prefix = 123456789

# Email Configuration
mail.server = smtp.gmail.com
mail.server.port = 587
mail.server.username = your-email@gmail.com
mail.server.password = your-app-password
mail.from.address = noreply@your-institution.edu
mail.feedback.recipient = repository@your-institution.edu
mail.admin = admin@your-institution.edu
mail.server.disabled = false
mail.extraproperties = mail.smtp.auth=true, \
                       mail.smtp.starttls.enable=true

# Authentication Configuration
plugin.sequence.org.dspace.authenticate.AuthenticationMethod = org.dspace.authenticate.PasswordAuthentication

# Authorization Configuration
webui.strengths.show = true

# Upload Settings
upload.max = 536870912
upload.temp.dir = ${dspace.dir}/upload

# Thumbnail Settings
thumbnail.maxwidth = 300
thumbnail.maxheight = 300

# Batch Import/Export
dspace-api.content.export.download.dir = ${dspace.dir}/exports

# Statistics
usage-statistics.dbfile = ${dspace.dir}/log/dspace-stats.db
solr-statistics.server = ${solr.server}/statistics

# Google Analytics (optional)
google.analytics.key =

# Curation System
curate.ui.taskqueue.dir = ${dspace.dir}/ctqueues
plugin.named.org.dspace.curate.CurationTask = \
    org.dspace.ctask.general.ProfileFormats = profileformats, \
    org.dspace.ctask.general.RequiredMetadata = requiredmetadata

# OAI-PMH Configuration
oai.solr.url = ${solr.server}/oai
oai.identifier.prefix = oai:${dspace.server.url}:

# Content Bitstream Store
default.bitstream.store = 0
store.number = 0
store.dir = ${assetstore.dir}

# Media Filter Configuration
filter.plugins = \
    PDF Text Extractor, \
    HTML Text Extractor, \
    Word Text Extractor, \
    JPEG Thumbnail, \
    PDF Thumbnail

# Submission Configuration
submission.lookup.scopus.apikey =
submission.lookup.crossref.email =

# SWORD Configuration (optional)
sword-server.on-behalf-of.enable = true
sword-server.workflowdefault = false

# Logging
log.init.config = ${dspace.dir}/config/log4j2.xml
log.dir = ${dspace.dir}/log

# Iiif Configuration (optional)
iiif.enabled = false

# Enable Identifiers (DOI, Handle, etc.)
identifier.doi.prefix = 10.5072
identifier.doi.namespaceseparator = /

Create dspace.cfg for additional configuration:

# Additional DSpace Configuration

# File Upload Settings
webui.submit.upload.required = true
webui.submit.upload.resume = true

# Search Configuration
search.index.1 = author:dc.contributor.*
search.index.2 = author:dc.creator
search.index.3 = title:dc.title
search.index.4 = keyword:dc.subject
search.index.5 = abstract:dc.description.abstract
search.index.6 = series:dc.relation.ispartofseries
search.index.7 = sponsor:dc.description.sponsorship
search.index.8 = identifier:dc.identifier.*
search.index.9 = language:dc.language.iso

# Browse Indexes
webui.browse.index.1 = dateissued:item:dateissued
webui.browse.index.2 = author:metadata:dc.contributor.*,dc.creator:text
webui.browse.index.3 = title:item:title
webui.browse.index.4 = subject:metadata:dc.subject.*:text

# Item Display
webui.itemdisplay.default = dc.title, dc.title.alternative, dc.contributor.*, \
    dc.subject, dc.date.issued(date), dc.publisher, dc.identifier.citation, \
    dc.relation.ispartofseries, dc.description.abstract, dc.description, \
    dc.identifier.govdoc, dc.identifier.uri(link), dc.identifier.isbn, \
    dc.identifier.issn, dc.identifier.ismn, dc.language.iso(language), \
    dc.type

# Metadata Registry
dublin.core.types = dc

# Workflow Settings
workflow.reviewer.notify = true
workflow.admin.notify = true

# Statistics Configuration
usage-statistics.authorization.admin.usage = true

# Format Support Registry
webui.submit.upload.maxsize = 512000000

Step 3: Create Database Initialization Script

Create init-dspace-db.sh:

#!/bin/bash
set -e

echo "Waiting for PostgreSQL to be ready..."
until PGPASSWORD=$DB_PASSWORD psql -h "$DB_HOST" -p "$DB_PORT" -U "$DB_USER" -c '\q' 2>/dev/null; do
  echo "PostgreSQL not ready, waiting..."
  sleep 2
done

echo "PostgreSQL is ready!"

# Create DSpace database and user
PGPASSWORD=$DB_ROOT_PASSWORD psql -h "$DB_HOST" -p "$DB_PORT" -U postgres <<EOF
SELECT 'CREATE DATABASE dspace' WHERE NOT EXISTS (SELECT FROM pg_database WHERE datname = 'dspace')\gexec
SELECT 'CREATE USER dspace WITH PASSWORD ''$DB_PASSWORD''' WHERE NOT EXISTS (SELECT FROM pg_user WHERE usename = 'dspace')\gexec
GRANT ALL PRIVILEGES ON DATABASE dspace TO dspace;
EOF

echo "Database initialized!"

# Check if schema needs initialization
TABLE_COUNT=$(PGPASSWORD=$DB_PASSWORD psql -h "$DB_HOST" -p "$DB_PORT" -U "$DB_USER" -d dspace -t -c "SELECT COUNT(*) FROM information_schema.tables WHERE table_schema='public';" 2>/dev/null || echo "0")

if [ "$TABLE_COUNT" -eq 0 ]; then
    echo "Initializing DSpace database schema..."
    cd /dspace
    ./bin/dspace database migrate
    echo "Database schema initialized!"

    echo "Creating initial administrator..."
    ./bin/dspace create-administrator -e admin@example.com -f Admin -l User -p admin -c en
    echo "Administrator created (change password immediately!)"
else
    echo "Database already initialized"
fi

Step 4: Create Solr Configuration

Create Dockerfile.solr for Solr search:

FROM solr:8.11

# Set environment variables
ENV SOLR_HOME=/var/solr/data

USER root

# Copy DSpace Solr cores
RUN mkdir -p /opt/solr/server/solr/configsets/dspace/conf
COPY --chown=solr:solr solr-cores/ /opt/solr/server/solr/configsets/dspace/

USER solr

# Create DSpace cores
RUN solr start -force && \
    solr create_core -c search -d /opt/solr/server/solr/configsets/dspace && \
    solr create_core -c statistics -d /opt/solr/server/solr/configsets/dspace && \
    solr create_core -c oai -d /opt/solr/server/solr/configsets/dspace && \
    solr stop

EXPOSE 8983

CMD ["solr-foreground"]

Step 5: Create Docker Compose for Local Development

Create docker-compose.yml:

version: '3.8'

services:
  postgres:
    image: postgres:13
    environment:
      POSTGRES_DB: dspace
      POSTGRES_USER: dspace
      POSTGRES_PASSWORD: dspace
      PGDATA: /var/lib/postgresql/data/pgdata
    volumes:
      - postgres-data:/var/lib/postgresql/data
    ports:
      - "5432:5432"
    restart: unless-stopped

  solr:
    build:
      context: .
      dockerfile: Dockerfile.solr
    volumes:
      - solr-data:/var/solr/data
    ports:
      - "8983:8983"
    restart: unless-stopped

  dspace:
    build:
      context: .
      dockerfile: Dockerfile
    depends_on:
      - postgres
      - solr
    environment:
      DSPACE_INSTALL_DIR: /dspace
      DB_HOST: postgres
      DB_PORT: 5432
      DB_USER: dspace
      DB_PASSWORD: dspace
      DB_ROOT_PASSWORD: postgres
      SOLR_HOST: solr
    volumes:
      - dspace-assetstore:/dspace/assetstore
      - dspace-logs:/dspace/log
      - dspace-exports:/dspace/exports
    ports:
      - "8080:8080"
    restart: unless-stopped

volumes:
  postgres-data:
  solr-data:
  dspace-assetstore:
  dspace-logs:
  dspace-exports:

Step 6: Create Management Scripts

Create maintenance.sh for common tasks:

#!/bin/bash

DSPACE_DIR="/dspace"

case "$1" in
    reindex)
        echo "Reindexing DSpace..."
        $DSPACE_DIR/bin/dspace index-discovery
        ;;
    filter-media)
        echo "Running media filter..."
        $DSPACE_DIR/bin/dspace filter-media
        ;;
    cleanup)
        echo "Cleaning up old searches and deleted items..."
        $DSPACE_DIR/bin/dspace cleanup
        ;;
    stats)
        echo "Generating statistics..."
        $DSPACE_DIR/bin/dspace stats-util -i
        ;;
    backup)
        BACKUP_DIR="/backups/dspace_$(date +%Y%m%d_%H%M%S)"
        mkdir -p $BACKUP_DIR
        echo "Backing up assetstore..."
        tar -czf $BACKUP_DIR/assetstore.tar.gz $DSPACE_DIR/assetstore
        echo "Backing up database..."
        pg_dump -h $DB_HOST -U $DB_USER dspace > $BACKUP_DIR/dspace.sql
        echo "Backup complete: $BACKUP_DIR"
        ;;
    checksum)
        echo "Verifying checksums..."
        $DSPACE_DIR/bin/dspace checker -l -p
        ;;
    *)
        echo "Usage: $0 {reindex|filter-media|cleanup|stats|backup|checksum}"
        exit 1
        ;;
esac

Step 7: Initialize Git Repository

git init
git add Dockerfile Dockerfile.solr local.cfg dspace.cfg init-dspace-db.sh docker-compose.yml maintenance.sh
git commit -m "Initial DSpace deployment configuration"

Step 8: Test Locally

Before deploying to Klutch.sh, test locally:

# Build and start containers
docker-compose up -d

# Wait for services to start (may take 5-10 minutes)
docker-compose logs -f dspace

# Initialize database
docker-compose exec dspace bash /dspace/init-dspace-db.sh

# Access DSpace at http://localhost:8080
# Default admin: admin@example.com / admin

Deploying to Klutch.sh

Step 1: Deploy PostgreSQL Database

First, deploy a PostgreSQL instance:

Navigate to klutch.sh/app
Click "New Project"
Select PostgreSQL or use a custom Dockerfile
Configure database: - Database name: `dspace` - Username: `dspace` - Password: Create a secure password - Root password: Create a secure root password
Select **TCP** as traffic type
Set internal port to **5432**
Add persistent storage with mount path: `/var/lib/postgresql/data` and size: `50GB`
Note the connection details (hostname like `postgres-app.klutch.sh:8000`)

Step 2: Deploy Solr Search Engine

Deploy Solr for search functionality:

Create a new project in Klutch.sh
Push your Solr Dockerfile to a GitHub repository
Import the repository to Klutch.sh
Configure Solr: - Select **HTTP** as traffic type - Set internal port to **8983** - Add persistent storage with mount path: `/var/solr/data` and size: `20GB`
Note the Solr URL (like `https://solr-app.klutch.sh`)

Step 3: Push DSpace Repository to GitHub

Create a new repository and push:

git remote add origin https://github.com/yourusername/dspace-klutch.git
git branch -M master
git push -u origin master

Step 4: Deploy DSpace to Klutch.sh

Navigate to klutch.sh/app
Click "New Project" and select "Import from GitHub"
Authorize Klutch.sh to access your GitHub repositories
Select your DSpace repository
Klutch.sh will automatically detect the Dockerfile

Step 5: Configure DSpace Traffic Settings

In the project settings, select **HTTP** as the traffic type
Set the internal port to **8080**
Klutch.sh will automatically provision an HTTPS endpoint

Step 6: Add Persistent Storage for DSpace

DSpace requires persistent storage for content and logs:

In your project settings, navigate to the "Storage" section
Add a volume with mount path: `/dspace/assetstore` and size: `100GB` (for digital objects)
Add a volume with mount path: `/dspace/log` and size: `10GB` (for logs)
Add a volume with mount path: `/dspace/exports` and size: `20GB` (for batch exports)

Storage recommendations:

Small repository (< 10,000 items): 50GB assetstore
Medium repository (10,000-100,000 items): 200GB assetstore
Large repository (100,000+ items): 500GB+ assetstore

Step 7: Configure DSpace Environment Variables

Add the following environment variables in Klutch.sh dashboard:

DB_HOST: Your PostgreSQL hostname (e.g., postgres-app.klutch.sh)
DB_PORT: 8000 (external TCP port)
DB_USER: dspace
DB_PASSWORD: Your database password
DB_ROOT_PASSWORD: Your root password
SOLR_HOST: Your Solr hostname (e.g., solr-app.klutch.sh)
DSPACE_INSTALL_DIR: /dspace
JAVA_OPTS: -Xmx4096m -XX:MaxMetaspaceSize=1024m (adjust based on resources)

Step 8: Update Configuration Files

Before deploying, update local.cfg with your actual URLs:

# Update these values
dspace.server.url = https://your-app.klutch.sh
dspace.ui.url = https://your-app.klutch.sh
db.url = jdbc:postgresql://postgres-app.klutch.sh:8000/dspace
solr.server = https://solr-app.klutch.sh/solr

Commit and push changes:

git add local.cfg
git commit -m "Update configuration for Klutch.sh deployment"
git push

Step 9: Deploy DSpace

Review your configuration settings in Klutch.sh
Click "Deploy" to start the deployment
Monitor build logs for any errors
Wait for initialization (first deployment takes 10-15 minutes)
Once deployed, DSpace will be available at `your-app.klutch.sh`

Step 10: Initialize Database and Create Admin User

After first deployment, initialize the database:

Access the DSpace container terminal in Klutch.sh
Run the initialization script: ```bash bash /dspace/init-dspace-db.sh ```
Create administrator account: ```bash /dspace/bin/dspace create-administrator -e admin@yourinstitution.edu -f Admin -l User -p your_secure_password -c en ```

Getting Started with DSpace

Initial Setup

After deployment, complete the initial configuration:

Access DSpace: Navigate to https://your-app.klutch.sh
Login as Administrator:
- Click “Log In” in the top navigation
- Enter administrator credentials
- You’ll be redirected to the admin dashboard
Configure Basic Settings:
- Navigate to “Administration” → “Edit Configuration”
- Update repository name and description
- Set contact information
- Configure institutional logo
Set Up Email Notifications:
- Verify SMTP settings in local.cfg
- Test email delivery
- Configure notification templates

Creating Community Structure

Organize your repository with communities and collections:

Navigate to "Administration" → "Communities & Collections"
Click "Create Top-Level Community"
Configure community: - **Name**: "School of Engineering" - **Short Description**: Brief overview - **Introductory Text**: Detailed description - **Copyright Text**: Rights statement - **Logo**: Upload community logo (optional)
Click "Create" to save community
Within the community, create sub-communities or collections

Creating Collections

Add collections to organize items:

Navigate to a community
Click "Create Collection"
Configure collection: - **Name**: "Department of Computer Science Theses" - **Short Description**: Collection overview - **Introductory Text**: Detailed information - **License**: Default submission license - **Provenance**: Provenance statement
Configure submission settings: - Select input forms - Define workflow steps - Set access policies
Click "Create" to save collection

Configuring Metadata Schemas

Customize metadata fields for your institution:

Navigate to "Administration" → "Registries" → "Metadata Registry"
View existing Dublin Core fields
Add custom fields: - Click "Add Field" - Select schema (dc, dcterms, or custom) - Enter element name (e.g., "department") - Enter qualifier (optional, e.g., "sponsor") - Add scope note for guidance
Save custom field

Common custom fields:

dc.contributor.advisor - Thesis advisor
dc.degree.name - Degree name (e.g., Ph.D.)
dc.degree.level - Degree level (Masters, Doctoral)
dc.degree.discipline - Field of study
dc.degree.grantor - Degree-granting institution
dc.identifier.doi - Digital Object Identifier
dc.identifier.orcid - ORCID iD

Configuring Submission Forms

Customize input forms for metadata collection:

Edit `/dspace/config/submission-forms.xml`
Define form fields for each collection type
Example thesis submission form:

<form name="thesis">
  <page number="1">
    <field>
      <dc-schema>dc</dc-schema>
      <dc-element>title</dc-element>
      <dc-qualifier></dc-qualifier>
      <required>true</required>
      <label>Title</label>
      <input-type>onebox</input-type>
      <hint>Enter the full title of your thesis</hint>
    </field>

    <field>
      <dc-schema>dc</dc-schema>
      <dc-element>contributor</dc-element>
      <dc-qualifier>author</dc-qualifier>
      <required>true</required>
      <label>Author</label>
      <input-type>name</input-type>
      <hint>Enter the author's name</hint>
    </field>

    <field>
      <dc-schema>dc</dc-schema>
      <dc-element>contributor</dc-element>
      <dc-qualifier>advisor</dc-qualifier>
      <required>true</required>
      <label>Thesis Advisor</label>
      <input-type>name</input-type>
      <hint>Enter advisor's name</hint>
    </field>

    <field>
      <dc-schema>dc</dc-schema>
      <dc-element>date</dc-element>
      <dc-qualifier>issued</dc-qualifier>
      <required>true</required>
      <label>Date of Issue</label>
      <input-type>date</input-type>
      <hint>Enter publication date</hint>
    </field>

    <field>
      <dc-schema>dc</dc-schema>
      <dc-element>subject</dc-element>
      <dc-qualifier></dc-qualifier>
      <required>false</required>
      <label>Keywords</label>
      <input-type>twobox</input-type>
      <hint>Enter keywords, one per line</hint>
      <repeatable>true</repeatable>
    </field>

    <field>
      <dc-schema>dc</dc-schema>
      <dc-element>description</dc-element>
      <dc-qualifier>abstract</dc-qualifier>
      <required>true</required>
      <label>Abstract</label>
      <input-type>textarea</input-type>
      <hint>Enter thesis abstract</hint>
    </field>
  </page>
</form>

Submitting Your First Item

Test the submission workflow:

Navigate to a collection
Click "Submit to this Collection"
Follow the submission steps: - **Select Type**: Choose item type (Article, Thesis, etc.) - **Describe**: Enter metadata - **Upload**: Add files (PDF, datasets, etc.) - **Verify**: Review submission - **License**: Accept distribution license - **Complete**: Submit for review or archive
If workflow is enabled, submission goes to reviewers
Otherwise, item is archived immediately

Configuring Workflows

Set up review processes for quality control:

Edit `/dspace/config/workflow.xml`
Define workflow steps for collections:

<workflow id="defaultWorkflow">
  <step id="reviewstep">
    <role name="reviewer">
      <description>Reviewers for this step</description>
    </role>
    <actions>
      <action id="approve">
        <description>Approve submission</description>
        <outcomes>
          <outcome id="approved">
            <step>editstep</step>
          </outcome>
        </outcomes>
      </action>
      <action id="reject">
        <description>Reject submission</description>
        <outcomes>
          <outcome id="rejected">
            <step>reject</step>
          </outcome>
        </outcomes>
      </action>
    </actions>
  </step>

  <step id="editstep">
    <role name="editor">
      <description>Editors for this step</description>
    </role>
    <actions>
      <action id="approve">
        <description>Final approval</description>
        <outcomes>
          <outcome id="approved">
            <step>archive</step>
          </outcome>
        </outcomes>
      </action>
    </actions>
  </step>
</workflow>

Managing User Groups

Create groups for access control:

Navigate to "Administration" → "Access Control" → "Groups"
Click "Create New Group"
Configure group: - **Name**: "Faculty Submitters" - **Description**: Group description
Add members: - Search for users - Add to group
Assign permissions to collections

Setting Access Policies

Control who can view and submit content:

Navigate to collection settings
Click "Authorization" tab
Add policies: - **Action**: READ (view items) - **Group**: Anonymous or specific group - **Start Date**: Optional embargo start - **End Date**: Optional embargo end
Add submission policies: - **Action**: ADD (submit items) - **Group**: Faculty Submitters

Configuring Embargo Periods

Set up temporary access restrictions:

During submission, select "Access Conditions"
Choose embargo type: - **Open Access**: Immediate public access - **Embargo**: Restricted until date - **Restricted**: Limited group access
Set embargo date
System automatically lifts embargo when date passes

Production Best Practices

Performance Optimization

Java Memory Configuration: Adjust heap size based on repository size:

# For repositories with < 50,000 items
JAVA_OPTS="-Xmx4g -XX:MaxMetaspaceSize=1g"

# For repositories with 50,000-200,000 items
JAVA_OPTS="-Xmx8g -XX:MaxMetaspaceSize=2g"

# For repositories with 200,000+ items
JAVA_OPTS="-Xmx16g -XX:MaxMetaspaceSize=4g"

Database Connection Pooling: Optimize PostgreSQL connections:

# In local.cfg
db.maxconnections = 30
db.maxwait = 5000
db.maxidle = 10

Solr Memory: Increase Solr heap for large indexes:

# In Dockerfile.solr
ENV SOLR_JAVA_MEM="-Xms2g -Xmx4g"

Assetstore Organization: Use multiple assetstores for better I/O:

# In local.cfg
assetstore.dir = ${dspace.dir}/assetstore
assetstore.dir.1 = ${dspace.dir}/assetstore2
assetstore.incoming = 1

Enable Caching: Configure content caching:

# Enable caching
cache.enabled = true
cache.size = 100
cache.name = org.dspace.content

Security Hardening

Change Default Credentials: Immediately change admin password:

/dspace/bin/dspace user --modify --email admin@example.com --password new_secure_password

Secure Database Connection: Use SSL for database:

db.url = jdbc:postgresql://postgres-host:8000/dspace?ssl=true&sslfactory=org.postgresql.ssl.NonValidatingFactory

Configure CORS: Restrict API access:

# In local.cfg
cors.allowed-origins = https://your-app.klutch.sh
cors.allowed-methods = GET, POST, PUT, DELETE, OPTIONS
cors.allowed-headers = *

Implement Rate Limiting: Prevent API abuse:

# Limit REST API requests
rest.ratelimit.enabled = true
rest.ratelimit.limit = 100
rest.ratelimit.period = 60

Enable Audit Logging: Track administrative actions:

audit.enabled = true
audit.log = ${dspace.dir}/log/audit.log

Backup Strategy

Implement comprehensive backups:

Database Backup: Daily PostgreSQL dumps:

#!/bin/bash
BACKUP_DIR="/backups/dspace_$(date +%Y%m%d_%H%M%S)"
mkdir -p $BACKUP_DIR

# Backup database
pg_dump -h $DB_HOST -p $DB_PORT -U $DB_USER -F c dspace > $BACKUP_DIR/dspace_db.backup

# Compress backup
gzip $BACKUP_DIR/dspace_db.backup

# Keep only last 30 days
find /backups -name "dspace_*.backup.gz" -mtime +30 -delete

echo "Database backup complete: $BACKUP_DIR"

Assetstore Backup: Incremental file backups:

#!/bin/bash
BACKUP_DIR="/backups/assetstore_$(date +%Y%m%d)"
ASSETSTORE="/dspace/assetstore"

# Incremental backup using rsync
rsync -av --delete $ASSETSTORE/ $BACKUP_DIR/

echo "Assetstore backup complete: $BACKUP_DIR"

Configuration Backup: Version control for configs:

# Backup configuration files
tar -czf config_backup_$(date +%Y%m%d).tar.gz /dspace/config/

Automated Backup Schedule:

# Add to crontab
0 2 * * * /usr/local/bin/backup-dspace-db.sh
0 3 * * 0 /usr/local/bin/backup-assetstore.sh

Maintenance Tasks

Schedule regular maintenance:

Reindex Search: Weekly search index rebuild:

# Rebuild discovery index
/dspace/bin/dspace index-discovery -b

# Or incremental update
/dspace/bin/dspace index-discovery

Media Filter: Generate thumbnails and extract text:

# Process new items
/dspace/bin/dspace filter-media

# Force reprocess all items
/dspace/bin/dspace filter-media -f

Cleanup Tasks: Remove orphaned data:

# Clean up deleted items, searches, etc.
/dspace/bin/dspace cleanup

Statistics Processing: Generate usage statistics:

# Process statistics
/dspace/bin/dspace stats-util -i

# Generate reports
/dspace/bin/dspace stats-util -r

Checksum Verification: Verify bitstream integrity:

# Check all bitstreams
/dspace/bin/dspace checker -l -p

# Check and report results
/dspace/bin/dspace checker-emailer

Automated Maintenance Cron:

# Add to crontab
0 1 * * * /dspace/bin/dspace cleanup
0 2 * * 0 /dspace/bin/dspace index-discovery -b
0 3 * * * /dspace/bin/dspace filter-media
0 4 * * 0 /dspace/bin/dspace checker -l

Monitoring and Logging

Enable Detailed Logging:

<!-- In log4j2.xml -->
<Configuration>
  <Appenders>
    <RollingFile name="DSpaceLog" fileName="${dspace.dir}/log/dspace.log"
                 filePattern="${dspace.dir}/log/dspace-%d{yyyy-MM-dd}.log">
      <PatternLayout>
        <Pattern>%d{ISO8601} %-5p %c @ %m%n</Pattern>
      </PatternLayout>
      <Policies>
        <TimeBasedTriggeringPolicy />
      </Policies>
      <DefaultRolloverStrategy max="30"/>
    </RollingFile>
  </Appenders>

  <Loggers>
    <Root level="INFO">
      <AppenderRef ref="DSpaceLog"/>
    </Root>
  </Loggers>
</Configuration>

Monitor Database Performance:

-- Active connections
SELECT count(*) FROM pg_stat_activity WHERE datname = 'dspace';

-- Long-running queries
SELECT pid, now() - pg_stat_activity.query_start AS duration, query
FROM pg_stat_activity
WHERE pg_stat_activity.query != '<IDLE>'
  AND now() - pg_stat_activity.query_start > interval '5 minutes';

-- Database size
SELECT pg_size_pretty(pg_database_size('dspace'));

Health Check Script:

#!/bin/bash
DSPACE_URL="https://your-app.klutch.sh"

# Check DSpace is responding
if ! curl -f -s -o /dev/null "$DSPACE_URL/server/api"; then
    echo "ERROR: DSpace not responding"
    exit 1
fi

# Check Solr
if ! curl -f -s -o /dev/null "http://localhost:8983/solr/search/admin/ping"; then
    echo "ERROR: Solr not responding"
    exit 1
fi

# Check disk space
DISK_USAGE=$(df -h /dspace/assetstore | awk 'NR==2 {print $5}' | sed 's/%//')
if [ "$DISK_USAGE" -gt 85 ]; then
    echo "WARNING: Disk usage at ${DISK_USAGE}%"
fi

echo "OK: DSpace is healthy"
exit 0

Resource Allocation

Recommended resources by repository size:

Small Repository (< 10,000 items):

DSpace: 4GB RAM, 2 vCPU
PostgreSQL: 2GB RAM, 2 vCPU
Solr: 2GB RAM, 1 vCPU
Storage: 100GB assetstore, 20GB database, 10GB Solr

Medium Repository (10,000-100,000 items):

DSpace: 8GB RAM, 4 vCPU
PostgreSQL: 4GB RAM, 2 vCPU
Solr: 4GB RAM, 2 vCPU
Storage: 500GB assetstore, 50GB database, 50GB Solr

Large Repository (100,000+ items):

DSpace: 16GB RAM, 8 vCPU
PostgreSQL: 8GB RAM, 4 vCPU
Solr: 8GB RAM, 4 vCPU
Storage: 2TB+ assetstore, 200GB database, 100GB Solr

Troubleshooting

Submission Upload Fails

Symptoms: File uploads fail during submission

Solutions:

Check File Size Limit:

# In local.cfg
upload.max = 536870912  # 512MB in bytes

Verify Disk Space:

df -h /dspace/assetstore
df -h /dspace/upload

Check Permissions:

chown -R dspace:dspace /dspace/assetstore
chown -R dspace:dspace /dspace/upload

Increase Tomcat Timeout:

<!-- In server.xml -->
<Connector port="8080" connectionTimeout="60000" />

Search Not Returning Results

Symptoms: Search returns no results or incomplete results

Solutions:

Rebuild Search Index:

/dspace/bin/dspace index-discovery -b

Check Solr Status:

curl http://localhost:8983/solr/search/admin/ping

Verify Solr Configuration:

# Check Solr URL in local.cfg
solr.server = http://solr-host:8983/solr

Check Solr Logs:

tail -f /var/solr/logs/solr.log

Database Connection Errors

Symptoms: “Unable to connect to database” errors

Solutions:

Verify Database is Running:

psql -h $DB_HOST -p $DB_PORT -U $DB_USER -d dspace -c "SELECT 1"

Check Connection String:

# In local.cfg
db.url = jdbc:postgresql://postgres-host:8000/dspace
db.username = dspace
db.password = your_password

Test Network Connectivity:

telnet $DB_HOST 8000

Increase Connection Pool:

db.maxconnections = 50
db.maxwait = 10000

Items Not Displaying Properly

Symptoms: Item pages show errors or missing metadata

Solutions:

Check Item Permissions:

/dspace/bin/dspace dsrun org.dspace.administer.ItemPolicies <item-id>

Verify Metadata:

/dspace/bin/dspace metadata-export -i <item-id>

Clear Cache:

rm -rf /dspace/var/cache/*

Check Logs:

tail -f /dspace/log/dspace.log

Embargo Not Working

Symptoms: Embargoed items are publicly accessible

Solutions:

Check Resource Policies:

/dspace/bin/dspace dsrun org.dspace.embargo.EmbargoManager

Verify Embargo Configuration:

# In local.cfg
embargo.field.terms = dc.embargo.terms
embargo.field.lift = dc.date.available

Manually Set Embargo:

/dspace/bin/dspace embargo-setter -i <item-id> -d YYYY-MM-DD

OAI-PMH Harvesting Issues

Symptoms: External systems cannot harvest metadata

Solutions:

Test OAI-PMH Endpoint:

curl "https://your-app.klutch.sh/server/oai?verb=Identify"

Rebuild OAI Index:

/dspace/bin/dspace oai import
/dspace/bin/dspace oai clean-cache

Check OAI Configuration:

oai.solr.url = ${solr.server}/oai
oai.identifier.prefix = oai:your-app.klutch.sh:

Advanced Configuration

Custom Themes

Customize the DSpace user interface:

Create Custom Theme:

# Copy base theme
cp -r /dspace/webapps/xmlui/themes/Mirage2 /dspace/webapps/xmlui/themes/CustomTheme

Modify Theme Configuration:

<!-- In theme.xml -->
<theme name="CustomTheme" path="CustomTheme/">
  <conffile>sitemap.xmap</conffile>
</theme>

Customize Styles:

/* In style.css */
.header {
  background-color: #003366;
  color: white;
}

.logo {
  max-width: 200px;
}

.sidebar {
  background-color: #f5f5f5;
}

Update Logo and Branding:

# Replace logo
cp your-logo.png /dspace/webapps/xmlui/themes/CustomTheme/images/logo.png

DOI Integration

Enable DOI minting for persistent identifiers:

Configure DataCite or Crossref:

# In local.cfg
identifier.doi.prefix = 10.1234
identifier.doi.namespaceseparator = /
identifier.doi.user = your-datacite-username
identifier.doi.password = your-datacite-password
identifier.doi.datacentre = your-datacentre

Enable DOI Plugin:

plugin.sequence.org.dspace.identifier.IdentifierProvider = \
    org.dspace.identifier.DOIIdentifierProvider

Mint DOIs:

# Mint DOI for item
/dspace/bin/dspace doi-organiser -r

ORCID Integration

Connect authors with ORCID profiles:

Register for ORCID API Access:
- Get client ID and secret from ORCID.org
Configure ORCID:

# In local.cfg
orcid.application-client-id = your-client-id
orcid.application-client-secret = your-client-secret
orcid.domain-url = https://orcid.org

Enable ORCID Authentication:

plugin.sequence.org.dspace.authenticate.AuthenticationMethod = \
    org.dspace.authenticate.OAuthAuthentication

Batch Import/Export

Import large collections of items:

Prepare Import Package:

import_package/
├── collections/
├── simple-archive/
│   ├── item_001/
│   │   ├── contents
│   │   ├── dublin_core.xml
│   │   ├── file1.pdf
│   └── item_002/
│       ├── contents
│       ├── dublin_core.xml
│       ├── file2.pdf

Create dublin_core.xml:

<?xml version="1.0" encoding="UTF-8"?>
<dublin_core schema="dc">
  <dcvalue element="title" qualifier="">Sample Article Title</dcvalue>
  <dcvalue element="contributor" qualifier="author">Smith, John</dcvalue>
  <dcvalue element="date" qualifier="issued">2024-01-15</dcvalue>
  <dcvalue element="description" qualifier="abstract">This is the abstract...</dcvalue>
  <dcvalue element="subject">Computer Science</dcvalue>
  <dcvalue element="type">Article</dcvalue>
</dublin_core>

Create contents file:

file1.pdf bundle:ORIGINAL

Import Items:

/dspace/bin/dspace import -a -e admin@example.com -c 123456789/2 -s /import_package/simple-archive -m mapfile.txt

Export Items:

# Export collection
/dspace/bin/dspace export -t COLLECTION -i 123456789/2 -d /exports/collection -n 1

REST API Usage

Interact with DSpace programmatically:

Authenticate:

# Get JWT token
TOKEN=$(curl -X POST "https://your-app.klutch.sh/server/api/authn/login" \
  -H "Content-Type: application/x-www-form-urlencoded" \
  -d "user=admin@example.com&password=your_password" \
  | jq -r '.token')

List Communities:

curl "https://your-app.klutch.sh/server/api/core/communities" \
  -H "Authorization: Bearer $TOKEN"

Get Item Metadata:

curl "https://your-app.klutch.sh/server/api/core/items/{item-uuid}" \
  -H "Authorization: Bearer $TOKEN"

Create Item via API:

import requests
import json

api_url = "https://your-app.klutch.sh/server/api"
token = "your-jwt-token"

headers = {
    "Authorization": f"Bearer {token}",
    "Content-Type": "application/json"
}

# Create item
item_data = {
    "name": "Test Item",
    "metadata": {
        "dc.title": [{"value": "Test Title"}],
        "dc.contributor.author": [{"value": "Smith, John"}],
        "dc.date.issued": [{"value": "2024-01-15"}]
    }
}

response = requests.post(
    f"{api_url}/core/items",
    headers=headers,
    json=item_data
)

print(response.json())

Statistics and Reporting

Generate usage statistics:

Enable Statistics:

# In local.cfg
usage-statistics.dbfile = ${dspace.dir}/log/dspace-stats.db
solr-statistics.server = ${solr.server}/statistics

Generate Reports:

# Generate monthly report
/dspace/bin/dspace stats-util -r -m 2024-01

# Export statistics
/dspace/bin/dspace stats-util -e /exports/stats.csv

Integrate Google Analytics:

google.analytics.key = UA-XXXXXXXX-X

Additional Resources

Conclusion

DSpace provides a comprehensive, enterprise-grade solution for institutional repositories and digital asset management. By deploying on Klutch.sh, you benefit from automatic HTTPS, persistent storage, and simple Docker-based deployment while maintaining the robust preservation features and scalability that DSpace offers.

The platform’s proven track record with major universities and research institutions worldwide demonstrates its reliability and maturity. Features like flexible metadata schemas, configurable workflows, OAI-PMH harvesting, DOI integration, and comprehensive API access make DSpace the standard choice for organizations managing scholarly communications and digital collections.

Whether you’re launching a new institutional repository, managing research data, preserving cultural heritage materials, or building a digital library, DSpace scales to meet your needs. The system’s modular architecture allows you to customize every aspect from metadata schemas and submission forms to themes and access policies.

Start with the basic configuration outlined in this guide, then expand functionality through custom metadata fields, workflow customization, API integration, and advanced preservation features as your repository grows. Your digital content remains secure, discoverable, and preserved for the long term, while the open-source nature of DSpace ensures you have complete control over your institutional knowledge base.

Deploy DSpace today and join thousands of institutions worldwide in preserving and sharing digital scholarship for future generations.