Skip to content

Deploying DSpace

DSpace is the world’s leading open-source repository platform for research outputs, publications, datasets, and learning resources. Developed by MIT Libraries and HP Labs, and now maintained by LYRASIS and DuraSpace, DSpace powers digital repositories for thousands of academic institutions, research organizations, government agencies, and cultural heritage institutions worldwide. The platform provides robust tools for ingesting, preserving, indexing, and distributing digital content while ensuring long-term preservation and accessibility.

What makes DSpace particularly powerful is its comprehensive feature set designed for institutional repositories. The system supports complex metadata schemas including Dublin Core, qualified Dublin Core, and custom metadata fields. It handles a wide variety of digital content formats from PDFs and images to datasets and multimedia files. Features like customizable workflows, embargo periods, access controls, OAI-PMH harvesting, DOI minting, and integration with authentication systems make DSpace a complete solution for managing institutional knowledge.

Why Deploy DSpace on Klutch.sh?

Klutch.sh provides an excellent platform for hosting DSpace with several key advantages:

  • Simple Docker Deployment: Deploy your Dockerfile and Klutch.sh automatically handles containerization and orchestration
  • Persistent Storage: Attach volumes for content storage (assetstore), database, and Solr indexes with guaranteed durability
  • Automatic HTTPS: All deployments come with automatic SSL certificates for secure repository access
  • Resource Scalability: Scale CPU and memory resources as your repository grows
  • Database Integration: Run PostgreSQL alongside DSpace with persistent storage
  • Cost-Effective: Pay only for resources used, scale based on repository size
  • Zero Server Management: Focus on curating content, not managing infrastructure

Prerequisites

Before deploying DSpace, ensure you have:

  • A Klutch.sh account (sign up at klutch.sh)
  • Git installed locally
  • Basic understanding of institutional repositories and digital preservation
  • Familiarity with Docker and container concepts
  • PostgreSQL knowledge for database management
  • Understanding of metadata standards (Dublin Core, etc.)
  • Java and Tomcat knowledge (helpful but not required)

Understanding DSpace’s Architecture

DSpace uses a multi-tier architecture designed for scalability and long-term digital preservation:

Core Components

DSpace Backend (REST API): Built with Spring Boot, the backend provides:

  • RESTful API for all repository operations
  • Content ingestion and management
  • Metadata handling and validation
  • Workflow processing
  • Authorization and authentication
  • OAI-PMH data provider
  • Batch import/export functionality

DSpace Frontend (Angular): Modern single-page application offering:

  • User-friendly interface for browsing and searching
  • Submission workflows for depositing content
  • Administrative interfaces
  • Customizable themes and branding
  • Responsive design for mobile devices
  • Internationalization support

PostgreSQL Database: Stores all repository data including:

  • Item metadata
  • Bitstream metadata and mappings
  • Community and collection structure
  • User accounts and permissions
  • Workflow state
  • Handle assignments
  • Statistics and usage data

Assetstore: File system storage for digital objects:

  • Original uploaded files (bitstreams)
  • Generated thumbnails and derivatives
  • Preserved formats
  • Organized by internal ID for efficient access
  • Supports multiple assetstore locations

Solr Search Index: Apache Solr provides:

  • Full-text search across metadata and content
  • Faceted browsing by date, author, subject
  • Statistics and reporting
  • Authority control for names and subjects
  • Configurable relevance ranking

Handle Server (Optional): Persistent identifier system:

  • Assigns permanent URLs to items
  • Ensures long-term accessibility
  • Integrates with global Handle system
  • Supports custom handle prefixes

Content Model

Communities: Top-level organizational units

  • Represent departments, research centers, or topic areas
  • Can contain sub-communities
  • Define authorization policies
  • Support custom logos and descriptions

Collections: Groupings of related items

  • Belong to communities
  • Define metadata schemas
  • Configure submission workflows
  • Set access policies and embargo rules
  • Support collection-specific branding

Items: Individual repository objects

  • Consist of metadata and bitstreams
  • Immutable after archiving (versioning supported)
  • Assigned unique handles
  • Support relationships to other items
  • Can be versioned for updates

Bitstreams: Actual files attached to items

  • Original submission files
  • License agreements
  • Generated thumbnails
  • Format identification via PRONOM
  • Checksum verification for integrity

Workflow System

Submission Workflows: Configurable multi-step process

  • Describe item with metadata
  • Upload files
  • Verify submission
  • License agreement
  • Optional review steps
  • Automated notifications

Review Workflows: Optional curation process

  • Reviewers assigned by collection
  • Approve, reject, or request changes
  • Track submission history
  • Email notifications
  • Batch processing capabilities

Authentication and Authorization

Authentication Options:

  • Local database authentication
  • LDAP/Active Directory integration
  • Shibboleth/SAML single sign-on
  • ORCID authentication
  • OAuth providers
  • IP-based authentication

Authorization Model:

  • Resource policies define access
  • Actions: READ, WRITE, ADD, REMOVE, ADMIN
  • Inherited from communities/collections
  • Item-level overrides supported
  • Time-based embargoes
  • Group-based permissions

Preservation Features

Format Identification: Automatic format detection using PRONOM registry

Checksum Verification: Regular integrity checks on stored files

Metadata Preservation: Support for Dublin Core and custom schemas

Export Capabilities: AIP, METS, DSpace Intermediary Format

Version Control: Track changes to items over time

Installation and Setup

Step 1: Create the Dockerfile

Create a Dockerfile in your project root:

FROM dspace/dspace:7.6
# Set environment variables
ENV DSPACE_INSTALL_DIR=/dspace \
DSPACE_CFG=/dspace/config/local.cfg \
CATALINA_HOME=/usr/local/tomcat \
JAVA_OPTS="-Xmx2048m -XX:MaxMetaspaceSize=512m"
# Install additional utilities
USER root
RUN apt-get update && apt-get install -y \
curl \
vim \
wget \
postgresql-client \
&& rm -rf /var/lib/apt/lists/*
# Create necessary directories
RUN mkdir -p /dspace/assetstore \
/dspace/solr \
/dspace/upload \
/dspace/reports \
/dspace/log
# Set proper permissions
RUN chown -R dspace:dspace /dspace
# Copy custom configuration
COPY local.cfg /dspace/config/local.cfg
COPY dspace.cfg /dspace/config/dspace.cfg
# Switch back to dspace user
USER dspace
# Expose ports
EXPOSE 8080
# Health check
HEALTHCHECK --interval=30s --timeout=10s --start-period=120s --retries=3 \
CMD curl -f http://localhost:8080/server/api || exit 1
# Start DSpace
CMD ["catalina.sh", "run"]

Step 2: Create DSpace Configuration

Create local.cfg for environment-specific settings:

# Database Configuration
db.url = jdbc:postgresql://postgres-host.klutch.sh:8000/dspace
db.username = dspace
db.password = your_secure_password
db.schema = public
# DSpace Installation Directory
dspace.dir = /dspace
# DSpace Server Configuration
dspace.server.url = https://your-app.klutch.sh
dspace.ui.url = https://your-app.klutch.sh
dspace.name = My Institution Repository
# Assetstore Configuration
assetstore.dir = ${dspace.dir}/assetstore
assetstore.incoming = 0
# Solr Configuration
solr.server = http://localhost:8983/solr
# Handle Configuration
handle.canonical.prefix = ${dspace.server.url}/handle/
handle.prefix = 123456789
# Email Configuration
mail.server = smtp.gmail.com
mail.server.port = 587
mail.server.username = your-email@gmail.com
mail.server.password = your-app-password
mail.from.address = noreply@your-institution.edu
mail.feedback.recipient = repository@your-institution.edu
mail.admin = admin@your-institution.edu
mail.server.disabled = false
mail.extraproperties = mail.smtp.auth=true, \
mail.smtp.starttls.enable=true
# Authentication Configuration
plugin.sequence.org.dspace.authenticate.AuthenticationMethod = org.dspace.authenticate.PasswordAuthentication
# Authorization Configuration
webui.strengths.show = true
# Upload Settings
upload.max = 536870912
upload.temp.dir = ${dspace.dir}/upload
# Thumbnail Settings
thumbnail.maxwidth = 300
thumbnail.maxheight = 300
# Batch Import/Export
dspace-api.content.export.download.dir = ${dspace.dir}/exports
# Statistics
usage-statistics.dbfile = ${dspace.dir}/log/dspace-stats.db
solr-statistics.server = ${solr.server}/statistics
# Google Analytics (optional)
google.analytics.key =
# Curation System
curate.ui.taskqueue.dir = ${dspace.dir}/ctqueues
plugin.named.org.dspace.curate.CurationTask = \
org.dspace.ctask.general.ProfileFormats = profileformats, \
org.dspace.ctask.general.RequiredMetadata = requiredmetadata
# OAI-PMH Configuration
oai.solr.url = ${solr.server}/oai
oai.identifier.prefix = oai:${dspace.server.url}:
# Content Bitstream Store
default.bitstream.store = 0
store.number = 0
store.dir = ${assetstore.dir}
# Media Filter Configuration
filter.plugins = \
PDF Text Extractor, \
HTML Text Extractor, \
Word Text Extractor, \
JPEG Thumbnail, \
PDF Thumbnail
# Submission Configuration
submission.lookup.scopus.apikey =
submission.lookup.crossref.email =
# SWORD Configuration (optional)
sword-server.on-behalf-of.enable = true
sword-server.workflowdefault = false
# Logging
log.init.config = ${dspace.dir}/config/log4j2.xml
log.dir = ${dspace.dir}/log
# Iiif Configuration (optional)
iiif.enabled = false
# Enable Identifiers (DOI, Handle, etc.)
identifier.doi.prefix = 10.5072
identifier.doi.namespaceseparator = /

Create dspace.cfg for additional configuration:

# Additional DSpace Configuration
# File Upload Settings
webui.submit.upload.required = true
webui.submit.upload.resume = true
# Search Configuration
search.index.1 = author:dc.contributor.*
search.index.2 = author:dc.creator
search.index.3 = title:dc.title
search.index.4 = keyword:dc.subject
search.index.5 = abstract:dc.description.abstract
search.index.6 = series:dc.relation.ispartofseries
search.index.7 = sponsor:dc.description.sponsorship
search.index.8 = identifier:dc.identifier.*
search.index.9 = language:dc.language.iso
# Browse Indexes
webui.browse.index.1 = dateissued:item:dateissued
webui.browse.index.2 = author:metadata:dc.contributor.*,dc.creator:text
webui.browse.index.3 = title:item:title
webui.browse.index.4 = subject:metadata:dc.subject.*:text
# Item Display
webui.itemdisplay.default = dc.title, dc.title.alternative, dc.contributor.*, \
dc.subject, dc.date.issued(date), dc.publisher, dc.identifier.citation, \
dc.relation.ispartofseries, dc.description.abstract, dc.description, \
dc.identifier.govdoc, dc.identifier.uri(link), dc.identifier.isbn, \
dc.identifier.issn, dc.identifier.ismn, dc.language.iso(language), \
dc.type
# Metadata Registry
dublin.core.types = dc
# Workflow Settings
workflow.reviewer.notify = true
workflow.admin.notify = true
# Statistics Configuration
usage-statistics.authorization.admin.usage = true
# Format Support Registry
webui.submit.upload.maxsize = 512000000

Step 3: Create Database Initialization Script

Create init-dspace-db.sh:

#!/bin/bash
set -e
echo "Waiting for PostgreSQL to be ready..."
until PGPASSWORD=$DB_PASSWORD psql -h "$DB_HOST" -p "$DB_PORT" -U "$DB_USER" -c '\q' 2>/dev/null; do
echo "PostgreSQL not ready, waiting..."
sleep 2
done
echo "PostgreSQL is ready!"
# Create DSpace database and user
PGPASSWORD=$DB_ROOT_PASSWORD psql -h "$DB_HOST" -p "$DB_PORT" -U postgres <<EOF
SELECT 'CREATE DATABASE dspace' WHERE NOT EXISTS (SELECT FROM pg_database WHERE datname = 'dspace')\gexec
SELECT 'CREATE USER dspace WITH PASSWORD ''$DB_PASSWORD''' WHERE NOT EXISTS (SELECT FROM pg_user WHERE usename = 'dspace')\gexec
GRANT ALL PRIVILEGES ON DATABASE dspace TO dspace;
EOF
echo "Database initialized!"
# Check if schema needs initialization
TABLE_COUNT=$(PGPASSWORD=$DB_PASSWORD psql -h "$DB_HOST" -p "$DB_PORT" -U "$DB_USER" -d dspace -t -c "SELECT COUNT(*) FROM information_schema.tables WHERE table_schema='public';" 2>/dev/null || echo "0")
if [ "$TABLE_COUNT" -eq 0 ]; then
echo "Initializing DSpace database schema..."
cd /dspace
./bin/dspace database migrate
echo "Database schema initialized!"
echo "Creating initial administrator..."
./bin/dspace create-administrator -e admin@example.com -f Admin -l User -p admin -c en
echo "Administrator created (change password immediately!)"
else
echo "Database already initialized"
fi

Step 4: Create Solr Configuration

Create Dockerfile.solr for Solr search:

FROM solr:8.11
# Set environment variables
ENV SOLR_HOME=/var/solr/data
USER root
# Copy DSpace Solr cores
RUN mkdir -p /opt/solr/server/solr/configsets/dspace/conf
COPY --chown=solr:solr solr-cores/ /opt/solr/server/solr/configsets/dspace/
USER solr
# Create DSpace cores
RUN solr start -force && \
solr create_core -c search -d /opt/solr/server/solr/configsets/dspace && \
solr create_core -c statistics -d /opt/solr/server/solr/configsets/dspace && \
solr create_core -c oai -d /opt/solr/server/solr/configsets/dspace && \
solr stop
EXPOSE 8983
CMD ["solr-foreground"]

Step 5: Create Docker Compose for Local Development

Create docker-compose.yml:

version: '3.8'
services:
postgres:
image: postgres:13
environment:
POSTGRES_DB: dspace
POSTGRES_USER: dspace
POSTGRES_PASSWORD: dspace
PGDATA: /var/lib/postgresql/data/pgdata
volumes:
- postgres-data:/var/lib/postgresql/data
ports:
- "5432:5432"
restart: unless-stopped
solr:
build:
context: .
dockerfile: Dockerfile.solr
volumes:
- solr-data:/var/solr/data
ports:
- "8983:8983"
restart: unless-stopped
dspace:
build:
context: .
dockerfile: Dockerfile
depends_on:
- postgres
- solr
environment:
DSPACE_INSTALL_DIR: /dspace
DB_HOST: postgres
DB_PORT: 5432
DB_USER: dspace
DB_PASSWORD: dspace
DB_ROOT_PASSWORD: postgres
SOLR_HOST: solr
volumes:
- dspace-assetstore:/dspace/assetstore
- dspace-logs:/dspace/log
- dspace-exports:/dspace/exports
ports:
- "8080:8080"
restart: unless-stopped
volumes:
postgres-data:
solr-data:
dspace-assetstore:
dspace-logs:
dspace-exports:

Step 6: Create Management Scripts

Create maintenance.sh for common tasks:

#!/bin/bash
DSPACE_DIR="/dspace"
case "$1" in
reindex)
echo "Reindexing DSpace..."
$DSPACE_DIR/bin/dspace index-discovery
;;
filter-media)
echo "Running media filter..."
$DSPACE_DIR/bin/dspace filter-media
;;
cleanup)
echo "Cleaning up old searches and deleted items..."
$DSPACE_DIR/bin/dspace cleanup
;;
stats)
echo "Generating statistics..."
$DSPACE_DIR/bin/dspace stats-util -i
;;
backup)
BACKUP_DIR="/backups/dspace_$(date +%Y%m%d_%H%M%S)"
mkdir -p $BACKUP_DIR
echo "Backing up assetstore..."
tar -czf $BACKUP_DIR/assetstore.tar.gz $DSPACE_DIR/assetstore
echo "Backing up database..."
pg_dump -h $DB_HOST -U $DB_USER dspace > $BACKUP_DIR/dspace.sql
echo "Backup complete: $BACKUP_DIR"
;;
checksum)
echo "Verifying checksums..."
$DSPACE_DIR/bin/dspace checker -l -p
;;
*)
echo "Usage: $0 {reindex|filter-media|cleanup|stats|backup|checksum}"
exit 1
;;
esac

Step 7: Initialize Git Repository

Terminal window
git init
git add Dockerfile Dockerfile.solr local.cfg dspace.cfg init-dspace-db.sh docker-compose.yml maintenance.sh
git commit -m "Initial DSpace deployment configuration"

Step 8: Test Locally

Before deploying to Klutch.sh, test locally:

Terminal window
# Build and start containers
docker-compose up -d
# Wait for services to start (may take 5-10 minutes)
docker-compose logs -f dspace
# Initialize database
docker-compose exec dspace bash /dspace/init-dspace-db.sh
# Access DSpace at http://localhost:8080
# Default admin: admin@example.com / admin

Deploying to Klutch.sh

Step 1: Deploy PostgreSQL Database

First, deploy a PostgreSQL instance:

  1. Navigate to klutch.sh/app
  2. Click "New Project"
  3. Select PostgreSQL or use a custom Dockerfile
  4. Configure database: - Database name: `dspace` - Username: `dspace` - Password: Create a secure password - Root password: Create a secure root password
  5. Select **TCP** as traffic type
  6. Set internal port to **5432**
  7. Add persistent storage with mount path: `/var/lib/postgresql/data` and size: `50GB`
  8. Note the connection details (hostname like `postgres-app.klutch.sh:8000`)

Step 2: Deploy Solr Search Engine

Deploy Solr for search functionality:

  1. Create a new project in Klutch.sh
  2. Push your Solr Dockerfile to a GitHub repository
  3. Import the repository to Klutch.sh
  4. Configure Solr: - Select **HTTP** as traffic type - Set internal port to **8983** - Add persistent storage with mount path: `/var/solr/data` and size: `20GB`
  5. Note the Solr URL (like `https://solr-app.klutch.sh`)

Step 3: Push DSpace Repository to GitHub

Create a new repository and push:

Terminal window
git remote add origin https://github.com/yourusername/dspace-klutch.git
git branch -M master
git push -u origin master

Step 4: Deploy DSpace to Klutch.sh

  1. Navigate to klutch.sh/app
  2. Click "New Project" and select "Import from GitHub"
  3. Authorize Klutch.sh to access your GitHub repositories
  4. Select your DSpace repository
  5. Klutch.sh will automatically detect the Dockerfile

Step 5: Configure DSpace Traffic Settings

  1. In the project settings, select **HTTP** as the traffic type
  2. Set the internal port to **8080**
  3. Klutch.sh will automatically provision an HTTPS endpoint

Step 6: Add Persistent Storage for DSpace

DSpace requires persistent storage for content and logs:

  1. In your project settings, navigate to the "Storage" section
  2. Add a volume with mount path: `/dspace/assetstore` and size: `100GB` (for digital objects)
  3. Add a volume with mount path: `/dspace/log` and size: `10GB` (for logs)
  4. Add a volume with mount path: `/dspace/exports` and size: `20GB` (for batch exports)

Storage recommendations:

  • Small repository (< 10,000 items): 50GB assetstore
  • Medium repository (10,000-100,000 items): 200GB assetstore
  • Large repository (100,000+ items): 500GB+ assetstore

Step 7: Configure DSpace Environment Variables

Add the following environment variables in Klutch.sh dashboard:

  • DB_HOST: Your PostgreSQL hostname (e.g., postgres-app.klutch.sh)
  • DB_PORT: 8000 (external TCP port)
  • DB_USER: dspace
  • DB_PASSWORD: Your database password
  • DB_ROOT_PASSWORD: Your root password
  • SOLR_HOST: Your Solr hostname (e.g., solr-app.klutch.sh)
  • DSPACE_INSTALL_DIR: /dspace
  • JAVA_OPTS: -Xmx4096m -XX:MaxMetaspaceSize=1024m (adjust based on resources)

Step 8: Update Configuration Files

Before deploying, update local.cfg with your actual URLs:

# Update these values
dspace.server.url = https://your-app.klutch.sh
dspace.ui.url = https://your-app.klutch.sh
db.url = jdbc:postgresql://postgres-app.klutch.sh:8000/dspace
solr.server = https://solr-app.klutch.sh/solr

Commit and push changes:

Terminal window
git add local.cfg
git commit -m "Update configuration for Klutch.sh deployment"
git push

Step 9: Deploy DSpace

  1. Review your configuration settings in Klutch.sh
  2. Click "Deploy" to start the deployment
  3. Monitor build logs for any errors
  4. Wait for initialization (first deployment takes 10-15 minutes)
  5. Once deployed, DSpace will be available at `your-app.klutch.sh`

Step 10: Initialize Database and Create Admin User

After first deployment, initialize the database:

  1. Access the DSpace container terminal in Klutch.sh
  2. Run the initialization script: ```bash bash /dspace/init-dspace-db.sh ```
  3. Create administrator account: ```bash /dspace/bin/dspace create-administrator -e admin@yourinstitution.edu -f Admin -l User -p your_secure_password -c en ```

Getting Started with DSpace

Initial Setup

After deployment, complete the initial configuration:

  1. Access DSpace: Navigate to https://your-app.klutch.sh

  2. Login as Administrator:

    • Click “Log In” in the top navigation
    • Enter administrator credentials
    • You’ll be redirected to the admin dashboard
  3. Configure Basic Settings:

    • Navigate to “Administration” → “Edit Configuration”
    • Update repository name and description
    • Set contact information
    • Configure institutional logo
  4. Set Up Email Notifications:

    • Verify SMTP settings in local.cfg
    • Test email delivery
    • Configure notification templates

Creating Community Structure

Organize your repository with communities and collections:

  1. Navigate to "Administration" → "Communities & Collections"
  2. Click "Create Top-Level Community"
  3. Configure community: - **Name**: "School of Engineering" - **Short Description**: Brief overview - **Introductory Text**: Detailed description - **Copyright Text**: Rights statement - **Logo**: Upload community logo (optional)
  4. Click "Create" to save community
  5. Within the community, create sub-communities or collections

Creating Collections

Add collections to organize items:

  1. Navigate to a community
  2. Click "Create Collection"
  3. Configure collection: - **Name**: "Department of Computer Science Theses" - **Short Description**: Collection overview - **Introductory Text**: Detailed information - **License**: Default submission license - **Provenance**: Provenance statement
  4. Configure submission settings: - Select input forms - Define workflow steps - Set access policies
  5. Click "Create" to save collection

Configuring Metadata Schemas

Customize metadata fields for your institution:

  1. Navigate to "Administration" → "Registries" → "Metadata Registry"
  2. View existing Dublin Core fields
  3. Add custom fields: - Click "Add Field" - Select schema (dc, dcterms, or custom) - Enter element name (e.g., "department") - Enter qualifier (optional, e.g., "sponsor") - Add scope note for guidance
  4. Save custom field

Common custom fields:

dc.contributor.advisor - Thesis advisor
dc.degree.name - Degree name (e.g., Ph.D.)
dc.degree.level - Degree level (Masters, Doctoral)
dc.degree.discipline - Field of study
dc.degree.grantor - Degree-granting institution
dc.identifier.doi - Digital Object Identifier
dc.identifier.orcid - ORCID iD

Configuring Submission Forms

Customize input forms for metadata collection:

  1. Edit `/dspace/config/submission-forms.xml`
  2. Define form fields for each collection type
  3. Example thesis submission form:
<form name="thesis">
<page number="1">
<field>
<dc-schema>dc</dc-schema>
<dc-element>title</dc-element>
<dc-qualifier></dc-qualifier>
<required>true</required>
<label>Title</label>
<input-type>onebox</input-type>
<hint>Enter the full title of your thesis</hint>
</field>
<field>
<dc-schema>dc</dc-schema>
<dc-element>contributor</dc-element>
<dc-qualifier>author</dc-qualifier>
<required>true</required>
<label>Author</label>
<input-type>name</input-type>
<hint>Enter the author's name</hint>
</field>
<field>
<dc-schema>dc</dc-schema>
<dc-element>contributor</dc-element>
<dc-qualifier>advisor</dc-qualifier>
<required>true</required>
<label>Thesis Advisor</label>
<input-type>name</input-type>
<hint>Enter advisor's name</hint>
</field>
<field>
<dc-schema>dc</dc-schema>
<dc-element>date</dc-element>
<dc-qualifier>issued</dc-qualifier>
<required>true</required>
<label>Date of Issue</label>
<input-type>date</input-type>
<hint>Enter publication date</hint>
</field>
<field>
<dc-schema>dc</dc-schema>
<dc-element>subject</dc-element>
<dc-qualifier></dc-qualifier>
<required>false</required>
<label>Keywords</label>
<input-type>twobox</input-type>
<hint>Enter keywords, one per line</hint>
<repeatable>true</repeatable>
</field>
<field>
<dc-schema>dc</dc-schema>
<dc-element>description</dc-element>
<dc-qualifier>abstract</dc-qualifier>
<required>true</required>
<label>Abstract</label>
<input-type>textarea</input-type>
<hint>Enter thesis abstract</hint>
</field>
</page>
</form>

Submitting Your First Item

Test the submission workflow:

  1. Navigate to a collection
  2. Click "Submit to this Collection"
  3. Follow the submission steps: - **Select Type**: Choose item type (Article, Thesis, etc.) - **Describe**: Enter metadata - **Upload**: Add files (PDF, datasets, etc.) - **Verify**: Review submission - **License**: Accept distribution license - **Complete**: Submit for review or archive
  4. If workflow is enabled, submission goes to reviewers
  5. Otherwise, item is archived immediately

Configuring Workflows

Set up review processes for quality control:

  1. Edit `/dspace/config/workflow.xml`
  2. Define workflow steps for collections:
<workflow id="defaultWorkflow">
<step id="reviewstep">
<role name="reviewer">
<description>Reviewers for this step</description>
</role>
<actions>
<action id="approve">
<description>Approve submission</description>
<outcomes>
<outcome id="approved">
<step>editstep</step>
</outcome>
</outcomes>
</action>
<action id="reject">
<description>Reject submission</description>
<outcomes>
<outcome id="rejected">
<step>reject</step>
</outcome>
</outcomes>
</action>
</actions>
</step>
<step id="editstep">
<role name="editor">
<description>Editors for this step</description>
</role>
<actions>
<action id="approve">
<description>Final approval</description>
<outcomes>
<outcome id="approved">
<step>archive</step>
</outcome>
</outcomes>
</action>
</actions>
</step>
</workflow>

Managing User Groups

Create groups for access control:

  1. Navigate to "Administration" → "Access Control" → "Groups"
  2. Click "Create New Group"
  3. Configure group: - **Name**: "Faculty Submitters" - **Description**: Group description
  4. Add members: - Search for users - Add to group
  5. Assign permissions to collections

Setting Access Policies

Control who can view and submit content:

  1. Navigate to collection settings
  2. Click "Authorization" tab
  3. Add policies: - **Action**: READ (view items) - **Group**: Anonymous or specific group - **Start Date**: Optional embargo start - **End Date**: Optional embargo end
  4. Add submission policies: - **Action**: ADD (submit items) - **Group**: Faculty Submitters

Configuring Embargo Periods

Set up temporary access restrictions:

  1. During submission, select "Access Conditions"
  2. Choose embargo type: - **Open Access**: Immediate public access - **Embargo**: Restricted until date - **Restricted**: Limited group access
  3. Set embargo date
  4. System automatically lifts embargo when date passes

Production Best Practices

Performance Optimization

  1. Java Memory Configuration: Adjust heap size based on repository size:
Terminal window
# For repositories with < 50,000 items
JAVA_OPTS="-Xmx4g -XX:MaxMetaspaceSize=1g"
# For repositories with 50,000-200,000 items
JAVA_OPTS="-Xmx8g -XX:MaxMetaspaceSize=2g"
# For repositories with 200,000+ items
JAVA_OPTS="-Xmx16g -XX:MaxMetaspaceSize=4g"
  1. Database Connection Pooling: Optimize PostgreSQL connections:
# In local.cfg
db.maxconnections = 30
db.maxwait = 5000
db.maxidle = 10
  1. Solr Memory: Increase Solr heap for large indexes:
Terminal window
# In Dockerfile.solr
ENV SOLR_JAVA_MEM="-Xms2g -Xmx4g"
  1. Assetstore Organization: Use multiple assetstores for better I/O:
# In local.cfg
assetstore.dir = ${dspace.dir}/assetstore
assetstore.dir.1 = ${dspace.dir}/assetstore2
assetstore.incoming = 1
  1. Enable Caching: Configure content caching:
# Enable caching
cache.enabled = true
cache.size = 100
cache.name = org.dspace.content

Security Hardening

  1. Change Default Credentials: Immediately change admin password:
Terminal window
/dspace/bin/dspace user --modify --email admin@example.com --password new_secure_password
  1. Secure Database Connection: Use SSL for database:
db.url = jdbc:postgresql://postgres-host:8000/dspace?ssl=true&sslfactory=org.postgresql.ssl.NonValidatingFactory
  1. Configure CORS: Restrict API access:
# In local.cfg
cors.allowed-origins = https://your-app.klutch.sh
cors.allowed-methods = GET, POST, PUT, DELETE, OPTIONS
cors.allowed-headers = *
  1. Implement Rate Limiting: Prevent API abuse:
# Limit REST API requests
rest.ratelimit.enabled = true
rest.ratelimit.limit = 100
rest.ratelimit.period = 60
  1. Enable Audit Logging: Track administrative actions:
audit.enabled = true
audit.log = ${dspace.dir}/log/audit.log

Backup Strategy

Implement comprehensive backups:

  1. Database Backup: Daily PostgreSQL dumps:
backup-dspace-db.sh
#!/bin/bash
BACKUP_DIR="/backups/dspace_$(date +%Y%m%d_%H%M%S)"
mkdir -p $BACKUP_DIR
# Backup database
pg_dump -h $DB_HOST -p $DB_PORT -U $DB_USER -F c dspace > $BACKUP_DIR/dspace_db.backup
# Compress backup
gzip $BACKUP_DIR/dspace_db.backup
# Keep only last 30 days
find /backups -name "dspace_*.backup.gz" -mtime +30 -delete
echo "Database backup complete: $BACKUP_DIR"
  1. Assetstore Backup: Incremental file backups:
backup-assetstore.sh
#!/bin/bash
BACKUP_DIR="/backups/assetstore_$(date +%Y%m%d)"
ASSETSTORE="/dspace/assetstore"
# Incremental backup using rsync
rsync -av --delete $ASSETSTORE/ $BACKUP_DIR/
echo "Assetstore backup complete: $BACKUP_DIR"
  1. Configuration Backup: Version control for configs:
Terminal window
# Backup configuration files
tar -czf config_backup_$(date +%Y%m%d).tar.gz /dspace/config/
  1. Automated Backup Schedule:
Terminal window
# Add to crontab
0 2 * * * /usr/local/bin/backup-dspace-db.sh
0 3 * * 0 /usr/local/bin/backup-assetstore.sh

Maintenance Tasks

Schedule regular maintenance:

  1. Reindex Search: Weekly search index rebuild:
Terminal window
# Rebuild discovery index
/dspace/bin/dspace index-discovery -b
# Or incremental update
/dspace/bin/dspace index-discovery
  1. Media Filter: Generate thumbnails and extract text:
Terminal window
# Process new items
/dspace/bin/dspace filter-media
# Force reprocess all items
/dspace/bin/dspace filter-media -f
  1. Cleanup Tasks: Remove orphaned data:
Terminal window
# Clean up deleted items, searches, etc.
/dspace/bin/dspace cleanup
  1. Statistics Processing: Generate usage statistics:
Terminal window
# Process statistics
/dspace/bin/dspace stats-util -i
# Generate reports
/dspace/bin/dspace stats-util -r
  1. Checksum Verification: Verify bitstream integrity:
Terminal window
# Check all bitstreams
/dspace/bin/dspace checker -l -p
# Check and report results
/dspace/bin/dspace checker-emailer
  1. Automated Maintenance Cron:
Terminal window
# Add to crontab
0 1 * * * /dspace/bin/dspace cleanup
0 2 * * 0 /dspace/bin/dspace index-discovery -b
0 3 * * * /dspace/bin/dspace filter-media
0 4 * * 0 /dspace/bin/dspace checker -l

Monitoring and Logging

  1. Enable Detailed Logging:
<!-- In log4j2.xml -->
<Configuration>
<Appenders>
<RollingFile name="DSpaceLog" fileName="${dspace.dir}/log/dspace.log"
filePattern="${dspace.dir}/log/dspace-%d{yyyy-MM-dd}.log">
<PatternLayout>
<Pattern>%d{ISO8601} %-5p %c @ %m%n</Pattern>
</PatternLayout>
<Policies>
<TimeBasedTriggeringPolicy />
</Policies>
<DefaultRolloverStrategy max="30"/>
</RollingFile>
</Appenders>
<Loggers>
<Root level="INFO">
<AppenderRef ref="DSpaceLog"/>
</Root>
</Loggers>
</Configuration>
  1. Monitor Database Performance:
-- Active connections
SELECT count(*) FROM pg_stat_activity WHERE datname = 'dspace';
-- Long-running queries
SELECT pid, now() - pg_stat_activity.query_start AS duration, query
FROM pg_stat_activity
WHERE pg_stat_activity.query != '<IDLE>'
AND now() - pg_stat_activity.query_start > interval '5 minutes';
-- Database size
SELECT pg_size_pretty(pg_database_size('dspace'));
  1. Health Check Script:
health-check.sh
#!/bin/bash
DSPACE_URL="https://your-app.klutch.sh"
# Check DSpace is responding
if ! curl -f -s -o /dev/null "$DSPACE_URL/server/api"; then
echo "ERROR: DSpace not responding"
exit 1
fi
# Check Solr
if ! curl -f -s -o /dev/null "http://localhost:8983/solr/search/admin/ping"; then
echo "ERROR: Solr not responding"
exit 1
fi
# Check disk space
DISK_USAGE=$(df -h /dspace/assetstore | awk 'NR==2 {print $5}' | sed 's/%//')
if [ "$DISK_USAGE" -gt 85 ]; then
echo "WARNING: Disk usage at ${DISK_USAGE}%"
fi
echo "OK: DSpace is healthy"
exit 0

Resource Allocation

Recommended resources by repository size:

Small Repository (< 10,000 items):

  • DSpace: 4GB RAM, 2 vCPU
  • PostgreSQL: 2GB RAM, 2 vCPU
  • Solr: 2GB RAM, 1 vCPU
  • Storage: 100GB assetstore, 20GB database, 10GB Solr

Medium Repository (10,000-100,000 items):

  • DSpace: 8GB RAM, 4 vCPU
  • PostgreSQL: 4GB RAM, 2 vCPU
  • Solr: 4GB RAM, 2 vCPU
  • Storage: 500GB assetstore, 50GB database, 50GB Solr

Large Repository (100,000+ items):

  • DSpace: 16GB RAM, 8 vCPU
  • PostgreSQL: 8GB RAM, 4 vCPU
  • Solr: 8GB RAM, 4 vCPU
  • Storage: 2TB+ assetstore, 200GB database, 100GB Solr

Troubleshooting

Submission Upload Fails

Symptoms: File uploads fail during submission

Solutions:

  1. Check File Size Limit:
# In local.cfg
upload.max = 536870912 # 512MB in bytes
  1. Verify Disk Space:
Terminal window
df -h /dspace/assetstore
df -h /dspace/upload
  1. Check Permissions:
Terminal window
chown -R dspace:dspace /dspace/assetstore
chown -R dspace:dspace /dspace/upload
  1. Increase Tomcat Timeout:
<!-- In server.xml -->
<Connector port="8080" connectionTimeout="60000" />

Search Not Returning Results

Symptoms: Search returns no results or incomplete results

Solutions:

  1. Rebuild Search Index:
Terminal window
/dspace/bin/dspace index-discovery -b
  1. Check Solr Status:
Terminal window
curl http://localhost:8983/solr/search/admin/ping
  1. Verify Solr Configuration:
Terminal window
# Check Solr URL in local.cfg
solr.server = http://solr-host:8983/solr
  1. Check Solr Logs:
Terminal window
tail -f /var/solr/logs/solr.log

Database Connection Errors

Symptoms: “Unable to connect to database” errors

Solutions:

  1. Verify Database is Running:
Terminal window
psql -h $DB_HOST -p $DB_PORT -U $DB_USER -d dspace -c "SELECT 1"
  1. Check Connection String:
# In local.cfg
db.url = jdbc:postgresql://postgres-host:8000/dspace
db.username = dspace
db.password = your_password
  1. Test Network Connectivity:
Terminal window
telnet $DB_HOST 8000
  1. Increase Connection Pool:
db.maxconnections = 50
db.maxwait = 10000

Items Not Displaying Properly

Symptoms: Item pages show errors or missing metadata

Solutions:

  1. Check Item Permissions:
Terminal window
/dspace/bin/dspace dsrun org.dspace.administer.ItemPolicies <item-id>
  1. Verify Metadata:
Terminal window
/dspace/bin/dspace metadata-export -i <item-id>
  1. Clear Cache:
Terminal window
rm -rf /dspace/var/cache/*
  1. Check Logs:
Terminal window
tail -f /dspace/log/dspace.log

Embargo Not Working

Symptoms: Embargoed items are publicly accessible

Solutions:

  1. Check Resource Policies:
Terminal window
/dspace/bin/dspace dsrun org.dspace.embargo.EmbargoManager
  1. Verify Embargo Configuration:
# In local.cfg
embargo.field.terms = dc.embargo.terms
embargo.field.lift = dc.date.available
  1. Manually Set Embargo:
Terminal window
/dspace/bin/dspace embargo-setter -i <item-id> -d YYYY-MM-DD

OAI-PMH Harvesting Issues

Symptoms: External systems cannot harvest metadata

Solutions:

  1. Test OAI-PMH Endpoint:
Terminal window
curl "https://your-app.klutch.sh/server/oai?verb=Identify"
  1. Rebuild OAI Index:
Terminal window
/dspace/bin/dspace oai import
/dspace/bin/dspace oai clean-cache
  1. Check OAI Configuration:
oai.solr.url = ${solr.server}/oai
oai.identifier.prefix = oai:your-app.klutch.sh:

Advanced Configuration

Custom Themes

Customize the DSpace user interface:

  1. Create Custom Theme:
Terminal window
# Copy base theme
cp -r /dspace/webapps/xmlui/themes/Mirage2 /dspace/webapps/xmlui/themes/CustomTheme
  1. Modify Theme Configuration:
<!-- In theme.xml -->
<theme name="CustomTheme" path="CustomTheme/">
<conffile>sitemap.xmap</conffile>
</theme>
  1. Customize Styles:
/* In style.css */
.header {
background-color: #003366;
color: white;
}
.logo {
max-width: 200px;
}
.sidebar {
background-color: #f5f5f5;
}
  1. Update Logo and Branding:
Terminal window
# Replace logo
cp your-logo.png /dspace/webapps/xmlui/themes/CustomTheme/images/logo.png

DOI Integration

Enable DOI minting for persistent identifiers:

  1. Configure DataCite or Crossref:
# In local.cfg
identifier.doi.prefix = 10.1234
identifier.doi.namespaceseparator = /
identifier.doi.user = your-datacite-username
identifier.doi.password = your-datacite-password
identifier.doi.datacentre = your-datacentre
  1. Enable DOI Plugin:
plugin.sequence.org.dspace.identifier.IdentifierProvider = \
org.dspace.identifier.DOIIdentifierProvider
  1. Mint DOIs:
Terminal window
# Mint DOI for item
/dspace/bin/dspace doi-organiser -r

ORCID Integration

Connect authors with ORCID profiles:

  1. Register for ORCID API Access:

  2. Configure ORCID:

# In local.cfg
orcid.application-client-id = your-client-id
orcid.application-client-secret = your-client-secret
orcid.domain-url = https://orcid.org
  1. Enable ORCID Authentication:
plugin.sequence.org.dspace.authenticate.AuthenticationMethod = \
org.dspace.authenticate.OAuthAuthentication

Batch Import/Export

Import large collections of items:

  1. Prepare Import Package:
import_package/
├── collections/
├── simple-archive/
│ ├── item_001/
│ │ ├── contents
│ │ ├── dublin_core.xml
│ │ ├── file1.pdf
│ └── item_002/
│ ├── contents
│ ├── dublin_core.xml
│ ├── file2.pdf
  1. Create dublin_core.xml:
<?xml version="1.0" encoding="UTF-8"?>
<dublin_core schema="dc">
<dcvalue element="title" qualifier="">Sample Article Title</dcvalue>
<dcvalue element="contributor" qualifier="author">Smith, John</dcvalue>
<dcvalue element="date" qualifier="issued">2024-01-15</dcvalue>
<dcvalue element="description" qualifier="abstract">This is the abstract...</dcvalue>
<dcvalue element="subject">Computer Science</dcvalue>
<dcvalue element="type">Article</dcvalue>
</dublin_core>
  1. Create contents file:
file1.pdf bundle:ORIGINAL
  1. Import Items:
Terminal window
/dspace/bin/dspace import -a -e admin@example.com -c 123456789/2 -s /import_package/simple-archive -m mapfile.txt
  1. Export Items:
Terminal window
# Export collection
/dspace/bin/dspace export -t COLLECTION -i 123456789/2 -d /exports/collection -n 1

REST API Usage

Interact with DSpace programmatically:

  1. Authenticate:
Terminal window
# Get JWT token
TOKEN=$(curl -X POST "https://your-app.klutch.sh/server/api/authn/login" \
-H "Content-Type: application/x-www-form-urlencoded" \
-d "user=admin@example.com&password=your_password" \
| jq -r '.token')
  1. List Communities:
Terminal window
curl "https://your-app.klutch.sh/server/api/core/communities" \
-H "Authorization: Bearer $TOKEN"
  1. Get Item Metadata:
Terminal window
curl "https://your-app.klutch.sh/server/api/core/items/{item-uuid}" \
-H "Authorization: Bearer $TOKEN"
  1. Create Item via API:
import requests
import json
api_url = "https://your-app.klutch.sh/server/api"
token = "your-jwt-token"
headers = {
"Authorization": f"Bearer {token}",
"Content-Type": "application/json"
}
# Create item
item_data = {
"name": "Test Item",
"metadata": {
"dc.title": [{"value": "Test Title"}],
"dc.contributor.author": [{"value": "Smith, John"}],
"dc.date.issued": [{"value": "2024-01-15"}]
}
}
response = requests.post(
f"{api_url}/core/items",
headers=headers,
json=item_data
)
print(response.json())

Statistics and Reporting

Generate usage statistics:

  1. Enable Statistics:
# In local.cfg
usage-statistics.dbfile = ${dspace.dir}/log/dspace-stats.db
solr-statistics.server = ${solr.server}/statistics
  1. Generate Reports:
Terminal window
# Generate monthly report
/dspace/bin/dspace stats-util -r -m 2024-01
# Export statistics
/dspace/bin/dspace stats-util -e /exports/stats.csv
  1. Integrate Google Analytics:
google.analytics.key = UA-XXXXXXXX-X

Additional Resources

Conclusion

DSpace provides a comprehensive, enterprise-grade solution for institutional repositories and digital asset management. By deploying on Klutch.sh, you benefit from automatic HTTPS, persistent storage, and simple Docker-based deployment while maintaining the robust preservation features and scalability that DSpace offers.

The platform’s proven track record with major universities and research institutions worldwide demonstrates its reliability and maturity. Features like flexible metadata schemas, configurable workflows, OAI-PMH harvesting, DOI integration, and comprehensive API access make DSpace the standard choice for organizations managing scholarly communications and digital collections.

Whether you’re launching a new institutional repository, managing research data, preserving cultural heritage materials, or building a digital library, DSpace scales to meet your needs. The system’s modular architecture allows you to customize every aspect from metadata schemas and submission forms to themes and access policies.

Start with the basic configuration outlined in this guide, then expand functionality through custom metadata fields, workflow customization, API integration, and advanced preservation features as your repository grows. Your digital content remains secure, discoverable, and preserved for the long term, while the open-source nature of DSpace ensures you have complete control over your institutional knowledge base.

Deploy DSpace today and join thousands of institutions worldwide in preserving and sharing digital scholarship for future generations.