Deploying DSpace
DSpace is the world’s leading open-source repository platform for research outputs, publications, datasets, and learning resources. Developed by MIT Libraries and HP Labs, and now maintained by LYRASIS and DuraSpace, DSpace powers digital repositories for thousands of academic institutions, research organizations, government agencies, and cultural heritage institutions worldwide. The platform provides robust tools for ingesting, preserving, indexing, and distributing digital content while ensuring long-term preservation and accessibility.
What makes DSpace particularly powerful is its comprehensive feature set designed for institutional repositories. The system supports complex metadata schemas including Dublin Core, qualified Dublin Core, and custom metadata fields. It handles a wide variety of digital content formats from PDFs and images to datasets and multimedia files. Features like customizable workflows, embargo periods, access controls, OAI-PMH harvesting, DOI minting, and integration with authentication systems make DSpace a complete solution for managing institutional knowledge.
Why Deploy DSpace on Klutch.sh?
Klutch.sh provides an excellent platform for hosting DSpace with several key advantages:
- Simple Docker Deployment: Deploy your Dockerfile and Klutch.sh automatically handles containerization and orchestration
- Persistent Storage: Attach volumes for content storage (assetstore), database, and Solr indexes with guaranteed durability
- Automatic HTTPS: All deployments come with automatic SSL certificates for secure repository access
- Resource Scalability: Scale CPU and memory resources as your repository grows
- Database Integration: Run PostgreSQL alongside DSpace with persistent storage
- Cost-Effective: Pay only for resources used, scale based on repository size
- Zero Server Management: Focus on curating content, not managing infrastructure
Prerequisites
Before deploying DSpace, ensure you have:
- A Klutch.sh account (sign up at klutch.sh)
- Git installed locally
- Basic understanding of institutional repositories and digital preservation
- Familiarity with Docker and container concepts
- PostgreSQL knowledge for database management
- Understanding of metadata standards (Dublin Core, etc.)
- Java and Tomcat knowledge (helpful but not required)
Understanding DSpace’s Architecture
DSpace uses a multi-tier architecture designed for scalability and long-term digital preservation:
Core Components
DSpace Backend (REST API): Built with Spring Boot, the backend provides:
- RESTful API for all repository operations
- Content ingestion and management
- Metadata handling and validation
- Workflow processing
- Authorization and authentication
- OAI-PMH data provider
- Batch import/export functionality
DSpace Frontend (Angular): Modern single-page application offering:
- User-friendly interface for browsing and searching
- Submission workflows for depositing content
- Administrative interfaces
- Customizable themes and branding
- Responsive design for mobile devices
- Internationalization support
PostgreSQL Database: Stores all repository data including:
- Item metadata
- Bitstream metadata and mappings
- Community and collection structure
- User accounts and permissions
- Workflow state
- Handle assignments
- Statistics and usage data
Assetstore: File system storage for digital objects:
- Original uploaded files (bitstreams)
- Generated thumbnails and derivatives
- Preserved formats
- Organized by internal ID for efficient access
- Supports multiple assetstore locations
Solr Search Index: Apache Solr provides:
- Full-text search across metadata and content
- Faceted browsing by date, author, subject
- Statistics and reporting
- Authority control for names and subjects
- Configurable relevance ranking
Handle Server (Optional): Persistent identifier system:
- Assigns permanent URLs to items
- Ensures long-term accessibility
- Integrates with global Handle system
- Supports custom handle prefixes
Content Model
Communities: Top-level organizational units
- Represent departments, research centers, or topic areas
- Can contain sub-communities
- Define authorization policies
- Support custom logos and descriptions
Collections: Groupings of related items
- Belong to communities
- Define metadata schemas
- Configure submission workflows
- Set access policies and embargo rules
- Support collection-specific branding
Items: Individual repository objects
- Consist of metadata and bitstreams
- Immutable after archiving (versioning supported)
- Assigned unique handles
- Support relationships to other items
- Can be versioned for updates
Bitstreams: Actual files attached to items
- Original submission files
- License agreements
- Generated thumbnails
- Format identification via PRONOM
- Checksum verification for integrity
Workflow System
Submission Workflows: Configurable multi-step process
- Describe item with metadata
- Upload files
- Verify submission
- License agreement
- Optional review steps
- Automated notifications
Review Workflows: Optional curation process
- Reviewers assigned by collection
- Approve, reject, or request changes
- Track submission history
- Email notifications
- Batch processing capabilities
Authentication and Authorization
Authentication Options:
- Local database authentication
- LDAP/Active Directory integration
- Shibboleth/SAML single sign-on
- ORCID authentication
- OAuth providers
- IP-based authentication
Authorization Model:
- Resource policies define access
- Actions: READ, WRITE, ADD, REMOVE, ADMIN
- Inherited from communities/collections
- Item-level overrides supported
- Time-based embargoes
- Group-based permissions
Preservation Features
Format Identification: Automatic format detection using PRONOM registry
Checksum Verification: Regular integrity checks on stored files
Metadata Preservation: Support for Dublin Core and custom schemas
Export Capabilities: AIP, METS, DSpace Intermediary Format
Version Control: Track changes to items over time
Installation and Setup
Step 1: Create the Dockerfile
Create a Dockerfile in your project root:
FROM dspace/dspace:7.6
# Set environment variablesENV DSPACE_INSTALL_DIR=/dspace \ DSPACE_CFG=/dspace/config/local.cfg \ CATALINA_HOME=/usr/local/tomcat \ JAVA_OPTS="-Xmx2048m -XX:MaxMetaspaceSize=512m"
# Install additional utilitiesUSER rootRUN apt-get update && apt-get install -y \ curl \ vim \ wget \ postgresql-client \ && rm -rf /var/lib/apt/lists/*
# Create necessary directoriesRUN mkdir -p /dspace/assetstore \ /dspace/solr \ /dspace/upload \ /dspace/reports \ /dspace/log
# Set proper permissionsRUN chown -R dspace:dspace /dspace
# Copy custom configurationCOPY local.cfg /dspace/config/local.cfgCOPY dspace.cfg /dspace/config/dspace.cfg
# Switch back to dspace userUSER dspace
# Expose portsEXPOSE 8080
# Health checkHEALTHCHECK --interval=30s --timeout=10s --start-period=120s --retries=3 \ CMD curl -f http://localhost:8080/server/api || exit 1
# Start DSpaceCMD ["catalina.sh", "run"]Step 2: Create DSpace Configuration
Create local.cfg for environment-specific settings:
# Database Configurationdb.url = jdbc:postgresql://postgres-host.klutch.sh:8000/dspacedb.username = dspacedb.password = your_secure_passworddb.schema = public
# DSpace Installation Directorydspace.dir = /dspace
# DSpace Server Configurationdspace.server.url = https://your-app.klutch.shdspace.ui.url = https://your-app.klutch.shdspace.name = My Institution Repository
# Assetstore Configurationassetstore.dir = ${dspace.dir}/assetstoreassetstore.incoming = 0
# Solr Configurationsolr.server = http://localhost:8983/solr
# Handle Configurationhandle.canonical.prefix = ${dspace.server.url}/handle/handle.prefix = 123456789
# Email Configurationmail.server = smtp.gmail.commail.server.port = 587mail.server.username = your-email@gmail.commail.server.password = your-app-passwordmail.from.address = noreply@your-institution.edumail.feedback.recipient = repository@your-institution.edumail.admin = admin@your-institution.edumail.server.disabled = falsemail.extraproperties = mail.smtp.auth=true, \ mail.smtp.starttls.enable=true
# Authentication Configurationplugin.sequence.org.dspace.authenticate.AuthenticationMethod = org.dspace.authenticate.PasswordAuthentication
# Authorization Configurationwebui.strengths.show = true
# Upload Settingsupload.max = 536870912upload.temp.dir = ${dspace.dir}/upload
# Thumbnail Settingsthumbnail.maxwidth = 300thumbnail.maxheight = 300
# Batch Import/Exportdspace-api.content.export.download.dir = ${dspace.dir}/exports
# Statisticsusage-statistics.dbfile = ${dspace.dir}/log/dspace-stats.dbsolr-statistics.server = ${solr.server}/statistics
# Google Analytics (optional)google.analytics.key =
# Curation Systemcurate.ui.taskqueue.dir = ${dspace.dir}/ctqueuesplugin.named.org.dspace.curate.CurationTask = \ org.dspace.ctask.general.ProfileFormats = profileformats, \ org.dspace.ctask.general.RequiredMetadata = requiredmetadata
# OAI-PMH Configurationoai.solr.url = ${solr.server}/oaioai.identifier.prefix = oai:${dspace.server.url}:
# Content Bitstream Storedefault.bitstream.store = 0store.number = 0store.dir = ${assetstore.dir}
# Media Filter Configurationfilter.plugins = \ PDF Text Extractor, \ HTML Text Extractor, \ Word Text Extractor, \ JPEG Thumbnail, \ PDF Thumbnail
# Submission Configurationsubmission.lookup.scopus.apikey =submission.lookup.crossref.email =
# SWORD Configuration (optional)sword-server.on-behalf-of.enable = truesword-server.workflowdefault = false
# Logginglog.init.config = ${dspace.dir}/config/log4j2.xmllog.dir = ${dspace.dir}/log
# Iiif Configuration (optional)iiif.enabled = false
# Enable Identifiers (DOI, Handle, etc.)identifier.doi.prefix = 10.5072identifier.doi.namespaceseparator = /Create dspace.cfg for additional configuration:
# Additional DSpace Configuration
# File Upload Settingswebui.submit.upload.required = truewebui.submit.upload.resume = true
# Search Configurationsearch.index.1 = author:dc.contributor.*search.index.2 = author:dc.creatorsearch.index.3 = title:dc.titlesearch.index.4 = keyword:dc.subjectsearch.index.5 = abstract:dc.description.abstractsearch.index.6 = series:dc.relation.ispartofseriessearch.index.7 = sponsor:dc.description.sponsorshipsearch.index.8 = identifier:dc.identifier.*search.index.9 = language:dc.language.iso
# Browse Indexeswebui.browse.index.1 = dateissued:item:dateissuedwebui.browse.index.2 = author:metadata:dc.contributor.*,dc.creator:textwebui.browse.index.3 = title:item:titlewebui.browse.index.4 = subject:metadata:dc.subject.*:text
# Item Displaywebui.itemdisplay.default = dc.title, dc.title.alternative, dc.contributor.*, \ dc.subject, dc.date.issued(date), dc.publisher, dc.identifier.citation, \ dc.relation.ispartofseries, dc.description.abstract, dc.description, \ dc.identifier.govdoc, dc.identifier.uri(link), dc.identifier.isbn, \ dc.identifier.issn, dc.identifier.ismn, dc.language.iso(language), \ dc.type
# Metadata Registrydublin.core.types = dc
# Workflow Settingsworkflow.reviewer.notify = trueworkflow.admin.notify = true
# Statistics Configurationusage-statistics.authorization.admin.usage = true
# Format Support Registrywebui.submit.upload.maxsize = 512000000Step 3: Create Database Initialization Script
Create init-dspace-db.sh:
#!/bin/bashset -e
echo "Waiting for PostgreSQL to be ready..."until PGPASSWORD=$DB_PASSWORD psql -h "$DB_HOST" -p "$DB_PORT" -U "$DB_USER" -c '\q' 2>/dev/null; do echo "PostgreSQL not ready, waiting..." sleep 2done
echo "PostgreSQL is ready!"
# Create DSpace database and userPGPASSWORD=$DB_ROOT_PASSWORD psql -h "$DB_HOST" -p "$DB_PORT" -U postgres <<EOFSELECT 'CREATE DATABASE dspace' WHERE NOT EXISTS (SELECT FROM pg_database WHERE datname = 'dspace')\gexecSELECT 'CREATE USER dspace WITH PASSWORD ''$DB_PASSWORD''' WHERE NOT EXISTS (SELECT FROM pg_user WHERE usename = 'dspace')\gexecGRANT ALL PRIVILEGES ON DATABASE dspace TO dspace;EOF
echo "Database initialized!"
# Check if schema needs initializationTABLE_COUNT=$(PGPASSWORD=$DB_PASSWORD psql -h "$DB_HOST" -p "$DB_PORT" -U "$DB_USER" -d dspace -t -c "SELECT COUNT(*) FROM information_schema.tables WHERE table_schema='public';" 2>/dev/null || echo "0")
if [ "$TABLE_COUNT" -eq 0 ]; then echo "Initializing DSpace database schema..." cd /dspace ./bin/dspace database migrate echo "Database schema initialized!"
echo "Creating initial administrator..." ./bin/dspace create-administrator -e admin@example.com -f Admin -l User -p admin -c en echo "Administrator created (change password immediately!)"else echo "Database already initialized"fiStep 4: Create Solr Configuration
Create Dockerfile.solr for Solr search:
FROM solr:8.11
# Set environment variablesENV SOLR_HOME=/var/solr/data
USER root
# Copy DSpace Solr coresRUN mkdir -p /opt/solr/server/solr/configsets/dspace/confCOPY --chown=solr:solr solr-cores/ /opt/solr/server/solr/configsets/dspace/
USER solr
# Create DSpace coresRUN solr start -force && \ solr create_core -c search -d /opt/solr/server/solr/configsets/dspace && \ solr create_core -c statistics -d /opt/solr/server/solr/configsets/dspace && \ solr create_core -c oai -d /opt/solr/server/solr/configsets/dspace && \ solr stop
EXPOSE 8983
CMD ["solr-foreground"]Step 5: Create Docker Compose for Local Development
Create docker-compose.yml:
version: '3.8'
services: postgres: image: postgres:13 environment: POSTGRES_DB: dspace POSTGRES_USER: dspace POSTGRES_PASSWORD: dspace PGDATA: /var/lib/postgresql/data/pgdata volumes: - postgres-data:/var/lib/postgresql/data ports: - "5432:5432" restart: unless-stopped
solr: build: context: . dockerfile: Dockerfile.solr volumes: - solr-data:/var/solr/data ports: - "8983:8983" restart: unless-stopped
dspace: build: context: . dockerfile: Dockerfile depends_on: - postgres - solr environment: DSPACE_INSTALL_DIR: /dspace DB_HOST: postgres DB_PORT: 5432 DB_USER: dspace DB_PASSWORD: dspace DB_ROOT_PASSWORD: postgres SOLR_HOST: solr volumes: - dspace-assetstore:/dspace/assetstore - dspace-logs:/dspace/log - dspace-exports:/dspace/exports ports: - "8080:8080" restart: unless-stopped
volumes: postgres-data: solr-data: dspace-assetstore: dspace-logs: dspace-exports:Step 6: Create Management Scripts
Create maintenance.sh for common tasks:
#!/bin/bash
DSPACE_DIR="/dspace"
case "$1" in reindex) echo "Reindexing DSpace..." $DSPACE_DIR/bin/dspace index-discovery ;; filter-media) echo "Running media filter..." $DSPACE_DIR/bin/dspace filter-media ;; cleanup) echo "Cleaning up old searches and deleted items..." $DSPACE_DIR/bin/dspace cleanup ;; stats) echo "Generating statistics..." $DSPACE_DIR/bin/dspace stats-util -i ;; backup) BACKUP_DIR="/backups/dspace_$(date +%Y%m%d_%H%M%S)" mkdir -p $BACKUP_DIR echo "Backing up assetstore..." tar -czf $BACKUP_DIR/assetstore.tar.gz $DSPACE_DIR/assetstore echo "Backing up database..." pg_dump -h $DB_HOST -U $DB_USER dspace > $BACKUP_DIR/dspace.sql echo "Backup complete: $BACKUP_DIR" ;; checksum) echo "Verifying checksums..." $DSPACE_DIR/bin/dspace checker -l -p ;; *) echo "Usage: $0 {reindex|filter-media|cleanup|stats|backup|checksum}" exit 1 ;;esacStep 7: Initialize Git Repository
git initgit add Dockerfile Dockerfile.solr local.cfg dspace.cfg init-dspace-db.sh docker-compose.yml maintenance.shgit commit -m "Initial DSpace deployment configuration"Step 8: Test Locally
Before deploying to Klutch.sh, test locally:
# Build and start containersdocker-compose up -d
# Wait for services to start (may take 5-10 minutes)docker-compose logs -f dspace
# Initialize databasedocker-compose exec dspace bash /dspace/init-dspace-db.sh
# Access DSpace at http://localhost:8080# Default admin: admin@example.com / adminDeploying to Klutch.sh
Step 1: Deploy PostgreSQL Database
First, deploy a PostgreSQL instance:
- Navigate to klutch.sh/app
- Click "New Project"
- Select PostgreSQL or use a custom Dockerfile
- Configure database: - Database name: `dspace` - Username: `dspace` - Password: Create a secure password - Root password: Create a secure root password
- Select **TCP** as traffic type
- Set internal port to **5432**
- Add persistent storage with mount path: `/var/lib/postgresql/data` and size: `50GB`
- Note the connection details (hostname like `postgres-app.klutch.sh:8000`)
Step 2: Deploy Solr Search Engine
Deploy Solr for search functionality:
- Create a new project in Klutch.sh
- Push your Solr Dockerfile to a GitHub repository
- Import the repository to Klutch.sh
- Configure Solr: - Select **HTTP** as traffic type - Set internal port to **8983** - Add persistent storage with mount path: `/var/solr/data` and size: `20GB`
- Note the Solr URL (like `https://solr-app.klutch.sh`)
Step 3: Push DSpace Repository to GitHub
Create a new repository and push:
git remote add origin https://github.com/yourusername/dspace-klutch.gitgit branch -M mastergit push -u origin masterStep 4: Deploy DSpace to Klutch.sh
- Navigate to klutch.sh/app
- Click "New Project" and select "Import from GitHub"
- Authorize Klutch.sh to access your GitHub repositories
- Select your DSpace repository
- Klutch.sh will automatically detect the Dockerfile
Step 5: Configure DSpace Traffic Settings
- In the project settings, select **HTTP** as the traffic type
- Set the internal port to **8080**
- Klutch.sh will automatically provision an HTTPS endpoint
Step 6: Add Persistent Storage for DSpace
DSpace requires persistent storage for content and logs:
- In your project settings, navigate to the "Storage" section
- Add a volume with mount path: `/dspace/assetstore` and size: `100GB` (for digital objects)
- Add a volume with mount path: `/dspace/log` and size: `10GB` (for logs)
- Add a volume with mount path: `/dspace/exports` and size: `20GB` (for batch exports)
Storage recommendations:
- Small repository (< 10,000 items): 50GB assetstore
- Medium repository (10,000-100,000 items): 200GB assetstore
- Large repository (100,000+ items): 500GB+ assetstore
Step 7: Configure DSpace Environment Variables
Add the following environment variables in Klutch.sh dashboard:
DB_HOST: Your PostgreSQL hostname (e.g.,postgres-app.klutch.sh)DB_PORT:8000(external TCP port)DB_USER:dspaceDB_PASSWORD: Your database passwordDB_ROOT_PASSWORD: Your root passwordSOLR_HOST: Your Solr hostname (e.g.,solr-app.klutch.sh)DSPACE_INSTALL_DIR:/dspaceJAVA_OPTS:-Xmx4096m -XX:MaxMetaspaceSize=1024m(adjust based on resources)
Step 8: Update Configuration Files
Before deploying, update local.cfg with your actual URLs:
# Update these valuesdspace.server.url = https://your-app.klutch.shdspace.ui.url = https://your-app.klutch.shdb.url = jdbc:postgresql://postgres-app.klutch.sh:8000/dspacesolr.server = https://solr-app.klutch.sh/solrCommit and push changes:
git add local.cfggit commit -m "Update configuration for Klutch.sh deployment"git pushStep 9: Deploy DSpace
- Review your configuration settings in Klutch.sh
- Click "Deploy" to start the deployment
- Monitor build logs for any errors
- Wait for initialization (first deployment takes 10-15 minutes)
- Once deployed, DSpace will be available at `your-app.klutch.sh`
Step 10: Initialize Database and Create Admin User
After first deployment, initialize the database:
- Access the DSpace container terminal in Klutch.sh
- Run the initialization script: ```bash bash /dspace/init-dspace-db.sh ```
- Create administrator account: ```bash /dspace/bin/dspace create-administrator -e admin@yourinstitution.edu -f Admin -l User -p your_secure_password -c en ```
Getting Started with DSpace
Initial Setup
After deployment, complete the initial configuration:
-
Access DSpace: Navigate to
https://your-app.klutch.sh -
Login as Administrator:
- Click “Log In” in the top navigation
- Enter administrator credentials
- You’ll be redirected to the admin dashboard
-
Configure Basic Settings:
- Navigate to “Administration” → “Edit Configuration”
- Update repository name and description
- Set contact information
- Configure institutional logo
-
Set Up Email Notifications:
- Verify SMTP settings in
local.cfg - Test email delivery
- Configure notification templates
- Verify SMTP settings in
Creating Community Structure
Organize your repository with communities and collections:
- Navigate to "Administration" → "Communities & Collections"
- Click "Create Top-Level Community"
- Configure community: - **Name**: "School of Engineering" - **Short Description**: Brief overview - **Introductory Text**: Detailed description - **Copyright Text**: Rights statement - **Logo**: Upload community logo (optional)
- Click "Create" to save community
- Within the community, create sub-communities or collections
Creating Collections
Add collections to organize items:
- Navigate to a community
- Click "Create Collection"
- Configure collection: - **Name**: "Department of Computer Science Theses" - **Short Description**: Collection overview - **Introductory Text**: Detailed information - **License**: Default submission license - **Provenance**: Provenance statement
- Configure submission settings: - Select input forms - Define workflow steps - Set access policies
- Click "Create" to save collection
Configuring Metadata Schemas
Customize metadata fields for your institution:
- Navigate to "Administration" → "Registries" → "Metadata Registry"
- View existing Dublin Core fields
- Add custom fields: - Click "Add Field" - Select schema (dc, dcterms, or custom) - Enter element name (e.g., "department") - Enter qualifier (optional, e.g., "sponsor") - Add scope note for guidance
- Save custom field
Common custom fields:
dc.contributor.advisor - Thesis advisordc.degree.name - Degree name (e.g., Ph.D.)dc.degree.level - Degree level (Masters, Doctoral)dc.degree.discipline - Field of studydc.degree.grantor - Degree-granting institutiondc.identifier.doi - Digital Object Identifierdc.identifier.orcid - ORCID iDConfiguring Submission Forms
Customize input forms for metadata collection:
- Edit `/dspace/config/submission-forms.xml`
- Define form fields for each collection type
- Example thesis submission form:
<form name="thesis"> <page number="1"> <field> <dc-schema>dc</dc-schema> <dc-element>title</dc-element> <dc-qualifier></dc-qualifier> <required>true</required> <label>Title</label> <input-type>onebox</input-type> <hint>Enter the full title of your thesis</hint> </field>
<field> <dc-schema>dc</dc-schema> <dc-element>contributor</dc-element> <dc-qualifier>author</dc-qualifier> <required>true</required> <label>Author</label> <input-type>name</input-type> <hint>Enter the author's name</hint> </field>
<field> <dc-schema>dc</dc-schema> <dc-element>contributor</dc-element> <dc-qualifier>advisor</dc-qualifier> <required>true</required> <label>Thesis Advisor</label> <input-type>name</input-type> <hint>Enter advisor's name</hint> </field>
<field> <dc-schema>dc</dc-schema> <dc-element>date</dc-element> <dc-qualifier>issued</dc-qualifier> <required>true</required> <label>Date of Issue</label> <input-type>date</input-type> <hint>Enter publication date</hint> </field>
<field> <dc-schema>dc</dc-schema> <dc-element>subject</dc-element> <dc-qualifier></dc-qualifier> <required>false</required> <label>Keywords</label> <input-type>twobox</input-type> <hint>Enter keywords, one per line</hint> <repeatable>true</repeatable> </field>
<field> <dc-schema>dc</dc-schema> <dc-element>description</dc-element> <dc-qualifier>abstract</dc-qualifier> <required>true</required> <label>Abstract</label> <input-type>textarea</input-type> <hint>Enter thesis abstract</hint> </field> </page></form>Submitting Your First Item
Test the submission workflow:
- Navigate to a collection
- Click "Submit to this Collection"
- Follow the submission steps: - **Select Type**: Choose item type (Article, Thesis, etc.) - **Describe**: Enter metadata - **Upload**: Add files (PDF, datasets, etc.) - **Verify**: Review submission - **License**: Accept distribution license - **Complete**: Submit for review or archive
- If workflow is enabled, submission goes to reviewers
- Otherwise, item is archived immediately
Configuring Workflows
Set up review processes for quality control:
- Edit `/dspace/config/workflow.xml`
- Define workflow steps for collections:
<workflow id="defaultWorkflow"> <step id="reviewstep"> <role name="reviewer"> <description>Reviewers for this step</description> </role> <actions> <action id="approve"> <description>Approve submission</description> <outcomes> <outcome id="approved"> <step>editstep</step> </outcome> </outcomes> </action> <action id="reject"> <description>Reject submission</description> <outcomes> <outcome id="rejected"> <step>reject</step> </outcome> </outcomes> </action> </actions> </step>
<step id="editstep"> <role name="editor"> <description>Editors for this step</description> </role> <actions> <action id="approve"> <description>Final approval</description> <outcomes> <outcome id="approved"> <step>archive</step> </outcome> </outcomes> </action> </actions> </step></workflow>Managing User Groups
Create groups for access control:
- Navigate to "Administration" → "Access Control" → "Groups"
- Click "Create New Group"
- Configure group: - **Name**: "Faculty Submitters" - **Description**: Group description
- Add members: - Search for users - Add to group
- Assign permissions to collections
Setting Access Policies
Control who can view and submit content:
- Navigate to collection settings
- Click "Authorization" tab
- Add policies: - **Action**: READ (view items) - **Group**: Anonymous or specific group - **Start Date**: Optional embargo start - **End Date**: Optional embargo end
- Add submission policies: - **Action**: ADD (submit items) - **Group**: Faculty Submitters
Configuring Embargo Periods
Set up temporary access restrictions:
- During submission, select "Access Conditions"
- Choose embargo type: - **Open Access**: Immediate public access - **Embargo**: Restricted until date - **Restricted**: Limited group access
- Set embargo date
- System automatically lifts embargo when date passes
Production Best Practices
Performance Optimization
- Java Memory Configuration: Adjust heap size based on repository size:
# For repositories with < 50,000 itemsJAVA_OPTS="-Xmx4g -XX:MaxMetaspaceSize=1g"
# For repositories with 50,000-200,000 itemsJAVA_OPTS="-Xmx8g -XX:MaxMetaspaceSize=2g"
# For repositories with 200,000+ itemsJAVA_OPTS="-Xmx16g -XX:MaxMetaspaceSize=4g"- Database Connection Pooling: Optimize PostgreSQL connections:
# In local.cfgdb.maxconnections = 30db.maxwait = 5000db.maxidle = 10- Solr Memory: Increase Solr heap for large indexes:
# In Dockerfile.solrENV SOLR_JAVA_MEM="-Xms2g -Xmx4g"- Assetstore Organization: Use multiple assetstores for better I/O:
# In local.cfgassetstore.dir = ${dspace.dir}/assetstoreassetstore.dir.1 = ${dspace.dir}/assetstore2assetstore.incoming = 1- Enable Caching: Configure content caching:
# Enable cachingcache.enabled = truecache.size = 100cache.name = org.dspace.contentSecurity Hardening
- Change Default Credentials: Immediately change admin password:
/dspace/bin/dspace user --modify --email admin@example.com --password new_secure_password- Secure Database Connection: Use SSL for database:
db.url = jdbc:postgresql://postgres-host:8000/dspace?ssl=true&sslfactory=org.postgresql.ssl.NonValidatingFactory- Configure CORS: Restrict API access:
# In local.cfgcors.allowed-origins = https://your-app.klutch.shcors.allowed-methods = GET, POST, PUT, DELETE, OPTIONScors.allowed-headers = *- Implement Rate Limiting: Prevent API abuse:
# Limit REST API requestsrest.ratelimit.enabled = truerest.ratelimit.limit = 100rest.ratelimit.period = 60- Enable Audit Logging: Track administrative actions:
audit.enabled = trueaudit.log = ${dspace.dir}/log/audit.logBackup Strategy
Implement comprehensive backups:
- Database Backup: Daily PostgreSQL dumps:
#!/bin/bashBACKUP_DIR="/backups/dspace_$(date +%Y%m%d_%H%M%S)"mkdir -p $BACKUP_DIR
# Backup databasepg_dump -h $DB_HOST -p $DB_PORT -U $DB_USER -F c dspace > $BACKUP_DIR/dspace_db.backup
# Compress backupgzip $BACKUP_DIR/dspace_db.backup
# Keep only last 30 daysfind /backups -name "dspace_*.backup.gz" -mtime +30 -delete
echo "Database backup complete: $BACKUP_DIR"- Assetstore Backup: Incremental file backups:
#!/bin/bashBACKUP_DIR="/backups/assetstore_$(date +%Y%m%d)"ASSETSTORE="/dspace/assetstore"
# Incremental backup using rsyncrsync -av --delete $ASSETSTORE/ $BACKUP_DIR/
echo "Assetstore backup complete: $BACKUP_DIR"- Configuration Backup: Version control for configs:
# Backup configuration filestar -czf config_backup_$(date +%Y%m%d).tar.gz /dspace/config/- Automated Backup Schedule:
# Add to crontab0 2 * * * /usr/local/bin/backup-dspace-db.sh0 3 * * 0 /usr/local/bin/backup-assetstore.shMaintenance Tasks
Schedule regular maintenance:
- Reindex Search: Weekly search index rebuild:
# Rebuild discovery index/dspace/bin/dspace index-discovery -b
# Or incremental update/dspace/bin/dspace index-discovery- Media Filter: Generate thumbnails and extract text:
# Process new items/dspace/bin/dspace filter-media
# Force reprocess all items/dspace/bin/dspace filter-media -f- Cleanup Tasks: Remove orphaned data:
# Clean up deleted items, searches, etc./dspace/bin/dspace cleanup- Statistics Processing: Generate usage statistics:
# Process statistics/dspace/bin/dspace stats-util -i
# Generate reports/dspace/bin/dspace stats-util -r- Checksum Verification: Verify bitstream integrity:
# Check all bitstreams/dspace/bin/dspace checker -l -p
# Check and report results/dspace/bin/dspace checker-emailer- Automated Maintenance Cron:
# Add to crontab0 1 * * * /dspace/bin/dspace cleanup0 2 * * 0 /dspace/bin/dspace index-discovery -b0 3 * * * /dspace/bin/dspace filter-media0 4 * * 0 /dspace/bin/dspace checker -lMonitoring and Logging
- Enable Detailed Logging:
<!-- In log4j2.xml --><Configuration> <Appenders> <RollingFile name="DSpaceLog" fileName="${dspace.dir}/log/dspace.log" filePattern="${dspace.dir}/log/dspace-%d{yyyy-MM-dd}.log"> <PatternLayout> <Pattern>%d{ISO8601} %-5p %c @ %m%n</Pattern> </PatternLayout> <Policies> <TimeBasedTriggeringPolicy /> </Policies> <DefaultRolloverStrategy max="30"/> </RollingFile> </Appenders>
<Loggers> <Root level="INFO"> <AppenderRef ref="DSpaceLog"/> </Root> </Loggers></Configuration>- Monitor Database Performance:
-- Active connectionsSELECT count(*) FROM pg_stat_activity WHERE datname = 'dspace';
-- Long-running queriesSELECT pid, now() - pg_stat_activity.query_start AS duration, queryFROM pg_stat_activityWHERE pg_stat_activity.query != '<IDLE>' AND now() - pg_stat_activity.query_start > interval '5 minutes';
-- Database sizeSELECT pg_size_pretty(pg_database_size('dspace'));- Health Check Script:
#!/bin/bashDSPACE_URL="https://your-app.klutch.sh"
# Check DSpace is respondingif ! curl -f -s -o /dev/null "$DSPACE_URL/server/api"; then echo "ERROR: DSpace not responding" exit 1fi
# Check Solrif ! curl -f -s -o /dev/null "http://localhost:8983/solr/search/admin/ping"; then echo "ERROR: Solr not responding" exit 1fi
# Check disk spaceDISK_USAGE=$(df -h /dspace/assetstore | awk 'NR==2 {print $5}' | sed 's/%//')if [ "$DISK_USAGE" -gt 85 ]; then echo "WARNING: Disk usage at ${DISK_USAGE}%"fi
echo "OK: DSpace is healthy"exit 0Resource Allocation
Recommended resources by repository size:
Small Repository (< 10,000 items):
- DSpace: 4GB RAM, 2 vCPU
- PostgreSQL: 2GB RAM, 2 vCPU
- Solr: 2GB RAM, 1 vCPU
- Storage: 100GB assetstore, 20GB database, 10GB Solr
Medium Repository (10,000-100,000 items):
- DSpace: 8GB RAM, 4 vCPU
- PostgreSQL: 4GB RAM, 2 vCPU
- Solr: 4GB RAM, 2 vCPU
- Storage: 500GB assetstore, 50GB database, 50GB Solr
Large Repository (100,000+ items):
- DSpace: 16GB RAM, 8 vCPU
- PostgreSQL: 8GB RAM, 4 vCPU
- Solr: 8GB RAM, 4 vCPU
- Storage: 2TB+ assetstore, 200GB database, 100GB Solr
Troubleshooting
Submission Upload Fails
Symptoms: File uploads fail during submission
Solutions:
- Check File Size Limit:
# In local.cfgupload.max = 536870912 # 512MB in bytes- Verify Disk Space:
df -h /dspace/assetstoredf -h /dspace/upload- Check Permissions:
chown -R dspace:dspace /dspace/assetstorechown -R dspace:dspace /dspace/upload- Increase Tomcat Timeout:
<!-- In server.xml --><Connector port="8080" connectionTimeout="60000" />Search Not Returning Results
Symptoms: Search returns no results or incomplete results
Solutions:
- Rebuild Search Index:
/dspace/bin/dspace index-discovery -b- Check Solr Status:
curl http://localhost:8983/solr/search/admin/ping- Verify Solr Configuration:
# Check Solr URL in local.cfgsolr.server = http://solr-host:8983/solr- Check Solr Logs:
tail -f /var/solr/logs/solr.logDatabase Connection Errors
Symptoms: “Unable to connect to database” errors
Solutions:
- Verify Database is Running:
psql -h $DB_HOST -p $DB_PORT -U $DB_USER -d dspace -c "SELECT 1"- Check Connection String:
# In local.cfgdb.url = jdbc:postgresql://postgres-host:8000/dspacedb.username = dspacedb.password = your_password- Test Network Connectivity:
telnet $DB_HOST 8000- Increase Connection Pool:
db.maxconnections = 50db.maxwait = 10000Items Not Displaying Properly
Symptoms: Item pages show errors or missing metadata
Solutions:
- Check Item Permissions:
/dspace/bin/dspace dsrun org.dspace.administer.ItemPolicies <item-id>- Verify Metadata:
/dspace/bin/dspace metadata-export -i <item-id>- Clear Cache:
rm -rf /dspace/var/cache/*- Check Logs:
tail -f /dspace/log/dspace.logEmbargo Not Working
Symptoms: Embargoed items are publicly accessible
Solutions:
- Check Resource Policies:
/dspace/bin/dspace dsrun org.dspace.embargo.EmbargoManager- Verify Embargo Configuration:
# In local.cfgembargo.field.terms = dc.embargo.termsembargo.field.lift = dc.date.available- Manually Set Embargo:
/dspace/bin/dspace embargo-setter -i <item-id> -d YYYY-MM-DDOAI-PMH Harvesting Issues
Symptoms: External systems cannot harvest metadata
Solutions:
- Test OAI-PMH Endpoint:
curl "https://your-app.klutch.sh/server/oai?verb=Identify"- Rebuild OAI Index:
/dspace/bin/dspace oai import/dspace/bin/dspace oai clean-cache- Check OAI Configuration:
oai.solr.url = ${solr.server}/oaioai.identifier.prefix = oai:your-app.klutch.sh:Advanced Configuration
Custom Themes
Customize the DSpace user interface:
- Create Custom Theme:
# Copy base themecp -r /dspace/webapps/xmlui/themes/Mirage2 /dspace/webapps/xmlui/themes/CustomTheme- Modify Theme Configuration:
<!-- In theme.xml --><theme name="CustomTheme" path="CustomTheme/"> <conffile>sitemap.xmap</conffile></theme>- Customize Styles:
/* In style.css */.header { background-color: #003366; color: white;}
.logo { max-width: 200px;}
.sidebar { background-color: #f5f5f5;}- Update Logo and Branding:
# Replace logocp your-logo.png /dspace/webapps/xmlui/themes/CustomTheme/images/logo.pngDOI Integration
Enable DOI minting for persistent identifiers:
- Configure DataCite or Crossref:
# In local.cfgidentifier.doi.prefix = 10.1234identifier.doi.namespaceseparator = /identifier.doi.user = your-datacite-usernameidentifier.doi.password = your-datacite-passwordidentifier.doi.datacentre = your-datacentre- Enable DOI Plugin:
plugin.sequence.org.dspace.identifier.IdentifierProvider = \ org.dspace.identifier.DOIIdentifierProvider- Mint DOIs:
# Mint DOI for item/dspace/bin/dspace doi-organiser -rORCID Integration
Connect authors with ORCID profiles:
-
Register for ORCID API Access:
- Get client ID and secret from ORCID.org
-
Configure ORCID:
# In local.cfgorcid.application-client-id = your-client-idorcid.application-client-secret = your-client-secretorcid.domain-url = https://orcid.org- Enable ORCID Authentication:
plugin.sequence.org.dspace.authenticate.AuthenticationMethod = \ org.dspace.authenticate.OAuthAuthenticationBatch Import/Export
Import large collections of items:
- Prepare Import Package:
import_package/├── collections/├── simple-archive/│ ├── item_001/│ │ ├── contents│ │ ├── dublin_core.xml│ │ ├── file1.pdf│ └── item_002/│ ├── contents│ ├── dublin_core.xml│ ├── file2.pdf- Create dublin_core.xml:
<?xml version="1.0" encoding="UTF-8"?><dublin_core schema="dc"> <dcvalue element="title" qualifier="">Sample Article Title</dcvalue> <dcvalue element="contributor" qualifier="author">Smith, John</dcvalue> <dcvalue element="date" qualifier="issued">2024-01-15</dcvalue> <dcvalue element="description" qualifier="abstract">This is the abstract...</dcvalue> <dcvalue element="subject">Computer Science</dcvalue> <dcvalue element="type">Article</dcvalue></dublin_core>- Create contents file:
file1.pdf bundle:ORIGINAL- Import Items:
/dspace/bin/dspace import -a -e admin@example.com -c 123456789/2 -s /import_package/simple-archive -m mapfile.txt- Export Items:
# Export collection/dspace/bin/dspace export -t COLLECTION -i 123456789/2 -d /exports/collection -n 1REST API Usage
Interact with DSpace programmatically:
- Authenticate:
# Get JWT tokenTOKEN=$(curl -X POST "https://your-app.klutch.sh/server/api/authn/login" \ -H "Content-Type: application/x-www-form-urlencoded" \ -d "user=admin@example.com&password=your_password" \ | jq -r '.token')- List Communities:
curl "https://your-app.klutch.sh/server/api/core/communities" \ -H "Authorization: Bearer $TOKEN"- Get Item Metadata:
curl "https://your-app.klutch.sh/server/api/core/items/{item-uuid}" \ -H "Authorization: Bearer $TOKEN"- Create Item via API:
import requestsimport json
api_url = "https://your-app.klutch.sh/server/api"token = "your-jwt-token"
headers = { "Authorization": f"Bearer {token}", "Content-Type": "application/json"}
# Create itemitem_data = { "name": "Test Item", "metadata": { "dc.title": [{"value": "Test Title"}], "dc.contributor.author": [{"value": "Smith, John"}], "dc.date.issued": [{"value": "2024-01-15"}] }}
response = requests.post( f"{api_url}/core/items", headers=headers, json=item_data)
print(response.json())Statistics and Reporting
Generate usage statistics:
- Enable Statistics:
# In local.cfgusage-statistics.dbfile = ${dspace.dir}/log/dspace-stats.dbsolr-statistics.server = ${solr.server}/statistics- Generate Reports:
# Generate monthly report/dspace/bin/dspace stats-util -r -m 2024-01
# Export statistics/dspace/bin/dspace stats-util -e /exports/stats.csv- Integrate Google Analytics:
google.analytics.key = UA-XXXXXXXX-XAdditional Resources
- Official DSpace Website
- DSpace 7 Documentation
- DSpace GitHub Repository
- DSpace REST API Documentation
- DSpace Community Forum
- Metadata Registry Documentation
- Klutch.sh Documentation
- Persistent Storage Guide
- Networking Configuration
Conclusion
DSpace provides a comprehensive, enterprise-grade solution for institutional repositories and digital asset management. By deploying on Klutch.sh, you benefit from automatic HTTPS, persistent storage, and simple Docker-based deployment while maintaining the robust preservation features and scalability that DSpace offers.
The platform’s proven track record with major universities and research institutions worldwide demonstrates its reliability and maturity. Features like flexible metadata schemas, configurable workflows, OAI-PMH harvesting, DOI integration, and comprehensive API access make DSpace the standard choice for organizations managing scholarly communications and digital collections.
Whether you’re launching a new institutional repository, managing research data, preserving cultural heritage materials, or building a digital library, DSpace scales to meet your needs. The system’s modular architecture allows you to customize every aspect from metadata schemas and submission forms to themes and access policies.
Start with the basic configuration outlined in this guide, then expand functionality through custom metadata fields, workflow customization, API integration, and advanced preservation features as your repository grows. Your digital content remains secure, discoverable, and preserved for the long term, while the open-source nature of DSpace ensures you have complete control over your institutional knowledge base.
Deploy DSpace today and join thousands of institutions worldwide in preserving and sharing digital scholarship for future generations.