Skip to content

Deploying Webarchive

Introduction

Webarchive is a self-hosted web archiving application that allows you to save and organize web pages for offline access. Unlike bookmark managers that only save links, Webarchive downloads and stores complete web pages, ensuring content is preserved even if the original source disappears.

The application provides a clean interface for managing your archived pages, with features like tagging, full-text search, and organized collections.

Key features of Webarchive include:

  • Full Page Archiving: Download complete web pages including images and styles
  • Screenshot Capture: Automatically capture screenshots of archived pages
  • PDF Generation: Create PDF versions of archived content
  • Full-Text Search: Search across all archived content
  • Tags and Collections: Organize archives with tags and collections
  • Reader Mode: Clean, distraction-free reading experience
  • Browser Extension: Archive pages directly from your browser
  • Multiple Archive Formats: Support for various archive formats
  • Public/Private Sharing: Share specific archives publicly
  • Import/Export: Backup and migrate your archives

This guide walks you through deploying Webarchive on Klutch.sh using Docker for personal web archiving.

Prerequisites

Before deploying Webarchive on Klutch.sh, ensure you have:

Repository Structure

Create a GitHub repository with the following structure:

webarchive-deploy/
├── Dockerfile
└── .dockerignore

Dockerfile

Create a Dockerfile in your repository:

FROM archivebox/archivebox:latest
# Web interface port
EXPOSE 8000
# The base image handles the entrypoint

Environment Variables

VariableRequiredDefaultDescription
ALLOWED_HOSTSNo*Allowed hostnames
SEARCH_BACKEND_ENGINENoripgrepSearch engine to use
MEDIA_MAX_SIZENo750mMaximum media file size
TIMEOUTNo60Archive timeout in seconds
CHECK_SSL_VALIDITYNotrueVerify SSL certificates

Deployment on Klutch.sh

  1. Push your Dockerfile to your GitHub repository.
  2. Log in to Klutch.sh and create a new project.
  3. Create a new app within your project and connect your GitHub repository containing the Dockerfile.
  4. Configure the deployment settings: - Select **HTTP** as the traffic type - Set the internal port to **8000**
  5. Add environment variables: - `ALLOWED_HOSTS`: Your app domain (e.g., `your-app.klutch.sh`)
  6. Attach persistent volumes: - Mount path: `/data` - Recommended size: 50 GB (or more depending on archiving needs) - Purpose: Archived pages, screenshots, and database
  7. Click **Deploy** and wait for the build to complete.
  8. Access your Webarchive instance at the provided URL and create your admin account.

Post-Deployment Configuration

After deployment:

  1. Create an admin account using the web interface
  2. Configure archiving preferences
  3. Install the browser extension if desired
  4. Start archiving web pages

Usage

Archiving Pages

Add pages to archive via:

  • Web interface: Paste URLs directly
  • Browser extension: One-click archiving
  • API: Programmatic archiving

Organizing Archives

  • Add tags to categorize archives
  • Create collections for related content
  • Use search to find archived content

Troubleshooting

Archiving Fails

Some sites may block archiving. Try disabling JavaScript or adjusting timeout settings.

Storage Running Out

Monitor disk usage and increase volume size as needed. Consider setting retention policies.

Additional Resources