Deploying Webarchive
Introduction
Webarchive is a self-hosted web archiving application that allows you to save and organize web pages for offline access. Unlike bookmark managers that only save links, Webarchive downloads and stores complete web pages, ensuring content is preserved even if the original source disappears.
The application provides a clean interface for managing your archived pages, with features like tagging, full-text search, and organized collections.
Key features of Webarchive include:
- Full Page Archiving: Download complete web pages including images and styles
- Screenshot Capture: Automatically capture screenshots of archived pages
- PDF Generation: Create PDF versions of archived content
- Full-Text Search: Search across all archived content
- Tags and Collections: Organize archives with tags and collections
- Reader Mode: Clean, distraction-free reading experience
- Browser Extension: Archive pages directly from your browser
- Multiple Archive Formats: Support for various archive formats
- Public/Private Sharing: Share specific archives publicly
- Import/Export: Backup and migrate your archives
This guide walks you through deploying Webarchive on Klutch.sh using Docker for personal web archiving.
Prerequisites
Before deploying Webarchive on Klutch.sh, ensure you have:
- A Klutch.sh account
- A GitHub account with a repository for your configuration
- Basic familiarity with Docker concepts
Repository Structure
Create a GitHub repository with the following structure:
webarchive-deploy/├── Dockerfile└── .dockerignoreDockerfile
Create a Dockerfile in your repository:
FROM archivebox/archivebox:latest
# Web interface portEXPOSE 8000
# The base image handles the entrypointEnvironment Variables
| Variable | Required | Default | Description |
|---|---|---|---|
ALLOWED_HOSTS | No | * | Allowed hostnames |
SEARCH_BACKEND_ENGINE | No | ripgrep | Search engine to use |
MEDIA_MAX_SIZE | No | 750m | Maximum media file size |
TIMEOUT | No | 60 | Archive timeout in seconds |
CHECK_SSL_VALIDITY | No | true | Verify SSL certificates |
Deployment on Klutch.sh
- Push your Dockerfile to your GitHub repository.
- Log in to Klutch.sh and create a new project.
- Create a new app within your project and connect your GitHub repository containing the Dockerfile.
- Configure the deployment settings: - Select **HTTP** as the traffic type - Set the internal port to **8000**
- Add environment variables: - `ALLOWED_HOSTS`: Your app domain (e.g., `your-app.klutch.sh`)
- Attach persistent volumes: - Mount path: `/data` - Recommended size: 50 GB (or more depending on archiving needs) - Purpose: Archived pages, screenshots, and database
- Click **Deploy** and wait for the build to complete.
- Access your Webarchive instance at the provided URL and create your admin account.
Post-Deployment Configuration
After deployment:
- Create an admin account using the web interface
- Configure archiving preferences
- Install the browser extension if desired
- Start archiving web pages
Usage
Archiving Pages
Add pages to archive via:
- Web interface: Paste URLs directly
- Browser extension: One-click archiving
- API: Programmatic archiving
Organizing Archives
- Add tags to categorize archives
- Create collections for related content
- Use search to find archived content
Troubleshooting
Archiving Fails
Some sites may block archiving. Try disabling JavaScript or adjusting timeout settings.
Storage Running Out
Monitor disk usage and increase volume size as needed. Consider setting retention policies.