Deploying SOSSE
Introduction
SOSSE (Selenium Open Source Search Engine) is a self-hosted web search engine that crawls, indexes, and searches websites while respecting your privacy. Unlike commercial search engines that track your queries and sell your data, SOSSE gives you complete control over your search infrastructure and the websites you index.
Built with Python and Django, SOSSE uses Selenium for JavaScript-rendered page crawling, making it capable of indexing modern single-page applications and dynamic websites that traditional crawlers struggle with. The application stores indexed content in PostgreSQL with full-text search capabilities.
Key highlights of SOSSE:
- Privacy-Focused: Your search queries never leave your server, ensuring complete privacy
- JavaScript Rendering: Selenium-based crawler handles modern JavaScript-heavy websites
- Full-Text Search: Powerful PostgreSQL-backed search with relevance ranking
- Configurable Crawling: Control crawl depth, frequency, and domain restrictions
- Web Interface: Clean Django-based interface for searching and administration
- Screenshot Capture: Optionally capture screenshots of indexed pages
- RSS Feed Support: Generate RSS feeds from search results
- Authentication: Built-in user authentication with admin capabilities
- Scheduled Crawling: Automatic recrawling to keep content fresh
- Open Source: Licensed under AGPLv3 with no tracking or telemetry
This guide walks through deploying SOSSE on Klutch.sh using Docker, configuring the search engine, and setting up your first crawl jobs.
Why Deploy SOSSE on Klutch.sh
Deploying SOSSE on Klutch.sh provides several advantages for running your own search engine:
Simplified Deployment: Klutch.sh automatically detects your Dockerfile and builds SOSSE without complex orchestration. Push to GitHub, and your search engine deploys automatically.
Persistent Storage: Attach persistent volumes for your PostgreSQL database, search indices, and crawler data. Your indexed content survives container restarts and redeployments.
HTTPS by Default: Klutch.sh provides automatic SSL certificates, ensuring secure access to your search engine from anywhere.
GitHub Integration: Connect your configuration repository directly from GitHub. Updates trigger automatic redeployments.
Scalable Resources: Allocate CPU and memory based on your indexing needs. Crawling JavaScript-heavy sites benefits from additional resources.
Environment Variable Management: Securely store database credentials and secret keys through Klutch.sh’s environment variable system.
Custom Domains: Assign a custom domain to your SOSSE instance for easy access.
Always-On Availability: Your search engine remains accessible 24/7 without managing your own infrastructure.
Prerequisites
Before deploying SOSSE on Klutch.sh, ensure you have:
- A Klutch.sh account
- A GitHub account with a repository for your SOSSE configuration
- Basic familiarity with Docker and containerization concepts
- A list of websites you want to index (optional but helpful for planning)
- (Optional) A custom domain for your SOSSE instance
Deploying SOSSE on Klutch.sh
- Select HTTP as the traffic type
- Set the internal port to 8000 (SOSSE’s default port)
Create a GitHub Repository
Create a new GitHub repository for your SOSSE deployment configuration. This repository will contain your Dockerfile and any custom settings.
Create Your Dockerfile
Create a Dockerfile in the root of your repository:
FROM biolds/sosse:latest
# Set environment variablesENV SOSSE_SECRET_KEY=${SOSSE_SECRET_KEY}ENV SOSSE_DB_HOST=${SOSSE_DB_HOST:-localhost}ENV SOSSE_DB_PORT=${SOSSE_DB_PORT:-5432}ENV SOSSE_DB_NAME=${SOSSE_DB_NAME:-sosse}ENV SOSSE_DB_USER=${SOSSE_DB_USER:-sosse}ENV SOSSE_DB_PASSWORD=${SOSSE_DB_PASSWORD}
# Expose the web interface portEXPOSE 8000
# Health checkHEALTHCHECK --interval=30s --timeout=10s --start-period=60s --retries=3 \ CMD curl -f http://localhost:8000/ || exit 1Push Your Repository to GitHub
Commit and push your Dockerfile to your GitHub repository.
Create a New Project on Klutch.sh
Navigate to the Klutch.sh dashboard and create a new project. Give it a descriptive name like “sosse” or “search-engine”.
Create a New App
Within your project, create a new app. Connect your GitHub account if you haven’t already, then select the repository containing your SOSSE Dockerfile.
Configure HTTP Traffic
In the deployment settings:
Set Environment Variables
Add the following environment variables:
| Variable | Value |
|---|---|
SOSSE_SECRET_KEY | A long random string for Django security |
SOSSE_DB_HOST | Your PostgreSQL database host |
SOSSE_DB_PORT | 5432 |
SOSSE_DB_NAME | sosse |
SOSSE_DB_USER | Your database username |
SOSSE_DB_PASSWORD | Your database password |
Attach Persistent Volumes
Add the following volumes:
| Mount Path | Recommended Size | Purpose |
|---|---|---|
/var/lib/sosse | 50 GB | Indexed content, screenshots, and crawler data |
/var/log/sosse | 5 GB | Application logs |
Deploy Your Application
Click Deploy to start the build process. Klutch.sh will build the container, attach volumes, and start SOSSE.
Access SOSSE
Once deployment completes, access your SOSSE instance at https://your-app-name.klutch.sh. Create an admin account and start configuring your crawl jobs.
Initial Configuration
Creating an Admin User
After deployment, create your administrator account through the web interface. Navigate to your SOSSE URL and follow the registration prompts.
Setting Up Your First Crawl
To index a website:
- Log in to the admin interface
- Navigate to Crawl Sources
- Add a new source with the URL you want to index
- Configure crawl depth and frequency
- Start the crawl job
Crawl Configuration Options
| Option | Description |
|---|---|
| URL | Starting URL for the crawl |
| Depth | How many links deep to follow |
| Frequency | How often to recrawl |
| Domain Restriction | Stay within the same domain or follow external links |
| JavaScript | Enable Selenium rendering for JavaScript sites |
Troubleshooting
Crawling Fails
- Verify the target website is accessible
- Check if JavaScript rendering is needed
- Review crawler logs for specific errors
Search Returns No Results
- Confirm crawl jobs have completed successfully
- Check that indexed content matches your search terms
- Verify PostgreSQL full-text search is functioning
Performance Issues
- Increase allocated resources for large indices
- Optimize crawl frequency to reduce load
- Consider limiting concurrent crawl threads
Additional Resources
Conclusion
Deploying SOSSE on Klutch.sh gives you a powerful, privacy-focused search engine with automatic builds, persistent storage, and secure HTTPS access. With SOSSE, you control what gets indexed and your search queries remain completely private. Whether you’re indexing your own websites, creating a custom search portal, or building a research tool, SOSSE on Klutch.sh provides the foundation for a reliable, self-hosted search solution.