Deploying sist2
Introduction
sist2 (Simple Incremental Search Tool 2) is a fast, cross-platform file system indexer and search engine. Written in C with a focus on performance, sist2 can index millions of documents and provide instant full-text search across your entire file collection.
Unlike traditional file managers, sist2 extracts content from documents, images, videos, and archives, making the actual contents searchable. It supports over 200 file formats including PDFs, Office documents, emails, images with OCR, and compressed archives.
Key highlights of sist2:
- Fast Indexing: Index millions of files efficiently
- Full-Text Search: Search inside document contents, not just filenames
- Format Support: 200+ file formats including PDF, DOCX, images, videos
- OCR Support: Extract text from images and scanned documents
- Archive Handling: Index contents of ZIP, RAR, TAR, and other archives
- Email Indexing: Parse and index email files (EML, MSG)
- Thumbnail Generation: Visual previews for images and documents
- Incremental Updates: Only re-index changed files
- Elasticsearch Backend: Powerful search with advanced queries
- Web Interface: Beautiful, responsive search interface
- Portable Index: Index files can be moved between systems
- Open Source: GPL-3.0 licensed
This guide walks through deploying sist2 on Klutch.sh using Docker, configuring indexing, and setting up the search interface for production use.
Why Deploy sist2 on Klutch.sh
Deploying sist2 on Klutch.sh provides several advantages for file search:
Simplified Deployment: Klutch.sh automatically detects your Dockerfile and builds sist2 without complex orchestration. Push to GitHub, and your search engine deploys automatically.
Persistent Storage: Attach persistent volumes for your files and search index. Your indexed data survives container restarts.
HTTPS by Default: Klutch.sh provides automatic SSL certificates for secure access to your search interface.
GitHub Integration: Connect your configuration repository directly from GitHub. Updates trigger automatic redeployments.
Scalable Resources: Allocate resources based on your index size and search volume.
Custom Domains: Assign a custom domain for convenient access to your search interface.
Always-On Availability: Your search engine runs 24/7 for instant file discovery.
Prerequisites
Before deploying sist2 on Klutch.sh, ensure you have:
- A Klutch.sh account
- A GitHub account with a repository for your sist2 configuration
- Basic familiarity with Docker and search concepts
- An Elasticsearch instance (can be provisioned separately)
- Files you want to index
- (Optional) A custom domain for your search interface
Understanding sist2 Architecture
sist2 operates as a multi-component system:
Indexer: Scans file systems and extracts content, metadata, and thumbnails.
Index File: A portable SQLite database containing the indexed data.
Elasticsearch: Stores the search index for fast full-text queries.
Web Server: Serves the search interface and handles queries.
Content Extractors: Libraries for parsing different file formats (PDF, Office, images, etc.).
Preparing Your Repository
To deploy sist2 on Klutch.sh, create a GitHub repository with your configuration.
Repository Structure
sist2-deploy/├── Dockerfile├── .dockerignore└── README.mdCreating the Dockerfile
Create a Dockerfile in the root of your repository:
FROM simon987/sist2:latest
# Create directoriesRUN mkdir -p /data /index /files
# The application uses command-line arguments for configuration# Web interface runs on port 4090 by default
EXPOSE 4090Creating the .dockerignore File
Create a .dockerignore file:
.git.github*.mdLICENSE.gitignore*.log.DS_StoreEnvironment Variables Reference
sist2 primarily uses command-line arguments rather than environment variables:
| Argument | Description |
|---|---|
--es-url | Elasticsearch URL |
--bind | Web server bind address |
--index | Path to index file |
--auth | Enable authentication |
--threads | Number of indexing threads |
Deploying sist2 on Klutch.sh
Once your repository is prepared, follow these steps to deploy sist2:
- Use a managed Elasticsearch service
- Or deploy Elasticsearch alongside sist2
- Select HTTP as the traffic type
- Set the internal port to 4090
- Build the container image
- Attach the persistent volumes
- Start the sist2 container
- Provision an HTTPS certificate
Set Up Elasticsearch
sist2 requires Elasticsearch. Provision an Elasticsearch instance:
Push Your Repository to GitHub
Initialize your repository and push to GitHub:
git initgit add Dockerfile .dockerignore README.mdgit commit -m "Initial sist2 deployment configuration"git remote add origin https://github.com/yourusername/sist2-deploy.gitgit push -u origin mainCreate a New Project on Klutch.sh
Navigate to the Klutch.sh dashboard and create a new project. Give it a descriptive name like “sist2” or “file-search”.
Create a New App
Within your project, create a new app. Connect your GitHub account if you haven’t already, then select the repository containing your sist2 Dockerfile.
Configure HTTP Traffic
In the deployment settings:
Attach Persistent Volumes
Add volumes for your files and index:
| Mount Path | Recommended Size | Purpose |
|---|---|---|
/files | 500+ GB | Files to be indexed (adjust based on needs) |
/index | 50 GB | Index database and thumbnails |
Deploy Your Application
Click Deploy to start the build process. Klutch.sh will:
Create Initial Index
After deployment, create your first index by scanning files:
sist2 scan /files --output /index/myindex.sist2Push to Elasticsearch
Push the index to Elasticsearch:
sist2 index /index/myindex.sist2 --es-url http://elasticsearch:9200Start Web Interface
Run the web interface:
sist2 web /index/myindex.sist2 --es-url http://elasticsearch:9200 --bind 0.0.0.0:4090Access sist2
Once running, access your search interface at https://your-app-name.klutch.sh.
Indexing Files
Basic Scan
Create an index of your files:
sist2 scan /files -o /index/documents.sist2Scan Options
Configure indexing behavior:
sist2 scan /files \ --output /index/documents.sist2 \ --threads 4 \ --content-size 32000000 \ --thumbnail-size 500 \ --ocr tesseract| Option | Description |
|---|---|
--threads | Number of parallel threads |
--content-size | Max content to extract per file |
--thumbnail-size | Thumbnail dimension |
--ocr | Enable OCR for images |
--archive | Handle archive formats |
--incremental | Only index new/changed files |
Incremental Updates
Update an existing index:
sist2 scan /files \ --output /index/documents.sist2 \ --incremental /index/documents.sist2Archive Support
Index inside compressed files:
sist2 scan /files \ --output /index/documents.sist2 \ --archive recurseSearch Interface Features
Full-Text Search
Search document contents with queries like:
- Simple terms:
budget report - Phrases:
"quarterly earnings" - Wildcards:
document*.pdf - Boolean:
meeting AND notes
Filters
Narrow results by:
- File type
- Date range
- File size
- Path location
Thumbnails
Visual previews show:
- Document first pages
- Image thumbnails
- Video frames
File Preview
Click files to view:
- Document content
- Image previews
- Metadata details
Production Best Practices
Indexing Strategy
- Scheduled Scans: Run incremental scans regularly
- Off-Peak Hours: Schedule heavy indexing during low-usage times
- Partition Large Collections: Create multiple indexes for very large file sets
Performance Optimization
- Elasticsearch Tuning: Configure for your index size
- Thread Count: Match to available CPU cores
- Memory Allocation: Ensure adequate memory for indexing
- SSD Storage: Use fast storage for index files
Security Recommendations
- Access Control: Enable authentication for sensitive files
- Network Security: Restrict access to Elasticsearch
- File Permissions: Ensure sist2 has read access to files
Backup Strategy
- Index Files: Back up
.sist2index files - Elasticsearch Snapshots: Regular Elasticsearch backups
- Source Files: Ensure original files are backed up
Troubleshooting Common Issues
Indexing Failures
Symptoms: Scan fails or skips files.
Solutions:
- Check file permissions
- Verify file format is supported
- Review logs for specific errors
- Ensure sufficient disk space
Search Not Returning Results
Symptoms: Searches return empty or incomplete results.
Solutions:
- Verify index was pushed to Elasticsearch
- Check Elasticsearch connectivity
- Ensure web server is using correct index
- Re-index if necessary
Slow Search Performance
Symptoms: Queries take a long time.
Solutions:
- Check Elasticsearch resource allocation
- Optimize Elasticsearch indices
- Consider sharding for large indexes
- Review query complexity
Additional Resources
- sist2 GitHub Repository
- sist2 Wiki
- Supported File Formats
- Klutch.sh Persistent Volumes
- Klutch.sh Deployments
Conclusion
Deploying sist2 on Klutch.sh gives you a powerful, self-hosted file search engine with automatic builds, persistent storage, and secure HTTPS access. The combination of sist2’s fast indexing and Klutch.sh’s deployment simplicity means you can search through millions of files without complex infrastructure.
With full-text search, OCR support, and archive handling, sist2 makes your entire file collection instantly searchable. The web interface provides a modern, intuitive search experience with thumbnails and previews.
Whether you’re managing document archives, searching through photo libraries, or organizing large file collections, sist2 on Klutch.sh provides the reliable foundation for powerful file discovery.