Skip to content

Deploying sist2

Introduction

sist2 (Simple Incremental Search Tool 2) is a fast, cross-platform file system indexer and search engine. Written in C with a focus on performance, sist2 can index millions of documents and provide instant full-text search across your entire file collection.

Unlike traditional file managers, sist2 extracts content from documents, images, videos, and archives, making the actual contents searchable. It supports over 200 file formats including PDFs, Office documents, emails, images with OCR, and compressed archives.

Key highlights of sist2:

  • Fast Indexing: Index millions of files efficiently
  • Full-Text Search: Search inside document contents, not just filenames
  • Format Support: 200+ file formats including PDF, DOCX, images, videos
  • OCR Support: Extract text from images and scanned documents
  • Archive Handling: Index contents of ZIP, RAR, TAR, and other archives
  • Email Indexing: Parse and index email files (EML, MSG)
  • Thumbnail Generation: Visual previews for images and documents
  • Incremental Updates: Only re-index changed files
  • Elasticsearch Backend: Powerful search with advanced queries
  • Web Interface: Beautiful, responsive search interface
  • Portable Index: Index files can be moved between systems
  • Open Source: GPL-3.0 licensed

This guide walks through deploying sist2 on Klutch.sh using Docker, configuring indexing, and setting up the search interface for production use.

Why Deploy sist2 on Klutch.sh

Deploying sist2 on Klutch.sh provides several advantages for file search:

Simplified Deployment: Klutch.sh automatically detects your Dockerfile and builds sist2 without complex orchestration. Push to GitHub, and your search engine deploys automatically.

Persistent Storage: Attach persistent volumes for your files and search index. Your indexed data survives container restarts.

HTTPS by Default: Klutch.sh provides automatic SSL certificates for secure access to your search interface.

GitHub Integration: Connect your configuration repository directly from GitHub. Updates trigger automatic redeployments.

Scalable Resources: Allocate resources based on your index size and search volume.

Custom Domains: Assign a custom domain for convenient access to your search interface.

Always-On Availability: Your search engine runs 24/7 for instant file discovery.

Prerequisites

Before deploying sist2 on Klutch.sh, ensure you have:

  • A Klutch.sh account
  • A GitHub account with a repository for your sist2 configuration
  • Basic familiarity with Docker and search concepts
  • An Elasticsearch instance (can be provisioned separately)
  • Files you want to index
  • (Optional) A custom domain for your search interface

Understanding sist2 Architecture

sist2 operates as a multi-component system:

Indexer: Scans file systems and extracts content, metadata, and thumbnails.

Index File: A portable SQLite database containing the indexed data.

Elasticsearch: Stores the search index for fast full-text queries.

Web Server: Serves the search interface and handles queries.

Content Extractors: Libraries for parsing different file formats (PDF, Office, images, etc.).

Preparing Your Repository

To deploy sist2 on Klutch.sh, create a GitHub repository with your configuration.

Repository Structure

sist2-deploy/
├── Dockerfile
├── .dockerignore
└── README.md

Creating the Dockerfile

Create a Dockerfile in the root of your repository:

FROM simon987/sist2:latest
# Create directories
RUN mkdir -p /data /index /files
# The application uses command-line arguments for configuration
# Web interface runs on port 4090 by default
EXPOSE 4090

Creating the .dockerignore File

Create a .dockerignore file:

.git
.github
*.md
LICENSE
.gitignore
*.log
.DS_Store

Environment Variables Reference

sist2 primarily uses command-line arguments rather than environment variables:

ArgumentDescription
--es-urlElasticsearch URL
--bindWeb server bind address
--indexPath to index file
--authEnable authentication
--threadsNumber of indexing threads

Deploying sist2 on Klutch.sh

Once your repository is prepared, follow these steps to deploy sist2:

    Set Up Elasticsearch

    sist2 requires Elasticsearch. Provision an Elasticsearch instance:

    • Use a managed Elasticsearch service
    • Or deploy Elasticsearch alongside sist2

    Push Your Repository to GitHub

    Initialize your repository and push to GitHub:

    Terminal window
    git init
    git add Dockerfile .dockerignore README.md
    git commit -m "Initial sist2 deployment configuration"
    git remote add origin https://github.com/yourusername/sist2-deploy.git
    git push -u origin main

    Create a New Project on Klutch.sh

    Navigate to the Klutch.sh dashboard and create a new project. Give it a descriptive name like “sist2” or “file-search”.

    Create a New App

    Within your project, create a new app. Connect your GitHub account if you haven’t already, then select the repository containing your sist2 Dockerfile.

    Configure HTTP Traffic

    In the deployment settings:

    • Select HTTP as the traffic type
    • Set the internal port to 4090

    Attach Persistent Volumes

    Add volumes for your files and index:

    Mount PathRecommended SizePurpose
    /files500+ GBFiles to be indexed (adjust based on needs)
    /index50 GBIndex database and thumbnails

    Deploy Your Application

    Click Deploy to start the build process. Klutch.sh will:

    • Build the container image
    • Attach the persistent volumes
    • Start the sist2 container
    • Provision an HTTPS certificate

    Create Initial Index

    After deployment, create your first index by scanning files:

    Terminal window
    sist2 scan /files --output /index/myindex.sist2

    Push to Elasticsearch

    Push the index to Elasticsearch:

    Terminal window
    sist2 index /index/myindex.sist2 --es-url http://elasticsearch:9200

    Start Web Interface

    Run the web interface:

    Terminal window
    sist2 web /index/myindex.sist2 --es-url http://elasticsearch:9200 --bind 0.0.0.0:4090

    Access sist2

    Once running, access your search interface at https://your-app-name.klutch.sh.

Indexing Files

Basic Scan

Create an index of your files:

Terminal window
sist2 scan /files -o /index/documents.sist2

Scan Options

Configure indexing behavior:

Terminal window
sist2 scan /files \
--output /index/documents.sist2 \
--threads 4 \
--content-size 32000000 \
--thumbnail-size 500 \
--ocr tesseract
OptionDescription
--threadsNumber of parallel threads
--content-sizeMax content to extract per file
--thumbnail-sizeThumbnail dimension
--ocrEnable OCR for images
--archiveHandle archive formats
--incrementalOnly index new/changed files

Incremental Updates

Update an existing index:

Terminal window
sist2 scan /files \
--output /index/documents.sist2 \
--incremental /index/documents.sist2

Archive Support

Index inside compressed files:

Terminal window
sist2 scan /files \
--output /index/documents.sist2 \
--archive recurse

Search Interface Features

Search document contents with queries like:

  • Simple terms: budget report
  • Phrases: "quarterly earnings"
  • Wildcards: document*.pdf
  • Boolean: meeting AND notes

Filters

Narrow results by:

  • File type
  • Date range
  • File size
  • Path location

Thumbnails

Visual previews show:

  • Document first pages
  • Image thumbnails
  • Video frames

File Preview

Click files to view:

  • Document content
  • Image previews
  • Metadata details

Production Best Practices

Indexing Strategy

  • Scheduled Scans: Run incremental scans regularly
  • Off-Peak Hours: Schedule heavy indexing during low-usage times
  • Partition Large Collections: Create multiple indexes for very large file sets

Performance Optimization

  • Elasticsearch Tuning: Configure for your index size
  • Thread Count: Match to available CPU cores
  • Memory Allocation: Ensure adequate memory for indexing
  • SSD Storage: Use fast storage for index files

Security Recommendations

  • Access Control: Enable authentication for sensitive files
  • Network Security: Restrict access to Elasticsearch
  • File Permissions: Ensure sist2 has read access to files

Backup Strategy

  1. Index Files: Back up .sist2 index files
  2. Elasticsearch Snapshots: Regular Elasticsearch backups
  3. Source Files: Ensure original files are backed up

Troubleshooting Common Issues

Indexing Failures

Symptoms: Scan fails or skips files.

Solutions:

  • Check file permissions
  • Verify file format is supported
  • Review logs for specific errors
  • Ensure sufficient disk space

Search Not Returning Results

Symptoms: Searches return empty or incomplete results.

Solutions:

  • Verify index was pushed to Elasticsearch
  • Check Elasticsearch connectivity
  • Ensure web server is using correct index
  • Re-index if necessary

Slow Search Performance

Symptoms: Queries take a long time.

Solutions:

  • Check Elasticsearch resource allocation
  • Optimize Elasticsearch indices
  • Consider sharding for large indexes
  • Review query complexity

Additional Resources

Conclusion

Deploying sist2 on Klutch.sh gives you a powerful, self-hosted file search engine with automatic builds, persistent storage, and secure HTTPS access. The combination of sist2’s fast indexing and Klutch.sh’s deployment simplicity means you can search through millions of files without complex infrastructure.

With full-text search, OCR support, and archive handling, sist2 makes your entire file collection instantly searchable. The web interface provides a modern, intuitive search experience with thumbnails and previews.

Whether you’re managing document archives, searching through photo libraries, or organizing large file collections, sist2 on Klutch.sh provides the reliable foundation for powerful file discovery.