Deploying Fess

Introduction

Fess is a powerful, open-source enterprise search server that makes it incredibly easy to build and deploy search functionality across websites, file systems, databases, and various data sources. Built on top of Elasticsearch/OpenSearch, Fess provides a user-friendly web interface for crawling, indexing, and searching content without requiring deep technical knowledge of search engines.

Whether you’re building an internal knowledge base, creating a customer-facing search portal, or implementing full-text search for your organization, Fess simplifies the complexity of enterprise search with features like:

Web Crawling: Automatically discover and index content from websites and web applications
File System Indexing: Search through documents in local and network file systems (PDF, Word, Excel, PowerPoint, and more)
Database Integration: Index and search data from relational databases
User Authentication: Built-in support for LDAP, Active Directory, and SSO
Search Relevance Tuning: Powerful ranking and relevance configuration options
Multi-language Support: Built-in support for Japanese, English, Chinese, Korean, and more
RESTful API: Programmatic access to search functionality
Role-based Access Control: Secure search results based on user permissions

Deploying Fess on Klutch.sh gives you a production-ready search platform with automatic HTTPS, persistent storage for your search indexes, and easy configuration through environment variables.

What You’ll Learn

How to deploy Fess with a Dockerfile on Klutch.sh
Setting up Elasticsearch/OpenSearch as the backend search engine
Configuring persistent storage for crawled data and search indexes
Implementing environment variables for production deployment
Best practices for security, performance, and scaling

Prerequisites

Before you begin, ensure you have:

A Klutch.sh account
A GitHub account with a repository for your Fess project
An Elasticsearch or OpenSearch instance (you can deploy one on Klutch.sh or use a managed service)
Basic familiarity with Docker and search engine concepts
(Optional) Familiarity with web crawling and search configuration

Understanding Fess Architecture

Fess consists of several key components:

Fess Application Server: The main web application that provides the admin interface and search frontend
Search Engine Backend: Elasticsearch or OpenSearch for storing and querying search indexes
Crawler: Background job system that crawls and indexes content from various sources
Storage Layer: Persistent storage for configuration, logs, and temporary files

The application runs on port 8080 by default and communicates with Elasticsearch/OpenSearch to store and retrieve search data.

Step 1: Prepare Your GitHub Repository

Create a new GitHub repository for your Fess deployment or use an existing repository.
Create a Dockerfile in the root of your repository with the following content:

FROM codelibs/fess:14.14

# Set the working directory
WORKDIR /opt/fess

# Expose the Fess web interface port
EXPOSE 8080

# The Fess image includes all necessary configurations
# Environment variables will be set through Klutch.sh dashboard
# Data will be persisted to /var/lib/fess and /opt/fess/logs

Note: This uses Fess version 14.14. You can check Docker Hub for the latest version tags.

(Optional) Create a .dockerignore file to exclude unnecessary files:

.git
.github
*.md
README.md
.env
.env.local
docker-compose.yml

Create a README.md file with basic information about your deployment:

# Fess Enterprise Search Deployment

This repository contains the configuration for deploying Fess on Klutch.sh.

## Environment Variables

The following environment variables are required:

- `FESS_DICTIONARY_PATH`: Path to dictionary files
- `ES_HTTP_URL`: Elasticsearch/OpenSearch HTTP endpoint
- `FESS_ADMIN_PASSWORD`: Admin user password

See the Klutch.sh dashboard for the full list of configured variables.

Commit and push your changes to GitHub:

git add Dockerfile .dockerignore README.md
git commit -m "Add Dockerfile for Fess deployment on Klutch.sh"
git push origin main

Step 2: Deploy Elasticsearch or OpenSearch

Fess requires Elasticsearch or OpenSearch as its backend. You have two options:

Option A: Use a Managed Service

Use a managed Elasticsearch/OpenSearch service like:

AWS OpenSearch Service
Elastic Cloud
Aiven for OpenSearch

Make note of the HTTP endpoint URL, username, and password.

Option B: Deploy on Klutch.sh

You can deploy Elasticsearch or OpenSearch as a separate app on Klutch.sh:

Create an elasticsearch-dockerfile in a separate repository:

FROM docker.elastic.co/elasticsearch/elasticsearch:8.11.3

# Disable security for simplified setup (enable in production)
ENV discovery.type=single-node
ENV xpack.security.enabled=false
ENV ES_JAVA_OPTS="-Xms512m -Xmx512m"

EXPOSE 9200 9300

Important considerations for Elasticsearch/OpenSearch:

Use TCP traffic type in Klutch.sh
Set internal port to 9200
Attach a persistent volume to /usr/share/elasticsearch/data
Allocate at least 2GB RAM and 10GB storage

Step 3: Create Your App on Klutch.sh

Log in to Klutch.sh and navigate to the dashboard.
Create a new project (if you don’t have one already) by clicking “New Project” and providing a project name like “Enterprise Search”.
Create a new app within your project by clicking “New App”.
Connect your GitHub repository by selecting it from the list of available repositories.
Configure the build settings:
- Klutch.sh will automatically detect the Dockerfile in your repository root
- The build will use this Dockerfile automatically
Set the internal port to 8080 (Fess’s default port). This is the port that traffic will be routed to within the container.
Select HTTP traffic for the app’s traffic type since Fess serves a web interface.

Step 4: Configure Persistent Storage

Fess requires persistent storage to retain your search configuration, crawled data, and logs across deployments.

In your app settings, navigate to the “Volumes” section.
Add the first persistent volume for Fess data:
- Mount Path: /var/lib/fess
- Size: Start with at least 20 GB (adjust based on the amount of content you’ll be indexing)
Add a second persistent volume for logs:
- Mount Path: /opt/fess/logs
- Size: 5 GB is usually sufficient for logs
Save the volume configuration. This ensures all your crawled data, configuration, and logs persist even when the container is restarted or redeployed.

The persistent volumes store:

/var/lib/fess: Fess configuration, crawl schedules, job queues, and temporary files
/opt/fess/logs: Application logs, error logs, and crawl logs

For more details on managing persistent storage, see the Volumes Guide.

Step 5: Configure Environment Variables

Fess requires several environment variables to connect to Elasticsearch/OpenSearch and configure its behavior.

In your app settings, navigate to the “Environment Variables” section.
Add the following required Elasticsearch/OpenSearch variables:

# Elasticsearch/OpenSearch connection (required)
ES_HTTP_URL=http://your-elasticsearch.klutch.sh:8000
ES_TRANSPORT_URL=your-elasticsearch.klutch.sh:8000

# If using authentication
ES_HTTP_USERNAME=elastic
ES_HTTP_PASSWORD=your-elasticsearch-password

Note: If you deployed Elasticsearch on Klutch.sh using TCP traffic, the external port will be 8000. Replace your-elasticsearch.klutch.sh with your actual Elasticsearch app URL.

Add Fess-specific configuration variables:

# Admin credentials (required - change these!)
FESS_ADMIN_PASSWORD=your-strong-admin-password

# Java heap size (adjust based on your instance size)
FESS_JAVA_OPTS=-Xms512m -Xmx1g

# Dictionary path (for language processing)
FESS_DICTIONARY_PATH=/opt/fess/app/WEB-INF/classes/fess_dict

# Logging level (optional)
FESS_LOG_LEVEL=info

# Session timeout in seconds (optional, default is 3600)
FESS_SESSION_TIMEOUT=7200

# Max upload size in bytes (optional, default is 4MB)
FESS_MAX_UPLOAD_SIZE=10485760

Add optional crawler configuration:

# Number of crawler threads
FESS_CRAWLER_THREADS=5

# Crawler user agent
FESS_CRAWLER_USER_AGENT=Fess/14.14

# Maximum depth for web crawling
FESS_CRAWLER_MAX_DEPTH=3

# Crawl interval in milliseconds
FESS_CRAWLER_INTERVAL=1000

Add optional search result configuration:

# Number of search results per page
FESS_SEARCH_PAGE_SIZE=20

# Default search timeout in milliseconds
FESS_SEARCH_TIMEOUT=30000

# Enable query suggestions
FESS_SEARCH_SUGGEST=true

Mark sensitive values as secrets in the Klutch.sh UI (passwords, API keys) to prevent them from appearing in logs.

Important Security Notes:

Never commit passwords or secrets to your repository
Always use strong, randomly generated passwords for FESS_ADMIN_PASSWORD
Use Klutch.sh environment variables for all sensitive data
Consider enabling Elasticsearch/OpenSearch authentication in production

Step 6: Deploy Your Application

Review your configuration to ensure all settings are correct:
- Dockerfile is detected
- Internal port is set to 8080
- Persistent volumes are mounted to /var/lib/fess and /opt/fess/logs
- Environment variables are configured with proper Elasticsearch/OpenSearch connection details
- Traffic type is set to HTTP
Click “Deploy” to start the build and deployment process.
Monitor the build logs to ensure the deployment completes successfully. The initial build typically takes 3-5 minutes.
Wait for the deployment to complete. Once done, you’ll see your app URL (e.g., https://example-app.klutch.sh).

Step 7: Initial Setup and Configuration

Access your Fess instance by navigating to your app URL (e.g., https://example-app.klutch.sh).
Log in to the admin interface:
- Click on the “Admin” link in the top right
- Default username: admin
- Password: The value you set for FESS_ADMIN_PASSWORD
- URL: https://example-app.klutch.sh/admin
Configure your first web crawler:
- Navigate to Crawler > Web
- Click “Create New”
- Enter a name for your crawler (e.g., “My Website”)
- Enter the URL to crawl (e.g., https://www.example.com)
- Configure crawl settings:
  - Max Depth: How deep to follow links (default: 3)
  - Max Access Count: Maximum pages to crawl (default: 1000)
  - Interval: Delay between requests in milliseconds (default: 1000)
- Click “Create”
Start the crawler:
- Go to Scheduler > Default Crawler
- Click “Start Now” to begin crawling immediately
- Or configure a schedule for automatic crawling
Wait for indexing to complete:
- Monitor crawl progress in System Info > Crawling Info
- Check for errors in the logs if needed
Test your search:
- Return to the home page (https://example-app.klutch.sh)
- Enter a search query related to the content you crawled
- Verify that search results are displayed correctly

Getting Started: Sample Usage

Here are common tasks and code examples for working with Fess:

Basic Web Search Interface

Access the search interface at your app URL:

https://example-app.klutch.sh

Enter queries in the search box to find indexed content.

REST API Search

Fess provides a JSON API for programmatic search access:

# Basic search query
curl "https://example-app.klutch.sh/json/?q=search+term"

# Search with filters
curl "https://example-app.klutch.sh/json/?q=fess&num=20&start=0"

# Get search suggestions
curl "https://example-app.klutch.sh/suggest/?q=fe"

JavaScript Integration

Embed search results in your web application:

<!DOCTYPE html>
<html>
<head>
    <title>Fess Search Integration</title>
    <script src="https://code.jquery.com/jquery-3.6.0.min.js"></script>
</head>
<body>
    <input type="text" id="searchQuery" placeholder="Search...">
    <button onclick="performSearch()">Search</button>
    <div id="results"></div>

    <script>
    function performSearch() {
        const query = document.getElementById('searchQuery').value;
        const apiUrl = `https://example-app.klutch.sh/json/?q=${encodeURIComponent(query)}`;

        fetch(apiUrl)
            .then(response => response.json())
            .then(data => {
                let html = '<h2>Search Results</h2>';
                data.response.result.forEach(item => {
                    html += `
                        <div class="result">
                            <h3><a href="${item.url}" target="_blank">${item.title}</a></h3>
                            <p>${item.content_description}</p>
                            <small>${item.url}</small>
                        </div>
                    `;
                });
                document.getElementById('results').innerHTML = html;
            })
            .catch(error => console.error('Search error:', error));
    }
    </script>
</body>
</html>

Python API Client

Use the Fess JSON API from Python:

import requests

class FessClient:
    def __init__(self, base_url):
        self.base_url = base_url
        self.api_url = f"{base_url}/json/"

    def search(self, query, num=10, start=0):
        """Perform a search query"""
        params = {
            'q': query,
            'num': num,
            'start': start
        }
        response = requests.get(self.api_url, params=params)
        return response.json()

    def suggest(self, query):
        """Get search suggestions"""
        suggest_url = f"{self.base_url}/suggest/"
        params = {'q': query}
        response = requests.get(suggest_url, params=params)
        return response.json()

# Usage
client = FessClient("https://example-app.klutch.sh")
results = client.search("artificial intelligence", num=20)

for item in results['response']['result']:
    print(f"Title: {item['title']}")
    print(f"URL: {item['url']}")
    print(f"Score: {item['score']}")
    print("---")

Node.js API Client

const axios = require('axios');

class FessClient {
    constructor(baseUrl) {
        this.baseUrl = baseUrl;
        this.apiUrl = `${baseUrl}/json/`;
    }

    async search(query, options = {}) {
        const params = {
            q: query,
            num: options.num || 10,
            start: options.start || 0,
            ...options
        };

        try {
            const response = await axios.get(this.apiUrl, { params });
            return response.data;
        } catch (error) {
            console.error('Search error:', error);
            throw error;
        }
    }

    async suggest(query) {
        const suggestUrl = `${this.baseUrl}/suggest/`;
        try {
            const response = await axios.get(suggestUrl, { params: { q: query } });
            return response.data;
        } catch (error) {
            console.error('Suggestion error:', error);
            throw error;
        }
    }
}

// Usage
const client = new FessClient('https://example-app.klutch.sh');

(async () => {
    const results = await client.search('machine learning', { num: 20 });

    results.response.result.forEach(item => {
        console.log(`Title: ${item.title}`);
        console.log(`URL: ${item.url}`);
        console.log(`Score: ${item.score}`);
        console.log('---');
    });
})();

Advanced Configuration

Custom Dockerfile for Additional Features

If you need custom dictionaries, plugins, or configurations:

FROM codelibs/fess:14.14

# Install additional system packages
USER root
RUN apt-get update && apt-get install -y \
    curl \
    vim \
    && rm -rf /var/lib/apt/lists/*

# Copy custom dictionaries
COPY ./custom_dicts /opt/fess/app/WEB-INF/classes/fess_dict/

# Copy custom configuration
COPY ./fess_config.properties /opt/fess/app/WEB-INF/classes/

# Switch back to fess user
USER fess

WORKDIR /opt/fess
EXPOSE 8080

File System Crawler Setup

To crawl local file systems or network shares:

Add a persistent volume for the files to crawl:
- Mount Path: /var/fess-files
- Size: Based on your file storage needs
Configure a file system crawler:
- Navigate to Crawler > File System
- Click “Create New”
- Enter the path: file:///var/fess-files/
- Configure supported file types (PDF, Word, Excel, etc.)
- Save and start the crawler
Upload files to the volume through your deployment workflow or SFTP

Data Store Crawler Setup

To index data from databases:

Install JDBC drivers in a custom Dockerfile if needed
Configure a data store crawler:
- Navigate to Crawler > Data Store
- Click “Create New”
- Select database type (MySQL, PostgreSQL, etc.)
- Enter connection details:
```
url=jdbc:postgresql://your-db.klutch.sh:8000/dbname
username=dbuser
password=dbpass
```
- Define the SQL query to fetch data:
```
SELECT id, title, content, url, updated_at
FROM articles
WHERE published = true
```
- Map fields to Fess fields
- Save and start the crawler

Production Best Practices

Security

Change default admin password: Always use a strong, unique password for the admin account
Enable Elasticsearch authentication: Configure username/password authentication for Elasticsearch
Use HTTPS: Klutch.sh provides automatic HTTPS for all apps
Implement user authentication: Configure LDAP, Active Directory, or SSO for multi-user access
Restrict admin access: Use role-based permissions to limit administrative functions
Regular security updates: Keep Fess and Elasticsearch updated to the latest versions

Performance Optimization

Allocate sufficient resources:
- Minimum 1 CPU core and 2GB RAM for Fess
- At least 2GB RAM for Elasticsearch
- Scale based on crawl volume and search traffic
Configure crawler throttling:
- Set appropriate FESS_CRAWLER_INTERVAL to avoid overwhelming target sites
- Limit FESS_CRAWLER_THREADS based on your resources
Optimize Elasticsearch:
- Configure proper Java heap size (50% of available RAM, max 32GB)
- Use SSD storage for persistent volumes
- Implement index lifecycle management for old data
Enable caching:
- Configure HTTP caching headers for search results
- Use a CDN for static assets
Monitor performance:
- Track search response times
- Monitor crawler job duration
- Watch Elasticsearch cluster health

Backup Strategy

Backup Fess configuration:
- Export crawler configurations regularly
- Document environment variables and settings
Backup Elasticsearch indexes:
- Configure Elasticsearch snapshots
- Store snapshots in remote storage (S3, GCS, etc.)
- Test restoration procedures
Backup logs:
- Archive crawl logs for audit trails
- Monitor and rotate log files to prevent disk space issues

Scaling Considerations

For high-traffic deployments:

Vertical scaling:
- Increase CPU and memory allocation
- Use larger persistent volumes
Elasticsearch scaling:
- Consider a multi-node Elasticsearch cluster
- Use index sharding for large datasets
- Implement read replicas for search performance
Crawler optimization:
- Distribute crawling across multiple time windows
- Use incremental crawling for large sites
- Prioritize high-value content

Troubleshooting

Application Won’t Start

Issue: Container starts but Fess doesn’t respond

Solutions:

Verify internal port is set to 8080
Check Elasticsearch connection in environment variables
Review application logs for startup errors
Ensure ES_HTTP_URL is accessible from the Fess container
Verify Java heap size settings aren’t exceeding available memory

Cannot Connect to Elasticsearch

Issue: Fess reports Elasticsearch connection errors

Solutions:

Verify ES_HTTP_URL format is correct (include http:// or https://)
Check that Elasticsearch is running and accessible
Confirm port 8000 (or 9200) is correct for your Elasticsearch deployment
Test Elasticsearch connectivity: curl http://your-elasticsearch.klutch.sh:8000
Verify authentication credentials if using secured Elasticsearch

Crawler Not Indexing Content

Issue: Crawler runs but no content appears in search results

Solutions:

Check crawler logs in /opt/fess/logs/fess-crawler.log
Verify the target URL is accessible from the Fess container
Ensure robots.txt allows crawling
Check crawler configuration for correct URL patterns
Verify Elasticsearch has sufficient storage space
Review crawler user agent settings if sites are blocking

Search Results Not Appearing

Issue: Search queries return no results despite successful crawling

Solutions:

Verify Elasticsearch indexing completed successfully
Check Elasticsearch cluster health
Review Fess search configuration
Ensure proper field mappings in Elasticsearch
Clear and rebuild search indexes if necessary
Check for Elasticsearch storage capacity issues

Out of Storage Space

Issue: Cannot crawl more content or persistent volume is full

Solutions:

Increase persistent volume size in Klutch.sh
Clean up old or unused crawled data
Implement index rotation and cleanup policies
Delete old Elasticsearch indexes
Archive and compress old log files

Slow Search Performance

Issue: Search queries take too long to return results

Solutions:

Increase Fess and Elasticsearch resources (CPU/RAM)
Optimize Elasticsearch index settings
Reduce the number of fields being searched
Implement query result caching
Add more Elasticsearch nodes for larger datasets
Review and optimize crawler schedules to reduce load

Memory Issues

Issue: Container crashes with out-of-memory errors

Solutions:

Increase the instance memory allocation
Adjust FESS_JAVA_OPTS heap size settings
Reduce FESS_CRAWLER_THREADS to lower memory usage
Optimize Elasticsearch memory settings
Review and reduce concurrent crawler jobs

Monitoring and Maintenance

Health Checks

Monitor these endpoints for system health:

# Check Fess status
curl https://example-app.klutch.sh/admin/system

# Check Elasticsearch health
curl http://your-elasticsearch.klutch.sh:8000/_cluster/health

Log Monitoring

Important log files to monitor:

Application logs: /opt/fess/logs/fess.log
Crawler logs: /opt/fess/logs/fess-crawler.log
Audit logs: /opt/fess/logs/audit.log
Error logs: /opt/fess/logs/error.log

Regular Maintenance Tasks

Weekly:
- Review crawler logs for errors
- Check search analytics and popular queries
- Monitor storage usage
Monthly:
- Update Fess and Elasticsearch to latest patch versions
- Review and optimize crawler schedules
- Analyze search performance metrics
- Archive old logs
Quarterly:
- Review and update crawler configurations
- Optimize Elasticsearch indexes
- Test backup and restore procedures
- Review user access and permissions

Integration Examples

WordPress Plugin Integration

Create a WordPress search plugin that uses Fess:

<?php
function fess_search_results($query) {
    $fess_url = 'https://example-app.klutch.sh/json/';
    $params = array(
        'q' => urlencode($query),
        'num' => 10
    );

    $url = $fess_url . '?' . http_build_query($params);
    $response = wp_remote_get($url);

    if (is_wp_error($response)) {
        return array('error' => 'Search unavailable');
    }

    $body = wp_remote_retrieve_body($response);
    return json_decode($body, true);
}

// Add shortcode for search results
add_shortcode('fess_search', function($atts) {
    $query = isset($_GET['q']) ? sanitize_text_field($_GET['q']) : '';

    if (empty($query)) {
        return '<p>Please enter a search query.</p>';
    }

    $results = fess_search_results($query);

    $html = '<div class="fess-results">';
    foreach ($results['response']['result'] as $item) {
        $html .= sprintf(
            '<div class="result"><h3><a href="%s">%s</a></h3><p>%s</p></div>',
            esc_url($item['url']),
            esc_html($item['title']),
            esc_html($item['content_description'])
        );
    }
    $html .= '</div>';

    return $html;
});

React Search Component

import React, { useState } from 'react';

const FessSearch = () => {
    const [query, setQuery] = useState('');
    const [results, setResults] = useState([]);
    const [loading, setLoading] = useState(false);

    const handleSearch = async (e) => {
        e.preventDefault();
        setLoading(true);

        try {
            const response = await fetch(
                `https://example-app.klutch.sh/json/?q=${encodeURIComponent(query)}&num=20`
            );
            const data = await response.json();
            setResults(data.response.result);
        } catch (error) {
            console.error('Search error:', error);
        } finally {
            setLoading(false);
        }
    };

    return (
        <div className="fess-search">
            <form onSubmit={handleSearch}>
                <input
                    type="text"
                    value={query}
                    onChange={(e) => setQuery(e.target.value)}
                    placeholder="Search..."
                />
                <button type="submit">Search</button>
            </form>

            {loading && <p>Searching...</p>}

            <div className="results">
                {results.map((item, index) => (
                    <div key={index} className="result-item">
                        <h3>
                            <a href={item.url} target="_blank" rel="noopener noreferrer">
                                {item.title}
                            </a>
                        </h3>
                        <p>{item.content_description}</p>
                        <small>{item.url}</small>
                    </div>
                ))}
            </div>
        </div>
    );
};

export default FessSearch;

Migrating from Other Search Platforms

From Apache Solr

If you’re migrating from Solr to Fess:

Export Solr data using the Solr export API
Transform data to match Fess’s expected format
Import into Elasticsearch using bulk API
Configure Fess crawlers to maintain data freshness
Update application search endpoints to use Fess JSON API

From Algolia

Migrating from Algolia to Fess:

Export Algolia indexes using their API
Map Algolia attributes to Fess fields
Import data into Elasticsearch
Configure search relevance to match Algolia behavior
Update client code to use Fess REST API

Cost Optimization

Resource Allocation

Start with minimal resources and scale based on usage
Monitor actual resource utilization before upgrading
Use crawler schedules during off-peak hours

Elasticsearch Optimization

Use index lifecycle management to delete old data
Implement index compression
Reduce replica count for non-critical indexes

Crawler Efficiency

Limit crawl depth to essential levels
Use robots.txt to exclude unnecessary pages
Implement incremental crawling for large sites
Schedule crawls during low-traffic periods

Resources

Conclusion

You now have a fully operational Fess enterprise search server running on Klutch.sh with persistent storage, configured Elasticsearch backend, and production-ready settings. Your search platform is ready to:

Crawl and index websites, file systems, and databases
Provide powerful full-text search capabilities
Handle multi-language content and search queries
Integrate with your applications through REST APIs
Scale as your content and search traffic grow

With Fess deployed on Klutch.sh, you have a robust, self-hosted search solution that gives you complete control over your search data and functionality. For questions or community support, refer to the Fess GitHub Discussions or the official documentation.