Skip to content

Deploying an Apache Tika App

Introduction

Apache Tika is an open-source toolkit that detects and extracts text and metadata from over a thousand file formats. Deploying Tika with a Dockerfile on Klutch.sh provides reproducible builds, managed secrets, and persistent storage for uploads—all configured from klutch.sh/app. This guide covers installation, Dockerfile setup, environment variables, storage, Nixpacks overrides, and sample extraction calls.


Prerequisites

  • Klutch.sh account (sign up)
  • GitHub repository containing your Tika Dockerfile (GitHub is the only supported git source)
  • Optional: object storage if you offload large files

Architecture and ports

  • Tika server (standalone) serves HTTP on internal port 9998. Choose HTTP traffic and set the internal port to 9998.
  • Persistent storage is optional; you can add a volume for temporary uploads if needed.

Repository layout

tika/
├── Dockerfile # Must be at repo root for auto-detection
└── README.md

Keep secrets out of Git; store them in Klutch.sh environment variables.


Installation (local) and starter commands

Build and run locally:

Terminal window
docker build -t tika-local .
docker run -p 9998:9998 tika-local

Dockerfile for Apache Tika (production-ready)

Place this at the repo root; Klutch.sh auto-detects Dockerfiles.

FROM apache/tika:latest
EXPOSE 9998
CMD ["tika-server"]

Notes:

  • Pin to a specific tag (e.g., apache/tika:2.9.1.0) for stability.
  • The default entrypoint starts the Tika server on port 9998.

Environment variables (Klutch.sh)

Tika server needs few env vars. If you deploy without the Dockerfile and rely on Nixpacks:

  • NIXPACKS_START_CMD=tika-server

Attach persistent volumes

Optional, if you store temporary uploads:

  • /tmp/tika — temporary files.

Ensure the path is writable inside the container.


Deploy Apache Tika on Klutch.sh (Dockerfile workflow)

  1. Push your repository—with the Dockerfile at the root—to GitHub.
  2. Open klutch.sh/app, create a project, and add an app.
  3. Select HTTP traffic and set the internal port to 9998.
  4. Add any required environment variables (none by default) and attach a volume at /tmp/tika if desired.
  5. Deploy. Your Tika server will be reachable at https://example-app.klutch.sh.

Sample extraction requests

Extract text from a local PDF:

Terminal window
curl -T report.pdf "https://example-app.klutch.sh/tika"

Extract metadata:

Terminal window
curl -H "Accept: application/json" -T report.pdf \
"https://example-app.klutch.sh/meta"

Health checks and production tips

  • Add an HTTP readiness probe to / or /tika to confirm the server responds.
  • Pin image versions and test upgrades in staging before production rollout.
  • If handling large files, monitor volume usage on /tmp/tika or offload storage to an object store.

Apache Tika on Klutch.sh delivers reproducible Docker builds, managed configuration, and optional storage—without extra YAML or CI steps. Configure the port, attach storage if needed, and start extracting text and metadata from your documents.