Deploying an Apache Tika App
Introduction
Apache Tika is an open-source toolkit that detects and extracts text and metadata from over a thousand file formats. Deploying Tika with a Dockerfile on Klutch.sh provides reproducible builds, managed secrets, and persistent storage for uploads—all configured from klutch.sh/app. This guide covers installation, Dockerfile setup, environment variables, storage, Nixpacks overrides, and sample extraction calls.
Prerequisites
- Klutch.sh account (sign up)
- GitHub repository containing your Tika Dockerfile (GitHub is the only supported git source)
- Optional: object storage if you offload large files
Architecture and ports
- Tika server (standalone) serves HTTP on internal port
9998. Choose HTTP traffic and set the internal port to9998. - Persistent storage is optional; you can add a volume for temporary uploads if needed.
Repository layout
tika/├── Dockerfile # Must be at repo root for auto-detection└── README.mdKeep secrets out of Git; store them in Klutch.sh environment variables.
Installation (local) and starter commands
Build and run locally:
docker build -t tika-local .docker run -p 9998:9998 tika-localDockerfile for Apache Tika (production-ready)
Place this at the repo root; Klutch.sh auto-detects Dockerfiles.
FROM apache/tika:latest
EXPOSE 9998CMD ["tika-server"]Notes:
- Pin to a specific tag (e.g.,
apache/tika:2.9.1.0) for stability. - The default entrypoint starts the Tika server on port 9998.
Environment variables (Klutch.sh)
Tika server needs few env vars. If you deploy without the Dockerfile and rely on Nixpacks:
NIXPACKS_START_CMD=tika-server
Attach persistent volumes
Optional, if you store temporary uploads:
/tmp/tika— temporary files.
Ensure the path is writable inside the container.
Deploy Apache Tika on Klutch.sh (Dockerfile workflow)
- Push your repository—with the Dockerfile at the root—to GitHub.
- Open klutch.sh/app, create a project, and add an app.
- Select HTTP traffic and set the internal port to
9998. - Add any required environment variables (none by default) and attach a volume at
/tmp/tikaif desired. - Deploy. Your Tika server will be reachable at
https://example-app.klutch.sh.
Sample extraction requests
Extract text from a local PDF:
curl -T report.pdf "https://example-app.klutch.sh/tika"Extract metadata:
curl -H "Accept: application/json" -T report.pdf \ "https://example-app.klutch.sh/meta"Health checks and production tips
- Add an HTTP readiness probe to
/or/tikato confirm the server responds. - Pin image versions and test upgrades in staging before production rollout.
- If handling large files, monitor volume usage on
/tmp/tikaor offload storage to an object store.
Apache Tika on Klutch.sh delivers reproducible Docker builds, managed configuration, and optional storage—without extra YAML or CI steps. Configure the port, attach storage if needed, and start extracting text and metadata from your documents.