Deploying an Ollama App
Introduction
Ollama is an open-source local LLM server that runs models through a simple HTTP API. Deploying Ollama with a Dockerfile on Klutch.sh provides reproducible builds, managed secrets, and persistent storage for downloaded models—all configured from klutch.sh/app. This guide covers installation, repository prep, a production-ready Dockerfile, deployment steps, Nixpacks overrides, sample API usage, and production tips.
Prerequisites
- A Klutch.sh account (sign up)
- A GitHub repository containing your Dockerfile (GitHub is the only supported git source)
- Model selection and sizing plan (model weights persist on disk)
- Optional API keys if you proxy or augment requests externally
For onboarding, see the Quick Start.
Architecture and ports
- Ollama exposes an HTTP API on internal port
11434; choose HTTP traffic. - Persistent storage is required for model weights and caches.
Repository layout
ollama/├── Dockerfile # Must be at repo root for auto-detection└── README.mdKeep secrets out of Git; store them in Klutch.sh environment variables.
Installation (local) and starter commands
Validate locally before pushing to GitHub:
docker build -t ollama-local .docker run -p 11434:11434 ollama-localDockerfile for Ollama (production-ready)
Place this Dockerfile at the repo root; Klutch.sh auto-detects it (no Docker selection in the UI):
FROM ollama/ollama:latest
ENV OLLAMA_HOST=0.0.0.0:11434
EXPOSE 11434CMD ["serve"]Notes:
- Pin the image tag (e.g.,
ollama/ollama:0.3.x) for stability; update intentionally. - To preload models, add
RUN ollama pull llama3(adjust model name) during build.
Environment variables (Klutch.sh)
Set these in Klutch.sh before deploying:
OLLAMA_HOST=0.0.0.0:11434- Optional:
OLLAMA_MODELS=/root/.ollama/models(default), tuning flags for concurrency if needed.
If you deploy without the Dockerfile and need Nixpacks overrides (not typical for Ollama):
NIXPACKS_START_CMD=ollama serve
Attach persistent volumes
In Klutch.sh storage settings, add mount paths and sizes (no names required):
/root/.ollama— model weights and cache.
Ensure this path is writable inside the container.
Deploy Ollama on Klutch.sh (Dockerfile workflow)
- Push your repository—with the Dockerfile at the root—to GitHub.
- Open klutch.sh/app, create a project, and add an app.
- Select HTTP traffic and set the internal port to
11434. - Add the environment variables above (and any model preload choices if you customized the image).
- Attach a persistent volume for
/root/.ollama, sized for your model set. - Deploy. Your Ollama API will be reachable at
https://example-app.klutch.sh; attach a custom domain if desired.
Sample API usage
Run a prompt against a model (pulls the model if not present):
curl -X POST "https://example-app.klutch.sh/api/generate" \ -H "Content-Type: application/json" \ -d '{"model":"llama3","prompt":"Say hello from Klutch.sh"}'Health checks and production tips
- Add an HTTP probe to
/or/api/tagsfor readiness. - Enforce HTTPS at the edge; forward internally to port
11434. - Monitor disk usage on
/root/.ollama; resize before it fills. - Pin image and model versions; test upgrades in staging.
- Keep any external API keys in Klutch.sh secrets and rotate regularly.
Ollama on Klutch.sh combines reproducible Docker builds with managed secrets, persistent storage, and flexible HTTP/TCP routing. With the Dockerfile at the repo root, port 11434 configured, and models persisted, you can serve local LLMs without extra YAML or workflow overhead.