Deploying an Ollama App

Introduction

Ollama is an open-source local LLM server that runs models through a simple HTTP API. Deploying Ollama with a Dockerfile on Klutch.sh provides reproducible builds, managed secrets, and persistent storage for downloaded models—all configured from klutch.sh/app. This guide covers installation, repository prep, a production-ready Dockerfile, deployment steps, Nixpacks overrides, sample API usage, and production tips.

Prerequisites

A Klutch.sh account (sign up)
A GitHub repository containing your Dockerfile (GitHub is the only supported git source)
Model selection and sizing plan (model weights persist on disk)
Optional API keys if you proxy or augment requests externally

For onboarding, see the Quick Start.

Architecture and ports

Ollama exposes an HTTP API on internal port 11434; choose HTTP traffic.
Persistent storage is required for model weights and caches.

Repository layout

ollama/
├── Dockerfile              # Must be at repo root for auto-detection
└── README.md

Keep secrets out of Git; store them in Klutch.sh environment variables.

Installation (local) and starter commands

Validate locally before pushing to GitHub:

docker build -t ollama-local .
docker run -p 11434:11434 ollama-local

Dockerfile for Ollama (production-ready)

Place this Dockerfile at the repo root; Klutch.sh auto-detects it (no Docker selection in the UI):

FROM ollama/ollama:latest

ENV OLLAMA_HOST=0.0.0.0:11434

EXPOSE 11434
CMD ["serve"]

Notes:

Pin the image tag (e.g., ollama/ollama:0.3.x) for stability; update intentionally.
To preload models, add RUN ollama pull llama3 (adjust model name) during build.

Environment variables (Klutch.sh)

Set these in Klutch.sh before deploying:

OLLAMA_HOST=0.0.0.0:11434
Optional: OLLAMA_MODELS=/root/.ollama/models (default), tuning flags for concurrency if needed.

If you deploy without the Dockerfile and need Nixpacks overrides (not typical for Ollama):

NIXPACKS_START_CMD=ollama serve

Attach persistent volumes

In Klutch.sh storage settings, add mount paths and sizes (no names required):

/root/.ollama — model weights and cache.

Ensure this path is writable inside the container.

Deploy Ollama on Klutch.sh (Dockerfile workflow)

Push your repository—with the Dockerfile at the root—to GitHub.
Open klutch.sh/app, create a project, and add an app.
Select HTTP traffic and set the internal port to 11434.
Add the environment variables above (and any model preload choices if you customized the image).
Attach a persistent volume for /root/.ollama, sized for your model set.
Deploy. Your Ollama API will be reachable at https://example-app.klutch.sh; attach a custom domain if desired.

Sample API usage

Run a prompt against a model (pulls the model if not present):

curl -X POST "https://example-app.klutch.sh/api/generate" \
  -H "Content-Type: application/json" \
  -d '{"model":"llama3","prompt":"Say hello from Klutch.sh"}'

Health checks and production tips

Add an HTTP probe to / or /api/tags for readiness.
Enforce HTTPS at the edge; forward internally to port 11434.
Monitor disk usage on /root/.ollama; resize before it fills.
Pin image and model versions; test upgrades in staging.
Keep any external API keys in Klutch.sh secrets and rotate regularly.

Ollama on Klutch.sh combines reproducible Docker builds with managed secrets, persistent storage, and flexible HTTP/TCP routing. With the Dockerfile at the repo root, port 11434 configured, and models persisted, you can serve local LLMs without extra YAML or workflow overhead.