Deploying a PyTorch App

Introduction

PyTorch powers production inference for computer vision, NLP, and recommendation workloads. Deploying a PyTorch model server with a Dockerfile on Klutch.sh gives you reproducible builds, managed secrets, and persistent storage for models—all configured from klutch.sh/app. This guide uses a lightweight FastAPI/uvicorn app to serve models over HTTP.

Prerequisites

A Klutch.sh account (sign up)
A GitHub repository containing your Dockerfile, model code, and weights (GitHub is the only supported git source)
(Optional) External object storage if you fetch large models at startup

For onboarding, see the Quick Start.

Architecture and ports

The sample FastAPI server listens on internal port 8080. Choose HTTP traffic and set the internal port to 8080.
Attach storage if you need to persist downloaded weights or cache files between deployments.

Repository layout

pytorch-app/
├── Dockerfile              # Must be at repo root for auto-detection
├── app.py                  # FastAPI/uvicorn entrypoint
├── requirements.txt        # Python deps (torch, fastapi, uvicorn)
└── models/                 # Optional bundled weights

Keep secrets out of Git; store them in Klutch.sh environment variables.

Installation (local) and starter commands

Test locally before pushing:

docker build -t pytorch-local .
docker run -p 8080:8080 \
  -e MODEL_NAME=resnet18 \
  pytorch-local

Dockerfile for PyTorch inference (production-ready)

Place this at the repo root; Klutch.sh auto-detects Dockerfiles.

FROM python:3.11-slim

WORKDIR /app

COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY . .

ENV PORT=8080
EXPOSE 8080

CMD ["bash", "-lc", "uvicorn app:app --host 0.0.0.0 --port ${PORT}"]

Notes:

Add CUDA images if you have GPU support available (Klutch.sh runs CPU-only unless GPUs are provided).
Pin torch and model-specific libraries in requirements.txt for reproducibility.

Environment variables (Klutch.sh)

Set these before deploying:

PORT=8080
MODEL_NAME=resnet18 (example selector in your code)
Optional: paths or URLs for model weights, e.g., MODEL_WEIGHTS_URL, and auth tokens if pulling from private storage.

If deploying without the Dockerfile and relying on Nixpacks:

NIXPACKS_PYTHON_VERSION=3.11
NIXPACKS_START_CMD=uvicorn app:app --host 0.0.0.0 --port $PORT

Attach persistent volumes

If you cache or store models locally, add storage in Klutch.sh (path and size only):

/app/models — cached or bundled weights.
/root/.cache/torch — torch/hub cache (optional).

Ensure the paths align with your code.

Deploy PyTorch on Klutch.sh (Dockerfile workflow)

Push your repository—with the Dockerfile at the root—to GitHub.
Open klutch.sh/app, create a project, and add an app.
Select HTTP traffic and set the internal port to 8080.
Add the environment variables above (model selector and any weight URLs/secrets).
Attach volumes at /app/models (and /root/.cache/torch if needed) sized for your models.
Deploy. Your API will be reachable at https://example-app.klutch.sh.

Sample FastAPI app (`app.py`)

from fastapi import FastAPI
import torch
from torchvision import models, transforms
from PIL import Image
import io
import base64

app = FastAPI()

MODEL_NAME = "resnet18"
model = models.__dict__[MODEL_NAME](weights=models.ResNet18_Weights.DEFAULT)
model.eval()

preprocess = transforms.Compose([
    transforms.Resize(256),
    transforms.CenterCrop(224),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])

@app.get("/health")
def health():
    return {"status": "ok"}

@app.post("/predict")
def predict(image_b64: str):
    image_bytes = base64.b64decode(image_b64)
    img = Image.open(io.BytesIO(image_bytes)).convert("RGB")
    with torch.no_grad():
        inp = preprocess(img).unsqueeze(0)
        outputs = model(inp)
        pred = outputs.argmax(dim=1).item()
    return {"prediction": int(pred)}

Sample requests

Health check:

curl -I https://example-app.klutch.sh/health

Inference (send a base64 image string):

curl -X POST https://example-app.klutch.sh/predict \
  -H "Content-Type: application/json" \
  -d '{"image_b64":"<BASE64_IMAGE>"}'

Health checks and production tips

Add readiness/liveness probes to /health.
Pin torch and torchvision versions; test upgrades in staging.
Store weight URLs and tokens in Klutch.sh secrets; avoid embedding secrets in the image.
Monitor storage usage on model/cache volumes; resize before they fill.

PyTorch on Klutch.sh delivers reproducible Docker builds, managed secrets, and optional model storage—without extra YAML or CI steps. Configure ports, env vars, and volumes, then ship your inference API.