Deploying Zep

Introduction

Zep is an open-source vector database and memory store designed for LLM applications — storing embeddings, chat history, and long-term memory for conversational agents. Deploying Zep on Klutch.sh gives you a scalable, secure storage layer for embeddings and a reliable place to persist model state.

This guide covers deploying Zep with and without a Dockerfile, persisting data with volumes, securing secrets, and production recommendations. Where relevant, links point to existing Klutch.sh guides (Quick Start, Volumes, Builds) for consistency.

Prerequisites

A Klutch.sh account (sign up here)
A GitHub repository for your deployment code (or a small wrapper to start Zep)
Basic knowledge of Docker and environment variables
Optionally: object storage credentials (S3-compatible) if you plan to offload backups

1. Prepare your Zep project

Create a small repo that either runs the official Zep server or a wrapper that configures it. Keep secrets out of the repo — use environment variables in Klutch.sh.

Refer to the Quick Start Guide for repository and project setup.

2. Sample non-Docker deployment (Klutch.sh build)

If you want Klutch.sh to build the app from your repo without a Dockerfile:

Push your repo to GitHub. Include a start script (for example start.sh) that launches the Zep server or your entrypoint.
In Klutch.sh, create a new project and app and connect your repository.
Set the start command to your launcher (for example: ./start.sh or zep serve --host 0.0.0.0 --port 8000 depending on how you run Zep).
Attach a persistent volume for Zep data (see Volumes Guide).
Set the app port to 8000 (or the port Zep listens on).
Click “Create” to deploy.

Notes:

Configure runtime secrets (database keys, S3 creds) as Klutch.sh environment variables.
If you plan to scale out, consider using a storage backend or cluster configuration supported by Zep.

3. Deploying with a Dockerfile

Using a Dockerfile gives reproducibility and tighter control over dependencies. Example Dockerfile (simple CPU-focused Zep server):

FROM python:3.10-slim

WORKDIR /app

# Install system dependencies
RUN apt-get update && apt-get install -y --no-install-recommends \
    build-essential git && rm -rf /var/lib/apt/lists/*

COPY requirements.txt ./
RUN pip install --no-cache-dir -r requirements.txt

COPY . .

EXPOSE 8000
CMD ["zep", "serve", "--host", "0.0.0.0", "--port", "8000"]

requirements.txt should include zep and any connectors (for example zep[postgres] or zep[redis] if using external backends).
For production use, pin package versions and use multi-stage builds to reduce image size.

4. Persistence and backups

Zep stores vector indexes and metadata — attach a persistent volume to ensure data survives redeploys:

Create a Klutch.sh persistent volume and mount it to the path Zep uses for storage (e.g., /data/zep).
Use environment variables to configure Zep to write data to the mounted path.

If you prefer object storage backups, configure a scheduled job to snapshot and upload to S3-compatible storage, with credentials stored in Klutch.sh environment variables.

5. Environment variables and secrets

Store API keys, DB connection strings, and S3 credentials in the Klutch.sh UI as environment variables.
Avoid checking secrets into source control.

6. Scaling and production recommendations

Monitor CPU, memory, and storage usage; vector DBs can be I/O and memory intensive.
Use autoscaling or horizontal scaling where your architecture and Zep backend support it.
Use health checks and readiness probes if your deployment workflow needs them.
Pin dependency versions in requirements.txt and use CI to build and publish images for stable releases.
Consider using an external vector index backend (if supported) for very large datasets.

7. Example `requirements.txt`

zep
# optionally: zep[postgres], zep[redis], or other extras as needed

Resources

Deploying Zep on Klutch.sh provides a managed, persistent place to store embeddings and memory for LLMs. If you want, I can add a sample start.sh, a CI snippet to build and publish images, or a GPU-ready variant if you’re running GPU-accelerated vector search workloads.