Deploying Zep
Introduction
Zep is an open-source vector database and memory store designed for LLM applications — storing embeddings, chat history, and long-term memory for conversational agents. Deploying Zep on Klutch.sh gives you a scalable, secure storage layer for embeddings and a reliable place to persist model state.
This guide covers deploying Zep with and without a Dockerfile, persisting data with volumes, securing secrets, and production recommendations. Where relevant, links point to existing Klutch.sh guides (Quick Start, Volumes, Builds) for consistency.
Prerequisites
- A Klutch.sh account (sign up here)
- A GitHub repository for your deployment code (or a small wrapper to start Zep)
- Basic knowledge of Docker and environment variables
- Optionally: object storage credentials (S3-compatible) if you plan to offload backups
1. Prepare your Zep project
Create a small repo that either runs the official Zep server or a wrapper that configures it. Keep secrets out of the repo — use environment variables in Klutch.sh.
Refer to the Quick Start Guide for repository and project setup.
2. Sample non-Docker deployment (Klutch.sh build)
If you want Klutch.sh to build the app from your repo without a Dockerfile:
- Push your repo to GitHub. Include a start script (for example
start.sh) that launches the Zep server or your entrypoint. - In Klutch.sh, create a new project and app and connect your repository.
- Set the start command to your launcher (for example:
./start.shorzep serve --host 0.0.0.0 --port 8000depending on how you run Zep). - Attach a persistent volume for Zep data (see Volumes Guide).
- Set the app port to
8000(or the port Zep listens on). - Click “Create” to deploy.
Notes:
- Configure runtime secrets (database keys, S3 creds) as Klutch.sh environment variables.
- If you plan to scale out, consider using a storage backend or cluster configuration supported by Zep.
3. Deploying with a Dockerfile
Using a Dockerfile gives reproducibility and tighter control over dependencies. Example Dockerfile (simple CPU-focused Zep server):
FROM python:3.10-slim
WORKDIR /app
# Install system dependenciesRUN apt-get update && apt-get install -y --no-install-recommends \ build-essential git && rm -rf /var/lib/apt/lists/*
COPY requirements.txt ./RUN pip install --no-cache-dir -r requirements.txt
COPY . .
EXPOSE 8000CMD ["zep", "serve", "--host", "0.0.0.0", "--port", "8000"]requirements.txtshould includezepand any connectors (for examplezep[postgres]orzep[redis]if using external backends).- For production use, pin package versions and use multi-stage builds to reduce image size.
4. Persistence and backups
Zep stores vector indexes and metadata — attach a persistent volume to ensure data survives redeploys:
- Create a Klutch.sh persistent volume and mount it to the path Zep uses for storage (e.g.,
/data/zep). - Use environment variables to configure Zep to write data to the mounted path.
If you prefer object storage backups, configure a scheduled job to snapshot and upload to S3-compatible storage, with credentials stored in Klutch.sh environment variables.
5. Environment variables and secrets
- Store API keys, DB connection strings, and S3 credentials in the Klutch.sh UI as environment variables.
- Avoid checking secrets into source control.
6. Scaling and production recommendations
- Monitor CPU, memory, and storage usage; vector DBs can be I/O and memory intensive.
- Use autoscaling or horizontal scaling where your architecture and Zep backend support it.
- Use health checks and readiness probes if your deployment workflow needs them.
- Pin dependency versions in
requirements.txtand use CI to build and publish images for stable releases. - Consider using an external vector index backend (if supported) for very large datasets.
7. Example requirements.txt
zep# optionally: zep[postgres], zep[redis], or other extras as neededResources
Deploying Zep on Klutch.sh provides a managed, persistent place to store embeddings and memory for LLMs. If you want, I can add a sample start.sh, a CI snippet to build and publish images, or a GPU-ready variant if you’re running GPU-accelerated vector search workloads.