Skip to content

Deploying Jina

Introduction

Jina is a cloud-native neural search framework that lets you build multimodal AI services and pipelines. Whether you’re building semantic search, generative AI applications, or complex ML workflows, Jina provides the infrastructure to deploy and scale your AI services with gRPC, HTTP, and WebSocket communication protocols.

Developed by Jina AI, the framework abstracts away the complexity of building production-ready AI services. You define your logic in Executors, connect them in Flows, and Jina handles the rest - from containerization to scaling across GPU clusters.

Key highlights of Jina:

  • Multimodal Support: Handle text, images, audio, video, and custom data types
  • Neural Search: Build semantic and vector-based search applications
  • gRPC/HTTP/WebSocket: Multiple transport protocols for different use cases
  • Executor Framework: Modular components for AI logic
  • Flow Orchestration: Connect executors into processing pipelines
  • Kubernetes Native: First-class Kubernetes integration
  • Docker Ready: Easy containerization of executors
  • Scalable: From single node to distributed GPU clusters
  • Python Native: Write services in familiar Python
  • Open Source: Apache 2.0 licensed with active development

This guide walks through deploying Jina services on Klutch.sh using Docker, creating executors, and building AI pipelines.

Why Deploy Jina on Klutch.sh

Deploying Jina on Klutch.sh provides several advantages for AI services:

Simplified Deployment: Klutch.sh handles Docker container deployment, making Jina service hosting straightforward.

Persistent Storage: Attach volumes for model weights, indexes, and data persistence.

HTTPS by Default: Klutch.sh provides automatic SSL certificates for secure API endpoints.

GitHub Integration: Connect your Jina project for automated deployments.

Scalable Resources: Allocate CPU and memory (and GPU when available) for inference.

Environment Variable Management: Securely store API keys and configuration.

Custom Domains: Use your domain for the AI service endpoint.

Always-On Availability: Your AI services remain accessible 24/7.

Prerequisites

Before deploying Jina on Klutch.sh, ensure you have:

  • A Klutch.sh account
  • A GitHub account with a repository for your Jina project
  • Basic familiarity with Python and Docker
  • Understanding of machine learning concepts
  • (Optional) Pre-trained models for your use case
  • (Optional) A custom domain for your Jina services

Understanding Jina Architecture

Jina applications consist of several components:

Executor: A Python class containing AI logic (encoding, indexing, searching).

Flow: An orchestration layer connecting multiple executors into a pipeline.

Gateway: Entry point handling client requests via gRPC, HTTP, or WebSocket.

Document: The universal data type representing inputs and outputs.

DocumentArray: A collection of Documents for batch processing.

Preparing Your Repository

Repository Structure

jina-deploy/
├── Dockerfile
├── executor/
│ ├── __init__.py
│ ├── executor.py
│ └── config.yml
├── flow.yml
├── requirements.txt
├── README.md
└── .dockerignore

Creating a Simple Executor

Create executor/executor.py:

from jina import Executor, requests, DocumentArray
class MyEncoder(Executor):
def __init__(self, **kwargs):
super().__init__(**kwargs)
# Initialize your model here
self.model = self._load_model()
def _load_model(self):
# Load your ML model
from sentence_transformers import SentenceTransformer
return SentenceTransformer('all-MiniLM-L6-v2')
@requests
def encode(self, docs: DocumentArray, **kwargs):
"""Encode text documents into vectors."""
texts = docs.texts
embeddings = self.model.encode(texts)
docs.embeddings = embeddings
return docs

Create executor/config.yml:

jtype: MyEncoder
metas:
name: myencoder
py_modules:
- executor.py

Creating the Flow

Create flow.yml:

jtype: Flow
with:
protocol: http
port: 8080
cors: true
executors:
- name: encoder
uses: executor/config.yml
py_modules:
- executor/executor.py

Creating the Dockerfile

FROM jinaai/jina:3-py310-standard
WORKDIR /app
# Copy requirements
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
# Copy application code
COPY executor/ ./executor/
COPY flow.yml .
# Expose the gateway port
EXPOSE 8080
# Start the Flow
CMD ["jina", "flow", "--uses", "flow.yml"]

Creating requirements.txt

jina>=3.0
sentence-transformers>=2.2.0
torch>=2.0.0

Creating the .dockerignore File

.git
.github
*.md
LICENSE
.gitignore
*.log
.DS_Store
.env
.env.local
__pycache__
*.pyc
.pytest_cache

Environment Variables Reference

VariableRequiredDefaultDescription
JINA_LOG_LEVELNoINFOLogging verbosity
JINA_PORTNo8080Gateway port
JINA_PROTOCOLNohttpCommunication protocol
JINA_CORSNotrueEnable CORS

Deploying Jina on Klutch.sh

    Push Your Repository to GitHub

    Initialize your repository and push to GitHub:

    Terminal window
    git init
    git add Dockerfile executor/ flow.yml requirements.txt .dockerignore README.md
    git commit -m "Initial Jina deployment"
    git remote add origin https://github.com/yourusername/jina-deploy.git
    git push -u origin main

    Create a New Project on Klutch.sh

    Navigate to the Klutch.sh dashboard and create a new project. Give it a descriptive name like “jina” or “ai-service”.

    Create a New App

    Within your project, create a new app:

    1. Connect your GitHub repository
    2. Select the repository containing your Dockerfile
    3. Configure HTTP traffic on port 8080

    Set Environment Variables

    Configure optional environment variables:

    VariableValue
    JINA_LOG_LEVELINFO

    Attach Persistent Volumes

    Add persistent storage for models and indexes:

    Mount PathRecommended SizePurpose
    /app/models10+ GBModel weights
    /app/index10+ GBVector indexes

    Deploy Your Application

    Click Deploy to start the build process.

    Access Your Service

    Once deployment completes, your Jina service is available at https://your-app-name.klutch.sh.

Using Your Jina Service

Python Client

Use the Jina client to interact with your service:

from jina import Client, Document, DocumentArray
# Connect to your deployed service
client = Client(host='https://your-app-name.klutch.sh')
# Create documents
docs = DocumentArray([
Document(text='Hello world'),
Document(text='How are you?'),
Document(text='Jina is awesome!')
])
# Send request and get embeddings
results = client.post('/', docs)
for doc in results:
print(f"Text: {doc.text}")
print(f"Embedding shape: {doc.embedding.shape}")

HTTP API

Use REST API directly:

Terminal window
curl -X POST https://your-app-name.klutch.sh/post \
-H "Content-Type: application/json" \
-d '{
"data": [
{"text": "Hello world"},
{"text": "How are you?"}
]
}'

WebSocket

For streaming applications:

const ws = new WebSocket('wss://your-app-name.klutch.sh/ws');
ws.onopen = () => {
ws.send(JSON.stringify({
data: [{ text: 'Hello world' }]
}));
};
ws.onmessage = (event) => {
const result = JSON.parse(event.data);
console.log('Result:', result);
};

Building a Search Application

Indexing Executor

from jina import Executor, requests, DocumentArray
class Indexer(Executor):
def __init__(self, workspace: str = './index', **kwargs):
super().__init__(**kwargs)
self.workspace = workspace
self._index = DocumentArray()
self._load_index()
def _load_index(self):
try:
self._index = DocumentArray.load(self.workspace)
except:
pass
@requests(on='/index')
def index(self, docs: DocumentArray, **kwargs):
self._index.extend(docs)
self._index.save(self.workspace)
return docs
@requests(on='/search')
def search(self, docs: DocumentArray, **kwargs):
docs.match(self._index, limit=10)
return docs

Search Flow

jtype: Flow
with:
protocol: http
port: 8080
executors:
- name: encoder
uses: encoder/config.yml
- name: indexer
uses: indexer/config.yml
workspace: /app/index

Production Best Practices

Security Recommendations

  • API Authentication: Implement token-based auth for production
  • Input Validation: Validate all incoming documents
  • Rate Limiting: Protect against abuse
  • HTTPS Only: Always use HTTPS (handled by Klutch.sh)

Performance Optimization

  • Batching: Process documents in batches
  • Model Optimization: Use optimized model formats (ONNX, TensorRT)
  • Caching: Cache frequently requested results
  • Resource Allocation: Match resources to model requirements

Monitoring

  • Logging: Configure appropriate log levels
  • Metrics: Enable Prometheus metrics
  • Health Checks: Implement proper health endpoints

Troubleshooting Common Issues

Model Loading Failures

Symptoms: Service won’t start, model errors in logs.

Solutions:

  • Ensure model files are in the container
  • Check disk space for model downloads
  • Verify Python dependencies are installed
  • Review model initialization code

Memory Issues

Symptoms: Out of memory errors.

Solutions:

  • Increase container memory allocation
  • Use smaller batch sizes
  • Consider quantized models
  • Enable model offloading

Slow Response Times

Symptoms: High latency on requests.

Solutions:

  • Optimize model for inference
  • Enable batching for throughput
  • Scale horizontally for capacity
  • Use GPU acceleration when available

Additional Resources

Conclusion

Deploying Jina on Klutch.sh enables you to run production-ready AI services with minimal infrastructure management. The combination of Jina’s powerful executor framework and Klutch.sh’s container hosting provides a solid foundation for neural search, generative AI, and multimodal applications.

Whether you’re building semantic search, recommendation systems, or complex ML pipelines, Jina on Klutch.sh delivers the infrastructure needed for scalable, reliable AI services.