Skip to content

Deploying Git Annex

Introduction

Git Annex is a distributed file synchronization system that allows you to manage large files with Git without storing the file contents in Git itself. It extends Git to handle files that are too large or too many for traditional Git workflows, while maintaining the benefits of version control.

Git Annex works by storing file contents in a separate location (the “annex”) while keeping lightweight symlinks in your Git repository. This enables you to track and synchronize large media files, datasets, or any other binary content across multiple machines and storage backends.

Key highlights of Git Annex:

  • Large File Management: Handle files of any size without bloating your Git repository
  • Distributed Storage: Spread files across multiple storage backends (local, cloud, remote)
  • Deduplication: Automatic content-addressable storage prevents duplicate data
  • Flexible Backends: Support for S3, rsync, WebDAV, and many other storage types
  • Version Control: Track changes to files while managing where content is stored
  • Partial Clones: Clone repositories without downloading all file contents
  • Content Verification: Cryptographic checksums ensure data integrity
  • Encryption: Optional encryption for remote storage
  • Open Source: Licensed under AGPL-3.0

This guide walks through deploying a Git Annex assistant server on Klutch.sh using Docker for centralized annex storage.

Why Deploy Git Annex on Klutch.sh

Deploying Git Annex on Klutch.sh provides several advantages:

Simplified Deployment: Klutch.sh automatically builds and deploys your Git Annex server. Push to GitHub, and your annex storage deploys automatically.

Persistent Storage: Attach persistent volumes for your annex content. Files survive container restarts and redeployments.

HTTPS by Default: Klutch.sh provides automatic SSL certificates for secure file transfers.

GitHub Integration: Connect your configuration repository directly from GitHub for automatic updates.

Scalable Storage: Allocate storage based on your file management needs.

Always-On Availability: Your annex server remains accessible 24/7 for syncing from anywhere.

Prerequisites

Before deploying Git Annex on Klutch.sh, ensure you have:

  • A Klutch.sh account
  • A GitHub account with a repository for your Git Annex configuration
  • Basic familiarity with Docker and containerization concepts
  • Understanding of Git basics
  • (Optional) SSH keys for authentication

Understanding Git Annex Architecture

Git Annex extends Git with additional functionality:

Git Repository: Standard Git repository tracking file metadata and symlinks.

Annex Storage: Separate storage for actual file contents, organized by content hash.

Location Tracking: Git Annex tracks which repositories contain which file contents.

Special Remotes: Backend storage systems like S3, rsync, or directory-based storage.

Assistant Daemon: Optional daemon for automatic syncing and file watching.

Preparing Your Repository

To deploy Git Annex on Klutch.sh, create a GitHub repository containing your Dockerfile.

Repository Structure

git-annex-deploy/
├── Dockerfile
├── README.md
└── .dockerignore

Creating the Dockerfile

Create a Dockerfile in the root of your repository:

FROM alpine:latest
# Install git-annex
RUN apk add --no-cache \
git \
git-annex \
openssh \
rsync
# Create annex user
RUN adduser -D -h /annex annex
# Create directories
RUN mkdir -p /annex/repos /annex/.ssh \
&& chown -R annex:annex /annex
# Switch to annex user
USER annex
WORKDIR /annex
# Initialize SSH
RUN chmod 700 /annex/.ssh
# Volume for annex data
VOLUME /annex/repos
# Expose SSH port
EXPOSE 22
CMD ["sh", "-c", "tail -f /dev/null"]

Web-Based Annex Server

For HTTP-based access:

FROM nginx:alpine
# Install git-annex
RUN apk add --no-cache git git-annex
# Create annex directory
RUN mkdir -p /annex/repos
# Configure nginx for git-http-backend
COPY nginx.conf /etc/nginx/nginx.conf
EXPOSE 80
CMD ["nginx", "-g", "daemon off;"]

Creating the .dockerignore File

Create a .dockerignore file:

.git
.github
*.md
README.md
LICENSE
.gitignore
*.log
.DS_Store
.env
.env.local

Deploying Git Annex on Klutch.sh

Once your repository is prepared, follow these steps to deploy Git Annex:

    Push Your Repository to GitHub

    Initialize your repository and push to GitHub:

    Terminal window
    git init
    git add Dockerfile .dockerignore README.md
    git commit -m "Initial Git Annex deployment configuration"
    git remote add origin https://github.com/yourusername/git-annex-deploy.git
    git push -u origin main

    Create a New Project on Klutch.sh

    Navigate to the Klutch.sh dashboard and create a new project. Give it a descriptive name like “git-annex” or “annex-storage”.

    Create a New App

    Within your project, create a new app. Connect your GitHub account if you haven’t already, then select the repository containing your Git Annex Dockerfile.

    Configure Traffic

    For TCP-based SSH access:

    • Select TCP as the traffic type
    • External port will be 8000
    • Set the internal port to 22

    For HTTP-based access:

    • Select HTTP as the traffic type
    • Set the internal port to 80

    Attach Persistent Volumes

    Persistent storage is essential for Git Annex. Add the following volumes:

    Mount PathRecommended SizePurpose
    /annex/repos100+ GBAnnex repositories and file content

    Deploy Your Application

    Click Deploy to start the build process. Klutch.sh will:

    • Detect your Dockerfile automatically
    • Build the container image
    • Attach the persistent volumes
    • Start the Git Annex container
    • Configure networking

    Initialize Your First Annex

    After deployment, connect to your server and initialize an annex repository:

    Terminal window
    cd /annex/repos
    mkdir myannex
    cd myannex
    git init
    git annex init "server"

Using Git Annex

Basic Concepts

Understanding Git Annex workflow:

ConceptDescription
AnnexStorage area for file contents
SymlinkPointer in Git to annex content
Special RemoteExternal storage backend
Trust LevelHow much Git Annex trusts a repository

Adding Files

Add files to your annex:

Terminal window
# Add a file to the annex
git annex add largefile.zip
# Commit the change
git commit -m "Added largefile.zip"

Getting Files

Retrieve file contents:

Terminal window
# Get a specific file
git annex get largefile.zip
# Get all files
git annex get .

Dropping Files

Remove local copies to save space:

Terminal window
# Drop local copy (keeps in other locations)
git annex drop largefile.zip
# Verify content exists elsewhere first
git annex whereis largefile.zip

Syncing

Synchronize between repositories:

Terminal window
# Sync metadata and content
git annex sync
# Sync and get all content
git annex sync --content

Setting Up Clients

Cloning the Annex

On client machines:

Terminal window
# Clone the repository
git clone ssh://user@your-server:8000/annex/repos/myannex
# Initialize as annex
cd myannex
git annex init "laptop"
# Enable the remote
git annex enableremote origin

SSH Configuration

Configure SSH for your annex server:

~/.ssh/config
Host annex
HostName your-app-name.klutch.sh
Port 8000
User annex
IdentityFile ~/.ssh/id_rsa

Adding Remote

Add the server as a remote:

Terminal window
git remote add origin annex:/annex/repos/myannex
git annex sync

Special Remotes

Directory Remote

Simple directory-based storage:

Terminal window
git annex initremote backup type=directory directory=/path/to/backup encryption=none

Rsync Remote

Remote storage via rsync:

Terminal window
git annex initremote rsync-backup type=rsync rsyncurl=user@host:/path encryption=none

S3 Remote

Amazon S3 or compatible storage:

Terminal window
git annex initremote s3 type=S3 bucket=mybucket encryption=shared

Content Management

Preferred Content

Configure what content to keep where:

Terminal window
# Server keeps all content
git annex wanted here "standard"
# Laptop gets only recent files
git annex wanted laptop "include=*.recent or present"

Required Content

Ensure certain files are always available:

Terminal window
git annex required here "include=important/*"

Groups

Organize repositories into groups:

Terminal window
# Add to group
git annex group here backup
# Set group-based wanted content
git annex wanted here "groupwanted"

Production Best Practices

Security Recommendations

  • SSH Keys: Use SSH keys for authentication
  • Encryption: Enable encryption for remote storage
  • Access Control: Limit who can access the annex
  • Verification: Regularly verify file integrity

Storage Management

  • Monitor Usage: Track storage consumption
  • Deduplication: Git Annex automatically deduplicates
  • Cleanup: Use git annex unused to find orphaned content

Backup Strategy

Protect your data:

  1. Multiple Remotes: Store content in multiple locations
  2. numcopies: Configure minimum number of copies
  3. Verify: Regularly verify content integrity
Terminal window
# Require at least 2 copies
git annex numcopies 2
# Verify all content
git annex fsck

Troubleshooting Common Issues

Content Not Syncing

Solutions:

  • Check remote connectivity
  • Verify trust levels
  • Review preferred content settings
  • Run git annex sync --content

Missing Content

Solutions:

  • Check git annex whereis
  • Verify remote is accessible
  • Check if content was dropped
  • Review numcopies settings

Solutions:

  • Run git annex fix
  • Verify annex is properly initialized
  • Check filesystem support for symlinks

Additional Resources

Conclusion

Deploying Git Annex on Klutch.sh gives you a centralized storage backend for managing large files with Git. The combination of Git Annex’s powerful file management and Klutch.sh’s deployment simplicity means you can focus on your data rather than infrastructure.

With support for deduplication, flexible storage backends, and distributed workflows, Git Annex provides everything you need to manage large file collections. Whether you’re handling media libraries, scientific datasets, or any large binary content, Git Annex on Klutch.sh delivers reliable, version-controlled file storage.