Deploying Git Annex
Introduction
Git Annex is a distributed file synchronization system that allows you to manage large files with Git without storing the file contents in Git itself. It extends Git to handle files that are too large or too many for traditional Git workflows, while maintaining the benefits of version control.
Git Annex works by storing file contents in a separate location (the “annex”) while keeping lightweight symlinks in your Git repository. This enables you to track and synchronize large media files, datasets, or any other binary content across multiple machines and storage backends.
Key highlights of Git Annex:
- Large File Management: Handle files of any size without bloating your Git repository
- Distributed Storage: Spread files across multiple storage backends (local, cloud, remote)
- Deduplication: Automatic content-addressable storage prevents duplicate data
- Flexible Backends: Support for S3, rsync, WebDAV, and many other storage types
- Version Control: Track changes to files while managing where content is stored
- Partial Clones: Clone repositories without downloading all file contents
- Content Verification: Cryptographic checksums ensure data integrity
- Encryption: Optional encryption for remote storage
- Open Source: Licensed under AGPL-3.0
This guide walks through deploying a Git Annex assistant server on Klutch.sh using Docker for centralized annex storage.
Why Deploy Git Annex on Klutch.sh
Deploying Git Annex on Klutch.sh provides several advantages:
Simplified Deployment: Klutch.sh automatically builds and deploys your Git Annex server. Push to GitHub, and your annex storage deploys automatically.
Persistent Storage: Attach persistent volumes for your annex content. Files survive container restarts and redeployments.
HTTPS by Default: Klutch.sh provides automatic SSL certificates for secure file transfers.
GitHub Integration: Connect your configuration repository directly from GitHub for automatic updates.
Scalable Storage: Allocate storage based on your file management needs.
Always-On Availability: Your annex server remains accessible 24/7 for syncing from anywhere.
Prerequisites
Before deploying Git Annex on Klutch.sh, ensure you have:
- A Klutch.sh account
- A GitHub account with a repository for your Git Annex configuration
- Basic familiarity with Docker and containerization concepts
- Understanding of Git basics
- (Optional) SSH keys for authentication
Understanding Git Annex Architecture
Git Annex extends Git with additional functionality:
Git Repository: Standard Git repository tracking file metadata and symlinks.
Annex Storage: Separate storage for actual file contents, organized by content hash.
Location Tracking: Git Annex tracks which repositories contain which file contents.
Special Remotes: Backend storage systems like S3, rsync, or directory-based storage.
Assistant Daemon: Optional daemon for automatic syncing and file watching.
Preparing Your Repository
To deploy Git Annex on Klutch.sh, create a GitHub repository containing your Dockerfile.
Repository Structure
git-annex-deploy/├── Dockerfile├── README.md└── .dockerignoreCreating the Dockerfile
Create a Dockerfile in the root of your repository:
FROM alpine:latest
# Install git-annexRUN apk add --no-cache \ git \ git-annex \ openssh \ rsync
# Create annex userRUN adduser -D -h /annex annex
# Create directoriesRUN mkdir -p /annex/repos /annex/.ssh \ && chown -R annex:annex /annex
# Switch to annex userUSER annexWORKDIR /annex
# Initialize SSHRUN chmod 700 /annex/.ssh
# Volume for annex dataVOLUME /annex/repos
# Expose SSH portEXPOSE 22
CMD ["sh", "-c", "tail -f /dev/null"]Web-Based Annex Server
For HTTP-based access:
FROM nginx:alpine
# Install git-annexRUN apk add --no-cache git git-annex
# Create annex directoryRUN mkdir -p /annex/repos
# Configure nginx for git-http-backendCOPY nginx.conf /etc/nginx/nginx.conf
EXPOSE 80
CMD ["nginx", "-g", "daemon off;"]Creating the .dockerignore File
Create a .dockerignore file:
.git.github*.mdREADME.mdLICENSE.gitignore*.log.DS_Store.env.env.localDeploying Git Annex on Klutch.sh
Once your repository is prepared, follow these steps to deploy Git Annex:
- Select TCP as the traffic type
- External port will be 8000
- Set the internal port to 22
- Select HTTP as the traffic type
- Set the internal port to 80
- Detect your Dockerfile automatically
- Build the container image
- Attach the persistent volumes
- Start the Git Annex container
- Configure networking
Push Your Repository to GitHub
Initialize your repository and push to GitHub:
git initgit add Dockerfile .dockerignore README.mdgit commit -m "Initial Git Annex deployment configuration"git remote add origin https://github.com/yourusername/git-annex-deploy.gitgit push -u origin mainCreate a New Project on Klutch.sh
Navigate to the Klutch.sh dashboard and create a new project. Give it a descriptive name like “git-annex” or “annex-storage”.
Create a New App
Within your project, create a new app. Connect your GitHub account if you haven’t already, then select the repository containing your Git Annex Dockerfile.
Configure Traffic
For TCP-based SSH access:
For HTTP-based access:
Attach Persistent Volumes
Persistent storage is essential for Git Annex. Add the following volumes:
| Mount Path | Recommended Size | Purpose |
|---|---|---|
/annex/repos | 100+ GB | Annex repositories and file content |
Deploy Your Application
Click Deploy to start the build process. Klutch.sh will:
Initialize Your First Annex
After deployment, connect to your server and initialize an annex repository:
cd /annex/reposmkdir myannexcd myannexgit initgit annex init "server"Using Git Annex
Basic Concepts
Understanding Git Annex workflow:
| Concept | Description |
|---|---|
| Annex | Storage area for file contents |
| Symlink | Pointer in Git to annex content |
| Special Remote | External storage backend |
| Trust Level | How much Git Annex trusts a repository |
Adding Files
Add files to your annex:
# Add a file to the annexgit annex add largefile.zip
# Commit the changegit commit -m "Added largefile.zip"Getting Files
Retrieve file contents:
# Get a specific filegit annex get largefile.zip
# Get all filesgit annex get .Dropping Files
Remove local copies to save space:
# Drop local copy (keeps in other locations)git annex drop largefile.zip
# Verify content exists elsewhere firstgit annex whereis largefile.zipSyncing
Synchronize between repositories:
# Sync metadata and contentgit annex sync
# Sync and get all contentgit annex sync --contentSetting Up Clients
Cloning the Annex
On client machines:
# Clone the repositorygit clone ssh://user@your-server:8000/annex/repos/myannex
# Initialize as annexcd myannexgit annex init "laptop"
# Enable the remotegit annex enableremote originSSH Configuration
Configure SSH for your annex server:
Host annex HostName your-app-name.klutch.sh Port 8000 User annex IdentityFile ~/.ssh/id_rsaAdding Remote
Add the server as a remote:
git remote add origin annex:/annex/repos/myannexgit annex syncSpecial Remotes
Directory Remote
Simple directory-based storage:
git annex initremote backup type=directory directory=/path/to/backup encryption=noneRsync Remote
Remote storage via rsync:
git annex initremote rsync-backup type=rsync rsyncurl=user@host:/path encryption=noneS3 Remote
Amazon S3 or compatible storage:
git annex initremote s3 type=S3 bucket=mybucket encryption=sharedContent Management
Preferred Content
Configure what content to keep where:
# Server keeps all contentgit annex wanted here "standard"
# Laptop gets only recent filesgit annex wanted laptop "include=*.recent or present"Required Content
Ensure certain files are always available:
git annex required here "include=important/*"Groups
Organize repositories into groups:
# Add to groupgit annex group here backup
# Set group-based wanted contentgit annex wanted here "groupwanted"Production Best Practices
Security Recommendations
- SSH Keys: Use SSH keys for authentication
- Encryption: Enable encryption for remote storage
- Access Control: Limit who can access the annex
- Verification: Regularly verify file integrity
Storage Management
- Monitor Usage: Track storage consumption
- Deduplication: Git Annex automatically deduplicates
- Cleanup: Use
git annex unusedto find orphaned content
Backup Strategy
Protect your data:
- Multiple Remotes: Store content in multiple locations
- numcopies: Configure minimum number of copies
- Verify: Regularly verify content integrity
# Require at least 2 copiesgit annex numcopies 2
# Verify all contentgit annex fsckTroubleshooting Common Issues
Content Not Syncing
Solutions:
- Check remote connectivity
- Verify trust levels
- Review preferred content settings
- Run
git annex sync --content
Missing Content
Solutions:
- Check
git annex whereis - Verify remote is accessible
- Check if content was dropped
- Review numcopies settings
Symlink Issues
Solutions:
- Run
git annex fix - Verify annex is properly initialized
- Check filesystem support for symlinks
Additional Resources
- Official Git Annex Website
- Git Annex Walkthrough
- Special Remotes Documentation
- Klutch.sh Persistent Volumes
- Klutch.sh Deployments
Conclusion
Deploying Git Annex on Klutch.sh gives you a centralized storage backend for managing large files with Git. The combination of Git Annex’s powerful file management and Klutch.sh’s deployment simplicity means you can focus on your data rather than infrastructure.
With support for deduplication, flexible storage backends, and distributed workflows, Git Annex provides everything you need to manage large file collections. Whether you’re handling media libraries, scientific datasets, or any large binary content, Git Annex on Klutch.sh delivers reliable, version-controlled file storage.