How does Borg's deduplication actually work?

Borg splits every file into variable-sized chunks and hashes each chunk. If a chunk's hash already exists in the repository, Borg stores a reference instead of the chunk. The result: the second backup, and every backup after, is essentially a delta, even though the user-facing experience is 'a full backup every night'. For typical server data, the storage required for 60 nightly backups is 1.5x to 2x the size of a single full backup, not 60x.

Is Borg's encryption really safe?

Yes, when you use the default modes. Borg encrypts every chunk client-side with AES-256-CTR and authenticates with HMAC-SHA256 (or BLAKE2). The repository host only ever sees ciphertext. The two modes worth knowing: `repokey-blake2` stores the key in the repository (encrypted with your passphrase) for convenience, `keyfile-blake2` keeps the key entirely off the repository for paranoid setups.

Why use Borgmatic instead of running Borg directly?

Because the third time you write `borg create --stats --compression auto,zstd,11 --exclude-caches /etc /var/www ...`, you'll wish you had a YAML file. Borgmatic is exactly that: one config file describes sources, repositories, retention, and integrity checks. A single `borgmatic` command runs the whole pipeline.

Where should I store the repository?

Off-site, ideally on infrastructure you don't manage day-to-day. Three patterns: a second server you control (cheapest, riskier), an S3-compatible bucket via rclone (cheap, slightly clunky), or a managed Borg host like BorgBase (cleanest, native support). For production servers, BorgBase is what I default to.

How often should I test my restores?

Quarterly at minimum. I've audited backup setups where the team had been running nightly backups for three years, never restored one, and only discovered when something went wrong that the backups had been silently failing for 18 months. A backup you've never restored is a file, not a backup.

Borg Backups: Encrypted, Deduplicated Backups That Don't Break the Storage Budget

The first time I had to restore a client’s website from backup, I learned the difference between “we have backups” and “we have working, recent, restorable backups”. The former is a checkbox; the latter is what saves your weekend when a SQL injection wipes the customer table at 2am.

Borg is the backup tool I run on every managed server I touch. It’s open-source, encryption is on by default, and the deduplication is so effective that nightly backups of multi-terabyte servers don’t fill the storage in a month.

What Borg actually does

Borg is a deduplicating, encrypting, compressing backup tool. From the user’s perspective, the workflow is:

Initialize a repository (a directory or remote server location).
Run borg create to back up specific paths into the repository.
Optionally borg prune to delete old archives on a retention schedule.
Run borg extract or borg mount when you need to restore.

What makes Borg interesting is what happens underneath. Every file gets split into chunks. Each chunk gets a hash. If a chunk’s hash already exists in the repository, Borg doesn’t store it again. It stores a reference.

For a server where most data doesn’t change night-to-night (database tables, system files, large binary uploads), this means:

The first backup is full-sized.
Every subsequent backup is essentially a delta.
A repository holding 60 nightly backups of a 200GB server might be 250GB total, not 12TB.

Add compression on top (Borg supports zstd, lz4, lzma, and zlib), and the repository sizes drop further.

Why encryption matters here

Borg encrypts every chunk at the client side, with the key never leaving your server. The repository (where the backups live) only ever sees ciphertext. This is the part that makes Borg fundamentally different from rsync or tarball backups: even if your backup destination is compromised, the attacker has random bytes.

The default encryption mode is repokey-blake2: the key is stored in the repository, encrypted with your passphrase. For paranoid setups, keyfile-blake2 keeps the key entirely off the repository server, so the operator of the backup host can’t decrypt your data even with full access to their own infrastructure.

Borgmatic, the wrapper that makes Borg manageable

Borg’s command-line interface is precise but verbose. After the third time you write borg create --stats --compression auto,zstd,11 --exclude-caches ..., you’ll want a wrapper.

Borgmatic is exactly that. One YAML file describes everything:

location:
  source_directories:
    - /etc
    - /var/www
    - /var/lib/mysql-backups
  repositories:
    - ssh://...@borgbase.com/./repo

retention:
  keep_daily: 7
  keep_weekly: 4
  keep_monthly: 6

consistency:
  checks:
    - repository
    - archives

A single borgmatic command runs the backup, prunes old archives per the retention rules, runs integrity checks, and exits. Wire it to a systemd timer or a cron job and the rest is automatic.

Where to store the repository

Three patterns I’ve used:

A second server you control. Cheapest if you already have spare disk. Risk: if both servers live in the same data center, a regional outage takes both down at once.
An S3-compatible bucket via rclone or similar. Decent off-site, cheap at terabyte scale, but the workflow is slightly clunky because Borg wants block-level access to its repo.
A managed Borg host like BorgBase. Built specifically for Borg and Restic repositories, with native protocol support, monitoring, and per-repository key management. This is the path I default to for production hosts.

BorgBase is run by the same team that builds PikaPods, and the privacy ethos shows in both products. They sponsor open-source backup tools (Vorta, the desktop client) directly.

What Borg doesn’t fix

Borg is a backup tool, not a disaster recovery plan:

It doesn’t decide what to back up. You pick paths. If you forget /etc/letsencrypt, your backup won’t have your TLS certs. Audit your borgmatic config every six months.
It doesn’t replace application-level backups. A borg create of /var/lib/mysql while MySQL is running gives you a torn database. Use mysqldump (or pg_dumpall, or whatever your stack uses) to a SQL file first, then back the SQL file up.
It doesn’t manage your encryption passphrases. If you lose the passphrase, the data is gone. Treat the passphrase like the recovery key it is, and store it somewhere you’ll find in a year (a hardware-backed password manager is the right answer).

Closing the loop

Borg is the boring, reliable backup tool that just works. Encrypted by default, deduplicating to absurd ratios, with a maintainer team that responds to issues fast. The combination of Borg + Borgmatic + BorgBase is the default backup stack for every managed server I run.

If your current backups are “we hope they’re working”, the Cloud Infrastructure Audit & Hardening engagement always includes a backup audit and a tested recovery plan as part of the deliverables. For more on the open-source tooling I run for clients, the open-source solutions category has the rest.