HACKER Q&A
📣 Xen9

Theory of Backups


Most discussions of backups are focused on the consumerist matters.

There seems to exist, buried underneath the superficial & the common sense, theory on how to do backups well.

I've found two elements upon which better theory concerning rotatioms & other details (EG hash verification scheduling, amount of different devices) can be built.

The first is the Tower of Hanoi scheduling scheme, which we will abbreviate TOH.

The second is the Incremental-Differential-Full backups concept, which we will abbreviate IDF.

The best available resource seems to be the Acronis websites' illustrated docs: http://acronis-backup-recovery.helpmax.net/en/understanding-acronis-backup-recovery-10/tower-of-hanoi-backup-scheme/. I request that you in good faith ignore Acronis is a company selling commercial Windows software; you are free to post better links in comments were you to find better info elsewhere.

We end up with a scheme we can call IDF-TOH. In it we have three types of backups:

- Incremental, at L_0 ("Level A" in the linked resource), the most frequent level, capturing only changes made since the last backup.

- Differential, at each level that belongs to the closed interval L_0-L_n, capturing changes made since the last full backup.

- Full, at L_n, the least frequent level, capturing the whole system to be backed up.

So now, what can we do? At least the following directions could be taken in further developing a Theory of Backups:

0 (backup scheduling): The frequencies can be chosen in many ways, and I am not sure which one is most optimal. Tower of Hanoi is for every level L_a, where a belongs to closed interval 0 ... n, 2^a. Frame-Steward may or may not be of any use in this.

1 (rotations): IDF-TOH does not address the problems of rotations. IE: if you make a backup that corrupts your previous data, and then repeat the mistake, you get in trouble quick. It's ALSO noteworthy that certain mediums may better fit certain layers in IDF-TOH & the future schemes. At least, for example, adrian_b three days before this wrote:

"... Of all the optical discs that have been available commercially, those with the longest archival time were the pressed CD-ROM with gold mirrors, where the only degradation mechanism is the depolymerization of the polycarbonate, which could make them fragile, but when kept at reasonable temperatures and humidities that should require many centuries..."

Consequently these would be the best for the Full backups, while Solid State Drives may work for the Incremental ones.

2 (perfecting IDF): The IDF scheme may not be perfect either & can probably be refined more or less.

3 (hashing): Verifying the backups matters & should be a part of a complete scheme.

---

This may not be valuable for all businesses but most invididuals already using rsync or borg would probably prefer to use the best available scheme if reduces probability of incidental data loss at minimal effort. The task of translating the best possible scheme to a config program with humane interface is an undertaking of its own.


  👤 mikewarot Accepted Answer ✓
My practically minded friend has an interesting scheme for backups. He uses clonezilla or similar products to clone the existing drives of a machine to new replacements. Then puts the old ones in a safe place. On a periodic basis.

The other normal backups are usually managed by someone else, he just does the hardware, most of the time.

His backups are tested by experience.


👤 eschneider
Rule 0 for backups: Whatever scheme you use, periodically verify your backups by restoring some files off them. Before you have a problem. You'll thank yourself eventually.

👤 sandreas
I'm missing the threat of ransomware in your explanation. The best concept does not help, if everything is always online and ransomware is able to encrypt your backups.

I personally use the following backup strategy:

- Setup an encrypted ZFS Storage in the network (e.g. TrueNAS - in my case it is Proxmox)

- Enable zfs-auto-snapshot for 15 min snapshots auto rotation (keep 24 daily, etc.)

- NEVER (!) type in the passwords of ZFS Storage permitted users on any client, that could be affected by ransomware

- Provide a user authenticated samba share to store all important data - try to prevent local storage of data

- Sync the ZFS snapshots to an external USB drive every night (I use a tasmota shelly plug and an external usb case to power off the devices if they are not needed)

  # create current snapshot
  zfs snapshot -r "$NEW_POOL_SNAP"

  # first backup
  zfs send --raw -R "$SRC_POOL@$NEW_SNAP_NAME" | pv | zfs recv -Fdu "$DST_POOL"

  # incremental backup
  zfs send --raw -RI "$BACKUP_FROM_SNAPSHOT" "$BACKUP_UNTIL_SNAPSHOT" | pv | zfs recv -Fdu "$DST_POOL"
- On Windows and macOS, backup the OS on an external drive

- Use restic to keep an additional copy of the local files and folders somewhere else

- Use a bluray burner to backup the most important stuff as a restic repository or encrypted archive (like very important documents, the best photo collections of you family, Keepass database, etc.) and put it to another location

- If cloud storage is affordable for the amount of data you have, consider using restic to store your stuff in the cloud

- From time to time try to restore a specific file from the backup and check if it worked and try to restore a full system (on an additional harddisk).

This may sound overkill, but ransomware is a pretty bad thing these days, even if you think you are not one of its targets.


👤 t_believ-er873
Everything depends on the security compliance needs.

Regarding backup scheduler - sometimes companies need to have frequent backups due to their RPOs and RTOs, for example, if they operate in a highly regulated industry. If someone can tolerate the loss of data of two hours, then, they need to have backup performed every 2 hours, if we speak here about 8 hours (working day), so why not to have backups on a daily basis?

Regarding rotations - everything depends on a backup solution, if it provides with immutable backups, so the entire data won't be corrupted. Thus, the faster someone notices the mistake, the faster they can restore their copy. IDF helps more to decide the issue with storage - not to overload it (here also worth mentioning deduplication and compression).


👤 scrapheap
A few things that I find don't get considered when contemplating backup procedures are:

1. How long should you keep backups for - is the content of your backup covered by privacy laws that require you to not have copies of it after a certain period of time? is there a point where the content of your back up is so old that it's the logical equivalent of not having made a back up in the first place?

2. How much does your backup process cost - if it costs more to back up a system than it would cost you if you lost it, then you've got the backup process wrong (interestingly this can be affected by economies of scale)

3. What do you need to restore a backup - does your system requires bespoke hardware that might have been lost in whatever disaster you're trying to recover from?


👤 dakiol
If I want to know about Operating System, I know which books to read. What's the equivalent for backups? It seems to me I merely rely on blog posts and that's something I'm not comfortable with. There are some books out there that perhaps dedicate at most 1 chapter to backups, but those are usually outdated books and do not contain much practical information.

👤 brudgers
YAGNI is among my theories for personal backups. Data is not precious. It is a burden. When something matters, I print it. Or put it on Facebook or Youtube.

…but I never delete because the more copies of the same thing there are, the more likely it will survive. If in fact I need it, time spent searching is far shorter than tedious backup procedure.

In addition, if I have to recreate something version 2 will be better because I keep getting better at the things I do.

But that is me not you. Good luck.


👤 illuminant
Backups are like prayers, only if you make backups you won't need prayers.