You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

197 lines
12 KiB
Markdown

# Rsync and Restic Backup Scripts
1 year ago
> **Thank you for visiting!** If you are viewing this repo on GitHub or GitLab, please note that this is just a mirror. Please visit the [originating repo](https://tacksupport.net/git/capntack/FOSview) for any comments, issues, pull requests, etc. You can sign in with your GitHub or GitLab account via Oauth2.
<br>
> **Disclaimer:** As with anything to do with your data, you should read and understand what this script does before applying it. I am not responsible for any mishaps. This script was written for my own personal use case, and I am sharing it in hopes it will hope someone craft their own solution. Even if you only replace the variables and run it as I do, me saying "it works on my machine" should not be sufficient. It wouldn't be for me.
<br>
A script to perform incremental backups using [rsync](https://github.com/WayneD/rsync) and [restic](https://github.com/restic/restic). Why both? Isn't that redundant? Well, I wanted to have some of my backups be more quickly retrievable. Like an on prem "cloud" storage. Rsync fits the bill for that. But I also wanted to backup and compress larger swaths of data for longer term storage. And that's where Restic comes in.
This script assumes you are running Linux, and have at least basic working knowledge of it and bash scripting. It "works on my machine" which is currently running Pop!_OS 22.04 LTS.
<br>
### Installation, Prep, and Configuration
1. Install rsync, restic, and depencies for the script:
```bash
apt install rsync restic moreutils
```
Moreutils installs the `ts` command for timestamping the logs
<br>
2. Ensure restic is up to date, in case the version from your repos is behind:
```bash
restic self-update
```
<br>
3. Initialize the restic "repo" (what restic calls the backup destination):
```bash
restic init --repo /path/to/repo
```
<br>
Create your repo password when prompted. Do not lose this, as you will otherwise be unable to access your backups. I would suggest a password manager. And possibly a physical copy stored in a very safe place.
<br>
4. Verify your repo is at least version 2, in order to support compression:
```bash
restic -r path/to/repo cat config
```
If it isn't, you may need to revisit step 2 and figure out why your install isn't up to date. Then recreate the repo (you can just delete the faulty repo directory to get rid of it).
<br>
5. Create the directory where rsync will backup to:
```bash
mkdir -p /path/to/dir/to/backup/to
```
<br>
6. cd to the directory where you want to store the script and clone the repo:
```bash
git clone https://tacksupport.net/git/Tack-Support/Rsync-and-Restic-Backup-Scripts.git
```
<br>
7. Run the setup script:
```bash
cd Rsync-and-Restic-Backup-Scripts && sudo chmod +x setup.sh && sudo ./setup.sh
```
> Or, alternatively, read setup.sh and manually perform the steps.
<br>
8. Configure the `rsyncManifest`. The `--include-from` option in the script will read this file to only backup what you want. There is some comments in the manifest, but the gist of it is that the file is read in order. The initial include, `+ */` includes the `$RSYNC_SOURCE` variable from the script all directories within, recursively. The following lines are where you specify the directories and files you explicitly want to backup. The final line, `- *` excludes everything that wasn't explicitly included prior. This allows you to choose a higher directory, say $HOME, but pick and choose what you want within it instead of rsyncing the entire directory. The script also includes the `--prune-empty-dirs` option, which will prevent it from syncing all the empty directory folders within the directories along the path to what you actually want at the end of it.
<br>
9. Run the first restic backup. This will take a while, depending on how much data you have. 250 GB took me about an hour and a half. Edit, remove, or add to the tags as desired. Tags can be shared between repos in various combinations. They can be used to search for, query, and prune your backups from their various sources. The `--exclude-caches` option will exclude directories containing the `CACHEDIR.TAG` file. Which isn't all caches, but it's a happy medium between not excluding any, and having to script/search them all out. Pay attention to lack of trailing slashes.
> Note: if you ever run this command as sudo or root, whether in your terminal or as a cronjob or any other way, you must always run it and other commands against that repo as sudo/root. So make your choice now.
```bash
restic backup --verbose --compression max \
-r /path/to/repo \
--tag $TAG1 --tag $TAG2 \
--exclude-caches \
/path/to/source
```
<br>
10. Verify your backup by first fetching the snapshot ID:
```bash
restic -r /path/to/repo snapshots
```
Then list the files within to verify eveything is there:
```bash
restic ls -r /path/to/repo --long $SNAPSHOT_ID
```
Then compare the backup size to the size of the source. This will retrieve the uncompressed size of the repo, and it won't perfectly align. But it should give you an idea.
```bash
restic ls -r /path/to/repo stats $SNAPSHOT_ID
```
And finally, check the integrity of the repo:
```bash
restic -r /path/to/repo check
```
<br>
11. Replace all text in `resticPassword` with just the password.
<br>
12. Replace the `/path/to/restic/password/.file` line in `resticExcludes` with the path to your restic password file. You can also add any other excludes you would like.
<br>
12. Configure `backups.sh` by filling out the variables the comments call out. Pay attention to where leading/trailing slashes are omitted. That is on purpose. I find it's best to use absolute paths, that way if you ever move the script to a different directory, it won't break. A few notes and definitions above and beyond the comments in the script:
a. The script dumps a log of its output into a `backupLogs`.
b. If you want to run multiple rsync or restic sources/destinations on the same host, copy the relevant section and increment the variables (i.e. "01" to "02"). Just note that each rsync really should have a separate source, destination, and manifest. While restic can have multiple sources syncing to the same repo, which also increases the benefit from its deduplication. You can also mix and match tags (though I would advise against using the exact same set of tags on two different sources). And while you can use the same password for each source, maybe don't?
c. By default, rsync will backup incrementally, but not track version history. This script gets around this by putting each new backup into its own dated directory, and then hardlinking to the inodes of already backed up files, and only backing up new files. The `--delete` option in this case simply doesn't backup a file instead of deleting it at the destination. A `latest`` folder is also created for both the script to check against and for ease of finding the lastest backup. This leads us to...
d. The rsync script also allows for days of retention. After which older backup directories are deleted. And, thanks to hardlinking, files that were initially backed up in it are not deleted if they are hardlinked in any subsequent backup. `$RSYNC_RETENTION_DAYS` variables are calculated thusly: # of days wanted (i.e. 7) + the latest directory (1) + 1. So in this case, to keep 7 days worth of versioning, you would use a 9 for this variable.
e. The rsync script includes a hacky fix for an issue I ran into rsyncing to an NFS destination. After backing up to the new directory as desired and updating the `latest` hardlink, the timestamps of both would change to the most recent date for the timestamp of 21:20. I have no idea why. And that would mess with the retention if I ran the backup multiple times in a day. As they would all have the same timestamp. So in between updating the `latest` hardlink and running the retention policy, the script runs a `touch` on a `timestamp.fix` file within the `$RSYNC_DEST_PATH`, which fixes the timestamps. If you aren't backing up to an NFS destination, you likely don't need this. And if you know why this is happening, please let me know. Or clone the repo, patch it, and do a pull request so that your fix can be tested and included.
f. Pay attention to the restic tags in the script. When the script runs the forget and prune commands, it will run that against the entire repo. So you want to ensure the tags in that command match the backups you want it to actually affect. I would suggest that after running the initial backup in step 9 and then have the script ready, run it and then run the verification steps from step 10 again. Just to be sure you have it right. And if you have multiple sources going to the same repo, do the same. You can also perform [dry runs](https://restic.readthedocs.io/en/latest/060_forget.html#removing-snapshots-according-to-a-policy) on removal polices (and [on backups](https://restic.readthedocs.io/en/latest/040_backup.html#dry-runs) too, btw) to sanity check yourself before accidentally nuking your repo. See the disclaimer at the start of this README.
g. Regarding the [compression level](https://restic.readthedocs.io/en/latest/047_tuning_backup_parameters.html?highlight=compress#compression) of the restic backup, you can choose `off`, `auto`, or `max`. I ran a super scientific one run each on my backup source and got the following results:
Raw Data: 255 GB to Backup <br>
All levels of compression also deduplicate files <br>
Compression Off: 249.5 GB = 97.8% compression <br>
Compression Auto: 208.4 GB = 81.7% compression <br>
Compression Max: 206.1 GB = 80.8% compression
I did not note the time it took, but I want to say it was about an hour when set to `off`, and about an hour and a half for both `auto` and `max`. I personally leave it at the default of `auto`, as `max` didn't make much of a difference. But you can decide as you like for your own backups.
h. At the end of the script, just prior to the completion of the log file, there is a line that will delete logs older than (by default) 14 days. Feel free to remove this or to edit the retention variable at the top of the script to your liking.
i. If needed, you can debug the script by uncommenting the 21st line in `backups.sh` to print out commands ran to the log so you can see what the last command ran was. Be sure to comment it back out afterwards so your logs aren't bloated.
j. I self-host an [ntfy](https://ntfy.sh) server to receive notifications on my homelab. (Boilerplate can be found [here](https://tacksupport.net/git/Tack-Support/Boilerplates/src/branch/main/docker-compose/ntfy/docker-compose.yml).) The commented out sections from lines 23 to 33 notify me in case the script fails and lines 125 to 131 notify me if it succeeds. Both also attach the log file. Delete, use, or modify to your own use case.
<br>
13. At this point, you can decide how you will run this script going forward. Whether just as you remember it (not very reliable), or set reminders for it (more reliable), or automate it in some way (best) like a crontab:
```bash
crontab -e
```
> Run as sudo if your restic repo requires.
Paste something like the following to the end of your crontab:
```bash
PATH=/absolute/path/to/script/dir
0 0 * * * /absolute/path/to/script/dir/backups.sh
```
<br>
### Sources, Inspiration, and Further Reading
- [Rsync's Documentation](https://rsync.samba.org/documentation.html)
- [Restic's Documentation](https://restic.readthedocs.io/en/latest/010_introduction.html)
- [Inspiration for the base rsync script](https://linuxconfig.org/how-to-create-incremental-backups-using-rsync-on-linux)
- [Inspiration for how to format the include-from-file](https://stackoverflow.com/a/32527277)
- [Inspiration for command in the rsync script to delete all but the most recent directories](https://stackoverflow.com/a/4127056)
- [Inspiration for the base restic script](https://codeberg.org/Taffer/restic-scripts)
- [More Inspiration for the base restic script](https://forum.yunohost.org/t/daily-automated-backups-using-restic/16812)
- [Inspiration for code to log to the console and log file](https://unix.stackexchange.com/a/574542)
- [Inspiration for adding timestamps to the logfile](https://stackoverflow.com/a/39239416)
- [Inspiration for adding a timestamp to end of script command](https://www.baeldung.com/linux/prepend-timestamp-command-output)