You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

185 lines
12 KiB
Markdown

# Rsync and Restic Backup Scripts
> **Thank you for visiting!** If you are viewing this repo on GitHub or GitLab, please note that this is just a mirror. Please visit the [originating repo](https://tacksupport.net/git/capntack/FOSview) for any comments, issues, pull requests, etc. You can sign in with your GitHub or GitLab account via Oauth2.
<br>
> **Disclaimer:** As with anything to do with your data, you should read and understand what this script does before applying it. I am not responsible for any mishaps. Even if you only replace the variables and run it as I do, me saying "it works on my machine" should not be sufficient. It wouldn't be for me.
<br>
A script to perform incremental backups using [rsync](https://github.com/WayneD/rsync) and [restic](https://github.com/restic/restic). Why both? Isn't that redundant? Well, I wanted to have some of my backups be more quickly retrievable. Like an on prem "cloud" storage. Rsync fits the bill for that. But I also wanted to backup and compress larger swaths of data for longer term storage. And that's where Restic comes in.
This script assumes you are running Linux, and have at least basic working knowledge of it and bash scripting. It "works on my machine" which is currently running Pop!_OS 22.04 LTS.
<br>
### Installation, Prep, and Configuration
1. Install rsync, restic, and depencies for the script:
```bash
apt install rsync restic moreutils # moreutils installs the `ts` command for timestamping the logs
```
<br>
2. Create the directory where rsync will backup to:
```bash
mkdir /path/to/dir/to/backup/to
```
<br>
3. Copy the rsync manifest template to the script's root directory, rename it as you like, and then fill it out. This will allow the `--include-from` option to only backup what you want. There is some comments in the template, but the gist of it is that the file is read in order. The initial include, `+ */` includes the `$RSYNC_SOURCE` variable from the script all directories within, recursively. The following lines are where you specify the directories and files you explicitely want to backup. The final line, `- *` excludes everything that wasn't explicitely included prior. This allows you to choose a higher directory, say $HOME, but pick and choose what you want within it instead of rsyncing the whole thing. The script also includes the `--prune-empty-dirs` option, which will prevent it from syncing all the empty directory folders within the directoris along the path to what you actually want at the end of it.
<br>
4. Ensure restic is up to date, in case the version from your repos are behind:
```bash
restic self-update
```
<br>
5. Initialize the restic "repo" (what restic calls the backup destination):
```bash
restic init --repo /path/to/repo
```
<br>
Create your repo password when prompted. Do not lose this, as you will otherwise be unable to access your backups. I would suggest a password manager. And possibly a physical copy stored in a very safe place.
<br>
6. Verify your repo is at least version 2, in order to support compression:
```bash
restic -p $REPO_PASSWORD -r path/to/repo cat config
```
If it isn't, you may need to revisit step 4 and figure out why your install isn't up to date. Then recreate the repo (you can just delete the faulty repo directory to get rid of it).
<br>
7. Run the first restic backup. This will take a while, depending on how much data you have. 250 GB took me about an hour and a half. Edit, remove, or add to the tags as desired. Tags can be shared between repos in various combinations. They can be used to search for, query, and prune your backups from their various sources. The `--exclude-caches` option will exclude directories containing the `CACHEDIR.TAG` file. Which isn't all caches, but it's a happy medium between not excluding any, and having to script/search them all out. Pay attention to lack of trailing slashes.
> Note: if you ever run this command as sudo, whether in your terminal or as a cronjob or any other way, you must always run it and other commands against that repo as sudo. So make your choice now.
```bash
restic backup --verbose --compression max \
-p $REPO_PASSWORD \
-r /path/to/repo \
--tag $TAG1 --tag $TAG2 \
--exclude-caches \
/path/to/source
```
<br>
8. Verify your backup by first fetching the snapshot ID:
```bash
restic -p $REPO_PASSWORD -r /path/to/repo snapshots
```
Then list the files within to verify eveything is there:
```bash
restic ls -p $REPO_PASSWORD -r /path/to/repo --long $SNAPSHOT_ID
```
Then compare the backup size to the size of the source. This will retrieve the uncompressed size of the repo, and it won't perfectly align. But it should give you an idea.
```bash
restic ls -p $REPO_PASSWORD -r /path/to/repo stats $SNAPSHOT_ID
```
And finally, check the integrity of the repo:
```bash
restic -p $REPO_PASSWORD -r /path/to/repo check
```
<br>
9. Copy the restic password template to the script's root directory, rename it as you like, and replace all text within it with just the password. Then secure the file:
```bash
sudo chmod 600 /path/to/restic/password/.file
```
<br>
10. Copy the restic excludes template to the script's root directory, rename it as you like, and replace the `/path/to/restic/password/.file` line with the path to your restic password file. You can also add any other excludes you would like.
<br>
11. Copy the script template to the script's root directory, rename as you like, and then fill out the variables the comments call out. Pay attention to where leading/trailing slashes are omitted. That is on purpose. I find it's best to use absolute paths, that way if you every move the script to a different directory, it won't break. A few notes and definitions above and beyond the comments in the script:
a, The script dumps a log of its output into a directory of your choosing (the first variable in the script). There's a directory in script's root directory for that, but feel free to put them wherever you like.
b. The script includes variables and scripts for both a second rsync and a second restic source/destination. You can add more or remove them as you like. Just note that each rsync really should have a separate source, destination, and manifest. While restic can have multiple sources syncing to the same repo, which also increases the benefit from its deduplication. You can also mix and match tags (though I would advise against using the exact same set of tags on two different sources). And while you can use the same password for each source, maybe don't?
c. By default, rsync will backup incrementally, but not track version history. The script gets around this by putting each new backup into its own dated directory, and then hardlinking to the inodes of already backed up files, and only backing up new files. The `--delete` option in this case simply doesn't backup a file instead of deleting it at the destination. A "latest" folder is also created for both the script to check against and for ease of finding the lastest backup. This leads us to...
d. The rsync script also allows for days of retention. After which older backup directories are deleted. And, thanks to hardlinking, files that were initially backed up in it are not deleted if they are hardlinked in any subsequent backup. `$RSYNC_RETENTION_DAYS` variables are calculated thusly: # of days wanted (i.e. 7) + the latest directory (1) + 1. So in this case, to keep 7 days worth of versioning, you would use a 9 for this variable.
e. The rsync script includes a hacky fix for an issue I ran into rsyncing to an NFS destination. After backing up to the new directory as desired and updating the `latest` hardlink, the timestamps of both would change to the most recent date for the timestamp of 21:20. I have no idea why. And that would mess with the retention if I ran the backup multiple times in a day. As they would all have the same timestamp. So in between updating the `latest` hardlink and running the retention policy, the script runs a `touch` on a `timestamp.fix` file within the `$RSYNC_DEST_PATH`, which fixes the timestamps. If you aren't backing up to an NFS destination, you likely don't need this. And if you know why this is happening, please let me know. Or clone the repo, patch it, and do a pull request so that your fix can be tested and included.
f. Pay attention to the restic tags in the script. When the script runs the forget and prune commands, it will run that against the entire repo. So you want to ensure the tags in that command match the backups you want it to actually affect. I would suggest, after running the initial backup in step 7 and then have the script ready, run it and then run the verification steps from step 8 again. Just to be sure you have it right. And if you have multiple sources going to the same repo, do the same. You can also perform [dry runs](https://restic.readthedocs.io/en/latest/060_forget.html#removing-snapshots-according-to-a-policy) on removal polices (and [on backups](https://restic.readthedocs.io/en/latest/040_backup.html#dry-runs) too, btw) to sanity check yourself before accidentally nuking your repo. See the disclaimer at the start of this README.
g. Regarding the [compression level](https://restic.readthedocs.io/en/latest/047_tuning_backup_parameters.html?highlight=compress#compression) of the restic backup, you can choose `off`, `auto`, or `max`. I ran a super scientific one run each on my backup source and got the following results:
Raw Data: 255 GB to Backup <br>
All levels of compression also deduplicate files <br>
Compression Off: 249.5 GB = 97.8% compression <br>
Compression Auto: 208.4 GB = 81.7% compression <br>
Compression Max: 206.1 GB = 80.8% compression
I did not note the time it took, but I want to say it was about an hour when set to `off`, and about an hour and a half for both `auto` and `max`. I personally leave it at the default of `auto`, as `max` didn't make much of a difference. But you can decide as you like for your own backups.
h. At the end of the script, just prior to the completion of the log file, there is a line that will delete logs older than (by default) 14 days. Feel free to remove this or to edit the retention variable at the top of the script to your liking.
<br>
12. Make the script executable:
```bash
sudo chmod u+x /path/to/script.sh
```
<br>
13. At this point, you can decide how you will run this script going forward. Whether just as you remember it (not very reliable), or set reminders for it (more reliable), or automate it in some way (best) like a crontab:
```bash
crontab -e
```
Paste something like the following to the end of your crontab:
```bash
0 0 * * * cd /path/to/script/dir/ && ./script.sh
```
You can avoid having to have crontab cd into the script's directory if you place it somewhere in your path. If you do, I would suggest copying the script you just edited to said path folder. That way you can fiddle with and test it without messing with your production script. Then replace the prod script once you have any tweaks figured out.
<br>
### Sources, Inspiration, and Further Reading
- [Rsync's Documentation](https://rsync.samba.org/documentation.html)
- [Restic's Documentation](https://restic.readthedocs.io/en/latest/010_introduction.html)
- [Inspiration for the base rsync script](https://linuxconfig.org/how-to-create-incremental-backups-using-rsync-on-linux)
- [Inspiration for how to format the include-from-file](https://stackoverflow.com/a/32527277)
- [Inspiration for command in the rsync script to delete all but the most recent directories](https://stackoverflow.com/a/4127056)
- [Inspiration for the base restic script](https://codeberg.org/Taffer/restic-scripts)
- [More Inspiration for the base restic script](https://forum.yunohost.org/t/daily-automated-backups-using-restic/16812)
- [Inspiration for code to log to the console and log file](https://unix.stackexchange.com/a/574542)
- [Inspiration for adding timestamps to the logfile](https://stackoverflow.com/a/39239416)
- [Inspiration for adding a timestamp to end of script command](https://www.baeldung.com/linux/prepend-timestamp-command-output)