Welcome to snapintime’s documentation!

snapintime is meant to manage the creation, culling, and send to a remote location of btrfs snapshots.

At this point in time it creates snapshots, culls local snapshots three days old, and can btrfs send/receive to a remote btrfs subvol.

Usage

Grab config.json from the Github repo (https://github.com/djotaku/Snap-in-Time), edit it, and place it in $HOME/.config/snapintime (or /root/.config/snapintime/ if you’re going to run as root)

Creating Local Snapshots

If running from a git clone:

pip -r requirements.txt
cd snapintime
python create_local_snapshots.py

If running from PyPi, run: python -m snapintime.create_local_snapshots

If you want to run it from cron in a virtual environment, you can adapt the following shell script to your situation:

#!/bin/bash
cd "/home/ermesa/Programming Projects/python/cronpip"
source ./bin/activate
python -m snapintime.create_local_snapshots

Make it executable and have cron run that script as often as you like.

For a more involved script, useful for logging, see Putting it All Together.

Backing Up to Remote Location

This code makes the assumption that you have setup ssh keys to allow you to ssh to the remote machine without inputting a password. It is recommended to run the remote backup code BEFORE the culling code to increase the chances that the last snapshot on the remote system is still on the local system. (This will minimize the amount of data that has to be transferred to the remote system.

pip -r requirements.txt
cd snapintime
python remote_backup.py

If running from PyPi, run: python -m snapintime.remote_backup

Culling Local Snapshots

The culling follows the following specification:

  • Three days ago: Leave at most 4 snapshots behind - closest snapshots to 0000, 0600, 1200, and 1800. (implemented)

  • Seven days ago: Leave at most 1 snapshot behind - the last one that day. In a perfect situation, it would be the one taken at 1800. (implemented)

  • 90 days ago: Go from that date up another 90 days and leave at most 1 snapshot per week. (implemented)

  • 365 days ago: Go form that date up another 365 days and leave at most 1 snapshot per quarter (implemented)

(Not going to care about leap years, eventually it’ll fix itself if this is run regularly)

I recommend running culling submodule AFTER remote backup (if you’re doing the remote backups). This is to prevent the removal of the subvol you’d use for the btrfs send/receive. If your computer is constantly on without interruption, it shouldn’t be an issue if you’re doing your remote backups daily. And why wouldn’t you? The smaller the diff betwen the last backup and this one, the less data you have to send over the network. So it’s more of a precaution in case you turn it off for a while on vacation or the computer breaks for a while and can’t do the backups.

pip -r requirements.txt
cd snapintime
python culling.py

If running from PyPi, run: python -m snapintime.culling

Putting it All Together

Here is my crontab output:

0 * * * * /root/bin/snapshots.sh
@daily /root/bin/remote_snapshots.sh
0 4 * * * /root/bin/snapshot_culling.sh

remote_snapshots.sh:

#!/bin/bash

cd "/home/ermesa/Programming Projects/python/cronpip"
source ./bin/activate
echo "#######################" >> snapintime_remote.log
echo "Starting remote backups" >> snapintime_remote.log
python -m snapintime.remote_backup >> snapintime_remote.log
echo "######################" >> snapintime_remote.log
#!/bin/bash

snapshot_culling.sh:

#!/bin/bash

cd "/home/ermesa/Programming Projects/python/cronpip"
source ./bin/activate
echo "#######################" >> snapintime_culling.log
echo "Starting culling" >> snapintime_culling.log
python -m snapintime.culling >> snapintime_culling.log
echo "######################" >> snapintime_culling.log

snapshots.sh:

#!/bin/bash

cd "/home/ermesa/Programming Projects/python/cronpip"
source ./bin/activate
echo "#######################" >> snapintime.log
echo "Starting snapshots" >> snapintime.log
python -m snapintime.create_local_snapshots >> snapintime.log
echo "######################" >> snapintime.log

config.json

An example of the config.json file:

{   "0":
{ "subvol": "/home",
"backuplocation": "/home/.snapshot",
"remote": "True",
"remote_location": "user@server",
"remote_subvol_dir": "/media/backups"
},
"1":
{ "subvol": "/media/Photos",
"backuplocation": "/media/Photos/.Snapshots"
},
"2":
{ "subvol": "/media/Archive",
"backuplocation": "/media/NotHome/Snapshots/Archive"
}
}
  • For the 0, 1, 2, 3, etc - there is currently (as of 0.8.1) not any inherent meaning to the fact that they are numbers. They just need to be distinct alpha-numberic sequences.

  • subvol: this should be the subvolume you want to create a snapshot of.

  • backuplocation: the subvolume that holds your backup subvolumes.

  • remote: If set to True, an attempt will be made to backup to the remote location. Any other value or lack of this field means it will not try and backup to the remote location.

  • remote_location: The username@theserver where the backup subvolumes will be sent to.

  • remote_subvol_dir: Just like backuplocation, but on the remote machine.

Origins of my culling algorithm

I’m basing it on a conversation I had in the btrfs mailing list. Here’s how the guy who inspired me, Duncan, explained it to me:

“However, best snapshot management practice does progressive snapshot thinning, so you never have more than a few hundred snapshots to manage at once. Think of it this way. If you realize you deleted something you needed yesterday, you might well remember about when you deleted it and can thus pick the correct snapshot to mount and copy it back from. But if you don’t realize you need it until a year later, say when you’re doing your taxes, how likely are you to remember the specific hour, or even the specific day, you deleted it? A year later, getting a copy from the correct week, or perhaps the correct month, will probably suffice, and even if you DID still have every single hour’s snapshots a year later, how would you ever know which one to pick? So while a day out, hourly snapshots are nice, a year out, they’re just noise.

As a result, a typical automated snapshot thinning script, working with snapshots each hour to begin with, might look like this:

Keep two days of hourly snapshots: 48 hourly snapshots

After two days, delete five of six snapshots, leaving a snapshot every 6 hours, four snapshots a day, for another 5 days: 4*5=20 6-hourly, 20 +48=68 total.

After a week, delete three of the four 6-hour snapshots, leaving daily snapshots, for 12 weeks (plus the week of more frequent snapshots above, 13 weeks total): 7*12=84 daily snaps, 68+84=152 total.

After a quarter (13 weeks), delete six of seven daily snapshots, leaving weekly snapshots, for 3 more quarters plus the one above of more frequent snapshots, totaling a year: 3*13=39 weekly snaps, 152+39=191 total.

After a year, delete 12 of the 13 weekly snapshots, leaving one a quarter. At 191 for the latest year plus one a quarter you can have several years worth of snapshots (well beyond the normal life of the storage media) and still be in the low 200s snapshots total, while keeping them reasonably easy to manage. =:^)”

create_local_snapshots

Read in configuration file and create local snapshots.

snapintime.create_local_snapshots.create_snapshot(date_suffix: str, subvol: str, backup_location: str)

Create a btrfs snapshot.

Parameters
  • date_suffix – a datetime object formatted to be the name of the snapshot

  • subvol – The subvolume to be snapshotted

  • backup_location – The folder in which to create the snapshot

snapintime.create_local_snapshots.get_date_time() → str

Return the current time, uses system time zone.

snapintime.create_local_snapshots.iterate_configs(date_time: str, config: dict) → list

Iterate over all the subvolumes in the config file, then call create_snapshot.

Parameters
  • date_time – The date time that will end up as the btrfs snapshot name

  • config – The config file, parsed by import_config.

Returns

A list containing return values from create_snapshot

snapintime.create_local_snapshots.main()

Culling

Thin out the snapshots on disk.

snapintime.culling.btrfs_del(directory: str, subvols: list) → list

Delete subvolumes in a given directory.

Parameters
  • directory – The directory holding the subvolumes.

  • subvols – A list of subvolumes to delete

Returns

A list with the commands run and the results or, if there weren’t any subvolumes to delete, returns a message with that information.

snapintime.culling.cull_last_quarter(config: dict) → list

Cull the btrfs snapshots from quarter.

Should leave 1 snapshot per week for 13 weeks.

Parameters

config – The configuration file.

Returns

A list containing the results of running the commands.

snapintime.culling.cull_last_year(config: dict) → list

Cull the btrfs snapshots from quarter.

Should leave 1 snapshot per week for 13 weeks.

Parameters

config – The configuration file.

Returns

A list containing the results of running the commands.

snapintime.culling.cull_seven_days_ago(config: dict) → list

Cull the btrfs snapshots from 7 days ago.

Parameters

config – The configuration file.

Returns

A list containing the results of running the commands.

snapintime.culling.cull_three_days_ago(config: dict) → list

Cull the btrfs snapshots from 3 days ago.

Parameters

config – The configuration file.

Returns

A list containing the results of running the commands.

snapintime.culling.daily_cull(dir_to_cull: list) → list

Take a list of snapshots from a directory (already reduced to one day) and cull.

This culling will produce the closest it can to 4 snapshots for that day.

For a perfect set of 24 snapshots, it should leave behind (remove from list):

  • day1-0000

  • day1-0600

  • day1-1200

  • day1-1800

Parameters

dir_to_cull – A list containing snapshots. Assumes another function has already reduced this list to a list containing only one day’s worth of snapshots.

Returns

A list containing all the subvolumes to cull.

snapintime.culling.get_subvols_by_date(directory: str, reg_ex) → list

Return a list based on matching regular expression.

This is meant to produce the list that will be the input for one of the culling functions.

Parameters
  • directory – The directory we want to grab subvols from.

  • reg_ex – A regular expression to apply to the directory contents.

Returns

A list of subvolumes for culling.

snapintime.culling.main()
snapintime.culling.print_output(list_of_lists: list)
snapintime.culling.quarterly_yearly_cull(dir_to_cull: list) → list

Take a list of snapshots from a directory (already reduced to one week or quarter) and cull.

This culling will return a list with the snapshots to remove for the week or quarter.

For a perfect set of snapshots, (where the user has been doing one snapshot per hour and doing the daily culling) it should leave behind (remove from list):

  • day7-1800

Note

May end up combining with weekly cull as they essentially do the same thing.

Parameters

dir_to_cull – A list containing snapshots. Assumes another function has already reduced this list to a list containing only one week or quarter’s worth of snapshots.

Returns

A list containing all the subvolumes to cull.

snapintime.culling.split_dir_hours(subvols: list, reg_ex) → list

Return a list based on matching regular expression.

Parameters
  • subvols – A list of subvolumes.

  • reg_ex – A re object defining the regular expression to evaluate against.

Returns

A list that has only the items that passed the regular expression.

snapintime.culling.weekly_cull(dir_to_cull: list) → list

Take a list of snapshots from a directory (already reduced to one day) and cull.

This culling will return a list with the snapshots to remove for the given day.

For a perfect set of snapshots, (where the user has been doing one snapshot per hour and doing the daily culling) it should leave behind (remove from list):

  • day1-1800

Parameters

dir_to_cull – A list containing snapshots. Assumes another function has already reduced this list to a list containing only one day’s worth of snapshots.

Returns

A list containing all the subvolumes to cull.

remote_backup

Use btrfs send/receive to send snapshot to a remote computer.

snapintime.remote_backup.btrfs_send_receive(local_subvols: list, remote_subvol: str, backup_location: str, remote_location: str, remote_subvol_dir: str)

Run command to send/receive btrfs snapshot.

Parameters
  • local_subvols – A list of the local subvolumes to choose from.

  • remote_subvol – The latest subvolume that is present on both the remote and local systems.

  • backup_location – The folder prefix for teh local_subvol

  • remote_location – This should be a string like user@computer or user@IPaddress

  • remote_subvol_dir – This is the directory we will put the backup into on the remote system.

Returns

A dictionary with the result of the command.

snapintime.remote_backup.get_local_subvols(local_subvol_dir: str) → list

Grab the subvolumes from the local directory.

Parameters

local_subvol_dir – The directory containing the local subvolumes.

Returns

A list of all the local subvolumes

snapintime.remote_backup.get_remote_subvols(remote_location: str, remote_subvol_dir: str) → list

Retrieve the remote subvolumes.

This function assumes user has set up ssh keys for paswordless login.

Parameters
  • remote_location – This should be a string like user@computer or user@IPaddress

  • remote_subvol_dir – This is the directory we will search to get the latest subvolume.

Returns

A list of the remove subvolumes.

snapintime.remote_backup.iterate_configs(config: dict) → list

Iterate over all the subvolumes in the config file, then call btrfs_send_receive if the value of remote is “True”.

Parameters

config – The config file, parsed by import_config.

Returns

A list containing return values from btrfs_send_receive

snapintime.remote_backup.main()
snapintime.remote_backup.match_subvols(local_subvols: list, remote_subvols: list) → str

Return the latest remote subvol that also exists on the local system.

Parameters
  • local_subvols – A list of the local subvolumes.

  • remote_subvols – A list of the remote subvolumes.

Returns

The subvolume.

config

Load the config file.

snapintime.utils.config.import_config() → dict

Import config file.

Returns

A dictionary containing configs

Raises

FileNotFoundError

date

Provide date and Time Operations needed by snapintime.

snapintime.utils.date.many_dates(start_date: datetime.datetime, interval_start: int, interval_end: int) → list

Provide a list of dates within a certain range.

Used by quarterly culling and yearly culling to determine date range to cull.

Parameters
  • start_date – The reference point for the intervals

  • interval_start – How many days ago you want to start getting dates from.

  • interval_end – How many days ago you want to stop getting dates from.

snapintime.utils.date.prior_date(start_date: datetime.datetime, day: int = 0) → datetime.datetime

Provide a prior date offset by the variable given in day.

Unintuitively, positive numbers subtract days.

Parameters
  • start_date – The date from which you want to count back or forward.

  • day – The number of days you want to go back.

Returns

A datetime object day amount of days in the past.

snapintime.utils.date.quarterly_weeks(start_date: datetime.datetime) → list

Provide a list of 13 weekly date lists.

Parameters

start_date – Date from which to go back a quarter.

Returns

A list of lists containing datetime objects. Each sublist represents a week.

snapintime.utils.date.yearly_quarters(start_date: datetime.datetime) → list

Provide a list of 4 quarterly date lists.

Parameters

start_date – Date from which to go back a year.

Returns

A list of lists containing datetime objects. Each sublist represents a quarter.

Indices and tables