Skip to content

wodny/baskets

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Baskets - prune your backups

A tool to select and divide lines of text into groups and baskets (bins) within them according to a regex pattern and some additional rules. These lines of text (samples) are intended to be file paths or other backup archive IDs most of the time.

It may be useful when backups take up too much space and some of the old ones could be deleted.

Backup tools usually contain such a functionality but it might be too simple or maybe you just want to prune some .tgz backups.

The user defines:

  • the regular expression to divide every sample name into named regex match groups and optionally declare their types,

  • basket groups rules that:

    • first classify a sample into one of the basket groups,
    • then put a sample into one of the group's baskets (named according to the specified rule),
    • select some sample from every basket.

Syntax

The sample name pattern regex should use the (?P<name>...) groups. If the name contains __ (double underscore) if is parsed as name__type. Supported types are int and dt (for datetime).

A basket group is specified in the following way (with every part being optional): filter:basket_pattern:selection_method.

The filter might be something like date>=2023-01-01.

The basket pattern defines the way baskets are named and it might be something like ${service}-${date__Y}-${date__m}. It uses the Python's template string syntax. The template might contain groups defined in the sample name pattern.

Fields referencing groups of datetime type can have a suffix corresponding to the strftime format codes, eg. date__Y for year.

The selection method specifies how many files to keep or delete (if inversion used). It may be something like 3 to keep 3 files from every basket.

Examples

Quick pruning

Let's assume you have a list of files (eg. generated by the find command line tool) like the following (see examples/files.txt for the full list):

foo-2022-03-14.tar.gz
bar-2022-04-16.tar.gz
[...]

Then run something like the following to:

  • keep all files from year 2023 (first basket group)
  • in case of older files (second basket group) keep only one file per month per service (baskets named like foo-2022-05)
./baskets.py \
    -i -o lines \
    -b 'date>=2023-01-01' \
    -b ':${service}-${date__Y}-${date__m}:1' \
    '(?P<service>\w+)-(?P<date__dt>\d+-\d+-\d+).*' \
    < examples/files.txt

This will output a list of files to delete which you can pass to something like xargs rm.

More examples yet to come...

See also

About

Baskets - prune your backups

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages