PC Backup is a sort of DIY Google Drive/Dropbox solution that synchronizes specified folders from your PC to an S3 bucket in a specified intervals via cronjob. It basically computes an index of files along with their last modified timestamps and if there are any changes from the previous state it deletes/uploads files from/to the S3 bucket accordingly.
- Python.3.12+
- AWS Account
- S3 bucket
- IAM user and policy for programmatic access to the S3 bucket
- AWS CLI installed and configured on your machine, and
- Means for scheduling a cronjob.
Don't forget to enable versioning and to create a lifecycle policy for the noncurrent versions of the objects in your S3 bucket so this setup can be as close to Google Drive or Dropbox as possible.
Additionally you'll need:
- boto3 (AWS SDK for Python)
- psutil (cross-platform library for retrieving information on running processes and system utilization in Python)
Clone the repo and cd into its folder:
git clone https://github.com/vlatan/pc-backup.git && cd pc-backup
Create virtual environment .venv
, activate it, upgrade pip
and install the dependencies:
python3 -m venv .venv &&
source .venv/bin/activate &&
pip install pip --upgrade &&
pip install -r requirements.txt
Create config.json
file and define several variables in a JSON document format specific to your needs:
{
"DIRECTORIES": [
"/home/john/music",
"/home/john/videos",
"/home/documents"
],
"BUCKET_NAME": "your-bucket-name",
"STORAGE_CLASS": "STANDARD_IA",
"PREFIXES": [
"__",
"~",
"."
],
"SUFFIXES": [
".log",
".out",
".crdownload",
".tmp",
".part",
".partial",
".torrent",
"desktop.ini"
],
"MAX_POOL_SIZE": 10
}
DIRECTORIES
- list of absolute paths of the folders you want to track and upload/sync to AWS bucket.
BUCKET_NAME
- the name of your AWS S3 bucket that you already prepared for this job.
STORAGE_CLASS
- AWS S3 objects storage class.
PREFIXES
- list of prefixes to exclude files/folders with those prefixes (e.g. hidden files).
SUFFIXES
- list of suffixes to exclude files/folders with those suffixes (e.g. files with certain extensions).
MAX_POOL_SIZE
- the number of files to delete/upload concurrently. If not set the script will use the number of cores on your machine as the maximum concurrent tasks. Keep in mind, large number of concurrent tasks may slow down your machine.
Schedule a cronjob (run every minute) either using aws cli
or sdk
:
*/1 * * * * cd /path/to/pc-backup && .venv/bin/python sdk.py >> logs/backup.out 2>&1
*/1 * * * * cd /path/to/pc-backup && .venv/bin/python cli.py >> logs/backup.out 2>&1
- Check if the index is consistent with the bucket on every run (needs a call to the bucket).
- Check if the index has changed during a long running operation and abandon the operation if needed.