vtasks: Personal Pipeline

This repository contains my personal pipeline and serves two main purposes:

Learning: It serves as a playground for trying and learning new things. For example, I've used this repository to try different orchestrators such as Airflow, Luigi and Prefect which has allowed me to deeply understand the pros and cons of each.
Automating: This is a real pipeline that runs hourly in production and allows me to automate certain repetitive tasks.

Pipeline Design with Prefect

After trying different orchestrators, I have settled on using Prefect as my preferred choice. This is mainly due to its simplicity and the fact that the free tier for personal projects works perfectly for my needs.

With Prefect, you work with Flows (commonly known as DAGs in other orchestrators) and Tasks. The DAG is created programmatically by defining Flows, which can also have subflows, and Tasks.

In my pipeline, there is a main flow called vtasks, which calls multiple subflows. Each subflow is composed of multiple tasks. The names of the flows and tasks are hierarchical to simplify monitoring. Here's an overview of the vtasks flow:

- vtasks
  ├── vtasks.backup
  │   ├── vtasks.backup.backup_files
  │   ├── vtasks.backup.clean_backups
  │   └── vtasks.backup.copy
  ├── vtasks.expensor
  │   ├── vtasks.expensor.read
  │   └── vtasks.expensor.report
  └── ...

And a zoomed-in view of the vtasks.backup subflow:

Subflows

In general, the pipeline is designed to perform the following steps: extracting data from multiple sources, transforming the data, loading it into the cloud, and finally creating interactive plots as html files.

Extract: This step involves integrating with various sources such as APIs, Google Spreadsheets, or app integrations.
Transform: The transformation step mainly utilizes pandas due to its simplicity when handling small amounts of data.
Load: All the data is stored in Dropbox as parquet files. More details about this can be found in the post: reading and writting using Dropbox
Report: In this step, static html files are created, which contain interactive plots using highcharts You can read more about this in the post: create static web pages

You can find the definition and details of each subflow in:

Subflow	Description
archive	Helps in archiving files in Dropbox by renaming them and moving them to subfolders based on the year.
backups	Creates dated backups of important files, such as a KeePass database.
battery	Processes the battery log from my phone.
cryptos	Extracts data held in exchanges and retrives crypto currencies prices.
expensor	Creates reports about my personal finances.
gcal	Creates information about how I spend my time using Google Calendar data.
indexa	Extracts data from a robo-advisor called indexa_capital.
money_lover	Extracts incomes and expenses from the Money Lover app
vbooks	Creates a report of the books I have read and the reading list.
vprefect	Exports information for the flow runs of this pipeline, allowing me to keep a history of all runs.

Finally here are some examples of the reports that I end up creating (from the expensor subflow):

Deployment

For production, I'm using Heroku (with the Eco plan at $5/month) since it greatly simplifies continuous deployment (it has automatic deploys linked to changes in the main branch) and maintenance for a small fee. In the past, I used the AWS free tier, but it was harder to maintain.

In terms of scheduling, the pipeline runs hourly and usually takes 6-8 minutes to complete. To avoid wasting resources, I'm using Heroku Scheduler, which allows me to trigger the pipeline with a cron.

Author

Arnau Villoro

License

The content of this repository is licensed under a MIT.

Name		Name	Last commit message	Last commit date
Latest commit History 484 Commits
.github		.github
auth		auth
images		images
scripts		scripts
src		src
templates		templates
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
secrets.yaml		secrets.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

vtasks: Personal Pipeline

Pipeline Design with Prefect

Subflows

Deployment

Author

License

About

Releases

Packages

Contributors 2

Languages

villoro/vtasks

Folders and files

Latest commit

History

Repository files navigation

vtasks: Personal Pipeline

Pipeline Design with Prefect

Subflows

Deployment

Author

License

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages