Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
pwambach authored Feb 6, 2024
1 parent 7bdb1d1 commit dea9493
Showing 1 changed file with 4 additions and 0 deletions.
4 changes: 4 additions & 0 deletions pipeline/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,10 @@ The new (2023) Apache Airflow based pipeline is the successor of the previous Cl

Each dataset has it's own DAG (directed acyclic graph) file. This python file uses the Airflow API to build up a graph structure which describes each processing step. Steps can be processed sequentielly or in parallel. The DAG files can be found in `/pipeline/dags`. Many common tasks for this like download/upload of files from Google Cloud Storage or GDAL transforms have been moved into a library file which can be found at `/pipeline/dags/task_factories.py`.

Example DAG:
<img width="100%" alt="dag" src="https://github.com/ubilabs/esa-climate-from-space/assets/1611635/93a501e5-cdbf-4784-a987-147e0f32e031">


## DAGs and data flow

For now the Airflow pipelines start with downloading the (preprocessed) files from our GCS bucket `gs://esa-cfs-cate-data`. Note that the scripts to download these files from the data hub and upload to GCS are the same as for the old pipeline and can be found in `/data/downloads`. In the future these download tasks will probably also be implemented as Airflow DAGs.
Expand Down

0 comments on commit dea9493

Please sign in to comment.