# [Integration of lakeFS with Airflow via Hooks](https://docs.lakefs.io/hooks/airflow.html)

## Use Case: Versioning DAGs and running pipeline from hooks using a configurable version of DAGs

## Setup Task: Prerequisites

###### This Notebook requires connecting to a lakeFS Server. 
###### Run lakeFS locally with Docker (https://docs.lakefs.io/quickstart/run.html).

## Setup Task: Change your lakeFS credentials (Access Key and Secret Key)

In [None]:
lakefsEndPoint = 'http://host.docker.internal:8000'
lakefsAccessKey = 'AKIAIOSFOLKFSSAMPLES'
lakefsSecretKey = 'wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY'

## Setup Task: You can change lakeFS repo name (it can be an existing repo or provide another repo name)

In [None]:
repo = "airflow-dag-versioning-repo"

## Setup Task: Storage Information

In [None]:
storageNamespace = 's3://example/' + repo # e.g. "s3://bucket"

## Setup Task: Run additional [Setup](./airflow/DAG_Versioning/DAGVersioningSetup.ipynb) tasks here

In [None]:
%run ./airflow/DAG_Versioning/DAGVersioningSetup.ipynb

## Setup Task: Go to [Airflow UI](http://127.0.0.1:8080/home). Login by using username "airflow" and password "airflow".
### You should see "lakefs_create_dag", "lakefs_delete_dag" and "lakefs_trigger_dag" DAGs.

## You will run following steps in this notebook (refer to the image below):

##### - Create repository with the Main branch, add data file to Main branch and commit the changes
##### - Create transformation DAG on the main branch
##### - Create a new "version" branch. Pre-Create-Branch hook will trigger a DAG which will pull the DAG code from GitHub and will create the transformation DAG on the "version" branch.
##### - Add data file to "version" branch and commit the changes
##### - Post-Commit hook will trigger the transformation DAG on the "version" branch
##### - Delete or merge "version" branch
##### - If "version" branch is deleted then Pre-Delete-Branch hook will trigger another DAG which will delete transformation DAG on the "version" branch

![Step 1](./Images/AirflowDAGVersioning/AirflowDAGVersioningFull.png)

## Setup Task: Create Repository - Optional if repository exists

In [None]:
repository = lakefs.Repository(repo).create(storage_namespace=storageNamespace, default_branch=sourceBranch, exist_ok=True)
main = repository.branch(sourceBranch)
print(repository)

## Setup Task: Upload [Pre-Create-Branch Actions](./airflow/DAG_Versioning/actions_pre_create_branch.yaml) file. This action will invoke "lakefs_create_dag" DAG.

#### You can review [lakefs_create_dag](./airflow/DAG_Versioning/lakefs_create_dag_auto.py) program.

In [None]:
lakefs_demo.upload_object(main, local_path, 'actions_pre_create_branch.yaml', actions_folder_on_lakefs)

## Setup Task: Upload [Pre-Delete-Branch Actions](./airflow/DAG_Versioning/actions_pre_delete_branch.yaml) file. This action will invoke "lakefs_delete_dag" DAG.

#### You can review [lakefs_delete_dag](./airflow/DAG_Versioning/lakefs_delete_dag_auto.py) program.

In [None]:
lakefs_demo.upload_object(main, local_path, 'actions_pre_delete_branch.yaml', actions_folder_on_lakefs)

## Setup Task: Upload [Post-Commit Actions](./airflow/DAG_Versioning/actions_post_commit.yaml) file. This action will invoke "lakefs_trigger_dag" DAG.

#### You can review [lakefs_trigger_dag](./airflow/DAG_Versioning/lakefs_trigger_dag_auto.py) program.

In [None]:
lakefs_demo.upload_object(main, local_path, 'actions_post_commit.yaml', actions_folder_on_lakefs)

## Setup Task: Upload data file

In [None]:
lakefs_demo.upload_object(main, '', fileName, data_folder_on_lakefs)

## Setup Task: Commit changes and attach some metadata

In [None]:
ref = main.commit(message='Uploaded actions, DAGs and data files!',
        metadata={'using': 'python_api'})
print(ref.get_commit())

## Setup Task: Create transformation DAG on the main branch

In [None]:
lakefs_create_dag(sourceBranch, dags_folder_on_github, dag_template_filename, dag_name)

## Setup Task: Sync the DAG immediately otherwise you will have to wait for 1-2 minutes for Airflow to pick up the new DAG

In [None]:
dagbag = DagBag(include_examples=False)
dagbag.sync_to_db()
! airflow dags unpause lakefs_versioning_dag.main

## Setup Task: Now you should see "lakefs_versioning_dag.main" DAG in [Airflow UI](http://127.0.0.1:8080/home). Visualize the [transformation DAG](http://127.0.0.1:8080/dags/lakefs_versioning_dag.main/graph) on the "main" branch.

## Create a new version1 branch. lakeFS Hook will trigger [lakefs_create_dag](http://127.0.0.1:8080/dags/lakefs_create_dag/graph) DAG whenever any branch starting with "version" is created. This DAG will dynamically create the transformation DAG on the version branch.

#### It will take around 1 minute to run this task.

In [None]:
newBranch = "version1"

In [None]:
branch = repository.branch(newBranch).create(source_reference=sourceBranch)
print(branch)

## Sync the DAG immediately otherwise you will have to wait for 1-2 minutes for Airflow to pick up the new DAG

In [None]:
from airflow.models import DagBag
dagbag = DagBag(include_examples=False)
dagbag.sync_to_db()

## Now you should see "lakefs_versioning_dag.version1" DAG in [Airflow UI](http://127.0.0.1:8080/home). Visualize the [transformation DAG](http://127.0.0.1:8080/dags/lakefs_versioning_dag.version1/graph) on the "version1" branch.

## Upload new data file

In [None]:
lakefs_demo.upload_object(branch, '', 'lakefs_test_new.csv', data_folder_on_lakefs)

## Commit changes

In [None]:
ref = branch.commit(message='Uploaded new data file!',
        metadata={'using': 'python_api'})
print(ref.get_commit())

## Wait for 5 seconds. [Transformation DAG on version1 branch](http://127.0.0.1:8080/dags/lakefs_versioning_dag.version1/graph) will get triggered after the commit

## If you create a branch which doesn't start with "version" prefix then it will not auto create transformation DAG

In [None]:
newBranch = "test"

In [None]:
branch = repository.branch(newBranch).create(source_reference=sourceBranch)
print(branch)

## If you want to create transformation DAG manually on the "test" branch

In [None]:
lakefs_create_dag(newBranch, dags_folder_on_github, dag_template_filename, dag_name)

## Sync the DAG immediately otherwise you will have to wait for 1-2 minutes for Airflow to pick up the new DAG

In [None]:
dagbag = DagBag(include_examples=False)
dagbag.sync_to_db()

## Now you should see "lakefs_versioning_dag.test" DAG in [Airflow UI](http://127.0.0.1:8080/home).

## Create a new version2 branch. lakeFS Hook will trigger [lakefs_create_dag](http://127.0.0.1:8080/dags/lakefs_create_dag/graph) DAG whenever any branch starting with "version" is created. This DAG will dynamically create the transformation DAG on the version2 branch.

#### It will take around 1 minute to run this task.

In [None]:
newBranch = "version2"

In [None]:
branch = repository.branch(newBranch).create(source_reference=sourceBranch)
print(branch)

## Sync the DAG immediately otherwise you will have to wait for 1-2 minutes for Airflow to pick up the new DAG

In [None]:
dagbag = DagBag(include_examples=False)
dagbag.sync_to_db()

## Now you should see "lakefs_versioning_dag.version2" DAG in [Airflow UI](http://127.0.0.1:8080/home).

## If you want then you can change [transformation](./airflow/DAG_Versioning/transformation.py) program, save it and upload to "version2" branch e.g. you can change partitioned column from "_c1" to ""_c2" in line # 28 and 29. This change will be reflected in "version2" branch only.

In [None]:
dag_filename = 'transformation.py'
lakefs_demo.upload_object(branch, local_path, dag_filename, dags_folder_on_lakefs)

## Commit changes if you changed [transformation](./airflow/DAG_Versioning/transformation.py) program

In [None]:
ref = branch.commit(message='Changed transformation program in version2!',
        metadata={'using': 'python_api'})
print(ref.get_commit())

## Wait for 5 seconds. [Transformation DAG on version2 branch](http://127.0.0.1:8080/dags/lakefs_versioning_dag.version2/graph) will get triggered after the commit.

## Delete "version2" branch. lakeFS Hook will trigger [lakefs_delete_dag](http://127.0.0.1:8080/dags/lakefs_delete_dag/graph) DAG whenever any branch starting with "version" is deleted. This DAG will auto delete the transformation DAG on the "version2" branch.

In [None]:
branch.delete()

## More Questions?

###### Join the lakeFS Slack group - https://lakefs.io/slack