<img src="https://docs.lakefs.io/assets/logo.svg" alt="lakeFS logo" width=300/>   <img src="https://docs.dagster.io/assets/logos/dagster_logo_primary.svg" alt="lakeFS logo" width=300/> 

# Integration of lakeFS with Dagster

**Use Case**: Isolating Dagster job run and atomic promotion to production

# Config

## lakeFS credentials

If not using the provided lakeFS server then enter your details here

In [1]:
lakefsEndPoint = 'http://lakefs:8000' # e.g. 'https://username.aws_region_name.lakefscloud.io' 
lakefsAccessKey = 'AKIAIOSFODNN7EXAMPLE'
lakefsSecretKey = 'wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY'

## Storage Information

If not using the provided MinIO object store then change the Storage Namespace to a location in the bucket you‚Äôve configured. The storage namespace is a location in the underlying storage where data for this repository will be stored.

In [2]:
storageNamespace = 's3://example/dagster/' # e.g. "s3://username-lakefs-cloud/"

# Setup

You shouldn't need to change anything in this section

## lakeFS repo name

In [3]:
repo_name = "dagster-existing-dag-repo"

## Branch names

In [4]:
sourceBranch = "main"
newBranch = "dagster_demo_existing_dag"

## Import Python packages

In [5]:
from dagster import execute_job, RunConfig
from jobs.Existing_DAG.lakefs_wrapper_dag import lakefs_wrapper_dag, LakeFSOpConfig

## Set environment variables

In [6]:
import os
os.environ["LAKEFS_ENDPOINT"] = lakefsEndPoint
os.environ["LAKEFS_CREDENTIALS_ACCESS_KEY_ID"] = lakefsAccessKey
os.environ["LAKEFS_CREDENTIALS_SECRET_ACCESS_KEY"] = lakefsSecretKey

## Set up the lakeFS Python client

_To learn more about lakeFS Python integration visit https://docs.lakefs.io/integrations/python.html_

In [7]:
import lakefs_client
from lakefs_client.models import *
from lakefs_client.client import LakeFSClient

# lakeFS credentials and endpoint
configuration = lakefs_client.Configuration()
configuration.username = lakefsAccessKey
configuration.password = lakefsSecretKey
configuration.host = lakefsEndPoint

lakefs = LakeFSClient(configuration)

#### Verify lakeFS credentials by getting lakeFS version

In [8]:
print("Verifying lakeFS credentials‚Ä¶")
try:
    v=lakefs.config.get_lake_fs_version()
except:
    print("üõë failed to get lakeFS version")
else:
    print(f"‚Ä¶‚úÖlakeFS credentials verified\n\n‚ÑπÔ∏èlakeFS version {v.version}")

Verifying lakeFS credentials‚Ä¶
‚Ä¶‚úÖlakeFS credentials verified

‚ÑπÔ∏èlakeFS version 0.101.0


## Create Repository if needed

In [9]:
from lakefs_client.exceptions import NotFoundException

try:
    repo=lakefs.repositories.get_repository(repo_name)
    print(f"Found existing repo {repo.id} using storage namespace {repo.storage_namespace}")
except NotFoundException as f:
    print(f"Repository {repo_name} does not exist, so going to try and create it now.")
    try:
        repo=lakefs.repositories.create_repository(repository_creation=RepositoryCreation(name=repo_name,
                                                                                                storage_namespace=f"{storageNamespace}/{repo_name}"))
        print(f"Created new repo {repo.id} using storage namespace {repo.storage_namespace}")
    except lakefs_client.ApiException as e:
        print(f"Error creating repo {repo_name}. Error is {e}")
        os._exit(00)
except lakefs_client.ApiException as e:
    print(f"Error getting repo {repo_name}: {e}")
    os._exit(00)


Repository dagster-existing-dag-repo does not exist, so going to try and create it now.
Created new repo dagster-existing-dag-repo using storage namespace s3://example/dagster//dagster-existing-dag-repo


----

## Dagster UI

From your *host machine* run the following: 
    
```bash
docker exec -t jupyter-notebook \
    dagster dev -h 0.0.0.0 -f jobs/Existing_DAG/lakefs_tutorial_taskflow_api_etl.py
```

You should see the output: 

```
dagster - INFO - Launching Dagster services...
dagster.daemon - INFO - Instance is configured with the following daemons:[‚Ä¶]
dagster.daemon.SensorDaemon - INFO - Not checking for any runs since no se[‚Ä¶]
dagit - INFO - Serving dagit on http://0.0.0.0:3000 in process 405
```

Open http://localhost:3000 in your web browser to see the Dagster UI

## You can review [lakeFS Wrapper DAG](./jobs/Existing_DAG/lakefs_wrapper_dag.py) and [Dagster ETL DAG](./jobs/Existing_DAG/lakefs_tutorial_taskflow_api_etl.py) programs.

## Execute lakeFS Wrapper DAG

In [10]:
job_result = lakefs_wrapper_dag.execute_in_process(
    run_config=RunConfig(
        {
            "create_etl_branch": LakeFSOpConfig(repo=repo.id, sourceBranch=sourceBranch, newBranch=newBranch),
            "trigger_existing_dag": LakeFSOpConfig(repo=repo.id, sourceBranch=sourceBranch, newBranch=newBranch),
            "commit_etl_branch": LakeFSOpConfig(repo=repo.id, sourceBranch=sourceBranch, newBranch=newBranch),
            "merge_etl_branch": LakeFSOpConfig(repo=repo.id, sourceBranch=sourceBranch, newBranch=newBranch),
        }
    )
)

2023-06-02 16:51:02 +0000 - dagster - DEBUG - lakefs_wrapper_dag - f2019977-4641-47e4-ae55-08ee6b9a474a - 862 - RUN_START - Started execution of run for "lakefs_wrapper_dag".
2023-06-02 16:51:02 +0000 - dagster - DEBUG - lakefs_wrapper_dag - f2019977-4641-47e4-ae55-08ee6b9a474a - 862 - ENGINE_EVENT - Executing steps in process (pid: 862)
2023-06-02 16:51:02 +0000 - dagster - DEBUG - lakefs_wrapper_dag - f2019977-4641-47e4-ae55-08ee6b9a474a - 862 - RESOURCE_INIT_STARTED - Starting initialization of resources [io_manager].
2023-06-02 16:51:02 +0000 - dagster - DEBUG - lakefs_wrapper_dag - f2019977-4641-47e4-ae55-08ee6b9a474a - 862 - RESOURCE_INIT_SUCCESS - Finished initialization of resources [io_manager].
2023-06-02 16:51:02 +0000 - dagster - DEBUG - lakefs_wrapper_dag - f2019977-4641-47e4-ae55-08ee6b9a474a - 862 - LOGS_CAPTURED - Started capturing logs in process (pid: 862).
2023-06-02 16:51:02 +0000 - dagster - DEBUG - lakefs_wrapper_dag - f2019977-4641-47e4-ae55-08ee6b9a474a - 862 - 

Total order value is: 1236.70


## lakeFS UI

Go to http://localhost:8000/repositories/dagster-existing-dag-repo/commits?ref=main to see the commits made to the repository including from Dagster

## More Questions?

###### Join the lakeFS Slack group - https://lakefs.io/slack