# Dataflow notebook

Our Dataflow **must** be coded in this notebook. The platform provides some cool automations to ensure the correct dataflow life cycle; this is why we are asking you to please use this notebook. Don’t panic! You can find some useful guidelines in the sections below.

## Dataflow development

First, we import the EnMa-SDK and required other packages for Dataflow development.

In [None]:
from enma import Dataflow, assertion, condition, ifelse, logs, metrics, parameters, task

Now, we develop the dataflow:

In [None]:
@task(name="a")
def task_a(input_cond):
    result = input_cond * [1]
    return result


@condition()
def task_b(stat_choice):
    return stat_choice == 0


@task(name="c")
def task_c(data):
    # Define Logs and Metrics
    logs.add("log_task_c")
    metrics.add("metric_task_c", 3)
    return 3


@task(name="d")
def task_d(data):
    # Define Logs and Metrics
    logs.add("log_task_d")
    metrics.add("metric_task_d", 3)
    return 4


@assertion(name="a_equal_b")
def assert_equals(a, b):
    assert a == b


with Dataflow("my-dummy-etl") as flow:

    # Define Parameters as key,value pairs.   
    n_elem = parameters.add("n_elem", 10)
    choice = parameters.add("choice", 0)

    #Define task within the dataflow.
    out_1 = task_a(n_elem)
        
    # Metrics can be defined at dataflow level (as here) or 
    # at task level (like in task_c or task_d).
    metrics.add("metric_flow", out_1)

    # Ifelse condition evaluates the condition defined as task_b. 
    # If function exection is True executes `task_c` else executes `task_d`.
    out_2 = ifelse(
        condition=task_b(choice),
        true_condition=task_c(out_1),
        false_condition=task_d(out_1),
    )

    # Logs can be defined at dataflow level (as here) or 
    # at task level (like in task_c or task_d).
    logs.add("log_flow")

    # Assertion enables to define function which evaluates objects.
    assert_equals(out_2, 3)

**Dataflow development - Completed**

## Dataflow local execution

In [None]:
flow.pprint()

Next cell shows dataflow pipeline diagram with all the information related to each node.

Once our dataflow is developed we can run it. The result of the Run shows the pipeline structure again, the elapsed time for each node and a table with the metrics generated at each task.

In [None]:
flow.run() # run could accept verbose argument [flow.run(verbose=0)]

## Register dataflow

Now we can register our Dataflow. With the following function a form is shown to push our Dataflow to Bitbucket and register it at the Engine section of Sandbox.

In [None]:
flow.register()

**Register dataflow - Completed**