## Introduction

In this codelab, we will learn how to create some `task` and make a basic `Dataflow` with them by using EnMa-SDK. Next we will learn how to visualize the flow and run it.

## Dependencies

In this first example we will use the following EnMa-SDK dependencies. In this case we import:

- **Dataflow:** this is the context manager to define custom `Dataflow`. It has a parameter to give a name to it.
- **task:** it is the decorator used for defining the flow tasks. Every function specified with `@task()` can be used inside the `Dataflow`. In this codelab we will use the `name` parameter for give an alias to the task.
- **metrics** we can use `metrics` to get some custom metrics during the flow execution.

In [1]:
from enma import Dataflow, metrics, task

In addition, we use `randmon` and `time` dependencies as part of our example. We have to import it too.

In [2]:
import random
import time

## Tasks Definition

Once all the resources are imported, we can start defining the tasks for our Dataflow. The tasks for the example are very simple and they are just for showing how to use them.

As you can see we set the optional `name` parameter in the first two task, with a different value than the name. When we execute the Dataflow we will see how this name is shown.

Inside some task we use the `metrics` module to define some metrics for the Dataflow.

We can create as many `task` functions as require our Dataflow.

One important thing we have to know about tasks functions is they can return a value or not, but in case they return a value **it's not allowed to return multiple values**.

In [3]:
@task(name="my-load-data")
def load_data(input_cond):
    # Create example data list.
    result = input_cond * [1]
    
    # Define a metric for the data lenth.
    metrics.add("data_len", len(result))
    return result

@task(name="custom-transform-data")
def transform_data(data):
    # We do some transformation in data.
    result2 = [x if random.random() > 0.5 else 0 for x in data]
    
    # We simulate some delay during the process.
    time.sleep(5)
    
    return result2

@task()
def get_results(data):
    # This task just create another metric.
    metrics.add("sum_of_data", sum(data))

## Dataflow Definition

Now we have the `task` we can define the Dataflow. This is a context manager so we use the statement `with`, give a name to the Dataflow as a parameter and an alias for use it later.

Inside the `with` statement we define the flow like this:

In [4]:
with Dataflow("my-example-dataflow") as flow:
    # First task is load the data.
    out_1 = load_data(10)
    
    # The second task is doing a simple transformation.
    out_2 = transform_data(out_1)
    
    # The last part is getting the results.
    get_results(out_2)

## Definition doesn't mean execution

At this point we realize that, despite having introduced a delay as part of the process, the cell has executed instantly.

This is because EnMa-SDK handles the definition process inside the context manager and skip the execution of the `task` decorated functions used inside.

Internally EnMa-SDK builds the execution graph and let it prepared untill we run the `Dataflow`.

One useful functionality which offers EnMa-SDK is `pprint` the `Dataflow`.

In [5]:
flow.pprint()

In the graphical representation of the `Dataflow` we can see the execution order of the `task`. This can help us when we define more complicated flows.

Each box represents a `node`. This is the internal name of the execution unit, has an **order number** associated, and in the first line of the box we can see the **type** which is `Task` in all of three of the example. Later we will see that there are other differente types.

The second line of each box is the **name** of the `node`, or `task` in this case. The first two takes the `name` parameter we set during the `task` definition. Because it is an optional parameter, the third `task` uses the **function name** as value for the **name**.

The third line shows the status of the `node`. As we said before, the tasks are not executed yet until we run the `Dataflow`, so `non executed` are the status for the three tasks.

## Dataflow Execution

We can `run` the `Dataflow` from here using the function with the same name.

In [6]:
flow.run()

After some while depending on the value we put on the sleep call, we get some output information as result of the `run`.

First we have the `Dataflow` **representation**. Is the same graph that we saw when `pprint` the flow but now we can see how all tasks have `executed` in the status row. In fact, if we will execute `flow.pprint()` again we will get this same result. This is because now the `Dataflow` has been executed and the information is stored internally.

After the flow representation we have a summary about the **performance** of each task. Two important points here, the first one is that only `task` nodes will be measured, and the second is related to `elapsed` values. The measured elapsed time for `task` execution only takes account the real computation time. Because of this, the second task measured time is small than the sleep time.

The last part of the outputs is related to the **metrics**. Each row of the table shows the node order number, the name of the node, the metric name and the value for the metric. Using the first two columns we can check in the graph representation where in the flow the metric was created.

Congratulations! We made our first `Dataflow` using EnMa-SDK, understood all the concepts and different parts, ran it and check the output results. Now we are ready for learn more advance features of EnMa-SDK.