# Pipeline

This notebook executes the [Transform notebook](<./01 - transform.ipynb>) followed by the [Load Elastic notebook](<./02 - load-elastic.ipynb>). It also saves a copy of both notebooks into the `history` folder.

## Imports

All of my notebooks, even the scheduling ones, start with imports!

In [1]:
from datetime import datetime as dt
import papermill as pm
import os


## Setup

Imports are followed by setup. In this case we'll define the date and time of this execution and make sure that the history output directory exists so Papermill can write to it.

In [3]:
executionDate = dt.now().strftime("%Y%m%d")
executionTime = dt.now().strftime("%H%M%S")

executionDir = "./history/{}".format(executionDate)

try:
    os.makedirs(executionDir)
    print('Successfully create output directory {}/'.format(executionDir))
except:
    print('Could not create output directory {}/, maybe it already exists?'.format(executionDir))



Could not create output directory ./history/20231123/, maybe it already exists?


## Step 1: Run Transform Notebook

This project is a little bit different, normally the pipeline would start with an Extract Notebook, but in this case extraction is handled elsewhere. So we start with a transform notebook, which loads the data from Digital Ocean Spaces, then processes it, and saves the processed data back to DO Spaces.

In [6]:
results = pm.execute_notebook(
   './notebooks/01 - transform.ipynb',
   './history/{}/{}-transform-output.ipynb'.format(executionDate, executionTime),
    cwd='./notebooks/'
)

print('Notebook successfully executed in {} seconds.'.format(results['cells'][0]['metadata']['papermill']['duration']))


Executing:   0%|          | 0/20 [00:00<?, ?cell/s]

0.00s - make the debugger miss breakpoints. Please pass -Xfrozen_modules=off
0.00s - to python to disable frozen modules.
0.00s - Note: Debugging will proceed. Set PYDEVD_DISABLE_FILE_VALIDATION=1 to disable this validation.


Notebook successfully executed in 0.00602 seconds.


## Step 2: Run the Load Elastic Notebook

Now it's time to take any data processed in step 1, and load it into Elastic.

In [9]:
results = pm.execute_notebook(
   './notebooks/02 - load-elastic.ipynb',
   './history/{}/{}-load-elastic-output.ipynb'.format(executionDate, executionTime),
    cwd='./notebooks/'
)

print('Notebook successfully executed in {} seconds.'.format(results['cells'][0]['metadata']['papermill']['duration']))


Executing:   0%|          | 0/12 [00:00<?, ?cell/s]

0.00s - make the debugger miss breakpoints. Please pass -Xfrozen_modules=off
0.00s - to python to disable frozen modules.
0.00s - Note: Debugging will proceed. Set PYDEVD_DISABLE_FILE_VALIDATION=1 to disable this validation.


Notebook successfully executed in 0.003196 seconds.
