In [5]:
from datetime import datetime as dt
import papermill as pm
import os


# Entrypoint

We're going to use this Notebook to create a pipeline of sorts of other notebooks. To accomplish this we're going to use Papermill. Papermill takes an input notebook, executes it, and saves the executed copy to an output location. We're going to save our notebooks to a `history` folder so we have a run history of what has been executed previously.

## Create Output Directory

Papermill wont create directories for you, so the first thing we need to do is create an output directory to save files into:

In [6]:
executionDate = dt.now().strftime("%Y%m%d")
executionTime = dt.now().strftime("%H%M%S")

executionDir = "./history/{}".format(executionDate)

try:
    os.makedirs(executionDir)
    print('Successfully create output directory {}/'.format(executionDir))
except:
    print('Could not create output directory {}/, maybe it already exists?'.format(executionDir))

Could not create output directory ./history/20220820/, maybe it already exists?


### Step 1: Run the Extract notebook

This step runs the `./notebooks/extract.ipynb` notebook, it also saves a copy of the notebook to the `./history` folder so we have some execution history.

In [7]:
results = pm.execute_notebook(
   './notebooks/extract.ipynb',
   './history/{}/{}-extract-output.ipynb'.format(executionDate, executionTime),
    cwd='./notebooks/'
)

print('Notebook successfully executed in {} seconds.'.format(results['cells'][0]['metadata']['papermill']['duration']))

Executing:   0%|          | 0/6 [00:00<?, ?cell/s]

Notebook successfully executed in 0.053995 seconds.


### Step 2: Run the Transform notebook

This step runs the `./notebooks/transform.ipynb` notebook, it also saves a copy of the notebook to the `./history` folder so we have some execution history.

In [8]:
results = pm.execute_notebook(
   './notebooks/transform.ipynb',
   './history/{}/{}-transform-output.ipynb'.format(executionDate, executionTime),
    cwd='./notebooks/'
)

print('Notebook successfully executed in {} seconds.'.format(results['cells'][0]['metadata']['papermill']['duration']))

Executing:   0%|          | 0/13 [00:00<?, ?cell/s]

Notebook successfully executed in 0.210468 seconds.
