# Workflow Orchestration

Workflow orchestration frameworks are primarily used to monitor and observe the movement of data in production applications. Such frameworks typically include a family of independent features that collectively make modern data pipelines fault-tolerant and robust. These features include:

* scheduling and triggering jobs
* retrying failed work
* dependency and state management
* caching expensive tasks
* resource management
* observability

These allow us to gracefully handle failure events, including scenarios beyond our control like cloud outages or API failures. Without explicitly tracking states in data pipelines, they become prone to triggering premature jobs, re-running already completed work, or even failing haphazardly. 

The features workflow orchestration provides are not limited to supporting the scheduled movement of data from a source to a destination. These features are also heavily applied in other domains such as machine learning and parameterized report generation. Presently, workflow orchestration is getting simple enough for hobbyists to adopt for personal projects. 


#### Negative Engineering
Negative Engineering happens when engineers write defensive code to make sure the positive code acutally runs. Writing code that anticipates the infinite number of possible failures.

#### Why this matters to you
- contiually patching of legacy pipelines 

#### Consequences of pipeline failures
* time spent finding where in the pipeline the failure occurred
* premature job triggers
* data staleness 
* expensive compute rerunning tasks 
* duplicating work


#### Common workflow patterns

- ETL 
- ELT
- ML
- Dashboarding
- DevOps


#### Exercise: Native Python Work Example

I have a pair of shoes I really want to buy but I have a tight budget. I want to find out when the shoe price drops so that I can buy them. For this example, I will create a python script that will find the price of the shoes online and then compare to my budget and print out whether or not I should buy the shoes. 

In [5]:
!pip install beautifulsoup4



In [4]:
import requests
import re
from bs4 import BeautifulSoup
import time

def find_nike_price(url):
    k = requests.get(url).text
    soup = BeautifulSoup(k,'html.parser')
    price_string = soup.find('div', {"class":"product-price"}).text
    price_string = price_string.replace(' ','')
    price = int(re.search('[0-9]+',price_string).group(0))
    return price

def compare_price(price, budget):
    if price <= budget:
       print(f"Buy the shoes! Good deal!")
    else:
        print(f"Don't buy the shoes. They're too expensive")

def nike_flow(url, budget):
    price = find_nike_price(url)
    compare_price(price, budget)

if __name__ == "__main__":
    
    url = "https://www.nike.com/t/air-max-270-womens-shoes-Pgb94t/AH6789-601"
    budget = 120

    # time.sleep with infinite loop to put this on a schedule

    # while True:
    #     time.sleep(300)
    #     nike_flow(url, budget)

    nike_flow(url, budget)

Don't buy the shoes. They're too expensive


#### Discussion: What Can You Use Workflow Orchestration For?

For Funsies:
- March Madness brackets
- Notification on shoe prices 
- Turning off your lights (us not being lazy)
- Notifications on crypto 

#### Q&A