# Setup

1. Run `make reinstall_package`
2. Run `cp .env.sample .env` and the fill the .env aside from the mlflow variables (leave them for the livecode)
3. Run `make reset_all_files`
4. Run `make run_preprocess` to create the 1k bq table

# MLflow livecode

🕵️‍♀️ Start by looking and explaining the decorator in `registry.py` then show where it is now used in `main.py`


💻 

- Start by filling the necessary envs in `.env` then don't forget to direnv reload 
- Then `make run_train` and checkout the results on the mlflow ui
- Update `save_model` 

```python
if MODEL_TARGET == "mlflow":
    mlflow.tensorflow.log_model(model=model,
                            artifact_path="model",
                            registered_model_name=MLFLOW_MODEL_NAME
                            )
```
The `make run_train` show the model in mlflow and shift it to production

- Update `load_model`
```python
if MODEL_TARGET == "mlflow":
    mlflow.set_tracking_uri(MLFLOW_TRACKING_URI)
    client = MlflowClient()
    model_versions = client.get_latest_versions(name=MLFLOW_MODEL_NAME, stages=[stage])
    model_uri = model_versions[0].source
    model = mlflow.tensorflow.load_model(model_uri=model_uri)
```
- Run a prediction with the loaded model



# Workflow deconstruction

- Create a new file in interface called `workflow.py`

- populate it can either live code or copy but this section should fast to get to prefect!

```python
from taxifare.interface.main import evaluate, preprocess, train

def preprocess_new_data(min_date: str, max_date: str):
    preprocess(min_date=min_date, max_date=max_date)

def evaluate_production_model(min_date: str, max_date: str):
    eval_mae = evaluate(min_date=min_date, max_date=max_date)
    return eval_mae

def re_train(min_date: str, max_date: str):
    train_mae = train(min_date=min_date, max_date=max_date, split_ratio=0.2)
    return train_mae

def train_flow():
    min_date = "2015-01-01"
    max_date = "2015-02-01"
    preprocess_new_data(min_date, max_date)
    old_mae = evaluate_production_model(min_date, max_date)
    new_mae = re_train(min_date, max_date)

if __name__ == "__main__":
    train_flow()
```

- then run the script!

- Checkout the new model 

# Prefect livecode

- Show how to connect to prefect cloud
`prefect cloud login`

- Import `task` and `flow` from prefect then decorate the tasks and flow as appropriate

```python
from taxifare.interface.main import evaluate, preprocess, train
from prefect import task, flow

@task
def preprocess_new_data(min_date: str, max_date: str):
    preprocess(min_date=min_date, max_date=max_date)

@task
def evaluate_production_model(min_date: str, max_date: str):
    eval_mae = evaluate(min_date=min_date, max_date=max_date)
    return eval_mae

@task
def re_train(min_date: str, max_date: str):
    train_mae = train(min_date=min_date, max_date=max_date, split_ratio=0.2)
    return train_mae

@flow
def train_flow():
    min_date = "2015-01-01"
    max_date = "2015-02-01"
    preprocess_new_data(min_date, max_date)
    evaluate_production_model(min_date, max_date)
    re_train(min_date, max_date)

if __name__ == "__main__":
    train_flow()
```

- Then rerun the script and see it happen live in the interface

- We can now optimize by running submitting the tasks instead

```python
@flow
def train_flow():
    min_date = "2015-01-01"
    max_date = "2015-02-01"
    preprocess_new_data.submit(min_date, max_date)
    evaluate_production_model.submit(min_date, max_date)
    re_train.submit(min_date, max_date)
```

- Show the problem that we now have of preprocess happening at the same time as retraining by rerunning and observing via the prefect ui 

- Finally wait for the preprocessing!


```python
@flow
def train_flow():
    min_date = "2015-01-01"
    max_date = "2015-02-01"
    preprocessed = preprocess_new_data.submit(min_date, max_date)
    evaluate_production_model.submit(min_date, max_date, wait_for=[preprocessed])
    re_train.submit(min_date, max_date, wait_for=[preprocessed])
```