# Train, Test, and Redeploy Our LLM

This notebook demonstrates how to train a Hugging Face LLM, fine-tunning it to a specific data and build an automated ML pipeline for data collection and preparation, training, evaluation, testing and deployment of an application pipeline with the new LLM.

The [**previous notebook**](./01-serving.ipynb) used the standard `GPT2-Medium` from Hugging Face models which generated ppoorly when asking itt about MLOps, butt it is all about tto change.

The tutorial has two main steps:
1. [Define MLRun project and set all mlrun function](#project-setup)
2. [Run full LLM life-cycle workflow](#full-workflow)
3. [Try the new model with Gradio](#use-gradio)

___
<a id="project-setup"></a>
## 1. Define MLRun project and set all mlrun function

Create or load an MLRun project that holds all your functions and configuration ([**project_setup.py**](./src/project_setup.py))

The project contains the following files:
* [data_collection.py](./src/data_collection.py) - Collect all text from given list of urls.
* [data_preprocess.py](./src/data_preprocess.py) - Preprocess the data and save it as a `pd.DataFrame` dataset arttifact.
* [training.py]() - Train and evaluate using HuggingFace `Trainer` API and DeepSpeed empowered by **MLRun's auto-logging** (`apply_mlrun` function).
* [serving.py](./src/serving.py) - Multiple model servers and serving steps to build the **Serving Graph** from notebook 01.
* [testing.py](./src/testing.py) - Stress testing the serving graph.

And a training pipeline (in [**training_workflow.py**](./src/training_workflow.py))

The training and evaluation function we will use is [hugging_face_classifier_trainer](https://www.mlrun.org/hub/). It is taken from [**MLRun's Functions Hub**](https://docs.mlrun.org/en/stable/runtimes/load-from-hub.html) - a collection of ready to be imported functions for variety of use cases. We import the function during the project setup.

In [1]:
import mlrun
from src.project_setup import create_and_set_project

project = create_and_set_project(
    git_source="git://github.com/yonishelach/learn-docs.git#main",
    name="mlopspedia",
    default_image="yonishelach/mlrun-hf-gpu",
    user_project=True,
)

> 2023-03-06 09:49:51,927 [info] loaded project huggingface-demo from MLRun DB


Names with underscore '_' are about to be deprecated, use dashes '-' instead. Replacing underscores with dashes.
Names with underscore '_' are about to be deprecated, use dashes '-' instead. Replacing underscores with dashes.


___
<a id="full-workflow"></a>
## 2. Run full LLM life-cycle workflow

Run the training pipeline (in [training_workflow.py](./src/training_workflow.py)) by using `project.run(workflow name, ...)`:

    collect_html_to_text_files -> prepare_dataset -> train -> (serving graph) -> model_server_tester
                                                           -> evaluate
                                                          
* `collect_html_to_text_files` (Data Collection) - Collect all text from given html urls into `.txt` files.
* `prepare_dataset` (Preprocess Data) - Join the `.txt` files, reformatting the text into our "Subject - Content" prompt template. We made every header (`<h>` tags) a *subject* of a prompt, and the text (`<p>` tags) under it as its *content*.
* `train` - Fine-tune the LLM on the data. We'll run the training on **OpenMPI** and we will use **DeepSpeed** for distributing the model and data between multiple workers, splitting the work between nodes and GPUs.
* `evaluate` - Evaluate our model using the *Perplexity* metric.
* Deployment - Deploy the same serving graph we saw on notebook 01: `-> preprocess -> llm -> postprocess -> toxicity classifier ->`
* `model_server_tester` (Stress Test) - Send data to our serving endpoint and get a performance report.

In [None]:
workflow_run = project.run(
    name="training_workflow",
    arguments={
        "dataset_name": "Shayanvsf/US_Airline_Sentiment",
        "pretrained_tokenizer": "distilbert-base-uncased",
        "pretrained_model": "distilbert-base-uncased",
        "TRAIN_output_dir": "finetuning-sentiment-model-3000-samples",
        "TRAIN_num_train_epochs": 1,
        "TRAIN_evaluation_strategy": "epoch",
        "CLASS_num_labels": 2
    },
    watch=True,
    dirty=True
)

Here we can see how the workflow looks on our UI & Also the result of the trainer and the server_tester

<img src="./images/workflow.png" alt="workflow" width="1200"/>

<img src="./images/latancy.png" alt="latancy" width="1200"/>

___
<a id="use-gradio"></a>
## Try the new model with Gradio

Once the pipeline completes, you can try the model using the function `invoke()` method or Gradio. You can get the new function object using the project `get_function()` method. 

In [None]:
serving_function = project.get_function("serving-trained-onnx")

In [None]:
body = "i love flying"
response = serving_function.invoke(path='/predict', body=body)
print(response)

In [None]:
import gradio as gr
import requests

serving_function._get_state()
serving_url = serving_function._resolve_invocation_url("", False)

def sentiment(text):
    # call the serving function with the input text
    resp = requests.post(serving_url, json={"text": text})
    return resp.json()

with gr.Blocks() as demo:
    input_box = [gr.Textbox(label="Text to analyze", placeholder="Please insert text", value="It was Terrible fight")]
    output = [gr.Textbox(label="Sentiment result"), gr.Textbox(label="Sentiment score")]
    greet_btn = gr.Button("Submit")
    greet_btn.click(fn=sentiment, inputs=input_box, outputs=output)

In [None]:
demo.launch(share=True)

**Done !**