### Overview of MLflow
* The basic function of MLFlow is to log the parameters (model parameters), metric and artifacts
* the syntax is to use `log_` prefix
```Python
    import mlflow

    mlflow.log_param("num_dimensions", 8)
    mlflow.log_param("regularization", 0.1)

    mlflow.log_metric("accuracy", 0.1)
    mlflow.log_metric("accuracy", 0.45)

    mlflow.log_artifact("roc.png")
    mlflow.log_artifact("model.pkl")

```

### MLflow projects
* provide a standard format for packaging reusable data science code. Each project is a directory wit code or a Git repository, and uses a descriptor file to specify its dependencies and how to run the code. A MLFlow project is defined by a simple YAML fiel called MLproject

```YAML

    name: My Project
    conda_env: conda.yaml
    entry_points:
      main:
        parameters:
          data_files: path
          regularization: {type: float, default: 0.1}
        command: "python train.py -r {regularization} {data_file}"
      validate:
        parameters:
          data_file: path
        command: "python validate.py {data_file}"
```    

### Set up MLflow
* pip install mlflow
* use `mlflow ui` to see the UI running on the local server with http://127.0.0.1:5000
* define a new python file test-mlflow.py
```python
from mlflow import log_metric, log_param, log_artifact

if __name__ == "__main__":
    log_param("threshold", 3)
    log_param("verbosity", "DEBUG")
    
    log_metric("timestamp", 1000000)
    log_metric("TTC", 33)
    
    log_artifact("produced-dataset.csv")
```    

* execute the python file and go to the MLflow UI, you will see
  + parameters
  + metrics
  + artifacts (the produced-dataset.csv file is shown in the artifacts)
  
* create a new experiment from the termnial
  + create a python file, produce-metrics.py
  ```python
    from mlflow import log_metric, log_param, log_artifact
    from random import choice
    
    metric_names = ["cpu", "ram", "disk"]
    percentages = [i for i in range(0, 100)]
    
    for i in range(40):
        log_metric(choice(metric_names), choice(percentages))   
  ```
  + `mlflow experiments create --experiment-name produce-metrics` to create a new experiment with the name produce-metrics
  + MLFLOW_EXPERIMENT_ID=2 python produce-metrics.py
    + all the values of the metrics are captured and saved in mlflow UI
  + access the mlruns by `tree mlruns` which is stored in the mlflow folder  
  + alfredodeza / mlflow-demo (github)

### MLFlow projects
* create a conda virtual environment by `conda create --name exploratory python=3.8`
* activate the envrionment `conda activate exploratory`
* generate the `conda_env.yml` file by exporting conda envirnment `conda env export --name exploratory > conda_env.yml`
* you can add pip to install other packages, including pandas and mlflow
```
   - pip
   - pandas==1.5.0
   - mlflow==1.29.0
```
* make sure to update the `conda_env.yml` file by
  + `conda env update --file conda_env.yml --prune`
  
* activate the exploratory conda env by `conda activate exploratory`
* run mlflow `mlflow run . -P filename=carriage.csv`
  + the current directory is the project folder
  + -P will pass the parameters of the project file
  
* run mlflow from github
  + git repo: mlflow / mlflow-example
  + copy the SSH git clone command and run `mlflow run git@github.com...(git clone command)`
  
* MLFlow package
  + in the folder of "MODEL", it has the following files
    + MLmodel, this is a yaml file
    + conda.yaml, a yaml file to define the conda environment
    + `python_env.yaml`, a yaml file
    + requirements.txt, a txt file
  + MLmodel file
    + ![image.png](attachment:image.png)
  + conda.yaml
    + ![image-2.png](attachment:image-2.png)
  + `python_env.yaml`
    + ![image-3.png](attachment:image-3.png)
  + register model and retreive
    + need to use `log_model` to register a new model
    + 
```python
    from mlflow import MlflowClient

    client = MlflowClient()
    client.create_registered_model("onnx-t5")

    # retrieve the model
    model_name = "onnx-t5"
    model_version = 1

    model = mlflow.pyfunc.load_model(
        model_uri = f"models:/{model_name}/{model_version}"
    )

```
  + set up MLFlow server and then run
    + `mlflow server --backend-store-uri sqlite://mlflow.db --default-artifact-root /tmp/ --host 127.0.0.1:5000`
    
  + to serve a model, run
    + `mlflow models serve -m runs:/run_id/model -p 5001`
    + then use curl to get prediction
    `curl -X POST -H "Content-Type: application/json; format=pandas-split" --data '{"columns":["text"], "data":[["Today is a perfect day to practice automation skills"]]}' http://127.0.0.1:5001/invocations`
    
    
    
  

In [None]:
# code example of mlflow
import mlflow

mlflow.set_tracking_uri("http://127.0.0.1:5000")
from mlflow import MlflowClient
client = MlflowClient()

# create a new registered model, that doesn't exist
# this is an empty model with no meaningful content
client.create_registered_model("t5-onnx")

# delete the model
client.delete_registered_model("t5-onnx")

# update a model if the model has registered by log_model() command
client.update_model_version(
    name = "t5-small-summarizer",
    version = 1,
    description = "THis the T5 model in an ONNX version 1.6 using Opset 12"
)

## Hugging Face

### what is hugging face
* A platform that integrates models, datasets, spaces and docs
* have the programmatic access to use the hub's python client library
  + 
```shell
    pip install huggingface_hub
    # you already have it if you installed transformers or datasets
    
    huggingface-cli login
    # log in using a token from huggingface.co/settings/tokens
    # create a model or dataset repo from the CLI if needed
    
    huggingface-cli repo create repo_name --type {model, dataset, space}
    
``` 

* model, dataset repo, and space repo allows you to host your app
* use access tokens to interact with github to push code changes
* log in by `huggingface-cli login` and input the token
* [mlops_git_template](https://github.com/nogibjj/mlops-template)

### using model hub
* Models in hugging face
  + you can select models based on different categories and languages
  + you can also deploy the model to sagemaker, or restAPI, or embed by git clone
* Datasets
  + can select datasets and preview the datasets by dataset viewer
  + the data is splitted to test, train and validation 
  + can access the datasets by python code 
  
### download models from hugging face
* the transformers pipeline will automatically download model specified
  + `summarizer = pipeline("summarization", model="t5-samll", tokenizer="t5-small", truncation=True, framework="tf")`
* to use the model for summarization:
  +
```python
  with open("mlflow.txt", "r") as _f:
        print(summarizer(_f.read()))
```              
* download model locally
```python
    from huggingface_hub import hf_hub_download
    hf_hub_download(repo_id="t5-small", filename="pytorch_model.bin")
```
  + repo id is the name of the model in hugging face
  + filename is the model file. You can search in the repo of the model and select any of the model files
  + the command will return the path of the model that is cached
  
* you can also use `git clone <model url>` to download models. The url is the web address of the specific model in hugging face Models

### Hugging face models
* there are different ways to use hugging face models during the interactive process, as shown below:
![image.png](attachment:image.png)
  + we can use the hosted model in spaces app. GPU is not obtained by default
  + github codespace has GPU available

* download model by pining a revision
* upload model to the web
  + in hugging face / Models, and click file, and then upload the file
* fine tune and retrain the model by on GPU and push back to model repo

### Hugging face datasets
* update / upload datasets
  + click your profile, and in the pop up menu, click "New Dataset"
  + create the new dataset repo as a git repo
  + you can upload your dataset by clicking "Files" / "Add file"
  + drag and drop file to update
  + then you can load the dataset by Python                          
  `dataset = datasets.load_dataset("alfredodeza/temparary-dataset")`
    + `load_dataset` accepts the url of the dataset you want to load
*  How to split datasets
  + create Python class for splitting and generate subsets of dataset from hugging face template
  + [example page from hugging face](https://huggingface.co/datasets/alfredodeza/wine-ratings/tree/main)
  + download the dataset by `load_dataset(url)`
  
* use datasets
  + in Datasets tab, filter the datasets to find the datasets
  + 
  

## Hugging Face and FastAPI

### packaging hugging face using github container registration

* [git repo](https://github.com/alfredodeza/huggingface-ghcr)

### containerize hugging face app by github
* build the Dockerfile with all the code in main.py to work, requirements.txt and command line to run docker
* test container works locally
* push the docker image to a docker registry
* make CI / CD automatically using github actions
  + in actions, select build docker container workflow
  + copy and paste the template code to push docker images to github container registry (ghcr)
  + the docker image is pushed and you can find the image by clicking "package" button
  
### Hugging face fine-tuning
* find the document notebook in 'Course' tab of hugging face 
  + check the "Fine-tuning a pretrained model" link and click "Fine-tuning a model with the trainer API or Keras" and "open in Colab" tab on the top

### ONNX
* setup environment.yml file as the following:
```yaml
    name: huggingface-onnx
    dependencies:
      - python=3.8
      - pytorch::pytorch
      - pip
      - transformers[onnx]
      - onnxruntime
      - pip
        - ipywidgets
        - ipykernel
```  
* make sure everything has been installed by
  + `!python -m transformers.onnx --help` in notebook
* [onnx model zoo (onnx / models)](https://github.com/onnx/models)  

### Hugging face Spaces
* we can explore an existing space, such as stable-diffusion to what the app is doing
* we can then explore the files in the sapce to see the code

* create new space and select the framework to work with
* you can clone the codebase to use it in other environments
* you can add new files, such as main.py to the repo and once you push the code, the code will be built for deployment
  + very useful for prototyping 
  
* look at the docs in hugging face  

### CD of hugging face space
* [github link](https://github.com/nogibjj/hugging-face)
* 