# Configuration System- Model Mangement

Model management is a part of MLOps. ML models should be consistent, and meet all business requirements at scale. To make this happen, a logical, easy-to-follow policy for model management is essential. Model management is responsible for development, training, validation and deployment of ML/DNN models.

* model training: Here we take care of model training, model compression, model validation, model deployment and model retraining (happens when the deployed model’s performance drops below a set threshold).
* **model deployment** perfomance/state monitoring,  deployment strategies (A/B testing and etc), model rollback
* **Resource scheduling** Support allocation and task management based on real-time resource status of heterogeneous platforms during training or inference
* **Experiments**: Here we take of logging training metrics, loss, images, text or any other metadata you might have as well as code, data & pipeline versioning, 


# Model Management workflow for deployment:



**Model Registry**:model registry, version management，elastic deployment and multi-model hybrid deployment.  

**Data Versioning**: version control systems help developers manage changes to source code. While data version control is a set of tools and processes that tries to adapt the version control process to the data world to manage the changes of models in relationship to datasets and vice-versa.  

**Traffic Management**: routing management，load balancing management  

**Model Monitoring** The collected metrics are written and reported to stream platform, monitored b Prom and indicator query,It is used to track the models inference performance and identify any signs of Serving Skew which is when data changes cause the deployed model performance to degrade below the score/accuracy it displayed in the training environment. 

**Experiment Tracker**: It is used for collecting, organizing, and tracking model training/validation information/performance across multiple runs with different configurations (lr, epochs, optimizers, loss, batch size and so on) and datasets (train/val splits and transforms).

# Workflow for deploying a model

* Register the model.
* Define deployment configuration
* Define inference configuration.
* Deploy your machine learning model.
* Update model
* Test the resulting web service.



## Register the model

When you register a model, make sure training side uploads the model to the distributed file system (or cloud) 

The following examples demonstrate how to register a model.

In [None]:
from hps.cm import cm

cm.initialize(server="localhost",port=9008)

# Set model download path
model_path = cm.set_inference_("inference","hdfs://model_repo/modelname", "modelfile")

# Register model
model = cm.register(workspace, model_name="wdl", model_path=model_path)

## Define a deployment configuration
A deployment configuration specifies the amount of hardware resource and deploy method(hierarchical or local) your model service needs in order to run. For example, a deployment configuration lets you specify that your service needs 2 devices, 2 CPU cores, 1 GPU core, and that you want to enable autoscaling.

The options available for a deployment configuration differ depending on the compute target you choose. In a local deployment, all you can specify is which port your webservice will be served on.

```json
        {
            "model":"wdl",
            "sparse_files":["/model/wdl/1/0_sparse_2000.model","/model/wdl/1/1_sparse_2000.model"],
            "dense_file":"/model/wdl/1/_dense_2000.model",
            "network_file":"/model/wdl/1/wdl_infer.json",
            "num_of_worker_buffer_in_pool": "4",
            "deployed_device_list":["1"],
            "max_batch_size":"1024",
            "default_value_for_each_table":["0.0","0.0"],
            "hit_rate_threshold":"0.9",
            "gpucacheper":"0.5",
            "gpucache":"true"
        }

```

In [None]:
from hps.cm import cm

# Set model deployment_config path
deployment_config = model.deploy_configuration("inference",model_name, config_file="./ps.json")

## Define an inference configuration

An inference configuration describes the model network to use when initializing model service. Configure resource and the model service-related parameters per model.

```pbtxt
name: "wdl"
backend: "hugectr"
max_batch_size:64,
input [
  {
    name: "DES"
    data_type: TYPE_FP32
    dims: [ -1 ]
  },
  {
    name: "CATCOLUMN"
    data_type: TYPE_INT64
    dims: [ -1 ]
  },
  {
    name: "ROWINDEX"
    data_type: TYPE_INT32
    dims: [ -1 ]
  }
]
```

```json
"layers": [
    {
      "name": "data",
      "type": "Data",
      "source": "./file_list.txt",
      "eval_source": "./file_list_test.txt",
      "check": "Sum",
      "label": {
        "top": "label",
        "label_dim": 1
      },
      "dense": {
        "top": "dense",
        "dense_dim": 13
      },
      "sparse": [
        {
          "top": "data1",
          "slot_num": 26,
          "is_fixed_length": false,
          "nnz_per_slot": 1
        }
      ]
    },
    {
      "name": "sparse_embedding1",
      "type": "LocalizedSlotSparseEmbeddingHash",
      "bottom": "data1",
      "top": "sparse_embedding1",
      "sparse_embedding_hparam": {
        "embedding_vec_size": 128,
        "combiner": "sum",
        "workspace_size_per_gpu_in_mb": 2547
      }
    },
    {
      "name": "fc1",
      "type": "InnerProduct",
      "bottom": "dense",
      "top": "fc1",
      "fc_param": {
        "num_output": 512
      }
    },
    {
      "name": "relu1",
      "type": "ReLU",
      "bottom": "fc1",
      "top": "relu1"
    },

```


In [None]:

from hps.cm import cm

inference_config = model.InferenceConfig(
    modelname=wdl,
    model_network="./wdl_network.json",
    running_config = "./config.pbtxtx"
)

## Deploy model
Lanch Triton Server and load the model

In [None]:
import tritonhttpclient

try:
    triton_client = tritonhttpclient.InferenceServerClient(url="localhost:8000", verbose=True)
    print("client created.")
except Exception as e:
    print("channel creation failed: " + str(e))
    
triton_client.load_model(model_name="wdl")

## Update Model

CM is capable of supporting the Model Repository Extension, such as Triton's Model Repository Extension allows you to query and control model repositories that are being served by Triton.

* **Depoly new models online**: Follow the step as described above, CM will load not only the network dense weight as part of the HugeCTR model, but inserting the embedding table of new models to Hierarchical Inference Parameter Server and creating the embedding cache based on model definition in Independent Parameter Server Configuration, which means the Parameter server will independently provide an initialization mechanism for the new embedding table and embedding cache of new models.


* **Update the deployed model online**: Just reset the inference and deployment configuration, CM will load the network dense weight as part of the HugeCTR model and updating the embedding tables of the latest model file to Inference Hierarchical Parameter Server and refreshing the embedding cache, which means the Parameter server will independently provide an updated mechanism for existing embedding tables.


In [None]:
deployment_config = model.deploy_configuration("inference",model_name, config_file="./ps.json")

inference_config = model.InferenceConfig(
    modelname=wdl,
    model_network="./wdl_network.json",
    running_config = "./config.pbtxtx"
)

## Recycle Model

* **Recycle old models**: Delete the inference and deployment configuration, CM will recycle the HugeCTR model network's weights  from Triton and release the corresponding embedded cache from devices, which means the embedding tables corresponding to the model will still remain in the Inference Hierarchical Parameter Server Database.

In [None]:
service.delete()
model.delete()

# Swagger UI

Users can configure specific configuration items through a UI to trigger changes to the model service in inference cluster. At the same time, these APIs ensure the service security of through Ouath authentication.

### Post API for set inference configuration 

![POST API](./post_inference.jpg)

### Configuration System UI

![CM](./CM.jpg)