In [52]:
!python ./train-and-generate-artifacts/train_and_generate_artifacts.py

Done


## Testing Unstructured Models locally

In this example, we will test out a unstructured binary model that we plan to take into DataRobot.  This testing would also apply to structured models as well.  The consideration is the custom.py, model-metadata.yaml and run-server.sh.  



## My preferred way to test models is via the DRUM prediction server

This approach allows me to 

1. Test the exact payload i would send to the production deployment
2. log, log, and log some more -> all will be visible in the codespace terminal.


#### Start the prediction server 

1. Open the codespace terminal
2. run `./run-server.sh` in terminal.
3. Review the start up logs

__IMPORTANT__ anytime you change your custom.py or associated files, you will need to stop and start the prediction server.  Kill the prediction server in terminal with CTRL+C

the `./run-server.sh` script will set several environment variable to mock mock the reporting of prediction and service data back to datarobot (by utilizing the FS Spooler via DataRobot MLOps client)

The following variables that are set are 

```
export MLOPS_SPOOLER_TYPE="FILESYSTEM"
export MLOPS_FILESYSTEM_DIRECTORY="/tmp/ta"
export MLOPS_DEPLOYMENT_ID="dummy_id_1234"
export MLOPS_MODEL_ID="dummy_id_4321"
export DEPLOYMENT_ID="dummy_id_1234"
export MODEL_ID="dummy_id_4321"
```

You will NOT need to worry about these variables when you deploy to DataRobot

The script also starts up a prediciton server via 

```
drum server \
  --code-dir ./custom-model \                     ## location of custom model
  --target-type unstructured \                    ## target type.  always unstructured for any unstructured monitoring model (i.e., binary, regression, multiclass)
  --monitor-embedded \                            ## test monitoring 
  --webserver https://app.datarobot.com/api/v2 \  ## necessary, but not used
  --api-token $DATAROBOT_API_TOKEN \              ## necessary, but not used 
  --address 0.0.0.0:12345 \                       ## location of prediction endpoint
  --logging-level info \                          ## logging level 
  --verbose \                                     ## more logging option 
  --runtime-params-file runtime_params.yaml       ## runtime parameters to use
```


In [50]:
import requests
import pprint 
try: 
    pprint.pprint(requests.get("http://0.0.0.0:12345/info").json())
except:
    print("did you run `./run-server.sh` in cli?")


{'codeDir': '/home/notebooks/storage/custom-model',
 'drumServer': 'flask',
 'drumVersion': '1.16.3',
 'language': 'python',
 'modelMetadata': {'inferenceModel': {'negativeClassLabel': '1',
                                      'positiveClassLabel': '0',
                                      'targetName': 'target'},
                   'name': 'my unstructured model',
                   'runtimeParameterDefinitions': [{'credentialType': 'api_token',
                                                    'description': 'API Token',
                                                    'fieldName': 'API_TOKEN',
                                                    'type': 'credential'},
                                                   {'default': 'one',
                                                    'description': 'a dummy '
                                                                   'string '
                                                                   'runtime '
           

## Post data for prediction

My unstructured model assumed that the payload of the post is just a string which represents the data in a csv format.  

In [42]:
import pandas as pd 
df = pd.read_csv("./datasets/predict-request-surgical-dataset.csv")
df.head()

Unnamed: 0,id,bmi,Age,asa_status,baseline_cancer,baseline_charlson,baseline_cvd,baseline_dementia,baseline_diabetes,baseline_digestive,...,ccsMort30Rate,complication_rsi,dow,gender,hour,month,moonphase,mort30,mortality_rsi,race
0,1700,38.55,45.5,1.0,0.0,1.0,1.0,0.0,0.0,0.0,...,0.002101,0.05,0.0,0.0,8.05,7.0,2.0,0.0,-0.3,1.0
1,1701,37.37,52.5,0.0,0.0,0.0,1.0,0.0,0.0,0.0,...,0.002959,-2.79,1.0,1.0,12.78,2.0,0.0,0.0,-2.61,1.0
2,1702,43.63,47.1,1.0,0.0,0.0,1.0,0.0,0.0,0.0,...,0.001962,-0.43,4.0,0.0,11.95,7.0,0.0,0.0,0.01,1.0
3,1703,26.67,76.7,1.0,0.0,1.0,1.0,0.0,0.0,0.0,...,0.002959,-2.29,0.0,1.0,10.72,10.0,1.0,0.0,-1.96,1.0
4,1704,28.98,90.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,...,0.002959,-1.97,0.0,1.0,9.12,8.0,0.0,0.0,-2.25,1.0


In [44]:

response = requests.post("http://0.0.0.0:12345/predictUnstructured", data = df.to_csv(index = False))
response.json()[0:5]

[[0.8041984722160688, 0.19580152778393115],
 [0.8451068693578099, 0.15489313064218999],
 [0.7686471413542375, 0.2313528586457624],
 [0.8313214926898287, 0.16867850731017137],
 [0.928182478531788, 0.07181752146821185]]

Looks good!  If it doesn't, check logs in terminal

## Take it into DataRobot

In [8]:
import datarobot as dr 
client = dr.Client()

In [9]:
environment = dr.ExecutionEnvironment.list("scikit").pop()
prediction_environment = [ pe for pe in dr.PredictionEnvironment.list() if pe.platform == "datarobotServerless"]
prediction_environment = prediction_environment[2]

Next up, add run time parameters.  In my example, `model-metadata.yaml` shows there is an credential names `API_TOKEN`, and some string named `SOME_PARAM`.  

When setting the value for the `API_TOKEN`, you need to find the name or id of the credential you have stored in datarobot.  In my case, i have a credential named `DR_OPENAI_API_KEY`, and i want to pass the value of this token.  So I list out all credentials, and grab the one with the appropriate name.  Then i pass in the id of this credential as the value, and DataRobot will work out the rest

In [18]:
## see model-metadata.yaml for runtime parameter details
credentials = [c for c in dr.Credential.list() if c.name in  ["DR_OPENAI_API_KEY"]]
runtime_parameter_values = [
    dr.models.runtime_parameters.RuntimeParameterValue(field_name = "API_TOKEN", type = "credential", value = credentials[0].credential_id),
    dr.models.runtime_parameters.RuntimeParameterValue(field_name = "SOME_PARAM", type = "string", value = "one"),
    ]

In [17]:
response = client.post("customModels", 
                       data = {
                        "customModelType": "inference",
                        "isProxyModel": False,
                        "isUnstructuredModelKind": True,
                        "name": 'Testing Unstructured Binary Monitored Deployment',
                        "targetName": "target",
                        "targetType": "Binary",
                        "positiveClassLabel": "1", 
                        "negativeClassLabel": "0",
                        }
                    )
custom_model = dr.CustomInferenceModel.get(response.json()["id"])

In [19]:
custom_model_version = dr.CustomModelVersion.create_clean(custom_model.id, 
                                            base_environment_id = environment.id,
                                            runtime_parameter_values = runtime_parameter_values,
                                            folder_path = "./custom-model",   
                                            )

## if you have a requiremnts.txt in your custom model, you will need to run the following to build the environment
# build = build_custom_model_environment(custom_model, custom_model_version)
# while build[2].build_status == "submitted":
#     build[2].refresh()
# while build[2].build_status == "processing":
#     build[2].refresh()
# if build[2].build_status != "success":
#     print("build comleted, status:")
#     print(build)
# registered_model_version = register_custom_model(umbrella_custom_model, umbrella_custom_model_version)
# print("version registered")
     

In [20]:
registered_model_version = dr.RegisteredModelVersion.create_for_custom_model_version(
    custom_model_version_id =  custom_model_version.id, 
    name = custom_model.name, 
    registered_model_name=  custom_model.name,
    description = custom_model.name,
)

In [21]:


deployment = dr.Deployment.create_from_registered_model_version(
        registered_model_version.id,
        prediction_environment_id=prediction_environment.id,
        label = registered_model_version.name,
    )
deployment.update_association_id_settings(["ASSOCIATION_ID"], required_in_prediction_requests=False)
deployment.update_drift_tracking_settings(target_drift_enabled=True, feature_drift_enabled=True)


ERROR! Session/line number was not unique in database. History logging moved to new session 9


In [47]:
deployment_url = f'https://app.datarobot.com/api/v2/deployments/{deployment.id}/predictionsUnstructured'
headers = {
    'Content-Type': 'text/plain; charset=UTF-8',
        # 'Content-Type': 'application/json; charset=UTF-8',
    'Authorization': 'Bearer {}'.format(os.environ["DATAROBOT_API_TOKEN"]),
}
response = requests.post(deployment_url1, headers = headers, data = df.head(5).to_csv(index = False))
response.json()

[[0.8041984722160688, 0.19580152778393115],
 [0.8451068693578099, 0.15489313064218999],
 [0.7686471413542375, 0.2313528586457624],
 [0.8313214926898287, 0.16867850731017137],
 [0.928182478531788, 0.07181752146821185]]