# PythonR-Python Deployment


Note: This notebook runs on Python 3.8 and uses UbiOps Client Library 3.15.0

In this notebook we will show you the following:

How to create a pipeline that prepares the data in R and makes a prediction of house prices using an XGboost model.

If you run the R script and this notebook according to the order described below, the DEA-deployment and XGB-deployment will be deployed to your UbiOps environment. You can thus check your environment after running to explore. You can also check the individual steps in these notebooks to see what we did exactly and how you can adapt it to your own use case.

<b>IMPORTANT</b> due to the fact that this notebook creates a pipeline that contains both R and Python deployments, it is required to run the notebooks in the following order:
1) Run this notebook up untill and including step 5

2) Run the R script

3) Run step 6

 #  1. Establishing a connection with your UbiOps environmentÂ¶

Add your API token and your project name. We provide a deployment name and deployment version name. Afterwards we initialize the client library. This way we can deploy the XGBoost model to your environment.


In [None]:
API_TOKEN = "<INSERT API_TOKEN WITH PROJECT EDITOR RIGHTS>" # Make sure this is in the format "Token token-code"
PROJECT_NAME = "<INSERT YOUR PROJECT NAME HERE>"


DEPLOYMENT_NAME = "python-pred-deployment"
DEPLOYMENT_VERSION = "v1"

# Import all necessary libraries
import shutil
import os
import ubiops

client = ubiops.ApiClient(ubiops.Configuration(api_key={"Authorization": API_TOKEN},
                                               host="https://api.ubiops.com/v2.1"))
api = ubiops.CoreApi(client)

# 2. Creating the model

This example will contain a simple linear regression with XGboost, where the data exploration has been done in R.

Since this document will be focused on the deploying side of the ML process. We will not cover the development of the model in-depth and make use of the pre-trained model below.


In [None]:
!pip install scikit-learn==0.24.2
!pip install xgboost==1.3.1
!pip install numpy==1.19.5
!pip install pandas==1.1.5
!pip install joblib==1.0.0
!pip install scipy==1.5.4

In [None]:
import numpy as np
import pandas as pd
import xgboost
from scipy.stats import pearsonr
from sklearn.linear_model import LinearRegression
from sklearn import tree, linear_model
from sklearn.model_selection import train_test_split
from sklearn.metrics import explained_variance_score
import joblib

# Read the data into a dataframe 
data= pd.read_csv("kc_house_data.csv")

# Train a simple linear regression model
regr = linear_model.LinearRegression()
new_data = data[["sqft_living","grade", "sqft_above", "sqft_living15","bathrooms","view","sqft_basement","lat","waterfront","yr_built","bedrooms"]]

X = new_data.values
y = data.price.values

# Create train test sets
X_train, X_test, y_train, y_test = train_test_split(X, y ,test_size=0.2)

# Train the model
regr.fit(X_train, y_train)

# Let"s make a XGboost prediction model
xgb = xgboost.XGBRegressor(n_estimators=100, learning_rate=0.08, gamma=0, subsample=0.75,
                           colsample_bytree=1, max_depth=7)

traindf, testdf = train_test_split(X_train, test_size = 0.2)

# Train the model
xgb.fit(X_train,y_train)

# Make predictions using the xgboost model
predictions = xgb.predict(X_test)


# Check how the xgboost model scores on accuracy on our test set
xgboost_score = explained_variance_score(predictions,y_test)

print(f"Score of the xgboost model {xgboost_score}")

# Save model
joblib.dump(xgb, "xgb-deployment/xgb-deployment.joblib")
print("XGBoost model built and saved successfully!")


# 3. Creating the XGboost deployment
Now that we have our model saved it is time to create a deployment in UbiOps so we can use it in a pipeline later.

In the cell below you can view the deployment.py which will take data about the house we wish to predict the price of.

As you can see in the initialization step we load the model we created earlier, then in the request method we make use of it to make a prediction. Input to this model is:


The file containing the deployment code is required to be called "deployment.py" and should contain the "Deployment"
class and "request" method.

In [None]:
"""
The file containing the deployment code is required to be called 'deployment.py' and should contain the 'Deployment'
class and 'request' method.
"""

import pandas as pd
import numpy as np
import os
from joblib import load

class Deployment:

    def __init__(self, base_directory, context):
        """
        Initialisation method for the deployment. It can for example be used for loading modules that have to be kept in
        memory or setting up connections. Load your external model files (such as pickles or .h5 files) here.

        :param str base_directory: absolute path to the directory where the deployment.py file is located
        :param dict context: a dictionary containing details of the deployment that might be useful in your code.
            It contains the following keys:
                - deployment (str): name of the deployment
                - version (str): name of the version
                - input_type (str): deployment input type, either 'structured' or 'plain'
                - output_type (str): deployment output type, either 'structured' or 'plain'
                - environment (str): the environment in which the deployment is running
                - environment_variables (str): the custom environment variables configured for the deployment.
                    You can also access those as normal environment variables via os.environ
        """

        print("Initialising xgboost model")

        XGBOOST_MODEL = os.path.join(base_directory, "xgb-deployment.joblib")
        self.model = load(XGBOOST_MODEL)

    def request(self, data):
        """
        Method for deployment requests, called separately for each individual request.

        :param dict/str data: request input data. In case of deployments with structured data, a Python dictionary
            with as keys the input fields as defined upon deployment creation via the platform. In case of a deployment
            with plain input, it is a string.
        :return dict/str: request output. In case of deployments with structured output data, a Python dictionary
            with as keys the output fields as defined upon deployment creation via the platform. In case of a deployment
            with plain output, it is a string. In this example, a dictionary with the key: output.
        """
        print('Loading data')
        input_data = pd.read_csv(data['clean_data'])
        
        print("Prediction being made")
        prediction = self.model.predict(input_data.values)
        
        # Writing the prediction to a csv for further use
        print('Writing prediction to csv')
        pd.DataFrame(prediction).to_csv('prediction.csv', header = ['house_prices'], index_label= 'index')
        
        return {
            "prediction": 'prediction.csv'
        }




# 4. Deploying to UbiOps

In [None]:
# Create the deployment
deployment_template = ubiops.DeploymentCreate(
    name=DEPLOYMENT_NAME,
    description="python-pred-deployment",
    input_type="structured",
    output_type="structured",
    input_fields=[
        ubiops.DeploymentInputFieldCreate(
            name="clean_data",
            data_type="file",
        ),
    ],
    output_fields=[
        ubiops.DeploymentOutputFieldCreate(
            name="prediction",
            data_type="file"
        ),
    ],
    labels={"demo": "python-pred-deployment"}
)

api.deployments_create(
    project_name=PROJECT_NAME,
    data=deployment_template
)

# Create the version
version_template = ubiops.DeploymentVersionCreate(
    version=DEPLOYMENT_VERSION,
    environment="python3-8",
    instance_type='512mb',
    minimum_instances=0,
    maximum_instances=1,
    maximum_idle_time=1800 # = 30 minutes
)

api.deployment_versions_create(
    project_name=PROJECT_NAME,
    deployment_name=DEPLOYMENT_NAME,
    data=version_template
)

# Zip the deployment package
shutil.make_archive("xgb-deployment", "zip", ".", "xgb-deployment")

# Upload the zipped deployment package
file_upload_result =api.revisions_file_upload(
    project_name=PROJECT_NAME,
    deployment_name=DEPLOYMENT_NAME,
    version=DEPLOYMENT_VERSION,
    file="xgb-deployment.zip"
)

print("Done")

Please open up your R IDE and run the script. When the R deployment is done and available, continue to step 5.

# 5. Creating the prediction pipeline


In [None]:
PIPELINE_NAME = "pythonr-pipeline"

pipeline_template = ubiops.PipelineCreate(
    name=PIPELINE_NAME,
    description="",
    input_type="structured",
    input_fields=[
        ubiops.DeploymentInputFieldCreate(
            name="input",
            data_type="file",
        )
    ],
    output_type="structured",
    output_fields=[
        ubiops.DeploymentOutputFieldCreate(
            name="prediction",
            data_type="file"
        )
    ],
    labels={"demo": "PythonR-pipeline"}
)

api.pipelines_create(
    project_name=PROJECT_NAME,
    data=pipeline_template
)

PIPELINE_VERSION = DEPLOYMENT_VERSION

print("Done")

In [None]:
ubiops.utils.wait_for_deployment_version(
    client=api.api_client,
    project_name=PROJECT_NAME,
    deployment_name=DEPLOYMENT_NAME,
    version=DEPLOYMENT_VERSION,
    revision_id=file_upload_result.revision
)

Now that we have created a pipeline we can create a version with the components in there.

In [None]:
pipeline_template = ubiops.PipelineVersionCreate(
    version=PIPELINE_VERSION,
    request_retention_mode='full',
    objects=[
        # python-pred-deployment
        {
            'name': DEPLOYMENT_NAME,
            'reference_name': DEPLOYMENT_NAME,
            'version': DEPLOYMENT_VERSION
        },
        # r-eda-deployment
        {
            'name': "r-eda-deployment",
            'reference_name': "r-eda-deployment",
            'version': DEPLOYMENT_VERSION
        }
    ],
    attachments=[
        # Start -> r-eda-deployment
        {
            'destination_name': "r-eda-deployment",
            'sources': [{
                'source_name': 'pipeline_start',
                'mapping': [{
                    "source_field_name": "input",
                    "destination_field_name": "raw_data"
                }]
            }]
        },
        # r-eda-deployment -> python-pred-deployment
        {
            'destination_name': DEPLOYMENT_NAME,
            'sources': [{
                'source_name': "r-eda-deployment",
                'mapping': [{
                    "source_field_name": "clean_data",
                    "destination_field_name": "clean_data"
                }]
            }]
        },
        # python-pred-deployment -> pipeline end
        {
            'destination_name': 'pipeline_end',
            'sources': [{
                'source_name': DEPLOYMENT_NAME,
                'mapping': [{
                    "source_field_name": "prediction",
                    "destination_field_name": "prediction"
                }]
            }]
        }
    ]
)

api.pipeline_versions_create(project_name=PROJECT_NAME, pipeline_name=PIPELINE_NAME, data=pipeline_template)

## All done! Let's close the client properly. 

In [None]:
api_client.close()

## Making a request and exploring further
Now you can go to the Web App and take a look in the user interface at what you have just built, it should look something like this:

![pythonr_overview](pythonr_overview.png)


 If you want you can create a request to the XGboost deployment using the "dummy_data.csv". The dummy data is a small test subset from the original data.

So there we have it! You have created a pipeline where one the data exploration deployment is written in R and the prediction model is written in Python. These two notebooks can be used as an example of how to make a pipeline that combines R and Python. Just change the code in the deployment packages and alter the input and output fields to your wish and you should be good to go.

For any questions, feel free to reach out to us via the customer service portal:
https://ubiops.atlassian.net/servicedesk/customer/portals

Or connect with using our Slack channel:
https://ubiops-community.slack.com/join/shared_invite/zt-np02blts-5xyFK0azBOuhJzdRSYwM_w#/shared-invite/email