A variant of this project is the house-price-prediction which a machine learning pipeline is built on prem. and is deployed via a rest api to serve the predictions of the model. However, this project is an alternative which leverages AWS services in building the entire pipeline which involves the following:
- Building a custom model for use in sagemaker.
- Building an inference pipeline for preprocessing and feature engineering in sagemaker.
- Training and deploying the model.
- Invoking model predictions with aws lambda.
- Integrating lambda with aws api gateway to serve model predictions to clients via RestApi
Sagemaker plays host to a lot of machine learning model that could be used out of the box but also offers the flexibility of using a custom model. The custom built for this project is the same xgboost model with that of the house-price-prediction to use this model in sagemaker the model must comply with the sagemaker's architecture more on this here, this was achieved through the following steps
- Structuring the project's folder to comply with sagemaker's model architecture
- Writing a custom train script that trains the model and saves the model in a specified location for sagemaker to access.
- Implemet a flask server to serve model's prediction
- Wrapping the model training pipeline in a SageMaker-compatible Docker image
- Ship image to Amazon ECR
Image repo: 249021303942.dkr.ecr.us-west-2.amazonaws.com/sagemaker-custom-models
To train with this model in sagemaker, all that is needed is to specify the location in ECR.
Building an inference pipeline for preprocessing and feature engineering in sagemaker,training and deploying the model
For data preprocessing and feature engineering there are alot of options during deployment, one way is to specify transformations in aws lambda rather than having a bulk of functions and layers in aws lambda an inference pipeline was built for this project, it seemed to be cleaner approach.
Building the inference pipeline involved the following steps:
-
Custom transformer script: Writing a script subclassing sklearn.base TransformerMxn and Base classes to build custom transformers, leveraging sklearn.pipeline to connect this transformers in a pipeline. Including the required functions for sagemaker, input_fn: to parse input data, predict_fn: make transformation with the model, output_fn: write and encode the output data, model_fn: load model. Creating an entry point in the script to train and save the transformer pipeline. To avoid a global variable conflict in sklearn's container on aws the transformers were written in a different module and specified as a dependency file.
-
Training the transformer pipeline setting the script as the entry point.
-
Performing a Batch transform with trained transformer pipeline and saving the location of the output file.
-
Training machine Learning model: The custom xgboost model image is loaded up using its location in ECR and training it on the transformed data
-
Building the pipeline: The transformer pipeline is connected to the trained xgboost model with sagemaker PipelineModel function, which is deployed with the endpoint name: inference-pipeline-ep-2020-05-15-21-52-39
This makes inference easy as data is transformed and predictions are made with the same model endpoint inference-pipeline-ep-2020-05-15-21-52-39. Predictions could be made by passing in the raw data.
To invoke predictions with the model the architecture used is
client >>>>> api-gateway >>>>>> lambda >>>>>> model_endpoint
The lambda microservice transform the data to a csv format and gets prediction from the model.
For easy integration with clients the lambda is integrated with aws api gateway to serve model predictions via Rest API. API for this project: https://i73xeinese.execute-api.us-west-2.amazonaws.com/beta
- /houseprice-prediction-inference-pipeline.ipynb: SageMaker notebook, Modelling and Inference Pipeline.
- /preprocessor.py: Custom transformer script, entry point for bulding inference pipeline.
- /transformers.py: Depedency script.
- /sagemaker-custom-model/Dockerfile: Custom sagemaker image.
- /sagemaker-custom-model/xgboost/train: Custom model train script
- /sagemaker-custom-model/xgboost/predictor.py: Serve Prediction