This example shows how to user SageMaker Clarify to run explainability jobs on a SageMaker hosted inference pipeline.
Below is the architecture diagram used in the solution:
The notebook performs the following steps:
- Prepare raw training and test data
- Create a SageMaker Processing job which performs preprocessing on the raw training data and also produces an SKlearn model which is reused for deployment.
- Train an XGBoost model on the processed data using SageMaker's built-in XGBoost container
- Create a SageMaker Inference pipeline containing the SKlearn and XGBoost model in a series
- Perform inference by supplying raw test data
- Set up and run explainability job powered by SageMaker Clarify
- Use open source shap library to create summary and waterfall plots to understand the feature importance better
- Run bias analysis jobs
- Clean up
The attached notebook can be run in Amazon SageMaker Studio.