# SageMaker End to End Solutions: Fraud Detection for Automobile Claims

<a id='overview-0'></a>

## [Overview](./0-AutoClaimFraudDetection.ipynb)
* **[Notebook 0 : Overview, Architecture, and Data Exploration](./0-AutoClaimFraudDetection.ipynb)**
  * **[Business Problem](#business-problem)**
  * **[Technical Solution](#nb0-solution)**
  * **[Solution Components](#nb0-components)**
  * **[Solution Architecture](#nb0-architecture)**
  * **[DataSets and Exploratory Data Analysis](#nb0-data-explore)**
  * **[Exploratory Data Science and Operational ML workflows](#nb0-workflows)**
  * **[The ML Life Cycle: Detailed View](#nb0-ml-lifecycle)**
* [Notebook 1: Data Prep, Process, Store Features](./1-data-prep-e2e.ipynb)
  * Architecture
  * Getting started
  * DataSets
  * SageMaker Feature Store
  * Create train and test datasets
* [Notebook 2: Train, Check Bias, Tune, Record Lineage, and Register a Model](./2-lineage-train-assess-bias-tune-registry-e2e.ipynb)
  * Architecture
  * Train a model using XGBoost
  * Model lineage with artifacts and associations
  * Evaluate the model for bias with Clarify
  * Deposit Model and Lineage in SageMaker Model Registry
* [Notebook 3: Mitigate Bias, Train New Model, Store in Registry](./3-mitigate-bias-train-model2-registry-e2e.ipynb)
  * Architecture
  * Develop a second model
  * Analyze the Second Model for Bias
  * View Results of Clarify Bias Detection Job
  * Configure and Run Clarify Explainability Job
  * Create Model Package for second trained model
* [Notebook 4: Deploy Model, Run Predictions](./4-deploy-run-inference-e2e.ipynb)
  * Architecture
  * Deploy an approved model and Run Inference via Feature Store
  * Create a Predictor
  * Run  Predictions from Online FeatureStore
* [Notebook 5 : Create and Run an End-to-End Pipeline to Deploy the Model](./5-pipeline-e2e.ipynb)
  * Architecture
  * Create an Automated Pipeline
  * Clean up

## Overview, Architecture, and Data Exploration

In this overview notebook, we will address business problems regarding auto insurance fraud, technical solutions, explore dataset, solution architecture, and scope the machine learning (ML) life cycle.

<a id='business-problem'> </a>

## Business Problem

[overview](#overview-0)

<i> "Auto insurance fraud ranges from misrepresenting facts on insurance applications and inflating insurance claims to staging accidents and submitting claim forms for injuries or damage that never occurred, to false reports of stolen vehicles.
Fraud accounted for between 15 percent and 17 percent of total claims payments for auto insurance bodily injury in 2012, according to an Insurance Research Council (IRC) study. The study estimated that between $\$5.6$ billion and $\$7.7$ billion was fraudulently added to paid claims for auto insurance bodily injury payments in 2012, compared with a range of $\$4.3$ billion to $\$5.8$ billion in 2002. </i>" [source: Insurance Information Institute](https://www.iii.org/article/background-on-insurance-fraud)

In this example, we will use an *auto insurance domain* to detect claims that are possibly fraudulent.  
more precisley we address the use-case: <i> "what is the likelihood that a given autoclaim is fraudulent?" </i>, and explore the technical solution.  

As you review the [notebooks](#nb0-notebooks) and the [architectures](#nb0-architecture) presented at each stage of the ML life cycle, you will see how you can leverage SageMaker services and features to enhance your effectiveness as a data scientist, as a machine learning engineer, and as an ML Ops Engineer.

We will then do [data exploration](#nb0-data-explore) on the synthetically generated datasets for Customers and Claims.

Then, we will provide an overview of the technical solution by examining the [Solution Components](#nb0-components) and the [Solution Architecture](#nb0-architecture).
We will be motivated by the need to accomplish new tasks in ML by examining a [detailed view of the Machine Learning Life-cycle](#nb0-ml-lifecycle), recognizing the [separation of exploratory data science and operationalizing an ML worklfow](#nb0-workflows).


### Car Insurance Claims: Data Sets and Problem Domain

The inputs for building our model and workflow are two tables of insurance data: a claims table and a customers table. This data was synthetically generated is provided to you in its raw state for pre-processing with SageMaker Data Wrangler. However, completing the Data Wragnler step is not required to continue with the rest of this notebook. If you wish, you may use the `claims_preprocessed.csv` and `customers_preprocessed.csv` in the `data` directory as they are exact copies of what Data Wragnler would output.

<a id ='nb0-solution'> </a>

## Technical Solution
[overview](#overview-0)

In this introduction, you will look at the technical architecture and solution components to build a solution for predicting fraudulent insurance claims and deploy it using SageMaker for real-time predictions. While a deployed model is the end-product of this notebook series, the purpose of this guide is to walk you through all the detailed stages of the [machine learning (ML) lifecycle](#ml-lifecycle) and show you what SageMaker servcies and features are there to support your activities in each stage.

**Topics**
- [Solution Components](#nb0-components)
- [Solution Architecture](#nb0-architecture)
- [Code Resources](#nb0-code)
- [ML lifecycle details](#nb0-ml-lifecycle)
- [Manual/exploratory and automated workflows](#nb0-workflows) 

<a id ='nb0-components'> </a>

## Solution Components
[overview](#overview-0)
    
The following [SageMaker](https://sagemaker.readthedocs.io/en/stable/v2.html) Services are used in this solution:

 1. [SageMaker DataWrangler](https://aws.amazon.com/sagemaker/data-wrangler/) - [docs](https://docs.aws.amazon.com/sagemaker/latest/dg/data-wrangler.html)
 1. [SageMaker Processing](https://aws.amazon.com/blogs/aws/amazon-sagemaker-processing-fully-managed-data-processing-and-model-evaluation/) - [docs](https://sagemaker.readthedocs.io/en/stable/amazon_sagemaker_processing.html)
 1. [SageMaker Feature Store](https://aws.amazon.com/sagemaker/feature-store/)- [docs](https://sagemaker.readthedocs.io/en/stable/amazon_sagemaker_featurestore.html)
 1. [SageMaker Clarify](https://aws.amazon.com/sagemaker/clarify/)- [docs](https://docs.aws.amazon.com/sagemaker/latest/dg/clarify-processing-job-run.html)
 1. [SageMaker Training with XGBoost Algorithm and Hyperparameter Optimization](https://sagemaker.readthedocs.io/en/stable/frameworks/xgboost/using_xgboost.html)- [docs](https://sagemaker.readthedocs.io/en/stable/frameworks/xgboost/index.html)
 1. [SageMaker Model Registry](https://docs.aws.amazon.com/sagemaker/latest/dg/model-registry.html)- [docs](https://docs.aws.amazon.com/sagemaker/latest/dg/model-registry-deploy.html#model-registry-deploy-api)
 1. [SageMaker Hosted Endpoints]()- [predictors - docs](https://sagemaker.readthedocs.io/en/stable/api/inference/predictors.html)
 1. [SageMaker Pipelines]()- [docs](https://sagemaker.readthedocs.io/en/stable/workflows/pipelines/index.html)

![Solution Components](images/solution-components-e2e.png)