Files

sagemaker-core
end_to_end_ml_lifecycle
prepare_data
build_and_train_models
deploy_and_monitor
generative_ai
ml_ops
responsible_ai
archived
- AutoML_-_Train_multiple_models_in_parallel
- Image_Classification_VIT
- JPMML_Models_SageMaker
- RestRServe_Example
- Text_Classification_BERT
- albert-base-v2
- amazon_comprehend_sagemaker_pipeline
- amazon_demo_product
- athena_ml_workflow_end_to_end
- auto-scaling
- autogluon-sagemaker-pipeline
- autogluon-tabular-containers
- autogluon-tabular
- autogluon_tabular_marketplace
- automate_model_retraining_workflow
- automating_auto_insurance_claim_processing
- autopilot-serverless-inference
- bandits_recsys_movielens_testbed
- bandits_statlog_vw_customEnv
- basic-training-container
- bedrock-examples
- bert_attention_head_view
- bert_trition_backend
- byoc-nginx-python
- causal-inference
- churn_prediction_multimodality_of_text_and_tabular
- clarify-explainability-inference-pipelines
- computer-vision-examples
- creative-writing-using-gpt-2-text-generation
- credit_card_fraud_detector
- curating_aws_marketplace_listing_and_sample_notebook
- custom-feature-selection
- custom_tensorflow_inference_script_csv_and_tfrecord
- customer_churn
- customizing_build_train_deploy_project
- data_parallel_bert
- deep_demand_forecasting
- deploy_all_options_xgb
- deploy_huggingface_model_on_Inf1_instance
- deploy_pytorch_model_on_Inf1_instance
- distributed_tensorflow_mask_rcnn
- dw_flow
- end_to_end_music_recommendation
- evaluating_aws_marketplace_models_for_person_counting_use_case
- fairness_and_explainability
- fairness_and_explainability_json
- fairness_and_explainability_json_format
- fairness_and_explainability_jsonlines
- fairness_and_explainability_spark
- fairseq_sagemaker_translate_en2fr
- falcon
- fil_ensemble
- framework-container
- frameworks-tensorflow
- fraud_detection
  - clarify_output
  - config
  - data
  - images
  - model_package_src
  - 0-AutoClaimFraudDetection.ipynb
  - 1-data-prep-e2e.ipynb
  - 2-lineage-train-assess-bias-tune-registry-e2e.ipynb
  - 3-mitigate-bias-train-model2-registry-e2e.ipynb
  - LICENSE
  - README.md
  - claims.flow
  - claims_flow_template
  - create_dataset.py
  - customers.flow
  - customers_flow_template
  - demo_helpers.py
  - deploy_model.py
  - index.rst
  - pipeline-e2e.ipynb
  - training_metrics.json
  - xgboost_starter_script.py
- fraud_detection_using_graph_neural_networks
- fully_sharded_data_parallel-falcon
- geospatial
- getting_started
- gluon_recommender_system
- gluoncv_yolo_neo
- hf-tgi-bloom7b1
- huggingface-inference-recommender
- huggingface-large-model-inference-santacoder
- huggingface_deploy_instructpix2pix
- huggingface_multiclass_text_classification_20_newsgroups
- huggingface_sentiment_classification
- huggingface_text_classification
- identify_key_insights_from_textual_document
- implicit_bpr
- improving_industrial_workplace_safety
- inference-benchmarking
- inference-recommender-with-python-sdk
- inference_pipeline_custom_containers
- ingest-with-aws-services
- jit_trace
- keras_bring_your_own
- keras_script_mode_pipe_mode_horovod
- language-modeling
- llm_monitor_byoc
- lmi-aitemplate-stablediff
- local_experiment_tracking
- machine_learning_workflow_abalone
- managed_spot_training_tensorflow_estimator
- ml-lifecycle
- mme-on-gpu
- mnist
- model_monitor_tensorflow
- monitoring_data_quality_of_models
- mpi_on_sagemaker
- multi_modal_parallel_sagemaker_labeling_workflows_with_step_functions
- multi_model_catboost
- multi_model_linear_learner_home_value
- multi_model_pytorch
- multicategory_sec
- multimodal_tabtext
- mxnet_distributed_mnist_neo_inf1
- mxnet_onnx_ei
- nas_for_llm_with_amt
- nlp_mlops_company_sentiment
- nlp_score_dashboard_sec
- notebook-job-step
- object_detection_with_tensorflow_and_tfrecords
- onnx-roberta-backend
- parameterize-spark-config-pysparkprocessor-pipeline
- pipe_bring_your_own
- prep_data
- preprocessing-audio-data-using-a-machine-learning-model
- product_ratings_with_pipelines
- pyspark_mnist
- python-sdk
- pytorch-ic-model
- pytorch-sagemaker-huggingface
- pytorch
- pytorch_cnn_cifar10
- pytorch_deploy_pretrained_bert_model
- pytorch_extend_container_train_deploy_bertopic
- pytorch_horovod_mnist
- pytorch_multiple_gpu_single_node
- pytorch_smdataparallel_mnist_demo
- pytorch_torchvision
- pytorch_triton_inference_recommender
- pytorch_yolov5_training_and_hpo
- r_byo_r_algo_hpo
- r_serving_with_fastapi
- r_serving_with_plumber
- rapids_bring_your_own
- resnet50
- resnet_onnx_backend_SME_triton_v2
- resnet_onnx_pytorch_tensorRT-backend
- resnet_pytorch_python-backend
- retail_recommend
- rl_gamerserver_ray
- rl_hvac_coach_energyplus
- rl_mountain_car_coach_gymEnv
- rl_stock_trading_coach_customEnv
- rl_traveling_salesman_vehicle_routing_coach
- rl_unity_ray
- roberta-base
- roberta_traced_triton
- sagemaker-autopilot-pipelines
- sagemaker-debugger
- sagemaker-featurestore
- sagemaker-huggingface-tgi-hosting-examples
- sagemaker-lineage
- sagemaker-pipeline-compare-model-versions
- sagemaker-pipeline-multi-model
- sagemaker-pipeline-parameterization
- sagemaker-script-mode
- sagemaker_clarify_integration
- sagemaker_job_tracking
- sagemaker_pytorch_model_zoo
- scientific_details_of_algorithms
- scikit_learn_bring_your_own_model
- scikit_learn_data_processing_and_model_evaluation
- scikit_learn_iris
- script-mode-container-2
- script-mode-container
- sentiment_parallel_batch
- seq2seq_translation_en-de
- shadow-console
- single_gpu_single_node
- sklearn-inference-recommender
- sklearn
- sm-train_a_pytorch_model
- smddp_deepspeed_example
- sme_resnet_pytorch_python-backend
- smp-gpt-sharded-data-parallel
- smp-train-gpt-neox-sharded-data-parallel
- smp-train-gptj-sharded-data-parallel-tp
- smp-train-t5-sharded-data-parallel
- stable_diffusion
- streamlit_demo
- studio-scheduling
- t5_pytorch_python-backend
- tensorboard_keras
- tensorflow-cloudwatch
- tensorflow
- tensorflow2-california-housing-sagemaker-pipelines-deploy-endpoint
- tensorflow2_mnist
- tensorflow_action_on_rule
- tensorflow_moving_from_framework_mode_to_script_mode
- tensorflow_open-images_jpg
- tensorflow_profiling
- tensorflow_script_mode_quickstart
- tensorflow_script_mode_training_and_serving
- tensorflow_serving_using_elastic_inference_with_your_own_model
- tensorflow_single_gpu_single_node
- tensort-rt
- text-to-image-fine-tuning
- text_explainability_sagemaker_algorithm
- tf-dali-ensemble-cv
- time_series_deepar
- time_series_forecasting
- timeseries-quantile-selection-dataflow
- training_pipeline_pytorch_mnist
- triton-cv-mme-tensorflow-backend
- using_step_decorator_with_selective_execution
- vision-transformer
- visual_object_detection
- visualization
- workshops
- xgboost_abalone
- xgboost_bring_your_own
- xgboost_ensemble_python-fil-backend
- xgboost_parquet_input_training
- 2_object_detection_train_eval.ipynb
- 3D-point-cloud-input-data-processing.ipynb
- Amazon_JumpStart_Image_Classification.ipynb
- Amazon_JumpStart_Image_Classification_Benchmarking.ipynb
- Amazon_JumpStart_Image_Embedding.ipynb
- Amazon_JumpStart_Inpainting.ipynb
- Amazon_JumpStart_Instance_Segmentation.ipynb
- Amazon_JumpStart_Machine_Translation.ipynb
- Amazon_JumpStart_NLP_Regression_Free_Training.ipynb
- Amazon_JumpStart_Named_Entity_Recognition.ipynb
- Amazon_JumpStart_Object_Detection.ipynb
- Amazon_JumpStart_Question_Answering.ipynb
- Amazon_JumpStart_Regression_Free_Training.ipynb
- Amazon_JumpStart_Semantic_Segmentation.ipynb
- Amazon_JumpStart_Semantic_Segmentation_Extract_Image.ipynb
- Amazon_JumpStart_Sentence_Pair_Classification.ipynb
- Amazon_JumpStart_Text_Classification.ipynb
- Amazon_JumpStart_Text_Generation.ipynb
- Amazon_JumpStart_Text_Summarization.ipynb
- Amazon_JumpStart_Upscaling.ipynb
- Amazon_JumpStart_Zero_Shot_Text_Classification.ipynb
- Amazon_Jumpstart_AlexaTM_20B.ipynb
- Amazon_Tabular_Classification_AutoGluon.ipynb
- Amazon_Tabular_Regression_LightGBM_CatBoost.ipynb
- Amazon_Tabular_Regression_TabTransformer.ipynb
- Amazon_TensorFlow_Image_Classification.ipynb
- Amazon_Tensorflow_Object_Detection.ipynb
- Batch Transform - breast cancer prediction with high level SDK.ipynb
- Batch Transform - breast cancer prediction with lowel level SDK.ipynb
- DeployStableCascade.ipynb
- Dynamic Pricing with Causal Machine Learning and Optimization on Amazon SageMaker.ipynb
- EnsembleLearnerCensusIncome.ipynb
- GPT-J-6B-model-parallel-inference-DJL.ipynb
- GPT-J-6B_DJLServing_with_PySDK.ipynb
- GT_semantic_segmentation_to_COCO.ipynb
- HPO_Analyze_TuningJob_Results.ipynb
- HuggingFace-Async-Inference-Walkthrough.ipynb
- JumpStart_Stable_Diffusion_Inference_Only.ipynb
- Linear_Learner_Regression_csv_format.ipynb
- R_binary_classification_algorithms_comparison.ipynb
- SEC_Retrieval_Summarizer_Scoring.ipynb
- SageMaker-Monitoring-Bias-Drift-for-Batch-Transform-JSON-Lines.ipynb
- SageMaker-Monitoring-Bias-Drift-for-Batch-Transform.ipynb
- SageMaker_Keyspaces_ml_example.ipynb
- Sklearn_on_SageMaker_end2end.ipynb
- Transcription_on_SM_endpoint.ipynb
- algorithms.ipynb
- ap-batch-transform.ipynb
- automatic-speech-recognition.ipynb
- autopilot_customer_churn_high_level_with_evaluation.ipynb
- autopilot_ts_data_merge.ipynb
- bias_detection_with_predicted_label_and_facet_datasets.ipynb
- bloom-z-176b-few-shot-and-zero-shot-learning.ipynb
- boto3_scikit_retrain_model_and_deploy_to_existing_endpoint.ipynb
- bring_your_own_container.ipynb
- build_gan_with_pytorch.ipynb
- churn-prediction-lightgbm-catboost-tabtransformer-autogluon.ipynb
- custom_dog_image_generator.ipynb
- data_analysis_of_ground_truth_image_classification_output.ipynb
- deepar_chicago_traffic_violations.ipynb
- distilgpt2-tgi.ipynb
- djl_deepspeed_deploy_opt30b.ipynb
- djl_deepspeed_deploy_opt30b_no_custom_inference_code.ipynb
- download_weights.ipynb
- endpoints.ipynb
- explainability_with_pdp.ipynb
- fairness_and_explainability_jsonlines_format.ipynb
- fairness_and_explainability_outputs.ipynb
- falcon-7b-instruction-domain-adaptation-finetuning.ipynb
- feature_store_securely_store_images.ipynb
- financial_payment_classification.ipynb
- forecast_example.ipynb
- frameworks.ipynb
- from_unlabeled_data_to_deployed_machine_learning_model_ground_truth_demo_image_classification.ipynb
- get_started_mnist_train_outputs.ipynb
- gpt2-large-tgi.ipynb
- gpt2-tgi.ipynb
- gpt2-xl-tgi.ipynb
- granite-code-instruct.ipynb
- ground_truth_annotation_dense_point_cloud_tutorial.ipynb
- hello_world_workflow.ipynb
- hf-tgi-flan-t5-xl.ipynb
- image-classification-with-shutterstock-datasets.ipynb
- image-generation-stable-diffusion.ipynb
- ingest_image_data.ipynb
- ingest_tabular_data.ipynb
- ingest_text_data.ipynb
- instant-recommendations.ipynb
- instruction-fine-tuning-flan-t5.ipynb
- kmeans_bring_your_own_model.ipynb
- kmeans_mnist.ipynb
- linear_learner_mnist.ipynb
- open-assistant-chatbot.ipynb
- pyspark-etl-training.ipynb
- pyspark_mnist_custom_estimator.ipynb
- pyspark_mnist_pca_mllib_kmeans.ipynb
- pytorch_mnist_elastic_inference.ipynb
- question_answering_jumpstart_knn.ipynb
- question_answering_pinecone_llama-2_jumpstart.ipynb
- question_answering_text_embedding_llama-2_jumpstart.ipynb
- r_sagemaker_hello_world.ipynb
- r_xgboost_batch_transform.ipynb
- r_xgboost_hpo_batch_transform.ipynb
- risk_bucketing.ipynb
- sagemaker-countycensusclustering.ipynb
- sagemaker-lightgbm-distributed-training-dask.ipynb
- sagemaker-lineage-multihop-queries_outputs.ipynb
- sagemaker-neo-tf-unet.ipynb
- sagemaker_autopilot_abalone_parquet_input.ipynb
- sagemaker_autopilot_direct_marketing.ipynb
- sagemaker_autopilot_neo4j_portfolio_churn.ipynb
- scikit_learn_model_registry_batch_transform.ipynb
- serverless-model-registry.ipynb
- sklearn_multi_model_endpoint_home_value.ipynb
- smp-finetuning-gpt-neox-fsdp-tp.ipynb
- smp-train-gpt-neox-fsdp-tp.ipynb
- sparkml_serving_emr_mleap_abalone.ipynb
- step_functions_mlworkflow_scikit_learn_data_processing_and_model_evaluation.ipynb
- tensorflow_BYOM_iris.ipynb
- text-embedding-sentence-similarity.ipynb
- text-generation-chatbot.ipynb
- text-generation-falcon.ipynb
- text-generation-few-shot-learning.ipynb
- text-generation-open-llama.ipynb
- text2text-generation-Batch-Transform.ipynb
- text2text-generation-bloomz.ipynb
- text2text-generation-flan-t5-ul2.ipynb
- text2text-generation-flan-t5.ipynb
- tf-resnet-profiling-multi-gpu-multi-node.ipynb
- tgi-bloom-560m.ipynb
- tgi-gpt-neox-20b.ipynb
- upgrade_to_v2.ipynb
- using-dataset-product-from-aws-data-exchange-with-ml-model-from-aws-marketplace.ipynb
- vilt-b32-finetuned-vqa.ipynb
- xgboost-census-debugger-rules.ipynb
- xgboost-inference-recommender.ipynb
- xgboost_customer_churn_outputs.ipynb
- xgboost_mnist.ipynb
- xgboost_multi_model_endpoint_home_value.ipynb
.github
_static
_templates
.gitignore
.readthedocs.yml
CODEOWNERS
CONTRIBUTING.md
LICENSE.txt
Makefile
NOTICE
README.md
conf.py
config.json
environment.yml
index.rst
intro.rst
make.bat
new_file_structure_updated_notebook_names_and_folders.xlsx
tox.ini

fraud_detection

Name		Name	Last commit message	Last commit date
parent directory ..
clarify_output		clarify_output
config		config
data		data
images		images
model_package_src		model_package_src
0-AutoClaimFraudDetection.ipynb		0-AutoClaimFraudDetection.ipynb
1-data-prep-e2e.ipynb		1-data-prep-e2e.ipynb
2-lineage-train-assess-bias-tune-registry-e2e.ipynb		2-lineage-train-assess-bias-tune-registry-e2e.ipynb
3-mitigate-bias-train-model2-registry-e2e.ipynb		3-mitigate-bias-train-model2-registry-e2e.ipynb
LICENSE		LICENSE
README.md		README.md
claims.flow		claims.flow
claims_flow_template		claims_flow_template
create_dataset.py		create_dataset.py
customers.flow		customers.flow
customers_flow_template		customers_flow_template
demo_helpers.py		demo_helpers.py
deploy_model.py		deploy_model.py
index.rst		index.rst
pipeline-e2e.ipynb		pipeline-e2e.ipynb
training_metrics.json		training_metrics.json
xgboost_starter_script.py		xgboost_starter_script.py

README.md

Architect and Build an End-to-End Workflow for Auto Claim Fraud Detection with SageMaker Services

The purpose of this end-to-end example is to demonstrate how to prepare, train, and deploy a model that detects auto insurance claims.

Business Problem
Technical Solution
Solution Components
Solution Architecture
Code Resources
Exploratory Data Science and Operational ML workflows
The ML Life Cycle: Detailed View

Business Problem

"Auto insurance fraud ranges from misrepresenting facts on insurance applications and inflating insurance claims to staging accidents and submitting claim forms for injuries or damage that never occurred, to false reports of stolen vehicles. Fraud accounted for between 15 percent and 17 percent of total claims payments for auto insurance bodily injury in 2012, according to an Insurance Research Council (IRC) study. The study estimated that between $5.6 billion and $7.7 billion was fraudulently added to paid claims for auto insurance bodily injury payments in 2012, compared with a range of $4.3 billion to $5.8 billion in 2002. " source: Insurance Information Institute

In this example, we will use an auto insurance domain to detect claims that are possibly fraudulent.
more precisely we address the use-case: "what is the likelihood that a given auto claim is fraudulent?" , and explore the technical solution.

As you review the notebooks and the architectures presented at each stage of the ML life cycle, you will see how you can leverage SageMaker services and features to enhance your effectiveness as a data scientist, as a machine learning engineer, and as an ML Ops Engineer.

We then perform data exploration on the synthetically generated datasets for Customers and Claims.

Then, we provide an overview of the technical solution by examining the Solution Components and the Solution Architecture. We are motivated by the need to accomplish new tasks in ML by examining a detailed view of the Machine Learning Lifecycle, recognizing the separation of exploratory data science and operationalizing an ML worklfow.

Car Insurance Claims: Data Sets and Problem Domain

The inputs for building our model and workflow are two tables of insurance data: a claims table and a customers table. This data was synthetically generated is provided to you in its raw state for pre-processing with SageMaker Data Wrangler. However, completing the SageMaker Data Wrangler step is not required to continue with the rest of this notebook. If you wish, you may use the claims_preprocessed.csv and customers_preprocessed.csv in the data directory as they are exact copies of what SageMaker Data Wrangler would output.

Technical Solution

In this introduction, you will look at the technical architecture and solution components to build a solution for predicting fraudulent insurance claims and deploy it using SageMaker for real-time predictions. While a deployed model is the end-product of this notebook series, the purpose of this guide is to walk you through all the detailed stages of the machine learning (ML) lifecycle and show you what SageMaker services and features are there to support your activities in each stage.

Solution Components

The following SageMaker Services are used in this solution:

Solution Architecture

The overall architecture is shown in the diagram below. 1end to end

We will go through 5 stages of ML and explore the solution architecture of SageMaker. Each of the sequancial notebooks will dive deep into corresponding ML stage.

Notebook 1: Data Exploration

Notebook 2: Data Preparation, Ingest, Transform, Preprocess, and Store in SageMaker Feature Store

Notebook 3 and Notebook 4 : Train, Tune, Check Pre- and Post-Training Bias, Mitigate Bias, Re-train, Deposit, and Deploy the Best Model to SageMaker Model Registry

This is the architecture for model deployment.

Pipeline Notebook: End-to-End Pipeline - MLOps Pipeline to run an end-to-end automated workflow with all the design decisions made during manual/exploratory steps in previous notebooks.

Code Resources

Stages

Our solution is split into the following stages of the ML Lifecycle, and each stage has its own notebook:

Notebook 1: Data Exploration: We first explore the data.
Notebook 2: Data Prep and Store: We prepare a dataset for machine learning using SageMaker Data Wrangler, create and deposit the datasets in a SageMaker Feature Store.
Notebook 3: Train, Assess Bias, Establish Lineage, Register Model: We detect possible pre-training and post-training bias, train and tune a XGBoost model using Amazon SageMaker, record Lineage in the Model Registry so we can later deploy it.
Notebook 4: Mitigate Bias, Re-train, Register, Deploy Unbiased Model: We mitigate bias, retrain a less biased model, store it in a Model Registry. We then deploy the model to a Amazon SageMaker Hosted Endpoint and run real-time inference via the SageMaker Online Feature Store.
Pipeline Notebook: Create and Run an MLOps Pipeline: We then create a SageMaker Pipeline that ties together everything we have done so far, from outputs from Data Wrangler, Feature Store, Clarify, Model Registry and finally deployment to a SageMaker Hosted Endpoint. --> Architecture

The Exploratory Data Science and ML Ops Workflows

Exploratory Data Science and Scalable MLOps

Note that there are typically two workflows: a manual exploratory workflow and an automated workflow.

The exploratory, manual data science workflow is where experiments are conducted and various techniques and strategies are tested.

After you have established your data prep, transformations, featurizations and training algorithms, testing of various hyperparameters for model tuning, you can start with the automated workflow where you rely on MLOps or the ML Engineering part of your team to streamline the process, make it more repeatable and scalable by putting it into an automated pipeline.

The ML Life Cycle: Detailed View

The Red Boxes and Icons represent comparatively newer concepts and tasks that are now deemed important to include and execute, in a production-oriented (versus research-oriented) and scalable ML lifecycle.

These newer lifecycle tasks and their corresponding, supporting AWS Services and features include:

Data Wrangling: AWS Data Wrangler for cleaning, normalizing, transforming and encoding data, as well as joining datasets. The outputs of Data Wrangler are code generated to work with SageMaker Processing, SageMaker Pipelines, SageMaker Feature Store or just a plain old python script with pandas.
1. Feature Engineering has always been done, but now with AWS Data Wrangler we can use a GUI based tool to do so and generate code for the next phases of the lifecycle.
Detect Bias: Using AWS Clarify, in Data Prep or in Training we can detect pre-training and post-training bias, and eventually at Inference time provide Interpretability / Explainability of the inferences (e.g., which factors were most influential in coming up with the prediction)
Feature Store (Offline): Once we have done all of our feature engineering, the encoding and transformations, we can then standardize features, offline in AWS Feature Store, to be used as input features for training models.
Artifact Lineage: Using AWS SageMaker’s Artifact Lineage features we can associate all the artifacts (data, models, parameters, etc.) with a trained model to produce metadata that can be stored in a Model Registry.
Model Registry: AWS Model Registry stores the metadata around all artifacts that you have chosen to include in the process of creating your models, along with the model(s) themselves in a Model Registry. Later a human approval can be used to note that the model is good to be put into production. This feeds into the next phase of deploy and monitor.
Inference and the Online Feature Store: For real-time inference, we can leverage an online AWS Feature Store we have created to get us single digit millisecond low latency and high throughput for serving our model with new incoming data.
Pipelines: Once we have experimented and decided on the various options in the lifecycle (which transforms to apply to our features, imbalance or bias in the data, which algorithms to choose to train with, which hyper-parameters are giving us the best performance metrics, etc.) we can now automate the various tasks across the lifecycle using SageMaker Pipelines.
1. In this notebook, we will show a pipeline that starts with the outputs of AWS Data Wrangler and ends with storing trained models in the Model Registry.
2. Typically, you could have a pipeline for data prep, one for training until model registry (which we are showing in the code associated with this blog), one for inference, and one for re-training using SageMaker Model Monitor to detect model drift and data drift and trigger a re-training using an AWS Lambda function.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Files

fraud_detection

fraud_detection

README.md

Architect and Build an End-to-End Workflow for Auto Claim Fraud Detection with SageMaker Services

Contents

Business Problem

Car Insurance Claims: Data Sets and Problem Domain

Technical Solution

Solution Components

Solution Architecture

Notebook 1: Data Exploration

Notebook 2: Data Preparation, Ingest, Transform, Preprocess, and Store in SageMaker Feature Store

Notebook 3 and Notebook 4 : Train, Tune, Check Pre- and Post-Training Bias, Mitigate Bias, Re-train, Deposit, and Deploy the Best Model to SageMaker Model Registry

Pipeline Notebook: End-to-End Pipeline - MLOps Pipeline to run an end-to-end automated workflow with all the design decisions made during manual/exploratory steps in previous notebooks.

Code Resources

Stages

The Exploratory Data Science and ML Ops Workflows

Exploratory Data Science and Scalable MLOps

The ML Life Cycle: Detailed View

Files

fraud_detection

Directory actions

More options

Directory actions

More options

Latest commit

History

fraud_detection

Folders and files

parent directory

README.md

Architect and Build an End-to-End Workflow for Auto Claim Fraud Detection with SageMaker Services

Contents

Business Problem

Car Insurance Claims: Data Sets and Problem Domain

Technical Solution

Solution Components

Solution Architecture

Notebook 1: Data Exploration

Notebook 2: Data Preparation, Ingest, Transform, Preprocess, and Store in SageMaker Feature Store

Notebook 3 and Notebook 4 : Train, Tune, Check Pre- and Post-Training Bias, Mitigate Bias, Re-train, Deposit, and Deploy the Best Model to SageMaker Model Registry

Pipeline Notebook: End-to-End Pipeline - MLOps Pipeline to run an end-to-end automated workflow with all the design decisions made during manual/exploratory steps in previous notebooks.

Code Resources

Stages

The Exploratory Data Science and ML Ops Workflows

Exploratory Data Science and Scalable MLOps

The ML Life Cycle: Detailed View