Files

sagemaker-core
end_to_end_ml_lifecycle
prepare_data
build_and_train_models
deploy_and_monitor
generative_ai
ml_ops
responsible_ai
archived
- AutoML_-_Train_multiple_models_in_parallel
- Image_Classification_VIT
- JPMML_Models_SageMaker
- RestRServe_Example
- Text_Classification_BERT
- albert-base-v2
- amazon_comprehend_sagemaker_pipeline
- amazon_demo_product
- athena_ml_workflow_end_to_end
- auto-scaling
- autogluon-sagemaker-pipeline
- autogluon-tabular-containers
- autogluon-tabular
- autogluon_tabular_marketplace
- automate_model_retraining_workflow
- automating_auto_insurance_claim_processing
- autopilot-serverless-inference
- bandits_recsys_movielens_testbed
- bandits_statlog_vw_customEnv
- basic-training-container
- bedrock-examples
- bert_attention_head_view
- bert_trition_backend
- byoc-nginx-python
- causal-inference
- churn_prediction_multimodality_of_text_and_tabular
- clarify-explainability-inference-pipelines
- computer-vision-examples
- creative-writing-using-gpt-2-text-generation
- credit_card_fraud_detector
- curating_aws_marketplace_listing_and_sample_notebook
- custom-feature-selection
- custom_tensorflow_inference_script_csv_and_tfrecord
- customer_churn
- customizing_build_train_deploy_project
- data_parallel_bert
- deep_demand_forecasting
- deploy_all_options_xgb
- deploy_huggingface_model_on_Inf1_instance
- deploy_pytorch_model_on_Inf1_instance
- distributed_tensorflow_mask_rcnn
- dw_flow
- end_to_end_music_recommendation
- evaluating_aws_marketplace_models_for_person_counting_use_case
- fairness_and_explainability
- fairness_and_explainability_json
- fairness_and_explainability_json_format
- fairness_and_explainability_jsonlines
- fairness_and_explainability_spark
- fairseq_sagemaker_translate_en2fr
- falcon
- fil_ensemble
- framework-container
- frameworks-tensorflow
- fraud_detection
- fraud_detection_using_graph_neural_networks
- fully_sharded_data_parallel-falcon
- geospatial
- getting_started
- gluon_recommender_system
- gluoncv_yolo_neo
- hf-tgi-bloom7b1
- huggingface-inference-recommender
- huggingface-large-model-inference-santacoder
- huggingface_deploy_instructpix2pix
- huggingface_multiclass_text_classification_20_newsgroups
- huggingface_sentiment_classification
- huggingface_text_classification
- identify_key_insights_from_textual_document
- implicit_bpr
- improving_industrial_workplace_safety
- inference-benchmarking
- inference-recommender-with-python-sdk
- inference_pipeline_custom_containers
- ingest-with-aws-services
- jit_trace
- keras_bring_your_own
- keras_script_mode_pipe_mode_horovod
- language-modeling
- llm_monitor_byoc
- lmi-aitemplate-stablediff
- local_experiment_tracking
- machine_learning_workflow_abalone
- managed_spot_training_tensorflow_estimator
- ml-lifecycle
- mme-on-gpu
- mnist
- model_monitor_tensorflow
- monitoring_data_quality_of_models
- mpi_on_sagemaker
- multi_modal_parallel_sagemaker_labeling_workflows_with_step_functions
- multi_model_catboost
- multi_model_linear_learner_home_value
- multi_model_pytorch
- multicategory_sec
- multimodal_tabtext
- mxnet_distributed_mnist_neo_inf1
- mxnet_onnx_ei
- nas_for_llm_with_amt
- nlp_mlops_company_sentiment
- nlp_score_dashboard_sec
- notebook-job-step
- object_detection_with_tensorflow_and_tfrecords
- onnx-roberta-backend
- parameterize-spark-config-pysparkprocessor-pipeline
- pipe_bring_your_own
- prep_data
- preprocessing-audio-data-using-a-machine-learning-model
- product_ratings_with_pipelines
  - code
  - README.md
  - pipelines_product_ratings.ipynb
- pyspark_mnist
- python-sdk
- pytorch-ic-model
- pytorch-sagemaker-huggingface
- pytorch
- pytorch_cnn_cifar10
- pytorch_deploy_pretrained_bert_model
- pytorch_extend_container_train_deploy_bertopic
- pytorch_horovod_mnist
- pytorch_multiple_gpu_single_node
- pytorch_smdataparallel_mnist_demo
- pytorch_torchvision
- pytorch_triton_inference_recommender
- pytorch_yolov5_training_and_hpo
- r_byo_r_algo_hpo
- r_serving_with_fastapi
- r_serving_with_plumber
- rapids_bring_your_own
- resnet50
- resnet_onnx_backend_SME_triton_v2
- resnet_onnx_pytorch_tensorRT-backend
- resnet_pytorch_python-backend
- retail_recommend
- rl_gamerserver_ray
- rl_hvac_coach_energyplus
- rl_mountain_car_coach_gymEnv
- rl_stock_trading_coach_customEnv
- rl_traveling_salesman_vehicle_routing_coach
- rl_unity_ray
- roberta-base
- roberta_traced_triton
- sagemaker-autopilot-pipelines
- sagemaker-debugger
- sagemaker-featurestore
- sagemaker-huggingface-tgi-hosting-examples
- sagemaker-lineage
- sagemaker-pipeline-compare-model-versions
- sagemaker-pipeline-multi-model
- sagemaker-pipeline-parameterization
- sagemaker-script-mode
- sagemaker_clarify_integration
- sagemaker_job_tracking
- sagemaker_pytorch_model_zoo
- scientific_details_of_algorithms
- scikit_learn_bring_your_own_model
- scikit_learn_data_processing_and_model_evaluation
- scikit_learn_iris
- script-mode-container-2
- script-mode-container
- sentiment_parallel_batch
- seq2seq_translation_en-de
- shadow-console
- single_gpu_single_node
- sklearn-inference-recommender
- sklearn
- sm-train_a_pytorch_model
- smddp_deepspeed_example
- sme_resnet_pytorch_python-backend
- smp-gpt-sharded-data-parallel
- smp-train-gpt-neox-sharded-data-parallel
- smp-train-gptj-sharded-data-parallel-tp
- smp-train-t5-sharded-data-parallel
- stable_diffusion
- streamlit_demo
- studio-scheduling
- t5_pytorch_python-backend
- tensorboard_keras
- tensorflow-cloudwatch
- tensorflow
- tensorflow2-california-housing-sagemaker-pipelines-deploy-endpoint
- tensorflow2_mnist
- tensorflow_action_on_rule
- tensorflow_moving_from_framework_mode_to_script_mode
- tensorflow_open-images_jpg
- tensorflow_profiling
- tensorflow_script_mode_quickstart
- tensorflow_script_mode_training_and_serving
- tensorflow_serving_using_elastic_inference_with_your_own_model
- tensorflow_single_gpu_single_node
- tensort-rt
- text-to-image-fine-tuning
- text_explainability_sagemaker_algorithm
- tf-dali-ensemble-cv
- time_series_deepar
- time_series_forecasting
- timeseries-quantile-selection-dataflow
- training_pipeline_pytorch_mnist
- triton-cv-mme-tensorflow-backend
- using_step_decorator_with_selective_execution
- vision-transformer
- visual_object_detection
- visualization
- workshops
- xgboost_abalone
- xgboost_bring_your_own
- xgboost_ensemble_python-fil-backend
- xgboost_parquet_input_training
- 2_object_detection_train_eval.ipynb
- 3D-point-cloud-input-data-processing.ipynb
- Amazon_JumpStart_Image_Classification.ipynb
- Amazon_JumpStart_Image_Classification_Benchmarking.ipynb
- Amazon_JumpStart_Image_Embedding.ipynb
- Amazon_JumpStart_Inpainting.ipynb
- Amazon_JumpStart_Instance_Segmentation.ipynb
- Amazon_JumpStart_Machine_Translation.ipynb
- Amazon_JumpStart_NLP_Regression_Free_Training.ipynb
- Amazon_JumpStart_Named_Entity_Recognition.ipynb
- Amazon_JumpStart_Object_Detection.ipynb
- Amazon_JumpStart_Question_Answering.ipynb
- Amazon_JumpStart_Regression_Free_Training.ipynb
- Amazon_JumpStart_Semantic_Segmentation.ipynb
- Amazon_JumpStart_Semantic_Segmentation_Extract_Image.ipynb
- Amazon_JumpStart_Sentence_Pair_Classification.ipynb
- Amazon_JumpStart_Text_Classification.ipynb
- Amazon_JumpStart_Text_Generation.ipynb
- Amazon_JumpStart_Text_Summarization.ipynb
- Amazon_JumpStart_Upscaling.ipynb
- Amazon_JumpStart_Zero_Shot_Text_Classification.ipynb
- Amazon_Jumpstart_AlexaTM_20B.ipynb
- Amazon_Tabular_Classification_AutoGluon.ipynb
- Amazon_Tabular_Regression_LightGBM_CatBoost.ipynb
- Amazon_Tabular_Regression_TabTransformer.ipynb
- Amazon_TensorFlow_Image_Classification.ipynb
- Amazon_Tensorflow_Object_Detection.ipynb
- Batch Transform - breast cancer prediction with high level SDK.ipynb
- Batch Transform - breast cancer prediction with lowel level SDK.ipynb
- DeployStableCascade.ipynb
- Dynamic Pricing with Causal Machine Learning and Optimization on Amazon SageMaker.ipynb
- EnsembleLearnerCensusIncome.ipynb
- GPT-J-6B-model-parallel-inference-DJL.ipynb
- GPT-J-6B_DJLServing_with_PySDK.ipynb
- GT_semantic_segmentation_to_COCO.ipynb
- HPO_Analyze_TuningJob_Results.ipynb
- HuggingFace-Async-Inference-Walkthrough.ipynb
- JumpStart_Stable_Diffusion_Inference_Only.ipynb
- Linear_Learner_Regression_csv_format.ipynb
- R_binary_classification_algorithms_comparison.ipynb
- SEC_Retrieval_Summarizer_Scoring.ipynb
- SageMaker-Monitoring-Bias-Drift-for-Batch-Transform-JSON-Lines.ipynb
- SageMaker-Monitoring-Bias-Drift-for-Batch-Transform.ipynb
- SageMaker_Keyspaces_ml_example.ipynb
- Sklearn_on_SageMaker_end2end.ipynb
- Transcription_on_SM_endpoint.ipynb
- algorithms.ipynb
- ap-batch-transform.ipynb
- automatic-speech-recognition.ipynb
- autopilot_customer_churn_high_level_with_evaluation.ipynb
- autopilot_ts_data_merge.ipynb
- bias_detection_with_predicted_label_and_facet_datasets.ipynb
- bloom-z-176b-few-shot-and-zero-shot-learning.ipynb
- boto3_scikit_retrain_model_and_deploy_to_existing_endpoint.ipynb
- bring_your_own_container.ipynb
- build_gan_with_pytorch.ipynb
- churn-prediction-lightgbm-catboost-tabtransformer-autogluon.ipynb
- custom_dog_image_generator.ipynb
- data_analysis_of_ground_truth_image_classification_output.ipynb
- deepar_chicago_traffic_violations.ipynb
- distilgpt2-tgi.ipynb
- djl_deepspeed_deploy_opt30b.ipynb
- djl_deepspeed_deploy_opt30b_no_custom_inference_code.ipynb
- download_weights.ipynb
- endpoints.ipynb
- explainability_with_pdp.ipynb
- fairness_and_explainability_jsonlines_format.ipynb
- fairness_and_explainability_outputs.ipynb
- falcon-7b-instruction-domain-adaptation-finetuning.ipynb
- feature_store_securely_store_images.ipynb
- financial_payment_classification.ipynb
- forecast_example.ipynb
- frameworks.ipynb
- from_unlabeled_data_to_deployed_machine_learning_model_ground_truth_demo_image_classification.ipynb
- get_started_mnist_train_outputs.ipynb
- gpt2-large-tgi.ipynb
- gpt2-tgi.ipynb
- gpt2-xl-tgi.ipynb
- granite-code-instruct.ipynb
- ground_truth_annotation_dense_point_cloud_tutorial.ipynb
- hello_world_workflow.ipynb
- hf-tgi-flan-t5-xl.ipynb
- image-classification-with-shutterstock-datasets.ipynb
- image-generation-stable-diffusion.ipynb
- ingest_image_data.ipynb
- ingest_tabular_data.ipynb
- ingest_text_data.ipynb
- instant-recommendations.ipynb
- instruction-fine-tuning-flan-t5.ipynb
- kmeans_bring_your_own_model.ipynb
- kmeans_mnist.ipynb
- linear_learner_mnist.ipynb
- open-assistant-chatbot.ipynb
- pyspark-etl-training.ipynb
- pyspark_mnist_custom_estimator.ipynb
- pyspark_mnist_pca_mllib_kmeans.ipynb
- pytorch_mnist_elastic_inference.ipynb
- question_answering_jumpstart_knn.ipynb
- question_answering_pinecone_llama-2_jumpstart.ipynb
- question_answering_text_embedding_llama-2_jumpstart.ipynb
- r_sagemaker_hello_world.ipynb
- r_xgboost_batch_transform.ipynb
- r_xgboost_hpo_batch_transform.ipynb
- risk_bucketing.ipynb
- sagemaker-countycensusclustering.ipynb
- sagemaker-lightgbm-distributed-training-dask.ipynb
- sagemaker-lineage-multihop-queries_outputs.ipynb
- sagemaker-neo-tf-unet.ipynb
- sagemaker_autopilot_abalone_parquet_input.ipynb
- sagemaker_autopilot_direct_marketing.ipynb
- sagemaker_autopilot_neo4j_portfolio_churn.ipynb
- scikit_learn_model_registry_batch_transform.ipynb
- serverless-model-registry.ipynb
- sklearn_multi_model_endpoint_home_value.ipynb
- smp-finetuning-gpt-neox-fsdp-tp.ipynb
- smp-train-gpt-neox-fsdp-tp.ipynb
- sparkml_serving_emr_mleap_abalone.ipynb
- step_functions_mlworkflow_scikit_learn_data_processing_and_model_evaluation.ipynb
- tensorflow_BYOM_iris.ipynb
- text-embedding-sentence-similarity.ipynb
- text-generation-chatbot.ipynb
- text-generation-falcon.ipynb
- text-generation-few-shot-learning.ipynb
- text-generation-open-llama.ipynb
- text2text-generation-Batch-Transform.ipynb
- text2text-generation-bloomz.ipynb
- text2text-generation-flan-t5-ul2.ipynb
- text2text-generation-flan-t5.ipynb
- tf-resnet-profiling-multi-gpu-multi-node.ipynb
- tgi-bloom-560m.ipynb
- tgi-gpt-neox-20b.ipynb
- upgrade_to_v2.ipynb
- using-dataset-product-from-aws-data-exchange-with-ml-model-from-aws-marketplace.ipynb
- vilt-b32-finetuned-vqa.ipynb
- xgboost-census-debugger-rules.ipynb
- xgboost-inference-recommender.ipynb
- xgboost_customer_churn_outputs.ipynb
- xgboost_mnist.ipynb
- xgboost_multi_model_endpoint_home_value.ipynb
.github
_static
_templates
.gitignore
.readthedocs.yml
CODEOWNERS
CONTRIBUTING.md
LICENSE.txt
Makefile
NOTICE
README.md
conf.py
config.json
environment.yml
index.rst
intro.rst
make.bat
new_file_structure_updated_notebook_names_and_folders.xlsx
tox.ini

product_ratings_with_pipelines

Name		Name	Last commit message	Last commit date
parent directory ..
code		code
README.md		README.md
pipelines_product_ratings.ipynb		pipelines_product_ratings.ipynb

README.md

Amazon SageMaker Pipelines

Training and deploying a text classification model using Amazon SageMaker Pipelines

Background
Prerequisites
Data
Approach
Other Resources

Background

Amazon SageMaker Pipelines makes it easy for data scientists and engineers to build, automate, and scale end-to-end machine learning workflows. Machine learning workflows are complex, requiring iteration and experimentation across each step of the machine learning process, such as exploring and preparing data, experimenting with different algorithms, training and turning models, and deploying models to production. Developing and managing these workflows can take weeks or months of coding and manually managing workflow dependencies can become complex. With Amazon SageMaker Pipelines, data science teams have an easy-to-use continuous integration and continuous delivery (CI/CD) service that simplifies the development and management of machine learning workflows at scale.

In this notebook we use SageMaker Pipelines to train and deploy a text classification model to predict e-commerce product ratings based on customers’ product reviews. We’ll use BlazingText, a SageMaker built-in algorithm, to minimize the amount of effort required to train and deploy the model. BlazingText provides highly optimized implementations of Word2vec and text classification algorithms.

Prereqs

You will need an AWS account to use this solution. Sign up for an account before you proceed.

You will also need to have permission to use Amazon SageMaker Studio. All AWS permissions can be managed through AWS IAM. Admin users will have the required permissions, but please contact your account's AWS administrator if your user account doesn't have the required permissions.

Data

To train the model, we’ll use a sample of data containing e-commerce reviews and associated product ratings. Our pipeline will start with processing the data for model training and will proceed with model training, evaluation, registry and deployment. The Women’s E-Commerce Clothing Clothing Reviews dataset has been made available under a Creative Commons license. A copy of the dataset has been saved in a sample data Amazon S3 bucket. In the first section of the notebook, we’ll walk through how to download the data and get started with building the ML workflow as a SageMaker pipeline.

Approach

Our ML workflow will be built in the following SageMaker pipeline steps:

Data processing step - in this step we use a scikit-learn processor to process the training data by cleaning up the review text (eg. remove punctuation and convert to lower case), rebalancing the dataset, creating review categories and generating the training, testing and validation datasets
Model training step - in this step we create a SageMaker estimator and specify model training hyperparameters and the location of training and validation data
Create model step - in the create model step we pass the model data from the training step
Deploy model step - the deploy model step uses a scikit-learn processor to deploy the trained model
Register model step - in the final model step we submit the trained model to the model registry. We can optionally configure this step to require manual approval before submission.

Other Resources

For additional SageMaker Pipelines examples, see Orchestrating Jobs with Amazon SageMaker Model Building Pipelines or the related GitHub repo.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Files

product_ratings_with_pipelines

product_ratings_with_pipelines

README.md

Amazon SageMaker Pipelines

Training and deploying a text classification model using Amazon SageMaker Pipelines

Contents

Background

Prereqs

Data

Approach

Other Resources

Files

product_ratings_with_pipelines

Directory actions

More options

Directory actions

More options

Latest commit

History

product_ratings_with_pipelines

Folders and files

parent directory

README.md

Amazon SageMaker Pipelines

Training and deploying a text classification model using Amazon SageMaker Pipelines

Contents

Background

Prereqs

Data

Approach

Other Resources