Files

sagemaker-core
end_to_end_ml_lifecycle
prepare_data
build_and_train_models
deploy_and_monitor
generative_ai
ml_ops
responsible_ai
archived
- AutoML_-_Train_multiple_models_in_parallel
- Image_Classification_VIT
- JPMML_Models_SageMaker
- RestRServe_Example
- Text_Classification_BERT
- albert-base-v2
- amazon_comprehend_sagemaker_pipeline
- amazon_demo_product
- athena_ml_workflow_end_to_end
- auto-scaling
- autogluon-sagemaker-pipeline
- autogluon-tabular-containers
- autogluon-tabular
- autogluon_tabular_marketplace
- automate_model_retraining_workflow
- automating_auto_insurance_claim_processing
- autopilot-serverless-inference
- bandits_recsys_movielens_testbed
- bandits_statlog_vw_customEnv
- basic-training-container
- bedrock-examples
- bert_attention_head_view
- bert_trition_backend
- byoc-nginx-python
- causal-inference
- churn_prediction_multimodality_of_text_and_tabular
- clarify-explainability-inference-pipelines
- computer-vision-examples
- creative-writing-using-gpt-2-text-generation
- credit_card_fraud_detector
- curating_aws_marketplace_listing_and_sample_notebook
- custom-feature-selection
- custom_tensorflow_inference_script_csv_and_tfrecord
- customer_churn
- customizing_build_train_deploy_project
- data_parallel_bert
- deep_demand_forecasting
- deploy_all_options_xgb
- deploy_huggingface_model_on_Inf1_instance
- deploy_pytorch_model_on_Inf1_instance
- distributed_tensorflow_mask_rcnn
- dw_flow
- end_to_end_music_recommendation
- evaluating_aws_marketplace_models_for_person_counting_use_case
- fairness_and_explainability
- fairness_and_explainability_json
- fairness_and_explainability_json_format
- fairness_and_explainability_jsonlines
- fairness_and_explainability_spark
- fairseq_sagemaker_translate_en2fr
- falcon
- fil_ensemble
- framework-container
- frameworks-tensorflow
- fraud_detection
- fraud_detection_using_graph_neural_networks
- fully_sharded_data_parallel-falcon
- geospatial
- getting_started
- gluon_recommender_system
- gluoncv_yolo_neo
- hf-tgi-bloom7b1
- huggingface-inference-recommender
- huggingface-large-model-inference-santacoder
- huggingface_deploy_instructpix2pix
- huggingface_multiclass_text_classification_20_newsgroups
- huggingface_sentiment_classification
- huggingface_text_classification
- identify_key_insights_from_textual_document
- implicit_bpr
- improving_industrial_workplace_safety
- inference-benchmarking
- inference-recommender-with-python-sdk
- inference_pipeline_custom_containers
- ingest-with-aws-services
- jit_trace
- keras_bring_your_own
- keras_script_mode_pipe_mode_horovod
- language-modeling
- llm_monitor_byoc
- lmi-aitemplate-stablediff
- local_experiment_tracking
- machine_learning_workflow_abalone
- managed_spot_training_tensorflow_estimator
- ml-lifecycle
- mme-on-gpu
- mnist
- model_monitor_tensorflow
- monitoring_data_quality_of_models
- mpi_on_sagemaker
- multi_modal_parallel_sagemaker_labeling_workflows_with_step_functions
- multi_model_catboost
- multi_model_linear_learner_home_value
- multi_model_pytorch
- multicategory_sec
- multimodal_tabtext
- mxnet_distributed_mnist_neo_inf1
- mxnet_onnx_ei
- nas_for_llm_with_amt
- nlp_mlops_company_sentiment
- nlp_score_dashboard_sec
- notebook-job-step
- object_detection_with_tensorflow_and_tfrecords
- onnx-roberta-backend
- parameterize-spark-config-pysparkprocessor-pipeline
- pipe_bring_your_own
- prep_data
- preprocessing-audio-data-using-a-machine-learning-model
- product_ratings_with_pipelines
- pyspark_mnist
- python-sdk
- pytorch-ic-model
- pytorch-sagemaker-huggingface
- pytorch
- pytorch_cnn_cifar10
- pytorch_deploy_pretrained_bert_model
- pytorch_extend_container_train_deploy_bertopic
- pytorch_horovod_mnist
- pytorch_multiple_gpu_single_node
- pytorch_smdataparallel_mnist_demo
- pytorch_torchvision
- pytorch_triton_inference_recommender
- pytorch_yolov5_training_and_hpo
- r_byo_r_algo_hpo
- r_serving_with_fastapi
- r_serving_with_plumber
- rapids_bring_your_own
- resnet50
- resnet_onnx_backend_SME_triton_v2
- resnet_onnx_pytorch_tensorRT-backend
  - images
  - workspace
  - README.md
  - resnet_onnx_pytorch_tensorRT_backend_MME_triton.ipynb
- resnet_pytorch_python-backend
- retail_recommend
- rl_gamerserver_ray
- rl_hvac_coach_energyplus
- rl_mountain_car_coach_gymEnv
- rl_stock_trading_coach_customEnv
- rl_traveling_salesman_vehicle_routing_coach
- rl_unity_ray
- roberta-base
- roberta_traced_triton
- sagemaker-autopilot-pipelines
- sagemaker-debugger
- sagemaker-featurestore
- sagemaker-huggingface-tgi-hosting-examples
- sagemaker-lineage
- sagemaker-pipeline-compare-model-versions
- sagemaker-pipeline-multi-model
- sagemaker-pipeline-parameterization
- sagemaker-script-mode
- sagemaker_clarify_integration
- sagemaker_job_tracking
- sagemaker_pytorch_model_zoo
- scientific_details_of_algorithms
- scikit_learn_bring_your_own_model
- scikit_learn_data_processing_and_model_evaluation
- scikit_learn_iris
- script-mode-container-2
- script-mode-container
- sentiment_parallel_batch
- seq2seq_translation_en-de
- shadow-console
- single_gpu_single_node
- sklearn-inference-recommender
- sklearn
- sm-train_a_pytorch_model
- smddp_deepspeed_example
- sme_resnet_pytorch_python-backend
- smp-gpt-sharded-data-parallel
- smp-train-gpt-neox-sharded-data-parallel
- smp-train-gptj-sharded-data-parallel-tp
- smp-train-t5-sharded-data-parallel
- stable_diffusion
- streamlit_demo
- studio-scheduling
- t5_pytorch_python-backend
- tensorboard_keras
- tensorflow-cloudwatch
- tensorflow
- tensorflow2-california-housing-sagemaker-pipelines-deploy-endpoint
- tensorflow2_mnist
- tensorflow_action_on_rule
- tensorflow_moving_from_framework_mode_to_script_mode
- tensorflow_open-images_jpg
- tensorflow_profiling
- tensorflow_script_mode_quickstart
- tensorflow_script_mode_training_and_serving
- tensorflow_serving_using_elastic_inference_with_your_own_model
- tensorflow_single_gpu_single_node
- tensort-rt
- text-to-image-fine-tuning
- text_explainability_sagemaker_algorithm
- tf-dali-ensemble-cv
- time_series_deepar
- time_series_forecasting
- timeseries-quantile-selection-dataflow
- training_pipeline_pytorch_mnist
- triton-cv-mme-tensorflow-backend
- using_step_decorator_with_selective_execution
- vision-transformer
- visual_object_detection
- visualization
- workshops
- xgboost_abalone
- xgboost_bring_your_own
- xgboost_ensemble_python-fil-backend
- xgboost_parquet_input_training
- 2_object_detection_train_eval.ipynb
- 3D-point-cloud-input-data-processing.ipynb
- Amazon_JumpStart_Image_Classification.ipynb
- Amazon_JumpStart_Image_Classification_Benchmarking.ipynb
- Amazon_JumpStart_Image_Embedding.ipynb
- Amazon_JumpStart_Inpainting.ipynb
- Amazon_JumpStart_Instance_Segmentation.ipynb
- Amazon_JumpStart_Machine_Translation.ipynb
- Amazon_JumpStart_NLP_Regression_Free_Training.ipynb
- Amazon_JumpStart_Named_Entity_Recognition.ipynb
- Amazon_JumpStart_Object_Detection.ipynb
- Amazon_JumpStart_Question_Answering.ipynb
- Amazon_JumpStart_Regression_Free_Training.ipynb
- Amazon_JumpStart_Semantic_Segmentation.ipynb
- Amazon_JumpStart_Semantic_Segmentation_Extract_Image.ipynb
- Amazon_JumpStart_Sentence_Pair_Classification.ipynb
- Amazon_JumpStart_Text_Classification.ipynb
- Amazon_JumpStart_Text_Generation.ipynb
- Amazon_JumpStart_Text_Summarization.ipynb
- Amazon_JumpStart_Upscaling.ipynb
- Amazon_JumpStart_Zero_Shot_Text_Classification.ipynb
- Amazon_Jumpstart_AlexaTM_20B.ipynb
- Amazon_Tabular_Classification_AutoGluon.ipynb
- Amazon_Tabular_Regression_LightGBM_CatBoost.ipynb
- Amazon_Tabular_Regression_TabTransformer.ipynb
- Amazon_TensorFlow_Image_Classification.ipynb
- Amazon_Tensorflow_Object_Detection.ipynb
- Batch Transform - breast cancer prediction with high level SDK.ipynb
- Batch Transform - breast cancer prediction with lowel level SDK.ipynb
- DeployStableCascade.ipynb
- Dynamic Pricing with Causal Machine Learning and Optimization on Amazon SageMaker.ipynb
- EnsembleLearnerCensusIncome.ipynb
- GPT-J-6B-model-parallel-inference-DJL.ipynb
- GPT-J-6B_DJLServing_with_PySDK.ipynb
- GT_semantic_segmentation_to_COCO.ipynb
- HPO_Analyze_TuningJob_Results.ipynb
- HuggingFace-Async-Inference-Walkthrough.ipynb
- JumpStart_Stable_Diffusion_Inference_Only.ipynb
- Linear_Learner_Regression_csv_format.ipynb
- R_binary_classification_algorithms_comparison.ipynb
- SEC_Retrieval_Summarizer_Scoring.ipynb
- SageMaker-Monitoring-Bias-Drift-for-Batch-Transform-JSON-Lines.ipynb
- SageMaker-Monitoring-Bias-Drift-for-Batch-Transform.ipynb
- SageMaker_Keyspaces_ml_example.ipynb
- Sklearn_on_SageMaker_end2end.ipynb
- Transcription_on_SM_endpoint.ipynb
- algorithms.ipynb
- ap-batch-transform.ipynb
- automatic-speech-recognition.ipynb
- autopilot_customer_churn_high_level_with_evaluation.ipynb
- autopilot_ts_data_merge.ipynb
- bias_detection_with_predicted_label_and_facet_datasets.ipynb
- bloom-z-176b-few-shot-and-zero-shot-learning.ipynb
- boto3_scikit_retrain_model_and_deploy_to_existing_endpoint.ipynb
- bring_your_own_container.ipynb
- build_gan_with_pytorch.ipynb
- churn-prediction-lightgbm-catboost-tabtransformer-autogluon.ipynb
- custom_dog_image_generator.ipynb
- data_analysis_of_ground_truth_image_classification_output.ipynb
- deepar_chicago_traffic_violations.ipynb
- distilgpt2-tgi.ipynb
- djl_deepspeed_deploy_opt30b.ipynb
- djl_deepspeed_deploy_opt30b_no_custom_inference_code.ipynb
- download_weights.ipynb
- endpoints.ipynb
- explainability_with_pdp.ipynb
- fairness_and_explainability_jsonlines_format.ipynb
- fairness_and_explainability_outputs.ipynb
- falcon-7b-instruction-domain-adaptation-finetuning.ipynb
- feature_store_securely_store_images.ipynb
- financial_payment_classification.ipynb
- forecast_example.ipynb
- frameworks.ipynb
- from_unlabeled_data_to_deployed_machine_learning_model_ground_truth_demo_image_classification.ipynb
- get_started_mnist_train_outputs.ipynb
- gpt2-large-tgi.ipynb
- gpt2-tgi.ipynb
- gpt2-xl-tgi.ipynb
- granite-code-instruct.ipynb
- ground_truth_annotation_dense_point_cloud_tutorial.ipynb
- hello_world_workflow.ipynb
- hf-tgi-flan-t5-xl.ipynb
- image-classification-with-shutterstock-datasets.ipynb
- image-generation-stable-diffusion.ipynb
- ingest_image_data.ipynb
- ingest_tabular_data.ipynb
- ingest_text_data.ipynb
- instant-recommendations.ipynb
- instruction-fine-tuning-flan-t5.ipynb
- kmeans_bring_your_own_model.ipynb
- kmeans_mnist.ipynb
- linear_learner_mnist.ipynb
- open-assistant-chatbot.ipynb
- pyspark-etl-training.ipynb
- pyspark_mnist_custom_estimator.ipynb
- pyspark_mnist_pca_mllib_kmeans.ipynb
- pytorch_mnist_elastic_inference.ipynb
- question_answering_jumpstart_knn.ipynb
- question_answering_pinecone_llama-2_jumpstart.ipynb
- question_answering_text_embedding_llama-2_jumpstart.ipynb
- r_sagemaker_hello_world.ipynb
- r_xgboost_batch_transform.ipynb
- r_xgboost_hpo_batch_transform.ipynb
- risk_bucketing.ipynb
- sagemaker-countycensusclustering.ipynb
- sagemaker-lightgbm-distributed-training-dask.ipynb
- sagemaker-lineage-multihop-queries_outputs.ipynb
- sagemaker-neo-tf-unet.ipynb
- sagemaker_autopilot_abalone_parquet_input.ipynb
- sagemaker_autopilot_direct_marketing.ipynb
- sagemaker_autopilot_neo4j_portfolio_churn.ipynb
- scikit_learn_model_registry_batch_transform.ipynb
- serverless-model-registry.ipynb
- sklearn_multi_model_endpoint_home_value.ipynb
- smp-finetuning-gpt-neox-fsdp-tp.ipynb
- smp-train-gpt-neox-fsdp-tp.ipynb
- sparkml_serving_emr_mleap_abalone.ipynb
- step_functions_mlworkflow_scikit_learn_data_processing_and_model_evaluation.ipynb
- tensorflow_BYOM_iris.ipynb
- text-embedding-sentence-similarity.ipynb
- text-generation-chatbot.ipynb
- text-generation-falcon.ipynb
- text-generation-few-shot-learning.ipynb
- text-generation-open-llama.ipynb
- text2text-generation-Batch-Transform.ipynb
- text2text-generation-bloomz.ipynb
- text2text-generation-flan-t5-ul2.ipynb
- text2text-generation-flan-t5.ipynb
- tf-resnet-profiling-multi-gpu-multi-node.ipynb
- tgi-bloom-560m.ipynb
- tgi-gpt-neox-20b.ipynb
- upgrade_to_v2.ipynb
- using-dataset-product-from-aws-data-exchange-with-ml-model-from-aws-marketplace.ipynb
- vilt-b32-finetuned-vqa.ipynb
- xgboost-census-debugger-rules.ipynb
- xgboost-inference-recommender.ipynb
- xgboost_customer_churn_outputs.ipynb
- xgboost_mnist.ipynb
- xgboost_multi_model_endpoint_home_value.ipynb
.github
_static
_templates
.gitignore
.readthedocs.yml
CODEOWNERS
CONTRIBUTING.md
LICENSE.txt
Makefile
NOTICE
README.md
conf.py
config.json
environment.yml
index.rst
intro.rst
make.bat
new_file_structure_updated_notebook_names_and_folders.xlsx
tox.ini

resnet_onnx_pytorch_tensorRT-backend

Name		Name	Last commit message	Last commit date
parent directory ..
images		images
workspace		workspace
README.md		README.md
resnet_onnx_pytorch_tensorRT_backend_MME_triton.ipynb		resnet_onnx_pytorch_tensorRT_backend_MME_triton.ipynb

README.md

Serve an ResNet-50 ONNX model along with PyTorch and TensorRT models on GPU with Amazon SageMaker Multi-model endpoints (MME)

In this example, we will walk you through how to use NVIDIA Triton Inference Server on Amazon SageMaker MME with GPU to deploy Resnet-50 ONNX, TensorRT and Pytorch model for Image Classification.

Steps to run the notebook

Launch SageMaker notebook instance with g4dn.xlarge instance. This example can also be run on a SageMaker studio notebook instance but the steps that follow will focus on the notebook instance.
- For git repositories select the option Clone a public git repository to this notebook instance only and specify the Git repository URL
Once JupyterLab is ready, launch the resnet_onnx_pytorch_tensorRT_backend_MME_triton.ipynb notebook with conda_python3 conda kernel and run through this notebook to learn how to host multiple CV models on g4dn.xlarge GPU behind MME endpoint. Notice that due to the sizes of the models, the first time the model invokes take seconds, but the second time it takes milliseconds. You can also run in a larger GPU instance like g5.xlarge to see the difference.

Note This notebook was tested with the conda_pytorch_p39 kernel on an Amazon SageMaker notebook instance of type g4dn.xlarge. It is a modified version of the original version of this sample notebook Here by Vikram Elango.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Files

resnet_onnx_pytorch_tensorRT-backend

resnet_onnx_pytorch_tensorRT-backend

README.md

Serve an ResNet-50 ONNX model along with PyTorch and TensorRT models on GPU with Amazon SageMaker Multi-model endpoints (MME)

Steps to run the notebook

Files

resnet_onnx_pytorch_tensorRT-backend

Directory actions

More options

Directory actions

More options

Latest commit

History

resnet_onnx_pytorch_tensorRT-backend

Folders and files

parent directory

README.md

Serve an ResNet-50 ONNX model along with PyTorch and TensorRT models on GPU with Amazon SageMaker Multi-model endpoints (MME)

Steps to run the notebook