This is a project-based course on optimizing TensorFlow (TF) models for deployment using TensorRT.
- Instructor: Snehan Kekre
- Certificate: Awarded upon completion
- Duration: ~<2 hours
By the end of this course, you will achieve the following objectives:
- Optimize TensorFlow models using TensorRT (TF-TRT).
- Optimize deep learning models at FP32, FP16, and INT8 precision using TF-TRT.
- Analyze how tuning TF-TRT parameters impacts
performance
andinference throughput
.
This course is divided into three parts:
- Course Overview: Introductory reading material.
- Optimize TensorFlow Models for Deployment with TensorRT: A hands-on project.
- Graded Quiz: A final assignment required to successfully complete the course.
This hands-on project guides you in optimizing TensorFlow (TF) models for inference with NVIDIA's TensorRT (TRT).
By the end of this project, you will:
- Optimize TensorFlow models using TensorRT (TF-TRT).
- Work with models at
FP32
,FP16
, andINT8
precision, observing how TF-TRT parameters affect performance and inference throughput.
To complete this project successfully, you should have:
- Competency in Python programming.
- An understanding of deep learning concepts and inference.
- Experience building deep learning models using TensorFlow and its Keras API.
Task | Description |
---|---|
Task 1 | Introduction and Project Overview |
Task 2 | Set up TensorFlow and TensorRT Runtime |
Task 3 | Load Data and Pre-trained InceptionV3 Model |
Task 4 | Create Batched Input |
Task 5 | Load the TensorFlow SavedModel |
Task 6 | Benchmark Prediction Throughput and Accuracy |
Task 7 | Convert TensorFlow SavedModel to TF-TRT Float32 Graph |
Task 8 | Benchmark TF-TRT Float32 |
Task 9 | Convert to TF-TRT Float16 and Benchmark |
Task 10 | Work with TF-TRT INT8 Models |
Task 11 | Convert to TF-TRT INT8 |
Description | Notebook | Demo |
---|---|---|
Intro to TensorFlow-TensorRT | HF/Gradio Space |
- Main Course - Coursera.
- Deep Learning Optimization and Deployment Using TensorFlow and TensorRT - NVIDIA DLI.
-
Core concepts:
-
Quantization in Signal Processing:
-
Computer Arithmetic:
- Core:
- Data types & conversion:
- Maths & Algebra:
- Tensor (Maths): https://en.wikipedia.org/wiki/Tensor_(intrinsic_definition)
- Matrix Multiplication: https://en.wikipedia.org/wiki/Matrix_multiplication
- ML Tensor: https://en.wikipedia.org/wiki/Tensor_(machine_learning)
- Model Zoo: Edge AI Model Zoo.
- Blogs:
- [2019 June 13] High-Performance Inference with TensorRT Integration.
- [2019 June 03] High performance inference with TensorRT Integration - TensorFlow Medium
- [2018 April 18] Speed Up TensorFlow Inference on GPUs with TensorRT.
- [2017 April 08] Advanced Spark and TensorFlow Meetup 2017-05-06 Reduced Precision (FP16, INT8) Inference on Convolutional Neural Networks with TensorRT and NVIDIA Pascal from Chris Gottbrath, Nvidia