When it's time to show your trained models to the world you'll need to choose one or more deployment options. Fortunately, TensorFlow offers the tools and frameworks you'll need to deploy your models for a wide range of use cases.
Machine Learning (ML) serving systems need to support model versioning (for model updates with a rollback option) and multiple models (for A/B testing), while ensuring that concurrent models achieve high throughput on hardware accelerators (GPUs and TPUs) with low latency. TensorFlow Serving is currently handling tens of millions of inferences per second for 1100+ of Google projects, including Google’s Cloud ML Prediction.
TensorFlow Extended (TFX)
When you’re ready to go beyond training a single model, or ready to put your amazing model to work and move it to production, TFX is there to help you build a complete ML pipeline.
With a TFX pipeline you can continuously retrain and update your models, and manage your model versioning and life cycle. TFX gives you the tools to validate and transform new data, monitor model performance, perform A/B testing, serve your trained models, and more. With TFX, your models are ready for production.
TensorFlow Lite is the official solution for
running machine learning models on mobile and embedded devices. It enables on
device machine learning inference with low latency and a small binary size on
Android, iOS, and other operating systems. Build a new model or retrain an
existing one, such as using transfer learning. Convert a TensorFlow model into a
compressed flat buffer with the TensorFlow Lite Converter. Take the compressed
.tflite file and load it into a mobile or embedded device.