date	title	id
2020-10-08 08:21:35 -0700	XLA	2020-10-08t15-21-35z

XLA (Accelerated Linear Algebra) is a domain-specific compiler for optimizing deep learning computations, particularly those related to linear algebra, such as matrix multiplications.

XLA was originally designed with TensorFlow's computational graph in mind. Without XLA, TensorFlow's computational graph would be executed by its graph executor like an interpreter. Each operation would be executed individually, by evaluating a pre-compiled GPU kernel implementation.

With XLA, the TensorFlow graph is compiled into a sequence of GPU kernels generated specifically for your model. This allows for the computation to be executed all at once, rather than for each component operation. The fact that the kernels are unique to your model allows for model-specific exploitation for new performance optimizations.

XLA achieves this through the notion of fusion. The component operations are fused into a single GPU kernel launch. Besides reducing the number of necessary GPU kernels, this fusion operation also eliminates the need for storing intermediate values in memory, instead streaming these values directly to their users while keeping them in GPU registers.

This memory optimization is of utmost important since it is typically a scarce resource in hardware accelerators for machine learning.

While XLA was originally designed for TensorFlow, it nowadays can be used also in JAX (also by Google) and in PyTorch.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

2020-10-08t15-21-35z.md

2020-10-08t15-21-35z.md

Files

2020-10-08t15-21-35z.md

Latest commit

History

2020-10-08t15-21-35z.md

File metadata and controls