Accelerated Linear Algebra

XLA is an open-source compiler for machine learning developed by the OpenXLA project. XLA is designed to improve the performance of machine learning models by optimizing the computation graphs at a lower level, making it particularly useful for large-scale computations and high-performance machine learning models. Key features of XLA include:

Compilation of Computation Graphs: Compiles computation graphs into efficient machine code.
Optimization Techniques: Applies operation fusion, memory optimization, and other techniques.
Hardware Support: Optimizes models for various hardware, including CPUs, GPUs, and NPUs.
Improved Model Execution Time: Aims to reduce machine learning models' execution time for both training and inference.
Seamless Integration: Can be used with existing machine learning code with minimal changes.

XLA represents a significant step in optimizing machine learning models, providing developers with tools to enhance computational efficiency and performance.

OpenXLA Project

OpenXLA Project is an open-source machine learning compiler and infrastructure initiative intended to provide a common set of tools for compiling and deploying machine learning models across different frameworks and hardware platforms. It provides a modular compilation stack that can be used by major deep learning frameworks like JAX, PyTorch, and TensorFlow. The project focuses on supplying shared components for optimization, portability, and execution across CPUs, GPUs, and specialized accelerators. Its design emphasizes interoperability between frameworks and a standardized set of representations for model computation.

Components

The OpenXLA ecosystem includes several core components:XLA – A deep learning compiler that optimizes computational graphs for multiple hardware targets.PJRT – A runtime interface that allows different back-ends to connect to XLA through a consistent API.StableHLO – A high-level operator set intended to serve as a stable, portable representation for ML models across compilers and frameworks.Shardy – An MLIR-based system for describing and transforming models that run in distributed or multi-device environments.

Additional profiling, testing, and integration tools maintained under the OpenXLA organization.

Users and adopters

Several machine learning frameworks can use or interoperate with OpenXLA components, including JAX, TensorFlow, and parts of the PyTorch ecosystem. The project is developed with participation from multiple hardware and software organizations that contribute back-end integrations, testing, or specifications for their devices. This includes Alibaba, Amazon Web Services, AMD, Anyscale, Apple, Arm, Cerebras, Google, Graphcore, Hugging Face, Intel, Meta, NVIDIA and SiFive.

Supported target devices

x86-64
ARM64
NVIDIA GPU
AMD GPU
Intel GPU
Apple GPU
Google TPU
AWS Trainium, Inferentia
Cerebras
Graphcore IPU

Governance

OpenXLA is developed as a community project with its work carried out in public repositories, discussion forums, and design meetings. Some components, such as StableHLO, began with stewardship from specific organizations and have outlined plans for more formal and distributed governance models as the project matures.

History

The project was announced in 2022 as an effort to coordinate development of ML compiler technologies across major AI companies, notably: Alibaba, Amazon Web Services, AMD, Anyscale, Apple, Arm, Cerebras, Google, Graphcore, Hugging Face, Intel, Meta, NVIDIA and SiFive.. It consolidated the XLA compiler, introduced StableHLO as a portable operator set, and created a unified structure for additional tools. Development continues within multiple repositories under the OpenXLA umbrella. It was founded by Eugene Burmako, James Rubin, Magnus Hyttsten, Mehdi Amini, Navid Khajouei, and Thea Lamkin from Google's Machine Learning organization.