CuPy
CuPy is an open source library for GPU-accelerated computing with Python programming language, providing support for multi-dimensional arrays, sparse matrices, and a variety of numerical algorithms implemented on top of them.
CuPy shares the same API set as NumPy and SciPy, allowing it to be a drop-in replacement to run NumPy/SciPy code on GPU. CuPy supports Nvidia CUDA GPU platform, and AMD ROCm GPU platform starting in v9.0.
CuPy has been initially developed as a backend of Chainer deep learning framework, and later established as an independent project in 2017.
CuPy is a part of the NumPy ecosystem array libraries and is widely adopted to utilize GPU with Python, especially in high-performance computing environments such as Summit, Perlmutter, EULER, and ABCI.
CuPy is a NumFOCUS sponsored project.
Features
CuPy implements NumPy/SciPy-compatible APIs, as well as features to write user-defined GPU kernels or access low-level APIs.NumPy-compatible APIs
The same set of APIs defined in the NumPy package are available under package.- Multi-dimensional array for boolean, integer, float, and complex data types
- Module-level functions
- Linear algebra functions
- Fast Fourier transform
- Random number generator
SciPy-compatible APIs
- Sparse matrices of CSR, COO, CSC, and DIA format
- Discrete Fourier transform
- Advanced linear algebra
- Multidimensional image processing
- Sparse linear algebra
- Special functions
- Signal processing
- Statistical functions
User-defined GPU kernels
- Kernel templates for element-wise and reduction operations
- Raw kernel
- Just-in-time transpiler
- Kernel fusion
Distributed computing
- Distributed communication package, providing collective and peer-to-peer primitives
Low-level CUDA features
- Stream and event
- Memory pool
- Profiler
- Host API binding
- CUDA Python support
Interoperability
- DLPack
- CUDA Array Interface
- NEP 13
- NEP 18
- Array API Standard
Examples
Array creation
>>> import cupy as cp
>>> x = cp.array
>>> x
array
>>> y = cp.arange
>>> y
array
Basic operations
>>> import cupy as cp
>>> x = cp.arange.reshape.astype
>>> x
array
>>> x.sum
array
Raw CUDA C/C++ kernel
>>> import cupy as cp
>>> kern = cp.RawKernel
>>> in1 = cp.arange.reshape
>>> in2 = cp.arange.reshape
>>> out = cp.zeros
>>> kern,, ) # grid, block and arguments
>>> out
array
Applications
- spaCy
- XGBoost
- NVIDIA RAPIDS
- scikit-learn
- MONAI
- Chainer