Releases: uxlfoundation/oneCCL
Intel(R) oneAPI Collective Communications Library (oneCCL) 2021.15.3
This ccl_2021.15.3-arc branch adds support for Intel ARC A and B Series GPU and some bug fixes.
An example of the cmake command for Intel ARC A Series GPU :
cmake .. -DCMAKE_INSTALL_PREFIX=_install -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx -DCOMPUTE_BACKEND=dpcpp -DCCL_ENABLE_ARCA=1
An example of the cmake command for Intel ARC B Series GPU :
cmake .. -DCMAKE_INSTALL_PREFIX=_install -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx -DCOMPUTE_BACKEND=dpcpp -DCCL_ENABLE_ARCB=1
If the system does not have GPU Peer-to-Peer (P2P) support, you will need to add this compiler environment flag (export IGC_VISAOptions=-activeThreadsOnlyBarrier) before compiling. Similarly, on a system without P2P support, add export IGC_VISAOptions=-activeThreadsOnlyBarrier to your command line before running the application.
Intel(R) oneAPI Collective Communications Library (oneCCL) 2021.15.2
What's new:
- Bug fix - Improvement of User Experience based on setting of Environment Variables.
Intel(R) oneAPI Collective Communications Library (oneCCL) 2021.15.1
What's new:
- Bug fixes
Intel(R) oneAPI Collective Communications Library (oneCCL) 2021.15
What's new:
- Support for Average for Allreduce and Reduce-Scatter
- Extend Group API to also support collective operations.
- New split_communicator API with updated parameters.
- Performance optimizations for scaleup for Alltoall
Removals:
- split_communicators is deprecated in 2021.15.0 and will be removed later
Intel(R) oneAPI Collective Communications Library (oneCCL) 2021.14
What's New:
- Optimizations on key-value store support to scale up to 3000 nodes
- New APIs for Allgather, Broadcast and group API calls
- Performance Optimizations for scaleup for Allgather, Allreduce, and Reduce-scatter for scaleup and scaleout
- Performance Optimizations for CPU single node
- Optimizations to reuse Level Zero events.
- Change of the default mechanism for IPC exchange to pidfd
Intel(R) oneAPI Collective Communications Library (oneCCL) 2021.13Intel(R) oneAPI Collective Communications Library (oneCCL) 2021.13.1
Intel(R) oneAPI Collective Communications Library (oneCCL) 2021.13
What's New:
- Optimizations to limit the memory consumed by oneCCL
- Optimizations to limit the number of file descriptors maintained opened by oneCCL.
- Align the support for in-place for the Allgatherv and Reduce-scatter collectives to follow the same behavior as NCCL.
- In particular, the Allgatherv collective is in place when:
- send_buff == recv_buff + rank_offset, where rank_offset = sum (recv_counts[i]), for all I<rank.
- Reduce-scatter is in-place when recv_buff == send_buff + rank *recv_count.
- When using the environment variable CCL_WORKER_AFFINITY, oneCCL enforces the requirement that the length of the list should be equal to the number of workers.
- Bug fixes.
Intel(R) oneAPI Collective Communications Library (oneCCL) 2021.12
What's New
- Performance improvements for scaleup for all message sizes for AllReduce, Allgather, and Reduce_Scatter.
- Optimizations also include small message sizes that appear in inference apps.
- Performance improvements for scaleout for Allreduce, Reduce, Allgather, and Reduce_Scatter.
- Optimized memory usage of oneCCL.
- Support for PMIx 4.2.6.
- Bug fixes.
Removals
- oneCCL 2021.12 removes support for PMIx 4.2.2
Intel(R) oneAPI Collective Communications Library (oneCCL) 2021.11.2
This update provides bug fixes to maintain driver compatibility for Intel® Data Center GPU Max Series.
Intel(R) oneAPI Collective Communications Library (oneCCL) 2021.11.1
This update addresses stability issues with distributed Training and Inference workloads on Intel® Data Center GPU Max Series.