Skip to content

Pull requests: NVIDIA/TransformerEngine

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Reviews
Assignee
Filter by who’s assigned
Sort

Pull requests list

fix a sync race error of softmax_lse in CP+THD+P2P
#1624 opened Mar 29, 2025 by xrennvidia Loading…
5 of 13 tasks
[PyTorch] Support default process group with FP8 current scaling bug Something isn't working
#1621 opened Mar 27, 2025 by timmoon10 Loading…
6 of 13 tasks
[PyTorch] Debug NCCL communication overlapping in linear backward with FP8 data bug Something isn't working
#1620 opened Mar 27, 2025 by timmoon10 Loading…
5 of 13 tasks
[PyTorch] fix fuse_wgrad_accumulation in LayerNormMLP backward
#1618 opened Mar 26, 2025 by Marks101 Loading…
1 of 6 tasks
Tongliu fp8 a2a
#1617 opened Mar 26, 2025 by Autumn1998 Loading…
13 tasks
[Pytorch] NVIDIA-DL-Framework-Inspect support – part 1 – core
#1614 opened Mar 25, 2025 by pggPL Loading…
7 tasks done
[Pytorch] NVIDIA-DL-Framework-Inspect support – part 2 – features
#1613 opened Mar 25, 2025 by pggPL Loading…
7 tasks done
[Pytorch] NVIDIA-DL-Framework-Inspect support – part 3 – tests
#1612 opened Mar 25, 2025 by pggPL Loading…
7 of 13 tasks
[Pytorch] NVIDIA-DL-Framework-Inspect support – part 4 – documentation
#1611 opened Mar 25, 2025 by pggPL Loading…
7 tasks done
Improved performance of mxfp8 cast kernels 2.2.0 performance Performance issues
#1602 opened Mar 22, 2025 by Oleg-Goncharov Loading…
6 of 13 tasks
[JAX] Add fast path for causal masking with segment IDs.
#1601 opened Mar 21, 2025 by mgoldfarb-nvidia Loading…
8 of 13 tasks
[PyTorch] Tutorial for the ONNX export
#1586 opened Mar 18, 2025 by pggPL Loading…
8 of 13 tasks
[JAX] Unbalanced Context Parallelism with THD format
#1565 opened Mar 12, 2025 by zlsh80826 Loading…
8 of 13 tasks
Draft: split wgrad for GroupedLinear
#1564 opened Mar 12, 2025 by lhb8125 Draft
13 tasks
[CI] Add isort
#1563 opened Mar 12, 2025 by yaox12 Draft
13 tasks
Enable AttnFuncWithCPAndKVP2P to support mla
#1561 opened Mar 12, 2025 by SuperCB Loading…
3 of 13 tasks
Blockwise scaling linear quantization recipe
#1559 opened Mar 11, 2025 by kwyss-nvidia Loading…
8 of 13 tasks
change softmax_lse correction of CP to FP32
#1546 opened Mar 7, 2025 by xrennvidia Loading…
6 of 13 tasks
Subchannel Block quantized GEMM
#1545 opened Mar 6, 2025 by kwyss-nvidia Loading…
6 of 12 tasks
Fused Linear and Cross Entropy operations
#1537 opened Mar 5, 2025 by Jianbing-D Loading…
[MoE] Enable MXFP8 and Per-Tensor Current Scaling for Grouped Linear
#1525 opened Feb 28, 2025 by yaox12 Loading…
5 of 17 tasks
Blockwise float8 quantizer and quantized tensor class
#1513 opened Feb 27, 2025 by kwyss-nvidia Loading…
23 of 34 tasks
Draft: split wgrad poc
#1510 opened Feb 26, 2025 by lhb8125 Draft
13 tasks
[Pytorch] Dynamo ONNX export support
#1497 opened Feb 19, 2025 by pggPL Loading…
8 of 13 tasks
ProTip! Filter pull requests by the default branch with base:main.