-
Notifications
You must be signed in to change notification settings - Fork 388
Pull requests: NVIDIA/TransformerEngine
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
fix a sync race error of softmax_lse in CP+THD+P2P
#1624
opened Mar 29, 2025 by
xrennvidia
Loading…
5 of 13 tasks
[PyTorch] Support default process group with FP8 current scaling
bug
Something isn't working
#1621
opened Mar 27, 2025 by
timmoon10
Loading…
6 of 13 tasks
[PyTorch] Debug NCCL communication overlapping in linear backward with FP8 data
bug
Something isn't working
#1620
opened Mar 27, 2025 by
timmoon10
Loading…
5 of 13 tasks
[PyTorch] Make breaking change in
InferenceParams.init
more explicit
2.2.0
#1619
opened Mar 26, 2025 by
cyanguwa
Loading…
8 of 13 tasks
[PyTorch] fix fuse_wgrad_accumulation in LayerNormMLP backward
#1618
opened Mar 26, 2025 by
Marks101
Loading…
1 of 6 tasks
[Pytorch] NVIDIA-DL-Framework-Inspect support – part 1 – core
#1614
opened Mar 25, 2025 by
pggPL
Loading…
7 tasks done
[Pytorch] NVIDIA-DL-Framework-Inspect support – part 2 – features
#1613
opened Mar 25, 2025 by
pggPL
Loading…
7 tasks done
[Pytorch] NVIDIA-DL-Framework-Inspect support – part 3 – tests
#1612
opened Mar 25, 2025 by
pggPL
Loading…
7 of 13 tasks
[Pytorch] NVIDIA-DL-Framework-Inspect support – part 4 – documentation
#1611
opened Mar 25, 2025 by
pggPL
Loading…
7 tasks done
Improved performance of mxfp8 cast kernels
2.2.0
performance
Performance issues
#1602
opened Mar 22, 2025 by
Oleg-Goncharov
Loading…
6 of 13 tasks
[JAX] Add fast path for causal masking with segment IDs.
#1601
opened Mar 21, 2025 by
mgoldfarb-nvidia
Loading…
8 of 13 tasks
[JAX] Unbalanced Context Parallelism with THD format
#1565
opened Mar 12, 2025 by
zlsh80826
Loading…
8 of 13 tasks
Enable AttnFuncWithCPAndKVP2P to support mla
#1561
opened Mar 12, 2025 by
SuperCB
Loading…
3 of 13 tasks
Blockwise scaling linear quantization recipe
#1559
opened Mar 11, 2025 by
kwyss-nvidia
Loading…
8 of 13 tasks
change softmax_lse correction of CP to FP32
#1546
opened Mar 7, 2025 by
xrennvidia
Loading…
6 of 13 tasks
[MoE] Enable MXFP8 and Per-Tensor Current Scaling for Grouped Linear
#1525
opened Feb 28, 2025 by
yaox12
Loading…
5 of 17 tasks
Blockwise float8 quantizer and quantized tensor class
#1513
opened Feb 27, 2025 by
kwyss-nvidia
Loading…
23 of 34 tasks
Previous Next
ProTip!
Filter pull requests by the default branch with base:main.