NVIDIA / TransformerEngine Public

Notifications
Fork 388
Star 2.3k

Code
Issues 174
Pull requests 64
Discussions
Actions
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Security
Insights

Pull requests: NVIDIA/TransformerEngine

Labels 42 Milestones 0

New pull request New

64 Open 1,109 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Reviews

Filter by reviews

No reviews Review required Approved review Changes requested

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Pull requests list

fix a sync race error of softmax_lse in CP+THD+P2P

#1624 opened Mar 29, 2025 by xrennvidia

Loading…

5 of 13 tasks

[PyTorch] Support default process group with FP8 current scaling bug

Something isn't working

#1621 opened Mar 27, 2025 by timmoon10

Loading…

6 of 13 tasks

[PyTorch] Debug NCCL communication overlapping in linear backward with FP8 data bug

Something isn't working

#1620 opened Mar 27, 2025 by timmoon10

Loading…

5 of 13 tasks

[PyTorch] Make breaking change in InferenceParams.init more explicit 2.2.0

#1619 opened Mar 26, 2025 by cyanguwa

Loading…

8 of 13 tasks

[PyTorch] fix fuse_wgrad_accumulation in LayerNormMLP backward

#1618 opened Mar 26, 2025 by Marks101

Loading…

1 of 6 tasks

Tongliu fp8 a2a

#1617 opened Mar 26, 2025 by Autumn1998

Loading…

13 tasks

[Pytorch] NVIDIA-DL-Framework-Inspect support – part 1 – core

#1614 opened Mar 25, 2025 by pggPL

Loading…

7 tasks done

[Pytorch] NVIDIA-DL-Framework-Inspect support – part 2 – features

#1613 opened Mar 25, 2025 by pggPL

Loading…

7 tasks done

[Pytorch] NVIDIA-DL-Framework-Inspect support – part 3 – tests

#1612 opened Mar 25, 2025 by pggPL

Loading…

7 of 13 tasks

[Pytorch] NVIDIA-DL-Framework-Inspect support – part 4 – documentation

#1611 opened Mar 25, 2025 by pggPL

Loading…

7 tasks done

Improved performance of mxfp8 cast kernels 2.2.0 performance

Performance issues

#1602 opened Mar 22, 2025 by Oleg-Goncharov

Loading…

6 of 13 tasks

[JAX] Add fast path for causal masking with segment IDs.

#1601 opened Mar 21, 2025 by mgoldfarb-nvidia

Loading…

8 of 13 tasks

[PyTorch] Tutorial for the ONNX export

#1586 opened Mar 18, 2025 by pggPL

Loading…

8 of 13 tasks

[JAX] Unbalanced Context Parallelism with THD format

#1565 opened Mar 12, 2025 by zlsh80826

Loading…

8 of 13 tasks

Draft: split wgrad for GroupedLinear

#1564 opened Mar 12, 2025 by lhb8125 • Draft

13 tasks

[CI] Add isort

#1563 opened Mar 12, 2025 by yaox12 • Draft

13 tasks

Enable AttnFuncWithCPAndKVP2P to support mla

#1561 opened Mar 12, 2025 by SuperCB

Loading…

3 of 13 tasks

Blockwise scaling linear quantization recipe

#1559 opened Mar 11, 2025 by kwyss-nvidia

Loading…

8 of 13 tasks

change softmax_lse correction of CP to FP32

#1546 opened Mar 7, 2025 by xrennvidia

Loading…

6 of 13 tasks

Subchannel Block quantized GEMM

#1545 opened Mar 6, 2025 by kwyss-nvidia

Loading…

6 of 12 tasks

Fused Linear and Cross Entropy operations

#1537 opened Mar 5, 2025 by Jianbing-D

Loading…

[MoE] Enable MXFP8 and Per-Tensor Current Scaling for Grouped Linear

#1525 opened Feb 28, 2025 by yaox12

Loading…

5 of 17 tasks

Blockwise float8 quantizer and quantized tensor class

#1513 opened Feb 27, 2025 by kwyss-nvidia

Loading…

23 of 34 tasks

Draft: split wgrad poc

#1510 opened Feb 26, 2025 by lhb8125 • Draft

13 tasks

[Pytorch] Dynamo ONNX export support

#1497 opened Feb 19, 2025 by pggPL

Loading…

8 of 13 tasks

Previous 1 2 3 Next

Previous Next

ProTip! Filter pull requests by the default branch with base:main.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly