Rename torchao.float8.Float8Tensor to torchao.float8.Float8TrainingTensor #2479

jerryzh168 · 2025-07-02T21:42:25Z

Stacked PRs:

Rename torchao.float8.Float8Tensor to torchao.float8.Float8TrainingTensor

Summary:
att, since we are introducing a inference version Float8Tensor

Test Plan:
regression tests for float8 training: pytest test/float8

Reviewers:

Subscribers:

Tasks:

Tags:

pytorch-bot · 2025-07-02T21:42:29Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2479

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit b39be5d with merge base 1fd34e4 ():

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

Run TorchAO Experimental Tests / test-mps-ops (macos-m1-stable) (gh) (trunk failure)
Process completed with exit code 127.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

vkuzo · 2025-07-03T19:09:16Z

can we test two more things:

run the overall test script for float8 training

./test/float8/test_everything.sh

verify torchtitan training still works: https://github.com/pytorch/torchtitan/blob/main/docs/float8.md

Also, there are some callsites for Float8Tensor in fbcode, so we should make sure this diff is landed in a way where we could fix them in the fbcode diff.

danielvegamyhre · 2025-07-03T19:44:48Z

can we test two more things:

run the overall test script for float8 training
./test/float8/test_everything.sh
verify torchtitan training still works: https://github.com/pytorch/torchtitan/blob/main/docs/float8.md

Also, there are some callsites for Float8Tensor in fbcode, so we should make sure this diff is landed in a way where we could fix them in the fbcode diff.

+1, for torchtitan i would suggest (1) 2d parallel w/ FSDP+TP in torchtitan for all 3 recipes, (2) for tensorwise test enabling fp8 all gather enabled and enabling precompute fp8 weight scales

jerryzh168 · 2025-07-03T21:19:14Z

Also, there are some callsites for Float8Tensor in fbcode, so we should make sure this diff is landed in a way where we could fix them in the fbcode diff.

sg, in this case I can do a forward fix in diff train

jerryzh168 · 2025-07-03T21:55:52Z

./test/float8/test_everything.sh

Runs without errors.

for torchtitan I have tested:

CONFIG_FILE="./torchtitan/models/llama3/train_configs/llama3_8b.toml" ./run_train.sh --model.converters="float8" --float8.enable_fsdp_float8_all_gather --float8.precompute_float8_dynamic_scale_for_fsdp --float8.force_recompute_fp8_weight_in_bwd --training.compile

and

CONFIG_FILE="./torchtitan/models/llama3/train_configs/llama3_8b.toml" ./run_train.sh --model.converters="float8" --float8.recipe_name rowwise --training.compile

CONFIG_FILE="./torchtitan/models/llama3/train_configs/llama3_8b.toml" ./run_train.sh --model.converters="float8" --float8.recipe_name rowwise_with_gw_hp --training.compile

is this enough?

…nsor Summary: att, since we are introducing a inference version Float8Tensor Test Plan: regression tests for float8 training: pytest test/float8 Reviewers: Subscribers: Tasks: Tags: stack-info: PR: #2479, branch: jerryzh168/stack/11

jerryzh168 force-pushed the jerryzh168/stack/11 branch from aacfea0 to 5fe4319 Compare July 2, 2025 21:42

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jul 2, 2025

jerryzh168 changed the base branch from jerryzh168/stack/4 to main July 2, 2025 23:44

jerryzh168 added the topic: not user facing Use this tag if you don't want this PR to show up in release notes label Jul 2, 2025

jerryzh168 force-pushed the jerryzh168/stack/11 branch from 5fe4319 to f9ba189 Compare July 2, 2025 23:44

jerryzh168 changed the base branch from main to jerryzh168/stack/4 July 2, 2025 23:44

jerryzh168 requested review from vkuzo, danielvegamyhre and drisspg July 2, 2025 23:53

jerryzh168 changed the base branch from jerryzh168/stack/4 to main July 3, 2025 00:09

jerryzh168 force-pushed the jerryzh168/stack/11 branch from f9ba189 to 782b53f Compare July 3, 2025 00:09

jerryzh168 changed the base branch from main to jerryzh168/stack/4 July 3, 2025 00:09

jerryzh168 changed the base branch from jerryzh168/stack/4 to main July 3, 2025 02:18

jerryzh168 force-pushed the jerryzh168/stack/11 branch from 782b53f to ea9092a Compare July 3, 2025 02:18

jerryzh168 changed the base branch from main to jerryzh168/stack/4 July 3, 2025 02:18

jerryzh168 changed the base branch from jerryzh168/stack/4 to main July 3, 2025 02:23

jerryzh168 force-pushed the jerryzh168/stack/11 branch from ea9092a to 1b8c7cc Compare July 3, 2025 02:23

jerryzh168 changed the base branch from main to jerryzh168/stack/4 July 3, 2025 02:23

jerryzh168 changed the base branch from jerryzh168/stack/4 to main July 3, 2025 02:36

jerryzh168 force-pushed the jerryzh168/stack/11 branch from 1b8c7cc to 2a00cd7 Compare July 3, 2025 02:36

jerryzh168 changed the base branch from main to jerryzh168/stack/4 July 3, 2025 02:36

jerryzh168 changed the base branch from jerryzh168/stack/4 to main July 3, 2025 02:37

jerryzh168 changed the base branch from main to jerryzh168/stack/4 July 3, 2025 02:38

jerryzh168 changed the base branch from jerryzh168/stack/4 to main July 3, 2025 02:44

jerryzh168 changed the base branch from main to jerryzh168/stack/4 July 3, 2025 02:44

jerryzh168 changed the base branch from jerryzh168/stack/4 to main July 3, 2025 20:58

jerryzh168 force-pushed the jerryzh168/stack/11 branch from 2a00cd7 to 4d04b4c Compare July 3, 2025 20:58

jerryzh168 changed the base branch from main to jerryzh168/stack/4 July 3, 2025 20:58

jerryzh168 changed the base branch from jerryzh168/stack/4 to main July 3, 2025 21:57

jerryzh168 force-pushed the jerryzh168/stack/11 branch from 4d04b4c to b39be5d Compare July 3, 2025 21:57

jerryzh168 changed the base branch from main to jerryzh168/stack/4 July 3, 2025 21:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Rename torchao.float8.Float8Tensor to torchao.float8.Float8TrainingTensor #2479

Rename torchao.float8.Float8Tensor to torchao.float8.Float8TrainingTensor #2479

Uh oh!

jerryzh168 commented Jul 2, 2025 •

edited

Loading

Uh oh!

pytorch-bot bot commented Jul 2, 2025 •

edited

Loading

Uh oh!

vkuzo commented Jul 3, 2025

Uh oh!

danielvegamyhre commented Jul 3, 2025 •

edited

Loading

Uh oh!

jerryzh168 commented Jul 3, 2025

Uh oh!

jerryzh168 commented Jul 3, 2025

Uh oh!

Uh oh!

Rename torchao.float8.Float8Tensor to torchao.float8.Float8TrainingTensor #2479

Are you sure you want to change the base?

Rename torchao.float8.Float8Tensor to torchao.float8.Float8TrainingTensor #2479

Uh oh!

Conversation

jerryzh168 commented Jul 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!