Issue search results

Filter by

280 results

(101 ms)inpytorch/torchtitan (press backspace or delete to remove)

pytorch/torchtitan
Linear layer weights are in float32 ?

Bug description I am seeing Linear layer weights in float32 ( wq.weight.dtype torch.float32 ) even after setting the following. mixed_precision_param = bfloat16 mixed_precision_reduce = bfloat16 Is ...

question

githubsgi

Opened
yesterday

#1027

pytorch/torchtitan
Any plan to add Llama 1B and/or 3B models ?

Wondering if there is any plan to add the 1B and/or 3B models to the TorchTitan set of example models ? It is probably fairly straight forward to do that , if I am not missing anything, Another toml file ...

githubsgi

Opened
yesterday

#1026

pytorch/torchtitan
Unable to run flex attention and torch.compile

Bug description CONFIG_FILE=./torchtitan/models/llama/train_configs/debug_model.toml ./run_train.sh --model.use_fl ex_attn --training.compile I have a long stack trace with TORCHDYNAMO_VERBOSE=1: ...

module: flex attention

lkhphuc

Opened
6 days ago

#1005

pytorch/torchtitan
Optimizer in backward with grad clipping is broken

I ve been working on something similar, see https://github.com/PrimeIntellect-ai/CPUOptimizer, but I see a problem in your implementation. If you hide the optimizer step inside the backward pass, the ...

apaz-cli

Opened
9 days ago

#990

pytorch/torchtitan
Is EP (Expert Parallelism) coming ?

Currently TorchTitan supports PP, CP, FSDP, PP parallelisms. Is there a plan to support Expert Parallelism (EP) ? Along the same line, see some DeepSeek files in the repo. Is there a plan to support DeepSeek ...

question

githubsgi

Opened
9 days ago

#987

pytorch/torchtitan
Is a PP+FSDP+TP config + toml available for pre-training 405B model ?

Would appreciate if someone can share a toml file to do PP+FSDP+TP for 405B model.

githubsgi

Opened
9 days ago

#986

pytorch/torchtitan
[DeepSeek] Potential memory leak, OOMs after a few steps

Bug description I ve been playing around with the experimental DeepSeek code and in-progress PRs and have a simple sft training loop implemented. The results on the first few steps look pretty promising ...

EugenHotaj

Opened
10 days ago

#979

pytorch/torchtitan
[DeepSeek] Create Titan's TrainSpec

Today run.py is a direct invoke. The model is expected to provide a TrainSpec so that it can be invoked by Titan s train.py.

kwen2501

Opened
11 days ago

#975

pytorch/torchtitan
[DeepSeek] Connect to Titan's checkpointing

Today run.py does not do checkpointing during training. We need to hook it to Titan s DCP.

kwen2501

Opened
11 days ago

#974

pytorch/torchtitan
[DeepSeek] Connect to Titan's data loader

Today run.py uses synthetic input.

kwen2501

Opened
11 days ago

#973

Learn how you can use GitHub Issues to plan and track your work.

Save views for sprints, backlogs, teams, or releases. Rank, sort, and filter issues to suit the occasion. The possibilities are endless.Learn more about GitHub Issues

ProTip!

Press the

key to activate the search input again and adjust your query.

Learn how you can use GitHub Issues to plan and track your work.

Save views for sprints, backlogs, teams, or releases. Rank, sort, and filter issues to suit the occasion. The possibilities are endless.Learn more about GitHub Issues

ProTip!

Press the

key to activate the search input again and adjust your query.

Languages

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Filter by

State

Advanced

pytorch/torchtitan
Linear layer weights are in float32 ?

pytorch/torchtitan
Any plan to add Llama 1B and/or 3B models ?

pytorch/torchtitan
Unable to run flex attention and torch.compile

pytorch/torchtitan
Optimizer in backward with grad clipping is broken

pytorch/torchtitan
Is EP (Expert Parallelism) coming ?

pytorch/torchtitan
Is a PP+FSDP+TP config + toml available for pre-training 405B model ?

pytorch/torchtitan
[DeepSeek] Potential memory leak, OOMs after a few steps

pytorch/torchtitan
[DeepSeek] Create Titan's TrainSpec

pytorch/torchtitan
[DeepSeek] Connect to Titan's checkpointing

pytorch/torchtitan
[DeepSeek] Connect to Titan's data loader

Learn how you can use GitHub Issues to plan and track your work.

Learn how you can use GitHub Issues to plan and track your work.

issues Search Results · repo:pytorch/torchtitan language:Python

Filter by

State

Advanced

280 results

Learn how you can use GitHub Issues to plan and track your work.

Learn how you can use GitHub Issues to plan and track your work.