Skip to content

issues Search Results · repo:pytorch/torchtitan language:Python

Filter by

280 results
 (101 ms)

280 results

inpytorch/torchtitan (press backspace or delete to remove)

Bug description I am seeing Linear layer weights in float32 ( wq.weight.dtype torch.float32 ) even after setting the following. mixed_precision_param = bfloat16 mixed_precision_reduce = bfloat16 Is ...
question
  • githubsgi
  • 11
  • Opened 
    yesterday
  • #1027

Wondering if there is any plan to add the 1B and/or 3B models to the TorchTitan set of example models ? It is probably fairly straight forward to do that , if I am not missing anything, Another toml file ...
  • githubsgi
  • 3
  • Opened 
    yesterday
  • #1026

Bug description CONFIG_FILE=./torchtitan/models/llama/train_configs/debug_model.toml ./run_train.sh --model.use_fl ex_attn --training.compile I have a long stack trace with TORCHDYNAMO_VERBOSE=1: ...
module: flex attention
  • lkhphuc
  • 1
  • Opened 
    6 days ago
  • #1005

I ve been working on something similar, see https://github.com/PrimeIntellect-ai/CPUOptimizer, but I see a problem in your implementation. If you hide the optimizer step inside the backward pass, the ...
  • apaz-cli
  • 7
  • Opened 
    9 days ago
  • #990

Currently TorchTitan supports PP, CP, FSDP, PP parallelisms. Is there a plan to support Expert Parallelism (EP) ? Along the same line, see some DeepSeek files in the repo. Is there a plan to support DeepSeek ...
question
  • githubsgi
  • 4
  • Opened 
    9 days ago
  • #987

Would appreciate if someone can share a toml file to do PP+FSDP+TP for 405B model.
  • githubsgi
  • 2
  • Opened 
    9 days ago
  • #986

Bug description I ve been playing around with the experimental DeepSeek code and in-progress PRs and have a simple sft training loop implemented. The results on the first few steps look pretty promising ...
  • EugenHotaj
  • 1
  • Opened 
    10 days ago
  • #979

Today run.py is a direct invoke. The model is expected to provide a TrainSpec so that it can be invoked by Titan s train.py.
  • kwen2501
  • Opened 
    11 days ago
  • #975

Today run.py does not do checkpointing during training. We need to hook it to Titan s DCP.
  • kwen2501
  • Opened 
    11 days ago
  • #974

Today run.py uses synthetic input.
  • kwen2501
  • Opened 
    11 days ago
  • #973
Issue origami icon

Learn how you can use GitHub Issues to plan and track your work.

Save views for sprints, backlogs, teams, or releases. Rank, sort, and filter issues to suit the occasion. The possibilities are endless.Learn more about GitHub Issues
ProTip! 
Press the
/
key to activate the search input again and adjust your query.
Issue origami icon

Learn how you can use GitHub Issues to plan and track your work.

Save views for sprints, backlogs, teams, or releases. Rank, sort, and filter issues to suit the occasion. The possibilities are endless.Learn more about GitHub Issues
ProTip! 
Press the
/
key to activate the search input again and adjust your query.
Issue search results · GitHub