Skip to content

issues Search Results · repo:stas00/ml-engineering language:Python

Filter by

33 results
 (76 ms)

33 results

instas00/ml-engineering (press backspace or delete to remove)

https://github.com/stas00/ml-engineering/blob/master/network/comms.md S*N/(S*B) + (k − 2)*N*(SB) = N*(S + k − 2)/(S*B), why not S*N/(S*B) + (k − 2)*N / (SB) ?
  • wtyfpc
  • 1
  • Opened 
    17 days ago
  • #109

In the Maximum Achievable Matmul FLOPS comparison table, the theory value of NVIDIA H100 SXM is 989. However, in the NVIDIA H100 Tensor Core GPU Datasheet, the value of BFLOAT16 Tensor Core is 1,979. Is ...
  • Keyu-Yan
  • 2
  • Opened 
    on Feb 24
  • #100

When I ran the following command at ml-engineering/compute/accelerator/benchmarks HIP_VISIBLE_DEVICES=4 python ./mamf-finder.py --m_range 20480 20500 256 --n 14784 --k 8192 --dtype float8_e4m3fn --output_file=$(date ...
  • tjtanaa
  • 2
  • Opened 
    on Jan 16
  • #91

In reference to a remark on this page : The problem with the advertised theoretical peak FLOPS is that they are very theoretical and can t be achieved in practice even if all the perfect conditions have ...
  • fluidnumerics-joe
  • 1
  • Opened 
    on Oct 25, 2024
  • #77

The MAMF is measured on a single GCD, while the 383 TFLOPs as specified is for 2 GCDs. I m also unable to achieve 147 TFLOPs on a single GCD however, I find that it peaks at 127, but I attribute that to ...
  • rlrs
  • 3
  • Opened 
    on Oct 15, 2024
  • #76

MAMF Finder uses MxNxK to mean [M, N] x [N, K] - [M, K], i.e. the shared dimension along which the inner product is taken is N. https://github.com/stas00/ml-engineering/blob/8deeddd8cb6ec126ff31c5fc4efdcca325476f8b/compute/accelerator/benchmarks/mamf-finder.py#L181 ...
  • GKolling
  • 2
  • Opened 
    on Oct 14, 2024
  • #74

You might want to change the link from https://huggingface.co/stas/ml-engineering-book/resolve/main/Stas%20Bekman%20-%20Machine%20Learning%20Engineering.pdf?download=true to https://huggingface.co/stas/ml-engineering-book/resolve/main/Stas%20Bekman%20-%20Machine%20Learning%20Engineering.pdf ...
  • sytelus
  • 3
  • Opened 
    on Oct 11, 2024
  • #73

Thank you for sharing the MLE Open Book with the community, its really useful. I tried to follow your guide on getting GPU utilization rates here to find out that the dcgm-exporter only works for datacenter ...
  • fortminors
  • 9
  • Opened 
    on Oct 9, 2024
  • #72

@stas00 Wondering if you have any tips tricks for working with performance profiling tools such as nsys? Or recommendations for systematically optimizing model architecture and single / multi-node training ...
  • jeromeku
  • 2
  • Opened 
    on Sep 30, 2024
  • #71
Issue origami icon

Learn how you can use GitHub Issues to plan and track your work.

Save views for sprints, backlogs, teams, or releases. Rank, sort, and filter issues to suit the occasion. The possibilities are endless.Learn more about GitHub Issues
ProTip! 
Press the
/
key to activate the search input again and adjust your query.
Issue origami icon

Learn how you can use GitHub Issues to plan and track your work.

Save views for sprints, backlogs, teams, or releases. Rank, sort, and filter issues to suit the occasion. The possibilities are endless.Learn more about GitHub Issues
ProTip! 
Restrict your search to the title by using the in:title qualifier.
Issue search results · GitHub