issues Search Results · repo:stas00/ml-engineering language:Python
Filter by
33 results
(76 ms)33 results
instas00/ml-engineering (press backspace or delete to remove)https://github.com/stas00/ml-engineering/blob/master/network/comms.md S*N/(S*B) + (k − 2)*N*(SB) = N*(S + k − 2)/(S*B),
why not S*N/(S*B) + (k − 2)*N / (SB) ?
wtyfpc
- 1
- Opened 17 days ago
- #109
In the Maximum Achievable Matmul FLOPS comparison table, the theory value of NVIDIA H100 SXM is 989. However, in the
NVIDIA H100 Tensor Core GPU Datasheet, the value of BFLOAT16 Tensor Core is 1,979. Is ...
Keyu-Yan
- 2
- Opened on Feb 24
- #100
When I ran the following command at ml-engineering/compute/accelerator/benchmarks
HIP_VISIBLE_DEVICES=4 python ./mamf-finder.py --m_range 20480 20500 256 --n 14784 --k 8192 --dtype float8_e4m3fn
--output_file=$(date ...
tjtanaa
- 2
- Opened on Jan 16
- #91
In reference to a remark on this page :
The problem with the advertised theoretical peak FLOPS is that they are very theoretical and can t be achieved in
practice even if all the perfect conditions have ...
fluidnumerics-joe
- 1
- Opened on Oct 25, 2024
- #77
The MAMF is measured on a single GCD, while the 383 TFLOPs as specified is for 2 GCDs. I m also unable to achieve 147
TFLOPs on a single GCD however, I find that it peaks at 127, but I attribute that to ...
rlrs
- 3
- Opened on Oct 15, 2024
- #76
cpollo55
- 5
- Opened on Oct 15, 2024
- #75
MAMF Finder uses MxNxK to mean [M, N] x [N, K] - [M, K], i.e. the shared dimension along which the inner product is
taken is N.
https://github.com/stas00/ml-engineering/blob/8deeddd8cb6ec126ff31c5fc4efdcca325476f8b/compute/accelerator/benchmarks/mamf-finder.py#L181 ...
GKolling
- 2
- Opened on Oct 14, 2024
- #74
You might want to change the link from
https://huggingface.co/stas/ml-engineering-book/resolve/main/Stas%20Bekman%20-%20Machine%20Learning%20Engineering.pdf?download=true
to
https://huggingface.co/stas/ml-engineering-book/resolve/main/Stas%20Bekman%20-%20Machine%20Learning%20Engineering.pdf ...
sytelus
- 3
- Opened on Oct 11, 2024
- #73
Thank you for sharing the MLE Open Book with the community, its really useful. I tried to follow your guide on getting
GPU utilization rates here to find out that the dcgm-exporter only works for datacenter ...
fortminors
- 9
- Opened on Oct 9, 2024
- #72
@stas00
Wondering if you have any tips tricks for working with performance profiling tools such as nsys? Or recommendations for
systematically optimizing model architecture and single / multi-node training ...
jeromeku
- 2
- Opened on Sep 30, 2024
- #71

Learn how you can use GitHub Issues to plan and track your work.
Save views for sprints, backlogs, teams, or releases. Rank, sort, and filter issues to suit the occasion. The possibilities are endless.Learn more about GitHub IssuesProTip!
Press the /
key to activate the search input again and adjust your query.
Learn how you can use GitHub Issues to plan and track your work.
Save views for sprints, backlogs, teams, or releases. Rank, sort, and filter issues to suit the occasion. The possibilities are endless.Learn more about GitHub IssuesProTip!
Restrict your search to the title by using the in:title qualifier.