-
Notifications
You must be signed in to change notification settings - Fork 2.9k
Insights: NVIDIA/NeMo
Overview
Could not load contribution data
Please try again later
41 Pull requests merged by 21 people
-
Ko3n1g/chore/asr only
#13704 merged
May 22, 2025 -
ci: Add
init-file-checker
#13684 merged
May 22, 2025 -
SpeechLM2 collection
#12617 merged
May 22, 2025 -
build: Pin transformers (#13675)
#13692 merged
May 22, 2025 -
ci: Enable codecov checks
#13497 merged
May 21, 2025 -
build: Pin transformers
#13675 merged
May 21, 2025 -
Enabling chunked inference for AED models in asr_evaluator
#13674 merged
May 21, 2025 -
[automodel] consolidate vllm scripts
#13670 merged
May 21, 2025 -
Magpie yaml updates for LT and gradient clipping
#13614 merged
May 21, 2025 -
[Audio] TransformerUNet: predictive model tests added
#13648 merged
May 21, 2025 -
build: multimodal-only
#13665 merged
May 20, 2025 -
Set env variables for eval tests
#13658 merged
May 20, 2025 -
Qwen3
#13554 merged
May 20, 2025 -
[automodel] consolidate sft peft scripts
#13634 merged
May 20, 2025 -
Update README.md for 25.04 release
#13654 merged
May 20, 2025 -
[Llama4] Fix the recipe bug - cherrypick #13649
#13650 merged
May 19, 2025 -
Bump nvidia-modelopt to 0.29.0
#13599 merged
May 19, 2025 -
[Llama4] Fix the missing args in the recipe
#13649 merged
May 19, 2025 -
Multi node settings for evaluation nemo-run script
#13568 merged
May 19, 2025 -
[Automodel] Fix CP device_mesh issue, use PTL distsampler (#13473)
#13636 merged
May 19, 2025 -
Update install to use pip install
#13605 merged
May 19, 2025 -
Set L2_NeMo_2_EVAL as optional
#13644 merged
May 18, 2025 -
Cherry-pick
Update t5.py
(#13082) tor2.3.0
andbump mcore to f98b1a0
#13642 merged
May 18, 2025 -
Cherry-pick
Update vLLMExporter to use vLLM V1
(#13498) intor2.3.0
#13631 merged
May 18, 2025 -
Cherry pick
[automodel] fix --mbs/gbs dtype and chat-template (13598)
intor2.3.0
#13613 merged
May 18, 2025 -
Add Gemma3 VL model
#13536 merged
May 18, 2025 -
Fix image_processor config in Energon path
#13618 merged
May 18, 2025 -
Skip eval unit test
#13635 merged
May 17, 2025 -
Set ASR test to be optional
#13633 merged
May 17, 2025 -
[automodel] dist.abort -> dist.destroy_process_group
#13578 merged
May 17, 2025 -
Fix ptl import in notebooks
#13608 merged
May 17, 2025 -
Tests for evaluation with NVIDIA Evals Factory
#13627 merged
May 17, 2025 -
Intermediate-tensor distillation support
#13069 merged
May 17, 2025 -
[automodel] fix log message
#13612 merged
May 17, 2025 -
Update vLLMExporter to use vLLM V1
#13498 merged
May 17, 2025 -
Creating a new MCore inferencing path for NeMo Deploy
#13611 merged
May 16, 2025 -
Cherry pick
Add CI test for local checkpointing (#13012)
intor2.3.0
#13472 merged
May 16, 2025 -
Cherry pick
[automodel] add find_unused_parameters=True for DDP (13366)
intor2.3.0
#13601 merged
May 16, 2025 -
Cherry-pick
[automodel] fallback FP8 + LCE -> FP8 + CE
(#13349) intor2.3.0
#13561 merged
May 16, 2025 -
Cherry pick
[automodel] deprecate global_batch_size dataset argument (13137)
intor2.3.0
#13560 merged
May 16, 2025 -
[Automodel] Fix CP device_mesh issue, use PTL distsampler
#13473 merged
May 16, 2025
37 Pull requests opened by 26 people
-
Chtruong/r2.3.0 cherry picks
#13615 opened
May 16, 2025 -
gpu type and #devices CLI args
#13620 opened
May 16, 2025 -
Enabled C2C-PCie bridge through NCCL
#13621 opened
May 16, 2025 -
Refactor MSC integration in exp manager
#13626 opened
May 16, 2025 -
[automodel] deprecate model_accelerator in favor of fp8_autocast
#13632 opened
May 17, 2025 -
Chtruong/cherry pick 12
#13641 opened
May 18, 2025 -
Load master weights from checkpoint
#13646 opened
May 19, 2025 -
[Evaluation] Add support for simple-evals and tasks that require logprobs
#13647 opened
May 19, 2025 -
Longcontext r2.3.0
#13653 opened
May 20, 2025 -
transformers_offline=0 and profile changes to llama3.1 405b
#13655 opened
May 20, 2025 -
Gemma3 Fix and Tests
#13661 opened
May 20, 2025 -
fix: vpp stage error fix when bumping up mcore version
#13662 opened
May 20, 2025 -
Clean up VLM's transposes in different parallel settings
#13663 opened
May 20, 2025 -
Changing the tokenizer from Scout to Maverick in the pretrain LLAMA4 …
#13664 opened
May 20, 2025 -
[Automodel] Add sdpa_kernel context manager for HFAutoModelForCausalLM
#13668 opened
May 21, 2025 -
Remove is-optional marker for L2_NeMo_2_EVAL
#13669 opened
May 21, 2025 -
IPL callback and two scripts
#13671 opened
May 21, 2025 -
fix: vpp stage refactoring to match mcore
#13673 opened
May 21, 2025 -
Mini Code Refactor: Typos, Yaml updates, Code changes
#13677 opened
May 21, 2025 -
fix moe_router_pre_softmax for Mixtral
#13678 opened
May 21, 2025 -
Fix Qwen3 export + misc
#13679 opened
May 21, 2025 -
Add tp-ep-pp-cp-dp order changes
#13680 opened
May 21, 2025 -
Add CI/CD to Magpie dev branch
#13682 opened
May 21, 2025 -
wip: Test circular install
#13683 opened
May 21, 2025 -
Merge origin/2.3.0 to llmb-nemo-2.3.0
#13685 opened
May 21, 2025 -
chore(🤖): Bump `NVIDIA/Megatron-LM` to `a845aa7...` (2025-05-22)
#13687 opened
May 22, 2025 -
chore(🤖): Bump `NVIDIA/Megatron-LM` to `5a676b3...` (2025-05-22)
#13688 opened
May 22, 2025 -
Fixed incorrect `use_bias` parameter handling in `ConformerEncoder.change_attention_model`
#13689 opened
May 22, 2025 -
Tweaking LLama4 Maverick PreTrain file to adapt to the user configs p…
#13690 opened
May 22, 2025 -
Add perf recipe script for Nemotron-H-56B
#13691 opened
May 22, 2025 -
Recipe default value fix for Llama4
#13696 opened
May 22, 2025 -
Add vLLM Mixtral and TRT-LLM qnemo export tests (plus a couple of bugfixes)
#13697 opened
May 22, 2025 -
Adding FP8 Default Configs for LLAMA4 Maverick
#13698 opened
May 22, 2025 -
Adds reasoning to the evaluation
#13699 opened
May 22, 2025 -
Mollys/grok llmb nemo r2.3.0
#13700 opened
May 22, 2025 -
Address VDR feedback for NeMo FW evaluations
#13701 opened
May 22, 2025 -
feat: Expose vLLM deploy API
#13702 opened
May 22, 2025
11 Issues closed by 7 people
-
Biggie smalls
#13638 closed
May 21, 2025 -
Add an Oracle to the GitHub assistant
#13637 closed
May 21, 2025 -
How to perform knowledge distillation for Hyena architecture models using NeMo?
#13496 closed
May 20, 2025 -
Discrepancy in custom transcribe pipeline vs. `model.transcribe()` for QuartzNet model
#12800 closed
May 20, 2025 -
Excessive memory buildup when merging text datasets in Megatron core
#12993 closed
May 20, 2025 -
Discrepancy in Megatron-Core API in 24.02.01 NeMo Container vs GitHub & Error with upgraded package
#13643 closed
May 19, 2025 -
[Llama3 Model Distillation] IndexError: pop from empty list
#13557 closed
May 18, 2025 -
AttributeError: 'Tensor' object has no attribute 'seek'
#12877 closed
May 16, 2025 -
riva quickstart gives"Unsupported model IR version: 10, max supported IR version: 9
#13204 closed
May 16, 2025 -
speech_llm/modular_audio_gpt_train.py is not running while freeze_audio_encoder: False
#12627 closed
May 16, 2025
14 Issues opened by 10 people
-
FastPitch Model Training Fails on pitch Shape Mismatch (Despite Valid [B, T] Tensor)
#13695 opened
May 22, 2025 -
fsdp2 / ddp
#13694 opened
May 22, 2025 -
wandb logger
#13693 opened
May 22, 2025 -
Taking a HF dataset without loading python script and train NeMo models
#13681 opened
May 21, 2025 -
Question for AdaLN's Impact on Tensor Parallelism Overlap
#13676 opened
May 21, 2025 -
FLOPsMeasurementCallback not reporting the FLOPs numbers
#13659 opened
May 20, 2025 -
Size mismatch when finetune Canary-180m in Vietnamese
#13657 opened
May 20, 2025 -
Key Error when trying to Inference using Fine Tuned llama4 model
#13656 opened
May 20, 2025 -
Megatron DDP strategy based MegatronLossReduction
#13630 opened
May 17, 2025 -
grpcio wheel does not build with Ubuntu 25.04
#13625 opened
May 16, 2025 -
pathtools doesn't work on Python 3.12
#13624 opened
May 16, 2025 -
Custom MegatronLoss with torch primitives
#13623 opened
May 16, 2025 -
Error installing geventhttpclient
#13622 opened
May 16, 2025 -
Low GPU utilization with NEST and OOMptimizer
#13619 opened
May 16, 2025
59 Unresolved conversations
Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.
-
Add Fréchet codec distance metric
#13553 commented on
May 22, 2025 • 27 new comments -
first commit
#12477 commented on
May 22, 2025 • 17 new comments -
Multi token bleu
#13526 commented on
May 19, 2025 • 13 new comments -
BPE char tokenizer
#13594 commented on
May 22, 2025 • 8 new comments -
New streaming RNN-T and TDT inference
#9106 commented on
May 22, 2025 • 7 new comments -
Support TRTLLM Pytorch backend Deployment
#13177 commented on
May 22, 2025 • 5 new comments -
PTQ model support, quant_cfg, and documentation updates
#13519 commented on
May 19, 2025 • 3 new comments -
Added packed sequence and sequence parallel to qwen2vl
#13572 commented on
May 22, 2025 • 3 new comments -
AIStore with Webdataset
#13604 commented on
May 22, 2025 • 3 new comments -
flux 12b fixes
#13459 commented on
May 21, 2025 • 2 new comments -
Punctuation Marks in Timestamps
#13353 commented on
May 16, 2025 • 2 new comments -
perf scripts updates
#13456 commented on
May 21, 2025 • 1 new comment -
feat - GPTSFTChatDataset alignment with OpenAI Messages, compatibility with packed sequences
#13367 commented on
May 22, 2025 • 1 new comment -
Add CallbackGroup & Metadata factory function
#13437 commented on
May 21, 2025 • 1 new comment -
ci: Bump dependencies (#12819)
#13482 commented on
May 17, 2025 • 0 new comments -
initial peft implementation
#13463 commented on
May 21, 2025 • 0 new comments -
Incorporate 25.04 NeMo Patches
#13488 commented on
May 22, 2025 • 0 new comments -
remove blocks unused to increase coverage
#13511 commented on
May 22, 2025 • 0 new comments -
Remove SDXL quantization tutorial
#13453 commented on
May 21, 2025 • 0 new comments -
HF export in nemo.export
#13516 commented on
May 16, 2025 • 0 new comments -
export_ckpt failed due to AssertionError: dtype mismatch between source and target state dicts
#13455 commented on
May 16, 2025 • 0 new comments -
support qwen2.5-vl
#13537 commented on
May 22, 2025 • 0 new comments -
Fix eval_beamsearch_ngram_ctc.py for hybrid models
#13538 commented on
May 21, 2025 • 0 new comments -
Magpietts 2503 updateshars
#13548 commented on
May 22, 2025 • 0 new comments -
Fix bugs in LLaVA NeXT model with padding and CPU initialization options
#13558 commented on
May 21, 2025 • 0 new comments -
Fix resume with MegatronPretrainingBatchSampler
#13565 commented on
May 16, 2025 • 0 new comments -
Fixed Mllama Energon config
#13574 commented on
May 21, 2025 • 0 new comments -
[automodel] move liger kernel patching
#13579 commented on
May 21, 2025 • 0 new comments -
Flux FP8 recipe
#13584 commented on
May 19, 2025 • 0 new comments -
Add resiliency features to recipes
#13585 commented on
May 16, 2025 • 0 new comments -
Temporary Forward compatibility MR for incoming vpp refactor in mcore
#13587 commented on
May 19, 2025 • 0 new comments -
Magpietts Multilingual IPA GRPO
#13595 commented on
May 22, 2025 • 0 new comments -
Added safe loading of models
#13607 commented on
May 21, 2025 • 0 new comments -
Llama3.2 1B Pruning: input_size missing
#13609 commented on
May 16, 2025 • 0 new comments -
Tips for Gemma 3 1B pretraining
#13438 commented on
May 16, 2025 • 0 new comments -
Bloated pre-requirements
#12188 commented on
May 17, 2025 • 0 new comments -
NCCL TImeout
#13562 commented on
May 21, 2025 • 0 new comments -
Error running transcribe_speech.py
#13571 commented on
May 21, 2025 • 0 new comments -
Speech data explorer doesn't work if run in a web-based development environment
#12527 commented on
May 22, 2025 • 0 new comments -
FileNotFoundError during checkpoint saving in nemo_model_checkpoint.py
#13581 commented on
May 22, 2025 • 0 new comments -
BUG - ASR - Finetuned hybrid model timestamps
#12799 commented on
May 22, 2025 • 0 new comments -
[NeMo2.0] Add MCore FSDP2 support
#11216 commented on
May 21, 2025 • 0 new comments -
Add safetensor option when saving and restoring models
#11549 commented on
May 20, 2025 • 0 new comments -
Fixed normalization of feature vector and weight vector
#12246 commented on
May 22, 2025 • 0 new comments -
Variable global and micro batch sizes for different GPUs
#12640 commented on
May 21, 2025 • 0 new comments -
feat: add support for nemo 2.0 checkpointing with multistorageclient
#12746 commented on
May 17, 2025 • 0 new comments -
Implement Speculative transform script for GPT models
#12863 commented on
May 22, 2025 • 0 new comments -
Transducer with Transformer-Decoder (GPT-like)
#13030 commented on
May 22, 2025 • 0 new comments -
Add option distributed_size to MegatronDistributedFusedAdam
#13102 commented on
May 20, 2025 • 0 new comments -
[automodel] set use_reentrant=False for gradient ckpting
#13272 commented on
May 16, 2025 • 0 new comments -
add CTC batched beam search
#13337 commented on
May 20, 2025 • 0 new comments -
Chtruong/debug flux tests
#13387 commented on
May 17, 2025 • 0 new comments -
adding inference utility to do nemo2 ckpt loading without PTL and usi…
#13389 commented on
May 17, 2025 • 0 new comments -
[automodel] fix grad ckpting high memory usage
#13390 commented on
May 18, 2025 • 0 new comments -
add FA3
#13399 commented on
May 17, 2025 • 0 new comments -
Adding fix to absolute embedding
#13414 commented on
May 20, 2025 • 0 new comments -
Automodel mvp 0.2
#13420 commented on
May 20, 2025 • 0 new comments -
Fix masking of <pad> tokens in AED inference
#13428 commented on
May 21, 2025 • 0 new comments -
Move loop under LLM collection
#13446 commented on
May 20, 2025 • 0 new comments