sync after 6915 (#14) #15

zhehuaichen · 2023-09-22T20:28:08Z

Fixed small bug with NoisePerturbationWithNormalization (Fixed small bug with NoisePerturbationWithNormalization NVIDIA/NeMo#7118)
Fix import guard checks (Fix import guard checks NVIDIA/NeMo#7124)
Revert "Fix import guard checks (Fix import guard checks NVIDIA/NeMo#7124)" (Revert "Fix import guard checks" NVIDIA/NeMo#7125)

This reverts commit a46e325.

Fix import guard checks (Fix import guard checks NVIDIA/NeMo#7126)
Fix import guard checks
[pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Add updated fc ctc and rnnt xxl models (Add updated fc ctc and rnnt xxl models NVIDIA/NeMo#7128) (Add updated fc ctc and rnnt xxl models NVIDIA/NeMo#7130)
[TTS] Create EnCodec training recipe ([TTS] Create EnCodec training recipe NVIDIA/NeMo#6852)
[TTS] Create EnCodec training recipe
[TTS] Update encodec recipe
[TTS] Rename EnCodec to AudioCodec
[TTS] Add EnCodec unit tests
[TTS] Add copyright header to distributed.py

Fix rank where torch.distributed may not be initialized yet and would not wait for tokenizer file caching (Fix tokenizer file caching where torch.distributed may not be initialized yet NVIDIA/NeMo#7061)
fix default attention size (Fix default context size NVIDIA/NeMo#7141) (Fix default context size NVIDIA/NeMo#7143)
fix evaluator.py for various exceptions by ast (fix evaluator.py for various exceptions by ast NVIDIA/NeMo#7150)
[TTS][ZH] add Chinese TTS recipes based on IPA symbol sets. ([TTS][ZH] add Chinese TTS recipes based on IPA symbol sets. NVIDIA/NeMo#6893)
[TTS] add Chinese TTS recipe based on IPA.
add new pinyin and ipa dictionaries with 36 finals.
add yaml configs for 24-final pinyin and ipa.
add copyright header
add a directory level 24finals to discriminate from 36 finals.
unify configs into a single one and add detailed comments providing supported candidates.
choose 36-final IPA as default phoneme dict

[TTS] Add output audio format to preprocessing ([TTS] Add output audio format to preprocessing NVIDIA/NeMo#6889)
[TTS] Add output audio format to preprocessing
[TTS] Add format validation
[TTS] Fix data tutorial

freeze (freeze base mode on init during peft NVIDIA/NeMo#7152)
make sure any empty segments are removed (NFA bugfix: remove any empty segments NVIDIA/NeMo#7155)
Update RIR generation scripts ([Tools] Update RIR generation scripts NVIDIA/NeMo#6547)

fix: reduce room size if evaluation of params fails
added randomized mic placement
added diffuse noise generation
added an option to specify the format and subtype for saved audio

A quickstart speech enhancement tutorial (A quickstart speech enhancement tutorial NVIDIA/NeMo#6492)

A simple example of training a model for speech enhancement task

NFA subtitle file config - specify colors and vertical alignment (NFA subtitle file config - specify colors and vertical alignment NVIDIA/NeMo#7160)
allow specifying colors of text in ASS subtitle file
specify vertical_alignment instead of marginv in ass_file_config
add documentation of CTMFileConfig and ASSFileConfig to NFA README

Eagerly accumulate embedding grads into fp32 buffer (Fix incorrect embedding grads with distopt BF16 grad reductions NVIDIA/NeMo#6958) (Fix incorrect embedding grads with distopt BF16 grad reductions NVIDIA/NeMo#7153)
TE bug fix (TE bug fix NVIDIA/NeMo#7027) (TE bug fix NVIDIA/NeMo#7036)
[TTS] Remove nested TTS configs ([TTS] Remove nested TTS configs NVIDIA/NeMo#7154)
[TTS] Remove nested TTS configs
[TTS] Modify tutorial to support multiple sampling rates
[TTS] Clarify min_duration unit
[TTS] Default 22.05kHz highfreq to null

Merge release r1.20.0 to main (Merge release r1.20.0 to main NVIDIA/NeMo#7167)
update package info
Add ASR with TTS Tutorial. Fix enhancer usage. (Add ASR with TTS Tutorial. Fix enhancer usage. NVIDIA/NeMo#6955)
Add ASR with TTS Tutorial
Fix enhancer usage
install_bs (fix install_beamsearch_decoders.sh NVIDIA/NeMo#7019)
Fix typo and branch in tutorial (Fix typo and branch in tutorial NVIDIA/NeMo#7048)
fix syntax error introduced in PR-7079 (fix syntax error introduced in PR-7079 NVIDIA/NeMo#7102)
fix syntax error introduced in PR-7079
fixes for pr review

fix links for TN (Fix links in Segmentation tutorial NVIDIA/NeMo#7117)
update branch (Update notebook branch NVIDIA/NeMo#7135)
Fixed main and merging this to r1.20 (Fixed main and merging this to r1.20 NVIDIA/NeMo#7127)
Fixed main and merging this to r1.20
Update vad_utils.py

update branch
fix version
resolve conflict the other way
keep both
revert keep both

Upgrade to pytorch lightning 2.0 (Upgrade to pytorch lightning 2.0 NVIDIA/NeMo#6433)
Upgrade pytorch lightning version in requirements
Initial fixes for PTL2.0
Add further fixes to support lightning 2.0
Add replacements for replace_sampler_ddp, resume_from_checkpoint_fit_path and few occurances of validation_epoch_end
Replace all occurances of validation_epoch_end to on_validation_epoch_end
Replace training_epoch_end, test_epoch_end with on_train_epoch_end and on_test_epoch_end respectively
Change logger=None to logger=False in Trainer object
Remove PTL2.0 deprecated Trainer args from TrainerConfig dataclass
Modify trainer.precision check and other small edits
Replace logger=None with logger=False in test_ptl_stateless_timer.py Trainer
Add default values for args to fix Attribute Error
Add the following modifications

Remove outputs arg from on_validation_epoch_end, on_test_epoch_end and make it an arg of the class 2) Replace resume_from_checkpoint with ckpt_path as needed 3) Explicitly add accelerator as 'CPU' in UTs being run on CPU

Remove outputs arg from on_validation_epoch_end, on_test_epoch_end
Remove outputs arg in on_validation_epoch_end in MultiBinaryAccuracy docstrings
Add val, test outputs as instance vars in PunctuationCapitalizationModel and TokenClassificationModel
Replace trainer.fit_loop.max_steps with trainer.fit_loop.epoch_loop.max_steps in test_optimizers_schedulers.py
Revert an extra space that was mistakenly added
Use self.validation_step_outputs and self.test_step_outputs in test_ema.py for uniformity
Use self.validation_step_outputs and self.test_step_outputs in test_ptl_stateless_timer.py and check_for_ranks.py for uniformity
Add self.validation_step_outputs.clear() and self.test_step_outputs.clear() wherever missing
Remove outputs arg from on_train_epoch_end
Remove outputs from on_validation_epoch_end in multi_binary_acc.py
Remove output args from on_validation_epoch_end in the docstrings of some ASR files
Remove output args from on_validation_epoch_end and clear memory from validation_step_outputs
Add on_validation_epoch_end and remove outputs args for nlp models
Append output of validation_step to validation_step_outputs in EncDecClassificationModel
Add the following changes

Index self.validation_step_outputs and self.test_step.outputs with dataloader_idx wherever needed 2) Initialize self.validation_step_outputs and self.test_step.outputs as empty lists and add support for multi dataloaders if they exist 3) Remove self.pre_configure_ddp from NLPDDPStrategy class as its removed in PTL 2.0

Add default value dataloader_idx=0 for on_validation_batch_end() in megatron_base_model.py
TypeCast precision to str in attention.py and utils_funcs.py to avoid TypeError
Add if condition check for multiple dataloaders when appending to validation outputs
Separate validation pass to be used with both validation_step and test_step
Add if condition check for multiple dataloader while appending to test_step_outputs in punctuation_capitalization_model.py
Add condition check for multiple dataloaders based on type of trainer.val/test_dataloaders or self._validation/test_dl instead of len
Comment Megatron T5 IA3 PP=2 in CI pipeline due to dataloader_iter issue with PTL 2.0
Modify precision checks to account for 16-mixed and bf16-mixed
Append output of validation/test_step to self.validation/test_step_outputs in CTCG2PModel
Modify find_unused_parameters=True in g2p_heteronym model

Add find_unused_parameters=True for DDP strategy in g2p_heteronym_classification_train_and_evaluate.py 2) Remove args output in validation/test_step and add instance variables instead for heteronym_classification.py

Remove outputs from on_test_epoch_end in DialogueGPTClassificationModel
Add validation/test outputs in sgdqa_model and modify dialogue_config.yaml
Add split arg self.test_step_outputs to TextClassificationModel
Add test_step_outputs to dialogue and text classification models
Change condition check for multiple dataloaders:

Replace ds_item as list in dialogue_config.yaml 2) Check for len of val/test_dataloaders or validation/test_dl along with type check of list in sgdqa_model.py while appending outputs of validation/test_step 3) Check for len of _validation/test_dl for creating self.validation/test_step_outputs in ModelPT and punctuation_cpitalization_model.py

Add additional condition for multi dataloaders

Check len(self.trainer.val/test_dataloaders) > 1 along with type(self.trainer.val/test_dataloaders) == list for multi dataloaders in validation/test_step

Add val step outputs and default val for dataloader_idx

Append validation_step outout to self.validation_step_outputs in MultiLabelIntentSlotClassificationMode 2) Add default val for dataloader_idx for on_test_batch_start/end in TimingCallback 3) Add self.validation/test_step_outputs in BERTQAModel and remove outputs arg

Add val/test_step_outputs to S2SQAModel and GPTQAModel
Edit JenkinsFile for bert_pretrainig.py

Edit Jenkinsfile for this test to disable validation as a workaround for trainer.val_dataloader None error

Modify precision to support 16-mixed, bf16-mixed in megatron_gpt_pretraining.py
Add ddp_find_unused_parameters_true and remove output args

Add ddp_find_unused_parameters_true fro trainer.strategy in self_alignment_pretraining.py as it has unused parameters 2) Remove output args and add self.validation/test_step_outputs to validation/test_step in mt_enc_dec_model.py 3) Comment tests in JenkinsFile that need to be fixed

Precision fix in megatron_nmt_training.py for 16-mixed, bf16-mixed
Precision fix for megatron_bert_pretraining.py and megatron_bert_model.py
Precision fix and validation/test_step_outputs

Add fix to account for 16-mixed and bf16-mixed in megatron_retro_mutransfer_pretrain.py, megatron_retro_pretraining.py 2) Reset ckpt_path for test in enc_dec_nmt.py
Remove outputs args and add validation/test_step_outputs in megatron_retrieval_model.py 4) Comment Megatron Bert Pretraining and Resume Training with Pipeline Paralleism and add back NMT Training Post-LN

Precision fix and skip few failing tests
Add missing comment lines in JenkinsFile
Comment jenkin tests and super().on_validation_epoch_end() in megatron_gpt_sft_model.py
Minor edit JenkinsFile
Minor edit in jenkins file
Edit in Jenkins file
Comment missed lines in Jenkins file
Fix precision and validation/test outputs

Add precision fix to account for 16-mixed and bf16-mixed in megatron_t5_pretraining.py 2) Remove outputs args and add append loss to self.validation/test_step_outputs in megatron_lm_encoder_decoder_model.py 3) Add back resume_from_checkpoint in the megatron_t5_config.yaml 4) Comment out certain tests in Jenkins file

Fix precision and validation/test/predict errors in megatron_t5_prompt_learning.py
Precision fix and edit precision typo in all files

Account for 16-mixed and bf16-mixed in megatron_bart_pretraining.py and megatron_t5_seq2seq_finetune.py 2) Fix precision typo in all files

Fix all CI TTS tests and comment few Jenkins tests
Combine xx_epoch_end and on_xx_epoch_end

Add on_inference_epoch_end to inference_epoch_end function and have a single on_validation/test_epoch_end in megatron_finetune_model.py and megatron_gpt_sft_model.py

Add a missing comment in JenkinsFile
Add try except StopIteration in validation_step for models with dataloader_iter
Remove pyyaml from requirements
Add try except for inference_step in megatron_finetune_model.py
Remove limit_val_batches for mockGPTDataset test
Add new self.validation_step_outputs for MegatronGPTSFTModel
Minor edit Jenkinsfile
Initialize self.validation/test_step_outputs in megatron_gpt_sft_model.py

Initialize self.validation/test_step_outputs in setup of MegatronGPTSFTModel to take care of cases when datalaoders are not setup in ModelPT for example while restoring the model.

Remove resume_from_checkpoint if trainer arg in conf yaml files
Remove resume_from_checkpoint as trainer arg in GPT, T5 configs
Remove resume_from_checkpoint in duplex_tn_config.yaml
Fix typos, unused imports and refactor code to remove redundant funcs
Remove commented code in megatron_nmt_model.py
Fix overriden functions to match parent class functions
Prefetch dataloader_iter to prevent hang for PP>1
Override setup() in NLPDDPStrategy to avoid hang during predict with PP>1
Uncomment tests in JenkinsFile
Add '16' to precision checks and other minor fixes
Clear validation/test_step_outputs with dataloader_idx for multi dataloaders
Minor edits
Modify precision checks to avoid indexing
Remove self.validation_step_outputs_sft and add dataloader_idx to clear outputs
Reference checkpoint with trainer.ckpt_path
[pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Add _prefetch to NLPModel and minor fixes
[pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Add limit_val_batches in JenkinsFile for NMT

Add trainer.limit_val_batches in Megatron NMT Training TP=2 2) Remove unused import in ModelPT

Include the scripts for preprocessing OAST and unit tests for chat sft datasets (Include the scripts for preprocessing OAST and unit tests for chat sft datasets NVIDIA/NeMo#7112)
scripts for sft
fix style
adde special token only for huggingface model
change default name
print out error datapoint content
show error id
annotation script working
try to be compatible with huggingface tokenizer
added examples
added lang
added lang
text to value special case
configure the slider
annoatation handles lang
added the unit test for chat sft dataset
used the file in the test dir
fix json error
load local tokenizer
remove mask count check
added HF dataset backend
[pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

add paths to labeler. (add paths to labeler. NVIDIA/NeMo#7087)
T5 metrics fix (T5 metrics fix NVIDIA/NeMo#7037)
Fix race condition when executing with multi-node where some ranks does not wait for setup (Fix race condition for downloading cache when executing with multi-node NVIDIA/NeMo#7016)
Added bool types to neural_types export (Added bool types to neural_types export NVIDIA/NeMo#7032)
rnnt and char utils (rnnt and char utils NVIDIA/NeMo#6971)
rnnt_ngram_merge
char level bug
[pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

fix tab text gen (Fix tabular data text generation NVIDIA/NeMo#7022) (Fix tabular data text generation NVIDIA/NeMo#7031)
Fixed kwargs for metric instance init
Fixed kwargs for metric instance init
removed kwagrs
Updated config desc
ASR Confidence update and tutorial (ASR Confidence update and tutorial NVIDIA/NeMo#6810)
small fixes and tests
various fixes for the tutorial
tutorial added
for for a little oops after rebasement
[pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

fix tests
unused import removed
fix review comments
deprecated parameters for greedy configs
move re-assigning to configs
fix comments 2
fix config tests
fix ece test (my env was bugged apparently)
renamings for confidence ensemble
[pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

fox comments 3
return dropped tutorial
CI flips back and forth, increasing tolerance

install_bs (fix install_beamsearch_decoders.sh NVIDIA/NeMo#7019) (fix install_beamsearch_decoders.sh NVIDIA/NeMo#7028)
fixes for spellmapper (fixes for spellmapper NVIDIA/NeMo#6994) (fixes for spellmapper NVIDIA/NeMo#7000)
added back the retro documents (added back the retro documents. NVIDIA/NeMo#7033)
Remove pyyaml (Remove pyyaml NVIDIA/NeMo#7052) (Remove pyyaml NVIDIA/NeMo#7054)
st standalone model (st standalone model NVIDIA/NeMo#6969)
st standalone model
[pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

style fix
sacrebleu import fix, unused imports removed
import guard for nlp inside asr transformer bpe model
[pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

codeql fixes
[pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

comments answered
import ordering fix
yttm for asr removed
logging added
added inference and translate method
[pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

remove pos emb from state dict for old models (remove pos emb from state dict for old models NVIDIA/NeMo#7068)
remove pos emb from state dict
[pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

move to nlp_model
[pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

update comment
fix nmt test
[pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

fix nmt test

Fix typo in ASR-TTS tutorial (Fix typo in ASR-TTS tutorial NVIDIA/NeMo#7049)
Fixed tutorial's name (Fixed tutorial's name NVIDIA/NeMo#7047)
Fix documentation for Numba (Fix documentation for Numba NVIDIA/NeMo#7065) (Fix documentation for Numba NVIDIA/NeMo#7077)
Fix documentation for Numba
Update force float32 flag dynamically
Update force float32 flag dynamically
Fix nemo version

Update Frame-VAD doc and fix onnx export (Update Frame-VAD doc and fix onnx export NVIDIA/NeMo#7076)
update fvad doc
fix typo
update fvad example
update
fix onnx export
update test
refactor
update doc
update

memmap worker arg (memmap worker arg NVIDIA/NeMo#7062)
memmap worker arg
[pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

update
[pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

update
update

Fix caching bug in causal convolutions for cache-aware ASR models (Fix caching bug in causal convolutions for cache-aware ASR models NVIDIA/NeMo#7034) (Fix caching bug in causal convolutions for cache-aware ASR models NVIDIA/NeMo#7082)
Fast Conformer global token fix (Fast Conformer global token fix NVIDIA/NeMo#7085)
old way
fix
fix
fix
remove extra
clean
clean
clean
fix
fix
fix
fix
fix
fix
[pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Refined export_config (Refined export_config NVIDIA/NeMo#7053) (Refined export_config NVIDIA/NeMo#7066)
Refined export_config
Rolling back hierarchy change ---------
small Bugfix (small Bugfix NVIDIA/NeMo#7081)
small Bugfix (small Bugfix NVIDIA/NeMo#7079)
fix branch
fix typo
fix link

Update tutorials/nlp/SpellMapper_English_ASR_Customization.ipynb
Update tutorials/nlp/SpellMapper_English_ASR_Customization.ipynb

Added script to extract ASR CTC and RNNT models from ASR hybrid models (Added script to extract ASR CTC and RNNT models from ASR hybrid models NVIDIA/NeMo#7092)
Added script to extract ctc and rnnt models from hybrid models
[pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Updated hybrid extraction script for review request 1
[pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Updated hybrid convert script to remove --cuda flag

Adding docs and models for multiple lookahead cache-aware ASR (Adding docs and models for multiple lookahead cache-aware ASR NVIDIA/NeMo#7067) (Adding docs and models for multiple lookahead cache-aware ASR NVIDIA/NeMo#7094)
update TTS readme (update TTS readme NVIDIA/NeMo#7088)
update TTS readme

Fix absolute path in path join call (Fix absolute path in path join call NVIDIA/NeMo#7099)
Disable distopt contiguous param buffer by default (Disable distopt contiguous param buffer by default NVIDIA/NeMo#7095)
microphone demo (NeMo ASR Demo NVIDIA/NeMo#7110)
[Fix] load_state_dict in nlp_model.py ([Fix] load_state_dict in nlp_model.py NVIDIA/NeMo#7086)
Fix load_state_dict in nlp_model.py
[pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Fix plot function in vad_utils.py (Fix plot function in vad_utils.py NVIDIA/NeMo#7113)

Fix plot function in vad_utils.py

Fixed small bug with NoisePerturbationWithNormalization (Fixed small bug with NoisePerturbationWithNormalization NVIDIA/NeMo#7118)
Fix import guard checks (Fix import guard checks NVIDIA/NeMo#7124)
Revert "Fix import guard checks (Fix import guard checks NVIDIA/NeMo#7124)" (Revert "Fix import guard checks" NVIDIA/NeMo#7125)

This reverts commit a46e325.

Fix import guard checks (Fix import guard checks NVIDIA/NeMo#7126)
Fix import guard checks
[pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Add updated fc ctc and rnnt xxl models (Add updated fc ctc and rnnt xxl models NVIDIA/NeMo#7128) (Add updated fc ctc and rnnt xxl models NVIDIA/NeMo#7130)
[TTS] Create EnCodec training recipe ([TTS] Create EnCodec training recipe NVIDIA/NeMo#6852)
[TTS] Create EnCodec training recipe
[TTS] Update encodec recipe
[TTS] Rename EnCodec to AudioCodec
[TTS] Add EnCodec unit tests
[TTS] Add copyright header to distributed.py

Fix rank where torch.distributed may not be initialized yet and would not wait for tokenizer file caching (Fix tokenizer file caching where torch.distributed may not be initialized yet NVIDIA/NeMo#7061)
fix default attention size (Fix default context size NVIDIA/NeMo#7141) (Fix default context size NVIDIA/NeMo#7143)
fix evaluator.py for various exceptions by ast (fix evaluator.py for various exceptions by ast NVIDIA/NeMo#7150)
[TTS][ZH] add Chinese TTS recipes based on IPA symbol sets. ([TTS][ZH] add Chinese TTS recipes based on IPA symbol sets. NVIDIA/NeMo#6893)
[TTS] add Chinese TTS recipe based on IPA.
add new pinyin and ipa dictionaries with 36 finals.
add yaml configs for 24-final pinyin and ipa.
add copyright header
add a directory level 24finals to discriminate from 36 finals.
unify configs into a single one and add detailed comments providing supported candidates.
choose 36-final IPA as default phoneme dict

[TTS] Add output audio format to preprocessing ([TTS] Add output audio format to preprocessing NVIDIA/NeMo#6889)
[TTS] Add output audio format to preprocessing
[TTS] Add format validation
[TTS] Fix data tutorial

freeze (freeze base mode on init during peft NVIDIA/NeMo#7152)
make sure any empty segments are removed (NFA bugfix: remove any empty segments NVIDIA/NeMo#7155)
Update RIR generation scripts ([Tools] Update RIR generation scripts NVIDIA/NeMo#6547)

fix: reduce room size if evaluation of params fails
added randomized mic placement
added diffuse noise generation
added an option to specify the format and subtype for saved audio

A quickstart speech enhancement tutorial (A quickstart speech enhancement tutorial NVIDIA/NeMo#6492)

A simple example of training a model for speech enhancement task

NFA subtitle file config - specify colors and vertical alignment (NFA subtitle file config - specify colors and vertical alignment NVIDIA/NeMo#7160)
allow specifying colors of text in ASS subtitle file
specify vertical_alignment instead of marginv in ass_file_config
add documentation of CTMFileConfig and ASSFileConfig to NFA README

Eagerly accumulate embedding grads into fp32 buffer (Fix incorrect embedding grads with distopt BF16 grad reductions NVIDIA/NeMo#6958) (Fix incorrect embedding grads with distopt BF16 grad reductions NVIDIA/NeMo#7153)
TE bug fix (TE bug fix NVIDIA/NeMo#7027) (TE bug fix NVIDIA/NeMo#7036)
[TTS] Remove nested TTS configs ([TTS] Remove nested TTS configs NVIDIA/NeMo#7154)
[TTS] Remove nested TTS configs
[TTS] Modify tutorial to support multiple sampling rates
[TTS] Clarify min_duration unit
[TTS] Default 22.05kHz highfreq to null

Merge release r1.20.0 to main (Merge release r1.20.0 to main NVIDIA/NeMo#7167)
update package info
Add ASR with TTS Tutorial. Fix enhancer usage. (Add ASR with TTS Tutorial. Fix enhancer usage. NVIDIA/NeMo#6955)
Add ASR with TTS Tutorial
Fix enhancer usage
install_bs (fix install_beamsearch_decoders.sh NVIDIA/NeMo#7019)
Fix typo and branch in tutorial (Fix typo and branch in tutorial NVIDIA/NeMo#7048)
fix syntax error introduced in PR-7079 (fix syntax error introduced in PR-7079 NVIDIA/NeMo#7102)
fix syntax error introduced in PR-7079
fixes for pr review

fix links for TN (Fix links in Segmentation tutorial NVIDIA/NeMo#7117)
update branch (Update notebook branch NVIDIA/NeMo#7135)
Fixed main and merging this to r1.20 (Fixed main and merging this to r1.20 NVIDIA/NeMo#7127)
Fixed main and merging this to r1.20
Update vad_utils.py

update branch
fix version
resolve conflict the other way
keep both
revert keep both

Upgrade to pytorch lightning 2.0 (Upgrade to pytorch lightning 2.0 NVIDIA/NeMo#6433)
Upgrade pytorch lightning version in requirements
Initial fixes for PTL2.0
Add further fixes to support lightning 2.0
Add replacements for replace_sampler_ddp, resume_from_checkpoint_fit_path and few occurances of validation_epoch_end
Replace all occurances of validation_epoch_end to on_validation_epoch_end
Replace training_epoch_end, test_epoch_end with on_train_epoch_end and on_test_epoch_end respectively
Change logger=None to logger=False in Trainer object
Remove PTL2.0 deprecated Trainer args from TrainerConfig dataclass
Modify trainer.precision check and other small edits
Replace logger=None with logger=False in test_ptl_stateless_timer.py Trainer
Add default values for args to fix Attribute Error
Add the following modifications

Remove outputs arg from on_validation_epoch_end, on_test_epoch_end and make it an arg of the class 2) Replace resume_from_checkpoint with ckpt_path as needed 3) Explicitly add accelerator as 'CPU' in UTs being run on CPU

Remove outputs arg from on_validation_epoch_end, on_test_epoch_end
Remove outputs arg in on_validation_epoch_end in MultiBinaryAccuracy docstrings
Add val, test outputs as instance vars in PunctuationCapitalizationModel and TokenClassificationModel
Replace trainer.fit_loop.max_steps with trainer.fit_loop.epoch_loop.max_steps in test_optimizers_schedulers.py
Revert an extra space that was mistakenly added
Use self.validation_step_outputs and self.test_step_outputs in test_ema.py for uniformity
Use self.validation_step_outputs and self.test_step_outputs in test_ptl_stateless_timer.py and check_for_ranks.py for uniformity
Add self.validation_step_outputs.clear() and self.test_step_outputs.clear() wherever missing
Remove outputs arg from on_train_epoch_end
Remove outputs from on_validation_epoch_end in multi_binary_acc.py
Remove output args from on_validation_epoch_end in the docstrings of some ASR files
Remove output args from on_validation_epoch_end and clear memory from validation_step_outputs
Add on_validation_epoch_end and remove outputs args for nlp models
Append output of validation_step to validation_step_outputs in EncDecClassificationModel
Add the following changes

Index self.validation_step_outputs and self.test_step.outputs with dataloader_idx wherever needed 2) Initialize self.validation_step_outputs and self.test_step.outputs as empty lists and add support for multi dataloaders if they exist 3) Remove self.pre_configure_ddp from NLPDDPStrategy class as its removed in PTL 2.0

Add default value dataloader_idx=0 for on_validation_batch_end() in megatron_base_model.py
TypeCast precision to str in attention.py and utils_funcs.py to avoid TypeError
Add if condition check for multiple dataloaders when appending to validation outputs
Separate validation pass to be used with both validation_step and test_step
Add if condition check for multiple dataloader while appending to test_step_outputs in punctuation_capitalization_model.py
Add condition check for multiple dataloaders based on type of trainer.val/test_dataloaders or self._validation/test_dl instead of len
Comment Megatron T5 IA3 PP=2 in CI pipeline due to dataloader_iter issue with PTL 2.0
Modify precision checks to account for 16-mixed and bf16-mixed
Append output of validation/test_step to self.validation/test_step_outputs in CTCG2PModel
Modify find_unused_parameters=True in g2p_heteronym model

Add find_unused_parameters=True for DDP strategy in g2p_heteronym_classification_train_and_evaluate.py 2) Remove args output in validation/test_step and add instance variables instead for heteronym_classification.py

Remove outputs from on_test_epoch_end in DialogueGPTClassificationModel
Add validation/test outputs in sgdqa_model and modify dialogue_config.yaml
Add split arg self.test_step_outputs to TextClassificationModel
Add test_step_outputs to dialogue and text classification models
Change condition check for multiple dataloaders:

Replace ds_item as list in dialogue_config.yaml 2) Check for len of val/test_dataloaders or validation/test_dl along with type check of list in sgdqa_model.py while appending outputs of validation/test_step 3) Check for len of _validation/test_dl for creating self.validation/test_step_outputs in ModelPT and punctuation_cpitalization_model.py

Add additional condition for multi dataloaders

Check len(self.trainer.val/test_dataloaders) > 1 along with type(self.trainer.val/test_dataloaders) == list for multi dataloaders in validation/test_step

Add val step outputs and default val for dataloader_idx

Append validation_step outout to self.validation_step_outputs in MultiLabelIntentSlotClassificationMode 2) Add default val for dataloader_idx for on_test_batch_start/end in TimingCallback 3) Add self.validation/test_step_outputs in BERTQAModel and remove outputs arg

Add val/test_step_outputs to S2SQAModel and GPTQAModel
Edit JenkinsFile for bert_pretrainig.py

Edit Jenkinsfile for this test to disable validation as a workaround for trainer.val_dataloader None error

Modify precision to support 16-mixed, bf16-mixed in megatron_gpt_pretraining.py
Add ddp_find_unused_parameters_true and remove output args

Add ddp_find_unused_parameters_true fro trainer.strategy in self_alignment_pretraining.py as it has unused parameters 2) Remove output args and add self.validation/test_step_outputs to validation/test_step in mt_enc_dec_model.py 3) Comment tests in JenkinsFile that need to be fixed

Precision fix in megatron_nmt_training.py for 16-mixed, bf16-mixed
Precision fix for megatron_bert_pretraining.py and megatron_bert_model.py
Precision fix and validation/test_step_outputs

Add fix to account for 16-mixed and bf16-mixed in megatron_retro_mutransfer_pretrain.py, megatron_retro_pretraining.py 2) Reset ckpt_path for test in enc_dec_nmt.py
Remove outputs args and add validation/test_step_outputs in megatron_retrieval_model.py 4) Comment Megatron Bert Pretraining and Resume Training with Pipeline Paralleism and add back NMT Training Post-LN

Precision fix and skip few failing tests
Add missing comment lines in JenkinsFile
Comment jenkin tests and super().on_validation_epoch_end() in megatron_gpt_sft_model.py
Minor edit JenkinsFile
Minor edit in jenkins file
Edit in Jenkins file
Comment missed lines in Jenkins file
Fix precision and validation/test outputs

Add precision fix to account for 16-mixed and bf16-mixed in megatron_t5_pretraining.py 2) Remove outputs args and add append loss to self.validation/test_step_outputs in megatron_lm_encoder_decoder_model.py 3) Add back resume_from_checkpoint in the megatron_t5_config.yaml 4) Comment out certain tests in Jenkins file

Fix precision and validation/test/predict errors in megatron_t5_prompt_learning.py
Precision fix and edit precision typo in all files

Account for 16-mixed and bf16-mixed in megatron_bart_pretraining.py and megatron_t5_seq2seq_finetune.py 2) Fix precision typo in all files

Fix all CI TTS tests and comment few Jenkins tests
Combine xx_epoch_end and on_xx_epoch_end

Add on_inference_epoch_end to inference_epoch_end function and have a single on_validation/test_epoch_end in megatron_finetune_model.py and megatron_gpt_sft_model.py

Add a missing comment in JenkinsFile
Add try except StopIteration in validation_step for models with dataloader_iter
Remove pyyaml from requirements
Add try except for inference_step in megatron_finetune_model.py
Remove limit_val_batches for mockGPTDataset test
Add new self.validation_step_outputs for MegatronGPTSFTModel
Minor edit Jenkinsfile
Initialize self.validation/test_step_outputs in megatron_gpt_sft_model.py

Initialize self.validation/test_step_outputs in setup of MegatronGPTSFTModel to take care of cases when datalaoders are not setup in ModelPT for example while restoring the model.

Remove resume_from_checkpoint if trainer arg in conf yaml files
Remove resume_from_checkpoint as trainer arg in GPT, T5 configs
Remove resume_from_checkpoint in duplex_tn_config.yaml
Fix typos, unused imports and refactor code to remove redundant funcs
Remove commented code in megatron_nmt_model.py
Fix overriden functions to match parent class functions
Prefetch dataloader_iter to prevent hang for PP>1
Override setup() in NLPDDPStrategy to avoid hang during predict with PP>1
Uncomment tests in JenkinsFile
Add '16' to precision checks and other minor fixes
Clear validation/test_step_outputs with dataloader_idx for multi dataloaders
Minor edits
Modify precision checks to avoid indexing
Remove self.validation_step_outputs_sft and add dataloader_idx to clear outputs
Reference checkpoint with trainer.ckpt_path
[pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Add _prefetch to NLPModel and minor fixes
[pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Add limit_val_batches in JenkinsFile for NMT

Add trainer.limit_val_batches in Megatron NMT Training TP=2 2) Remove unused import in ModelPT

Include the scripts for preprocessing OAST and unit tests for chat sft datasets (Include the scripts for preprocessing OAST and unit tests for chat sft datasets NVIDIA/NeMo#7112)
scripts for sft
fix style
adde special token only for huggingface model
change default name
print out error datapoint content
show error id
annotation script working
try to be compatible with huggingface tokenizer
added examples
added lang
added lang
text to value special case
configure the slider
annoatation handles lang
added the unit test for chat sft dataset
used the file in the test dir
fix json error
load local tokenizer
remove mask count check
added HF dataset backend
[pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

add paths to labeler. (add paths to labeler. NVIDIA/NeMo#7087)
[pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Co-authored-by: Adi Renduchintala <adithyar…

What does this PR do ?

Add a one line overview of what this PR aims to accomplish.

Collection: [Note which collection this PR will affect]

Changelog

Add specific line by line info of high level changes in this PR.

Usage

You can potentially add a usage example below

# Add a code snippet demonstrating how to use this

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you add or update any necessary documentation?
Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
- Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

New Feature
Bugfix
Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.

Additional Information

Related to # (issue)

* Fixed small bug with NoisePerturbationWithNormalization (#7118) Signed-off-by: Daniel Egert <degert@nvidia.com> * Fix import guard checks (#7124) Signed-off-by: smajumdar <titu1994@gmail.com> * Revert "Fix import guard checks (#7124)" (#7125) This reverts commit a46e3251944642f9102aa16ce2d2f9d3a804ff8a. * Fix import guard checks (#7126) * Fix import guard checks Signed-off-by: smajumdar <titu1994@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: smajumdar <titu1994@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Add updated fc ctc and rnnt xxl models (#7128) (#7130) * [TTS] Create EnCodec training recipe (#6852) * [TTS] Create EnCodec training recipe Signed-off-by: Ryan <rlangman@nvidia.com> * [TTS] Update encodec recipe Signed-off-by: Ryan <rlangman@nvidia.com> * [TTS] Rename EnCodec to AudioCodec Signed-off-by: Ryan <rlangman@nvidia.com> * [TTS] Add EnCodec unit tests Signed-off-by: Ryan <rlangman@nvidia.com> * [TTS] Add copyright header to distributed.py Signed-off-by: Ryan <rlangman@nvidia.com> --------- Signed-off-by: Ryan <rlangman@nvidia.com> * Fix rank where torch.distributed may not be initialized yet and would not wait for tokenizer file caching (#7061) Signed-off-by: Kim Ngo <6362111+findkim@users.noreply.github.com> Co-authored-by: David <amosalla@asu.edu> * fix default attention size (#7141) (#7143) * fix evaluator.py for various exceptions by ast (#7150) Signed-off-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com> * [TTS][ZH] add Chinese TTS recipes based on IPA symbol sets. (#6893) * [TTS] add Chinese TTS recipe based on IPA. * add new pinyin and ipa dictionaries with 36 finals. * add yaml configs for 24-final pinyin and ipa. * add copyright header * add a directory level 24finals to discriminate from 36 finals. Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> * unify configs into a single one and add detailed comments providing supported candidates. Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> * choose 36-final IPA as default phoneme dict Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> --------- Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> * [TTS] Add output audio format to preprocessing (#6889) * [TTS] Add output audio format to preprocessing Signed-off-by: Ryan <rlangman@nvidia.com> * [TTS] Add format validation Signed-off-by: Ryan <rlangman@nvidia.com> * [TTS] Fix data tutorial Signed-off-by: Ryan <rlangman@nvidia.com> --------- Signed-off-by: Ryan <rlangman@nvidia.com> * freeze (#7152) Signed-off-by: arendu <adithya.r@gmail.com> * make sure any empty segments are removed (#7155) Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com> * Update RIR generation scripts (#6547) - fix: reduce room size if evaluation of params fails - added randomized mic placement - added diffuse noise generation - added an option to specify the format and subtype for saved audio Signed-off-by: Ante Jukić <ajukic@nvidia.com> * A quickstart speech enhancement tutorial (#6492) A simple example of training a model for speech enhancement task Signed-off-by: Ante Jukić <ajukic@nvidia.com> * NFA subtitle file config - specify colors and vertical alignment (#7160) * allow specifying colors of text in ASS subtitle file Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com> * specify vertical_alignment instead of marginv in ass_file_config Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com> * add documentation of CTMFileConfig and ASSFileConfig to NFA README Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com> --------- Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com> * Eagerly accumulate embedding grads into fp32 buffer (#6958) (#7153) Signed-off-by: Tim Moon <tmoon@nvidia.com> Co-authored-by: Tim Moon <4406448+timmoon10@users.noreply.github.com> * TE bug fix (#7027) (#7036) Signed-off-by: Dmytro Pykhtar <dpykhtar@nvidia.com> Co-authored-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com> * [TTS] Remove nested TTS configs (#7154) * [TTS] Remove nested TTS configs Signed-off-by: Ryan <rlangman@nvidia.com> * [TTS] Modify tutorial to support multiple sampling rates Signed-off-by: Ryan <rlangman@nvidia.com> * [TTS] Clarify min_duration unit Signed-off-by: Ryan <rlangman@nvidia.com> * [TTS] Default 22.05kHz highfreq to null Signed-off-by: Ryan <rlangman@nvidia.com> --------- Signed-off-by: Ryan <rlangman@nvidia.com> * Merge release r1.20.0 to main (#7167) * update package info Signed-off-by: ericharper <complex451@gmail.com> * Add ASR with TTS Tutorial. Fix enhancer usage. (#6955) * Add ASR with TTS Tutorial * Fix enhancer usage Signed-off-by: Vladimir Bataev <vbataev@nvidia.com> * install_bs (#7019) Signed-off-by: Nikolay Karpov <karpnv@gmail.com> * Fix typo and branch in tutorial (#7048) Signed-off-by: Vladimir Bataev <vbataev@nvidia.com> * fix syntax error introduced in PR-7079 (#7102) * fix syntax error introduced in PR-7079 Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> * fixes for pr review Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> --------- Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> * fix links for TN (#7117) Signed-off-by: Evelina <ebakhturina@nvidia.com> * update branch (#7135) Signed-off-by: ericharper <complex451@gmail.com> * Fixed main and merging this to r1.20 (#7127) * Fixed main and merging this to r1.20 Signed-off-by: Taejin Park <tango4j@gmail.com> * Update vad_utils.py Signed-off-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com> --------- Signed-off-by: Taejin Park <tango4j@gmail.com> Signed-off-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com> Co-authored-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com> * update branch Signed-off-by: ericharper <complex451@gmail.com> * fix version Signed-off-by: ericharper <complex451@gmail.com> * resolve conflict the other way Signed-off-by: ericharper <complex451@gmail.com> * keep both Signed-off-by: ericharper <complex451@gmail.com> * revert keep both Signed-off-by: ericharper <complex451@gmail.com> --------- Signed-off-by: ericharper <complex451@gmail.com> Signed-off-by: Vladimir Bataev <vbataev@nvidia.com> Signed-off-by: Nikolay Karpov <karpnv@gmail.com> Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> Signed-off-by: Evelina <ebakhturina@nvidia.com> Signed-off-by: Taejin Park <tango4j@gmail.com> Signed-off-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com> Co-authored-by: Vladimir Bataev <vbataev@nvidia.com> Co-authored-by: Nikolay Karpov <karpnv@gmail.com> Co-authored-by: bene-ges <antonova_sasha@list.ru> Co-authored-by: Evelina <10428420+ekmb@users.noreply.github.com> Co-authored-by: Taejin Park <tango4j@gmail.com> Co-authored-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com> * Upgrade to pytorch lightning 2.0 (#6433) * Upgrade pytorch lightning version in requirements Signed-off-by: Abhishree <abhishreetm@gmail.com> * Initial fixes for PTL2.0 Signed-off-by: Abhishree <abhishreetm@gmail.com> * Add further fixes to support lightning 2.0 Signed-off-by: Abhishree <abhishreetm@gmail.com> * Add replacements for replace_sampler_ddp, resume_from_checkpoint_fit_path and few occurances of validation_epoch_end Signed-off-by: Abhishree <abhishreetm@gmail.com> * Replace all occurances of validation_epoch_end to on_validation_epoch_end Signed-off-by: Abhishree <abhishreetm@gmail.com> * Replace training_epoch_end, test_epoch_end with on_train_epoch_end and on_test_epoch_end respectively Signed-off-by: Abhishree <abhishreetm@gmail.com> * Change logger=None to logger=False in Trainer object Signed-off-by: Abhishree <abhishreetm@gmail.com> * Remove PTL2.0 deprecated Trainer args from TrainerConfig dataclass Signed-off-by: Abhishree <abhishreetm@gmail.com> * Modify trainer.precision check and other small edits Signed-off-by: Abhishree <abhishreetm@gmail.com> * Replace logger=None with logger=False in test_ptl_stateless_timer.py Trainer Signed-off-by: Abhishree <abhishreetm@gmail.com> * Add default values for args to fix Attribute Error Signed-off-by: Abhishree <abhishreetm@gmail.com> * Add the following modifications 1) Remove outputs arg from on_validation_epoch_end, on_test_epoch_end and make it an arg of the class 2) Replace resume_from_checkpoint with ckpt_path as needed 3) Explicitly add accelerator as 'CPU' in UTs being run on CPU Signed-off-by: Abhishree <abhishreetm@gmail.com> * Remove outputs arg from on_validation_epoch_end, on_test_epoch_end Signed-off-by: Abhishree <abhishreetm@gmail.com> * Remove outputs arg in on_validation_epoch_end in MultiBinaryAccuracy docstrings Signed-off-by: Abhishree <abhishreetm@gmail.com> * Add val, test outputs as instance vars in PunctuationCapitalizationModel and TokenClassificationModel Signed-off-by: Abhishree <abhishreetm@gmail.com> * Replace trainer.fit_loop.max_steps with trainer.fit_loop.epoch_loop.max_steps in test_optimizers_schedulers.py Signed-off-by: Abhishree <abhishreetm@gmail.com> * Revert an extra space that was mistakenly added Signed-off-by: Abhishree <abhishreetm@gmail.com> * Use self.validation_step_outputs and self.test_step_outputs in test_ema.py for uniformity Signed-off-by: Abhishree <abhishreetm@gmail.com> * Use self.validation_step_outputs and self.test_step_outputs in test_ptl_stateless_timer.py and check_for_ranks.py for uniformity Signed-off-by: Abhishree <abhishreetm@gmail.com> * Add self.validation_step_outputs.clear() and self.test_step_outputs.clear() wherever missing Signed-off-by: Abhishree <abhishreetm@gmail.com> * Remove outputs arg from on_train_epoch_end Signed-off-by: Abhishree <abhishreetm@gmail.com> * Remove outputs from on_validation_epoch_end in multi_binary_acc.py Signed-off-by: Abhishree <abhishreetm@gmail.com> * Remove output args from on_validation_epoch_end in the docstrings of some ASR files Signed-off-by: Abhishree <abhishreetm@gmail.com> * Remove output args from on_validation_epoch_end and clear memory from validation_step_outputs Signed-off-by: Abhishree <abhishreetm@gmail.com> * Add on_validation_epoch_end and remove outputs args for nlp models Signed-off-by: Abhishree <abhishreetm@gmail.com> * Append output of validation_step to validation_step_outputs in EncDecClassificationModel Signed-off-by: Abhishree <abhishreetm@gmail.com> * Add the following changes 1) Index self.validation_step_outputs and self.test_step.outputs with dataloader_idx wherever needed 2) Initialize self.validation_step_outputs and self.test_step.outputs as empty lists and add support for multi dataloaders if they exist 3) Remove self.pre_configure_ddp from NLPDDPStrategy class as its removed in PTL 2.0 Signed-off-by: Abhishree <abhishreetm@gmail.com> * Add default value dataloader_idx=0 for on_validation_batch_end() in megatron_base_model.py Signed-off-by: Abhishree <abhishreetm@gmail.com> * TypeCast precision to str in attention.py and utils_funcs.py to avoid TypeError Signed-off-by: Abhishree <abhishreetm@gmail.com> * Add if condition check for multiple dataloaders when appending to validation outputs Signed-off-by: Abhishree <abhishreetm@gmail.com> * Separate validation pass to be used with both validation_step and test_step Signed-off-by: Abhishree <abhishreetm@gmail.com> * Add if condition check for multiple dataloader while appending to test_step_outputs in punctuation_capitalization_model.py Signed-off-by: Abhishree <abhishreetm@gmail.com> * Add condition check for multiple dataloaders based on type of trainer.val/test_dataloaders or self._validation/test_dl instead of len Signed-off-by: Abhishree <abhishreetm@gmail.com> * Comment Megatron T5 IA3 PP=2 in CI pipeline due to dataloader_iter issue with PTL 2.0 Signed-off-by: Abhishree <abhishreetm@gmail.com> * Modify precision checks to account for 16-mixed and bf16-mixed Signed-off-by: Abhishree <abhishreetm@gmail.com> * Append output of validation/test_step to self.validation/test_step_outputs in CTCG2PModel Signed-off-by: Abhishree <abhishreetm@gmail.com> * Modify find_unused_parameters=True in g2p_heteronym model 1) Add find_unused_parameters=True for DDP strategy in g2p_heteronym_classification_train_and_evaluate.py 2) Remove args output in validation/test_step and add instance variables instead for heteronym_classification.py Signed-off-by: Abhishree <abhishreetm@gmail.com> * Remove outputs from on_test_epoch_end in DialogueGPTClassificationModel Signed-off-by: Abhishree <abhishreetm@gmail.com> * Add validation/test outputs in sgdqa_model and modify dialogue_config.yaml Signed-off-by: Abhishree <abhishreetm@gmail.com> * Add split arg self.test_step_outputs to TextClassificationModel Signed-off-by: Abhishree <abhishreetm@gmail.com> * Add test_step_outputs to dialogue and text classification models Signed-off-by: Abhishree <abhishreetm@gmail.com> * Change condition check for multiple dataloaders: 1) Replace ds_item as list in dialogue_config.yaml 2) Check for len of val/test_dataloaders or validation/test_dl along with type check of list in sgdqa_model.py while appending outputs of validation/test_step 3) Check for len of _validation/test_dl for creating self.validation/test_step_outputs in ModelPT and punctuation_cpitalization_model.py Signed-off-by: Abhishree <abhishreetm@gmail.com> * Add additional condition for multi dataloaders Check len(self.trainer.val/test_dataloaders) > 1 along with type(self.trainer.val/test_dataloaders) == list for multi dataloaders in validation/test_step Signed-off-by: Abhishree <abhishreetm@gmail.com> * Add val step outputs and default val for dataloader_idx 1) Append validation_step outout to self.validation_step_outputs in MultiLabelIntentSlotClassificationMode 2) Add default val for dataloader_idx for on_test_batch_start/end in TimingCallback 3) Add self.validation/test_step_outputs in BERTQAModel and remove outputs arg Signed-off-by: Abhishree <abhishreetm@gmail.com> * Add val/test_step_outputs to S2SQAModel and GPTQAModel Signed-off-by: Abhishree <abhishreetm@gmail.com> * Edit JenkinsFile for bert_pretrainig.py Edit Jenkinsfile for this test to disable validation as a workaround for trainer.val_dataloader None error Signed-off-by: Abhishree <abhishreetm@gmail.com> * Modify precision to support 16-mixed, bf16-mixed in megatron_gpt_pretraining.py Signed-off-by: Abhishree <abhishreetm@gmail.com> * Add ddp_find_unused_parameters_true and remove output args 1) Add ddp_find_unused_parameters_true fro trainer.strategy in self_alignment_pretraining.py as it has unused parameters 2) Remove output args and add self.validation/test_step_outputs to validation/test_step in mt_enc_dec_model.py 3) Comment tests in JenkinsFile that need to be fixed Signed-off-by: Abhishree <abhishreetm@gmail.com> * Precision fix in megatron_nmt_training.py for 16-mixed, bf16-mixed Signed-off-by: Abhishree <abhishreetm@gmail.com> * Precision fix for megatron_bert_pretraining.py and megatron_bert_model.py Signed-off-by: Abhishree <abhishreetm@gmail.com> * Precision fix and validation/test_step_outputs 1) Add fix to account for 16-mixed and bf16-mixed in megatron_retro_mutransfer_pretrain.py, megatron_retro_pretraining.py 2) Reset ckpt_path for test in enc_dec_nmt.py 3) Remove outputs args and add validation/test_step_outputs in megatron_retrieval_model.py 4) Comment Megatron Bert Pretraining and Resume Training with Pipeline Paralleism and add back NMT Training Post-LN Signed-off-by: Abhishree <abhishreetm@gmail.com> * Precision fix and skip few failing tests Signed-off-by: Abhishree <abhishreetm@gmail.com> * Add missing comment lines in JenkinsFile Signed-off-by: Abhishree <abhishreetm@gmail.com> * Comment jenkin tests and super().on_validation_epoch_end() in megatron_gpt_sft_model.py Signed-off-by: Abhishree <abhishreetm@gmail.com> * Minor edit JenkinsFile Signed-off-by: Abhishree <abhishreetm@gmail.com> * Minor edit in jenkins file Signed-off-by: Abhishree <abhishreetm@gmail.com> * Edit in Jenkins file Signed-off-by: Abhishree <abhishreetm@gmail.com> * Comment missed lines in Jenkins file Signed-off-by: Abhishree <abhishreetm@gmail.com> * Fix precision and validation/test outputs 1) Add precision fix to account for 16-mixed and bf16-mixed in megatron_t5_pretraining.py 2) Remove outputs args and add append loss to self.validation/test_step_outputs in megatron_lm_encoder_decoder_model.py 3) Add back resume_from_checkpoint in the megatron_t5_config.yaml 4) Comment out certain tests in Jenkins file Signed-off-by: Abhishree <abhishreetm@gmail.com> * Fix precision and validation/test/predict errors in megatron_t5_prompt_learning.py Signed-off-by: Abhishree <abhishreetm@gmail.com> * Precision fix and edit precision typo in all files 1) Account for 16-mixed and bf16-mixed in megatron_bart_pretraining.py and megatron_t5_seq2seq_finetune.py 2) Fix precision typo in all files Signed-off-by: Abhishree <abhishreetm@gmail.com> * Fix all CI TTS tests and comment few Jenkins tests Signed-off-by: Abhishree <abhishreetm@gmail.com> * Combine xx_epoch_end and on_xx_epoch_end Add on_inference_epoch_end to inference_epoch_end function and have a single on_validation/test_epoch_end in megatron_finetune_model.py and megatron_gpt_sft_model.py Signed-off-by: Abhishree <abhishreetm@gmail.com> * Add a missing comment in JenkinsFile Signed-off-by: Abhishree <abhishreetm@gmail.com> * Add try except StopIteration in validation_step for models with dataloader_iter Signed-off-by: Abhishree <abhishreetm@gmail.com> * Remove pyyaml from requirements Signed-off-by: Abhishree <abhishreetm@gmail.com> * Add try except for inference_step in megatron_finetune_model.py Signed-off-by: Abhishree <abhishreetm@gmail.com> * Remove limit_val_batches for mockGPTDataset test Signed-off-by: Abhishree <abhishreetm@gmail.com> * Add new self.validation_step_outputs for MegatronGPTSFTModel Signed-off-by: Abhishree <abhishreetm@gmail.com> * Minor edit Jenkinsfile Signed-off-by: Abhishree <abhishreetm@gmail.com> * Initialize self.validation/test_step_outputs in megatron_gpt_sft_model.py Initialize self.validation/test_step_outputs in setup of MegatronGPTSFTModel to take care of cases when datalaoders are not setup in ModelPT for example while restoring the model. Signed-off-by: Abhishree <abhishreetm@gmail.com> * Remove resume_from_checkpoint if trainer arg in conf yaml files Signed-off-by: Abhishree <abhishreetm@gmail.com> * Remove resume_from_checkpoint as trainer arg in GPT, T5 configs Signed-off-by: Abhishree <abhishreetm@gmail.com> * Remove resume_from_checkpoint in duplex_tn_config.yaml Signed-off-by: Abhishree <abhishreetm@gmail.com> * Fix typos, unused imports and refactor code to remove redundant funcs Signed-off-by: Abhishree <abhishreetm@gmail.com> * Remove commented code in megatron_nmt_model.py Signed-off-by: Abhishree <abhishreetm@gmail.com> * Fix overriden functions to match parent class functions Signed-off-by: Abhishree <abhishreetm@gmail.com> * Prefetch dataloader_iter to prevent hang for PP>1 Signed-off-by: Abhishree <abhishreetm@gmail.com> * Override setup() in NLPDDPStrategy to avoid hang during predict with PP>1 Signed-off-by: Abhishree <abhishreetm@gmail.com> * Uncomment tests in JenkinsFile Signed-off-by: Abhishree <abhishreetm@gmail.com> * Add '16' to precision checks and other minor fixes Signed-off-by: Abhishree <abhishreetm@gmail.com> * Clear validation/test_step_outputs with dataloader_idx for multi dataloaders Signed-off-by: Abhishree <abhishreetm@gmail.com> * Minor edits Signed-off-by: Abhishree <abhishreetm@gmail.com> * Modify precision checks to avoid indexing Signed-off-by: Abhishree <abhishreetm@gmail.com> * Remove self.validation_step_outputs_sft and add dataloader_idx to clear outputs Signed-off-by: Abhishree <abhishreetm@gmail.com> * Reference checkpoint with trainer.ckpt_path Signed-off-by: Abhishree <abhishreetm@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add _prefetch to NLPModel and minor fixes Signed-off-by: Abhishree <abhishreetm@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add limit_val_batches in JenkinsFile for NMT 1) Add trainer.limit_val_batches in Megatron NMT Training TP=2 2) Remove unused import in ModelPT Signed-off-by: Abhishree <abhishreetm@gmail.com> --------- Signed-off-by: Abhishree <abhishreetm@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Include the scripts for preprocessing OAST and unit tests for chat sft datasets (#7112) * scripts for sft Signed-off-by: Yi Dong <yidong@nvidia.com> * fix style Signed-off-by: Yi Dong <yidong@nvidia.com> * adde special token only for huggingface model Signed-off-by: Yi Dong <yidong@nvidia.com> * change default name Signed-off-by: Yi Dong <yidong@nvidia.com> * print out error datapoint content Signed-off-by: Yi Dong <yidong@nvidia.com> * show error id Signed-off-by: Yi Dong <yidong@nvidia.com> * annotation script working Signed-off-by: Yi Dong <yidong@nvidia.com> * try to be compatible with huggingface tokenizer Signed-off-by: Yi Dong <yidong@nvidia.com> * added examples Signed-off-by: Yi Dong <yidong@nvidia.com> * added lang Signed-off-by: Yi Dong <yidong@nvidia.com> * added lang Signed-off-by: Yi Dong <yidong@nvidia.com> * text to value special case Signed-off-by: Yi Dong <yidong@nvidia.com> * configure the slider Signed-off-by: Yi Dong <yidong@nvidia.com> * annoatation handles lang Signed-off-by: Yi Dong <yidong@nvidia.com> * added the unit test for chat sft dataset Signed-off-by: Yi Dong <yidong@nvidia.com> * used the file in the test dir Signed-off-by: Yi Dong <yidong@nvidia.com> * fix json error Signed-off-by: Yi Dong <yidong@nvidia.com> * load local tokenizer Signed-off-by: Yi Dong <yidong@nvidia.com> * remove mask count check Signed-off-by: Yi Dong <yidong@nvidia.com> * added HF dataset backend Signed-off-by: Yi Dong <yidong@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Yi Dong <yidong@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * add paths to labeler. (#7087) Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> * T5 metrics fix (#7037) * Fix race condition when executing with multi-node where some ranks does not wait for setup (#7016) Signed-off-by: Kim Ngo <6362111+findkim@users.noreply.github.com> Signed-off-by: jubick1337 <mattyson.so@gmail.com> * Added bool types to neural_types export (#7032) Signed-off-by: tbartley94 <tbartley@nvidia.com> Signed-off-by: jubick1337 <mattyson.so@gmail.com> * rnnt and char utils (#6971) * rnnt_ngram_merge Signed-off-by: Nikolay Karpov <karpnv@gmail.com> * char level bug Signed-off-by: Nikolay Karpov <karpnv@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Nikolay Karpov <karpnv@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Somshubra Majumdar <titu1994@gmail.com> Signed-off-by: jubick1337 <mattyson.so@gmail.com> * fix tab text gen (#7022) (#7031) Signed-off-by: Yi Dong <yidong@nvidia.com> Co-authored-by: Yi Dong <43824965+yidong72@users.noreply.github.com> Signed-off-by: jubick1337 <mattyson.so@gmail.com> * Fixed kwargs for metric instance init Signed-off-by: jubick1337 <mattyson.so@gmail.com> * Fixed kwargs for metric instance init Signed-off-by: jubick1337 <mattyson.so@gmail.com> * removed kwagrs Signed-off-by: jubick1337 <mattyson.so@gmail.com> * Updated config desc Signed-off-by: jubick1337 <mattyson.so@gmail.com> * ASR Confidence update and tutorial (#6810) * small fixes and tests Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com> * various fixes for the tutorial Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com> * tutorial added Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com> * for for a little oops after rebasement Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix tests Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com> * unused import removed Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com> * fix review comments Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com> * deprecated parameters for greedy configs Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com> * move re-assigning to configs Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com> * fix comments 2 Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com> * fix config tests Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com> * fix ece test (my env was bugged apparently) Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com> * renamings for confidence ensemble Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fox comments 3 Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com> * return dropped tutorial Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com> * CI flips back and forth, increasing tolerance Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com> --------- Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: jubick1337 <mattyson.so@gmail.com> * install_bs (#7019) (#7028) Signed-off-by: Nikolay Karpov <karpnv@gmail.com> Co-authored-by: Nikolay Karpov <karpnv@gmail.com> Signed-off-by: jubick1337 <mattyson.so@gmail.com> * fixes for spellmapper (#6994) (#7000) Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> Co-authored-by: bene-ges <antonova_sasha@list.ru> Co-authored-by: Evelina <10428420+ekmb@users.noreply.github.com> Signed-off-by: jubick1337 <mattyson.so@gmail.com> * added back the retro documents (#7033) Signed-off-by: Yi Dong <yidong@nvidia.com> Signed-off-by: jubick1337 <mattyson.so@gmail.com> * Remove pyyaml (#7052) (#7054) Signed-off-by: smajumdar <titu1994@gmail.com> Co-authored-by: Somshubra Majumdar <titu1994@gmail.com> Signed-off-by: jubick1337 <mattyson.so@gmail.com> * st standalone model (#6969) * st standalone model Signed-off-by: AlexGrinch <grinchuk.alexey@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * style fix Signed-off-by: AlexGrinch <grinchuk.alexey@gmail.com> * sacrebleu import fix, unused imports removed Signed-off-by: AlexGrinch <grinchuk.alexey@gmail.com> * import guard for nlp inside asr transformer bpe model Signed-off-by: AlexGrinch <grinchuk.alexey@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * codeql fixes Signed-off-by: AlexGrinch <grinchuk.alexey@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * comments answered Signed-off-by: AlexGrinch <grinchuk.alexey@gmail.com> * import ordering fix Signed-off-by: AlexGrinch <grinchuk.alexey@gmail.com> * yttm for asr removed Signed-off-by: AlexGrinch <grinchuk.alexey@gmail.com> * logging added Signed-off-by: AlexGrinch <grinchuk.alexey@gmail.com> * added inference and translate method Signed-off-by: AlexGrinch <grinchuk.alexey@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: AlexGrinch <grinchuk.alexey@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: jubick1337 <mattyson.so@gmail.com> * remove pos emb from state dict for old models (#7068) * remove pos emb from state dict Signed-off-by: Evelina <ebakhturina@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * move to nlp_model Signed-off-by: Evelina <ebakhturina@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update comment Signed-off-by: Evelina <ebakhturina@nvidia.com> * fix nmt test Signed-off-by: Evelina <ebakhturina@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix nmt test Signed-off-by: Evelina <ebakhturina@nvidia.com> --------- Signed-off-by: Evelina <ebakhturina@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: jubick1337 <mattyson.so@gmail.com> * Fix typo in ASR-TTS tutorial (#7049) Signed-off-by: Vladimir Bataev <vbataev@nvidia.com> Signed-off-by: jubick1337 <mattyson.so@gmail.com> * Fixed tutorial's name (#7047) Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com> Co-authored-by: Vladimir Bataev <vbataev@nvidia.com> Signed-off-by: jubick1337 <mattyson.so@gmail.com> * Fix documentation for Numba (#7065) (#7077) * Fix documentation for Numba * Update force float32 flag dynamically * Update force float32 flag dynamically * Fix nemo version --------- Signed-off-by: smajumdar <titu1994@gmail.com> Co-authored-by: Somshubra Majumdar <titu1994@gmail.com> Co-authored-by: Eric Harper <complex451@gmail.com> Signed-off-by: jubick1337 <mattyson.so@gmail.com> * Update Frame-VAD doc and fix onnx export (#7076) * update fvad doc Signed-off-by: stevehuang52 <heh@nvidia.com> * fix typo Signed-off-by: stevehuang52 <heh@nvidia.com> * update fvad example Signed-off-by: stevehuang52 <heh@nvidia.com> * update Signed-off-by: stevehuang52 <heh@nvidia.com> * fix onnx export Signed-off-by: stevehuang52 <heh@nvidia.com> * update test Signed-off-by: stevehuang52 <heh@nvidia.com> * refactor Signed-off-by: stevehuang52 <heh@nvidia.com> * update doc Signed-off-by: stevehuang52 <heh@nvidia.com> * update Signed-off-by: stevehuang52 <heh@nvidia.com> --------- Signed-off-by: stevehuang52 <heh@nvidia.com> Co-authored-by: fayejf <36722593+fayejf@users.noreply.github.com> Signed-off-by: jubick1337 <mattyson.so@gmail.com> * memmap worker arg (#7062) * memmap worker arg Signed-off-by: arendu <adithya.r@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update Signed-off-by: arendu <adithya.r@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update Signed-off-by: arendu <adithya.r@gmail.com> * update Signed-off-by: arendu <adithya.r@gmail.com> --------- Signed-off-by: arendu <adithya.r@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: jubick1337 <mattyson.so@gmail.com> * Fix caching bug in causal convolutions for cache-aware ASR models (#7034) (#7082) Co-authored-by: Vahid Noroozi <VahidooX@users.noreply.github.com> Signed-off-by: jubick1337 <mattyson.so@gmail.com> * Fast Conformer global token fix (#7085) * old way Signed-off-by: sam1373 <samuelkriman@gmail.com> * fix Signed-off-by: sam1373 <samuelkriman@gmail.com> * fix Signed-off-by: sam1373 <samuelkriman@gmail.com> * fix Signed-off-by: sam1373 <samuelkriman@gmail.com> * remove extra Signed-off-by: sam1373 <samuelkriman@gmail.com> * clean Signed-off-by: sam1373 <samuelkriman@gmail.com> * clean Signed-off-by: sam1373 <samuelkriman@gmail.com> * clean Signed-off-by: sam1373 <samuelkriman@gmail.com> * fix Signed-off-by: sam1373 <samuelkriman@gmail.com> * fix Signed-off-by: sam1373 <samuelkriman@gmail.com> * fix Signed-off-by: sam1373 <samuelkriman@gmail.com> * fix Signed-off-by: sam1373 <samuelkriman@gmail.com> * fix Signed-off-by: sam1373 <samuelkriman@gmail.com> * fix Signed-off-by: sam1373 <samuelkriman@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: sam1373 <samuelkriman@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: jubick1337 <mattyson.so@gmail.com> * Refined export_config (#7053) (#7066) * Refined export_config * Rolling back hierarchy change --------- Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com> Co-authored-by: Boris Fomitchev <borisfom@users.noreply.github.com> Signed-off-by: jubick1337 <mattyson.so@gmail.com> * small Bugfix (#7081) * small Bugfix (#7079) * fix branch Signed-off-by: fayejf <fayejf07@gmail.com> * fix typo Signed-off-by: fayejf <fayejf07@gmail.com> * fix link Signed-off-by: fayejf <fayejf07@gmail.com> --------- Signed-off-by: fayejf <fayejf07@gmail.com> * Update tutorials/nlp/SpellMapper_English_ASR_Customization.ipynb Signed-off-by: Somshubra Majumdar <titu1994@gmail.com> * Update tutorials/nlp/SpellMapper_English_ASR_Customization.ipynb Signed-off-by: Somshubra Majumdar <titu1994@gmail.com> --------- Signed-off-by: fayejf <fayejf07@gmail.com> Signed-off-by: Somshubra Majumdar <titu1994@gmail.com> Co-authored-by: fayejf <36722593+fayejf@users.noreply.github.com> Co-authored-by: Somshubra Majumdar <titu1994@gmail.com> Signed-off-by: jubick1337 <mattyson.so@gmail.com> * Added script to extract ASR CTC and RNNT models from ASR hybrid models (#7092) * Added script to extract ctc and rnnt models from hybrid models Signed-off-by: Daniel Egert <degert@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Updated hybrid extraction script for review request 1 Signed-off-by: Daniel Egert <degert@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Updated hybrid convert script to remove --cuda flag Signed-off-by: Daniel Egert <degert@nvidia.com> --------- Signed-off-by: Daniel Egert <degert@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Somshubra Majumdar <titu1994@gmail.com> Signed-off-by: jubick1337 <mattyson.so@gmail.com> * Adding docs and models for multiple lookahead cache-aware ASR (#7067) (#7094) Signed-off-by: jubick1337 <mattyson.so@gmail.com> * update TTS readme (#7088) * update TTS readme Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> --------- Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Signed-off-by: jubick1337 <mattyson.so@gmail.com> * Fix absolute path in path join call (#7099) Signed-off-by: Jan Beckmann <king-jan1999@hotmail.de> Signed-off-by: jubick1337 <mattyson.so@gmail.com> * Disable distopt contiguous param buffer by default (#7095) Signed-off-by: Tim Moon <tmoon@nvidia.com> Signed-off-by: jubick1337 <mattyson.so@gmail.com> * microphone demo (#7110) Signed-off-by: Linnea Pari Leaver <lleaver@lleaver-mlt.client.nvidia.com> Co-authored-by: Linnea Pari Leaver <lleaver@lleaver-mlt.client.nvidia.com> Signed-off-by: jubick1337 <mattyson.so@gmail.com> * [Fix] load_state_dict in nlp_model.py (#7086) * Fix load_state_dict in nlp_model.py Signed-off-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: jubick1337 <mattyson.so@gmail.com> * Fix plot function in vad_utils.py (#7113) Fix plot function in vad_utils.py Signed-off-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com> Signed-off-by: jubick1337 <mattyson.so@gmail.com> * Fixed small bug with NoisePerturbationWithNormalization (#7118) Signed-off-by: Daniel Egert <degert@nvidia.com> Signed-off-by: jubick1337 <mattyson.so@gmail.com> * Fix import guard checks (#7124) Signed-off-by: smajumdar <titu1994@gmail.com> Signed-off-by: jubick1337 <mattyson.so@gmail.com> * Revert "Fix import guard checks (#7124)" (#7125) This reverts commit a46e3251944642f9102aa16ce2d2f9d3a804ff8a. Signed-off-by: jubick1337 <mattyson.so@gmail.com> * Fix import guard checks (#7126) * Fix import guard checks Signed-off-by: smajumdar <titu1994@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: smajumdar <titu1994@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: jubick1337 <mattyson.so@gmail.com> * Add updated fc ctc and rnnt xxl models (#7128) (#7130) Signed-off-by: jubick1337 <mattyson.so@gmail.com> * [TTS] Create EnCodec training recipe (#6852) * [TTS] Create EnCodec training recipe Signed-off-by: Ryan <rlangman@nvidia.com> * [TTS] Update encodec recipe Signed-off-by: Ryan <rlangman@nvidia.com> * [TTS] Rename EnCodec to AudioCodec Signed-off-by: Ryan <rlangman@nvidia.com> * [TTS] Add EnCodec unit tests Signed-off-by: Ryan <rlangman@nvidia.com> * [TTS] Add copyright header to distributed.py Signed-off-by: Ryan <rlangman@nvidia.com> --------- Signed-off-by: Ryan <rlangman@nvidia.com> Signed-off-by: jubick1337 <mattyson.so@gmail.com> * Fix rank where torch.distributed may not be initialized yet and would not wait for tokenizer file caching (#7061) Signed-off-by: Kim Ngo <6362111+findkim@users.noreply.github.com> Co-authored-by: David <amosalla@asu.edu> Signed-off-by: jubick1337 <mattyson.so@gmail.com> * fix default attention size (#7141) (#7143) Signed-off-by: jubick1337 <mattyson.so@gmail.com> * fix evaluator.py for various exceptions by ast (#7150) Signed-off-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com> Signed-off-by: jubick1337 <mattyson.so@gmail.com> * [TTS][ZH] add Chinese TTS recipes based on IPA symbol sets. (#6893) * [TTS] add Chinese TTS recipe based on IPA. * add new pinyin and ipa dictionaries with 36 finals. * add yaml configs for 24-final pinyin and ipa. * add copyright header * add a directory level 24finals to discriminate from 36 finals. Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> * unify configs into a single one and add detailed comments providing supported candidates. Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> * choose 36-final IPA as default phoneme dict Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> --------- Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Signed-off-by: jubick1337 <mattyson.so@gmail.com> * [TTS] Add output audio format to preprocessing (#6889) * [TTS] Add output audio format to preprocessing Signed-off-by: Ryan <rlangman@nvidia.com> * [TTS] Add format validation Signed-off-by: Ryan <rlangman@nvidia.com> * [TTS] Fix data tutorial Signed-off-by: Ryan <rlangman@nvidia.com> --------- Signed-off-by: Ryan <rlangman@nvidia.com> Signed-off-by: jubick1337 <mattyson.so@gmail.com> * freeze (#7152) Signed-off-by: arendu <adithya.r@gmail.com> Signed-off-by: jubick1337 <mattyson.so@gmail.com> * make sure any empty segments are removed (#7155) Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com> Signed-off-by: jubick1337 <mattyson.so@gmail.com> * Update RIR generation scripts (#6547) - fix: reduce room size if evaluation of params fails - added randomized mic placement - added diffuse noise generation - added an option to specify the format and subtype for saved audio Signed-off-by: Ante Jukić <ajukic@nvidia.com> Signed-off-by: jubick1337 <mattyson.so@gmail.com> * A quickstart speech enhancement tutorial (#6492) A simple example of training a model for speech enhancement task Signed-off-by: Ante Jukić <ajukic@nvidia.com> Signed-off-by: jubick1337 <mattyson.so@gmail.com> * NFA subtitle file config - specify colors and vertical alignment (#7160) * allow specifying colors of text in ASS subtitle file Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com> * specify vertical_alignment instead of marginv in ass_file_config Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com> * add documentation of CTMFileConfig and ASSFileConfig to NFA README Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com> --------- Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com> Signed-off-by: jubick1337 <mattyson.so@gmail.com> * Eagerly accumulate embedding grads into fp32 buffer (#6958) (#7153) Signed-off-by: Tim Moon <tmoon@nvidia.com> Co-authored-by: Tim Moon <4406448+timmoon10@users.noreply.github.com> Signed-off-by: jubick1337 <mattyson.so@gmail.com> * TE bug fix (#7027) (#7036) Signed-off-by: Dmytro Pykhtar <dpykhtar@nvidia.com> Co-authored-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com> Signed-off-by: jubick1337 <mattyson.so@gmail.com> * [TTS] Remove nested TTS configs (#7154) * [TTS] Remove nested TTS configs Signed-off-by: Ryan <rlangman@nvidia.com> * [TTS] Modify tutorial to support multiple sampling rates Signed-off-by: Ryan <rlangman@nvidia.com> * [TTS] Clarify min_duration unit Signed-off-by: Ryan <rlangman@nvidia.com> * [TTS] Default 22.05kHz highfreq to null Signed-off-by: Ryan <rlangman@nvidia.com> --------- Signed-off-by: Ryan <rlangman@nvidia.com> Signed-off-by: jubick1337 <mattyson.so@gmail.com> * Merge release r1.20.0 to main (#7167) * update package info Signed-off-by: ericharper <complex451@gmail.com> * Add ASR with TTS Tutorial. Fix enhancer usage. (#6955) * Add ASR with TTS Tutorial * Fix enhancer usage Signed-off-by: Vladimir Bataev <vbataev@nvidia.com> * install_bs (#7019) Signed-off-by: Nikolay Karpov <karpnv@gmail.com> * Fix typo and branch in tutorial (#7048) Signed-off-by: Vladimir Bataev <vbataev@nvidia.com> * fix syntax error introduced in PR-7079 (#7102) * fix syntax error introduced in PR-7079 Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> * fixes for pr review Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> --------- Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> * fix links for TN (#7117) Signed-off-by: Evelina <ebakhturina@nvidia.com> * update branch (#7135) Signed-off-by: ericharper <complex451@gmail.com> * Fixed main and merging this to r1.20 (#7127) * Fixed main and merging this to r1.20 Signed-off-by: Taejin Park <tango4j@gmail.com> * Update vad_utils.py Signed-off-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com> --------- Signed-off-by: Taejin Park <tango4j@gmail.com> Signed-off-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com> Co-authored-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com> * update branch Signed-off-by: ericharper <complex451@gmail.com> * fix version Signed-off-by: ericharper <complex451@gmail.com> * resolve conflict the other way Signed-off-by: ericharper <complex451@gmail.com> * keep both Signed-off-by: ericharper <complex451@gmail.com> * revert keep both Signed-off-by: ericharper <complex451@gmail.com> --------- Signed-off-by: ericharper <complex451@gmail.com> Signed-off-by: Vladimir Bataev <vbataev@nvidia.com> Signed-off-by: Nikolay Karpov <karpnv@gmail.com> Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> Signed-off-by: Evelina <ebakhturina@nvidia.com> Signed-off-by: Taejin Park <tango4j@gmail.com> Signed-off-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com> Co-authored-by: Vladimir Bataev <vbataev@nvidia.com> Co-authored-by: Nikolay Karpov <karpnv@gmail.com> Co-authored-by: bene-ges <antonova_sasha@list.ru> Co-authored-by: Evelina <10428420+ekmb@users.noreply.github.com> Co-authored-by: Taejin Park <tango4j@gmail.com> Co-authored-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com> Signed-off-by: jubick1337 <mattyson.so@gmail.com> * Upgrade to pytorch lightning 2.0 (#6433) * Upgrade pytorch lightning version in requirements Signed-off-by: Abhishree <abhishreetm@gmail.com> * Initial fixes for PTL2.0 Signed-off-by: Abhishree <abhishreetm@gmail.com> * Add further fixes to support lightning 2.0 Signed-off-by: Abhishree <abhishreetm@gmail.com> * Add replacements for replace_sampler_ddp, resume_from_checkpoint_fit_path and few occurances of validation_epoch_end Signed-off-by: Abhishree <abhishreetm@gmail.com> * Replace all occurances of validation_epoch_end to on_validation_epoch_end Signed-off-by: Abhishree <abhishreetm@gmail.com> * Replace training_epoch_end, test_epoch_end with on_train_epoch_end and on_test_epoch_end respectively Signed-off-by: Abhishree <abhishreetm@gmail.com> * Change logger=None to logger=False in Trainer object Signed-off-by: Abhishree <abhishreetm@gmail.com> * Remove PTL2.0 deprecated Trainer args from TrainerConfig dataclass Signed-off-by: Abhishree <abhishreetm@gmail.com> * Modify trainer.precision check and other small edits Signed-off-by: Abhishree <abhishreetm@gmail.com> * Replace logger=None with logger=False in test_ptl_stateless_timer.py Trainer Signed-off-by: Abhishree <abhishreetm@gmail.com> * Add default values for args to fix Attribute Error Signed-off-by: Abhishree <abhishreetm@gmail.com> * Add the following modifications 1) Remove outputs arg from on_validation_epoch_end, on_test_epoch_end and make it an arg of the class 2) Replace resume_from_checkpoint with ckpt_path as needed 3) Explicitly add accelerator as 'CPU' in UTs being run on CPU Signed-off-by: Abhishree <abhishreetm@gmail.com> * Remove outputs arg from on_validation_epoch_end, on_test_epoch_end Signed-off-by: Abhishree <abhishreetm@gmail.com> * Remove outputs arg in on_validation_epoch_end in MultiBinaryAccuracy docstrings Signed-off-by: Abhishree <abhishreetm@gmail.com> * Add val, test outputs as instance vars in PunctuationCapitalizationModel and TokenClassificationModel Signed-off-by: Abhishree <abhishreetm@gmail.com> * Replace trainer.fit_loop.max_steps with trainer.fit_loop.epoch_loop.max_steps in test_optimizers_schedulers.py Signed-off-by: Abhishree <abhishreetm@gmail.com> * Revert an extra space that was mistakenly added Signed-off-by: Abhishree <abhishreetm@gmail.com> * Use self.validation_step_outputs and self.test_step_outputs in test_ema.py for uniformity Signed-off-by: Abhishree <abhishreetm@gmail.com> * Use self.validation_step_outputs and self.test_step_outputs in test_ptl_stateless_timer.py and check_for_ranks.py for uniformity Signed-off-by: Abhishree <abhishreetm@gmail.com> * Add self.validation_step_outputs.clear() and self.test_step_outputs.clear() wherever missing Signed-off-by: Abhishree <abhishreetm@gmail.com> * Remove outputs arg from on_train_epoch_end Signed-off-by: Abhishree <abhishreetm@gmail.com> * Remove outputs from on_validation_epoch_end in multi_binary_acc.py Signed-off-by: Abhishree <abhishreetm@gmail.com> * Remove output args from on_validation_epoch_end in the docstrings of some ASR files Signed-off-by: Abhishree <abhishreetm@gmail.com> * Remove output args from on_validation_epoch_end and clear memory from validation_step_outputs Signed-off-by: Abhishree <abhishreetm@gmail.com> * Add on_validation_epoch_end and remove outputs args for nlp models Signed-off-by: Abhishree <abhishreetm@gmail.com> * Append output of validation_step to validation_step_outputs in EncDecClassificationModel Signed-off-by: Abhishree <abhishreetm@gmail.com> * Add the following changes 1) Index self.validation_step_outputs and self.test_step.outputs with dataloader_idx wherever needed 2) Initialize self.validation_step_outputs and self.test_step.outputs as empty lists and add support for multi dataloaders if they exist 3) Remove self.pre_configure_ddp from NLPDDPStrategy class as its removed in PTL 2.0 Signed-off-by: Abhishree <abhishreetm@gmail.com> * Add default value dataloader_idx=0 for on_validation_batch_end() in megatron_base_model.py Signed-off-by: Abhishree <abhishreetm@gmail.com> * TypeCast precision to str in attention.py and utils_funcs.py to avoid TypeError Signed-off-by: Abhishree <abhishreetm@gmail.com> * Add if condition check for multiple dataloaders when appending to validation outputs Signed-off-by: Abhishree <abhishreetm@gmail.com> * Separate validation pass to be used with both validation_step and test_step Signed-off-by: Abhishree <abhishreetm@gmail.com> * Add if condition check for multiple dataloader while appending to test_step_outputs in punctuation_capitalization_model.py Signed-off-by: Abhishree <abhishreetm@gmail.com> * Add condition check for multiple dataloaders based on type of trainer.val/test_dataloaders or self._validation/test_dl instead of len Signed-off-by: Abhishree <abhishreetm@gmail.com> * Comment Megatron T5 IA3 PP=2 in CI pipeline due to dataloader_iter issue with PTL 2.0 Signed-off-by: Abhishree <abhishreetm@gmail.com> * Modify precision checks to account for 16-mixed and bf16-mixed Signed-off-by: Abhishree <abhishreetm@gmail.com> * Append output of validation/test_step to self.validation/test_step_outputs in CTCG2PModel Signed-off-by: Abhishree <abhishreetm@gmail.com> * Modify find_unused_parameters=True in g2p_heteronym model 1) Add find_unused_parameters=True for DDP strategy in g2p_heteronym_classification_train_and_evaluate.py 2) Remove args output in validation/test_step and add instance variables instead for heteronym_classification.py Signed-off-by: Abhishree <abhishreetm@gmail.com> * Remove outputs from on_test_epoch_end in DialogueGPTClassificationModel Signed-off-by: Abhishree <abhishreetm@gmail.com> * Add validation/test outputs in sgdqa_model and modify dialogue_config.yaml Signed-off-by: Abhishree <abhishreetm@gmail.com> * Add split arg self.test_step_outputs to TextClassificationModel Signed-off-by: Abhishree <abhishreetm@gmail.com> * Add test_step_outputs to dialogue and text classification models Signed-off-by: Abhishree <abhishreetm@gmail.com> * Change condition check for multiple dataloaders: 1) Replace ds_item as list in dialogue_config.yaml 2) Check for len of val/test_dataloaders or validation/test_dl along with type check of list in sgdqa_model.py while appending outputs of validation/test_step 3) Check for len of _validation/test_dl for creating self.validation/test_step_outputs in ModelPT and punctuation_cpitalization_model.py Signed-off-by: Abhishree <abhishreetm@gmail.com> * Add additional condition for multi dataloaders Check len(self.trainer.val/test_dataloaders) > 1 along with type(self.trainer.val/test_dataloaders) == list for multi dataloaders in validation/test_step Signed-off-by: Abhishree <abhishreetm@gmail.com> * Add val step outputs and default val for dataloader_idx 1) Append validation_step outout to self.validation_step_outputs in MultiLabelIntentSlotClassificationMode 2) Add default val for dataloader_idx for on_test_batch_start/end in TimingCallback 3) Add self.validation/test_step_outputs in BERTQAModel and remove outputs arg Signed-off-by: Abhishree <abhishreetm@gmail.com> * Add val/test_step_outputs to S2SQAModel and GPTQAModel Signed-off-by: Abhishree <abhishreetm@gmail.com> * Edit JenkinsFile for bert_pretrainig.py Edit Jenkinsfile for this test to disable validation as a workaround for trainer.val_dataloader None error Signed-off-by: Abhishree <abhishreetm@gmail.com> * Modify precision to support 16-mixed, bf16-mixed in megatron_gpt_pretraining.py Signed-off-by: Abhishree <abhishreetm@gmail.com> * Add ddp_find_unused_parameters_true and remove output args 1) Add ddp_find_unused_parameters_true fro trainer.strategy in self_alignment_pretraining.py as it has unused parameters 2) Remove output args and add self.validation/test_step_outputs to validation/test_step in mt_enc_dec_model.py 3) Comment tests in JenkinsFile that need to be fixed Signed-off-by: Abhishree <abhishreetm@gmail.com> * Precision fix in megatron_nmt_training.py for 16-mixed, bf16-mixed Signed-off-by: Abhishree <abhishreetm@gmail.com> * Precision fix for megatron_bert_pretraining.py and megatron_bert_model.py Signed-off-by: Abhishree <abhishreetm@gmail.com> * Precision fix and validation/test_step_outputs 1) Add fix to account for 16-mixed and bf16-mixed in megatron_retro_mutransfer_pretrain.py, megatron_retro_pretraining.py 2) Reset ckpt_path for test in enc_dec_nmt.py 3) Remove outputs args and add validation/test_step_outputs in megatron_retrieval_model.py 4) Comment Megatron Bert Pretraining and Resume Training with Pipeline Paralleism and add back NMT Training Post-LN Signed-off-by: Abhishree <abhishreetm@gmail.com> * Precision fix and skip few failing tests Signed-off-by: Abhishree <abhishreetm@gmail.com> * Add missing comment lines in JenkinsFile Signed-off-by: Abhishree <abhishreetm@gmail.com> * Comment jenkin tests and super().on_validation_epoch_end() in megatron_gpt_sft_model.py Signed-off-by: Abhishree <abhishreetm@gmail.com> * Minor edit JenkinsFile Signed-off-by: Abhishree <abhishreetm@gmail.com> * Minor edit in jenkins file Signed-off-by: Abhishree <abhishreetm@gmail.com> * Edit in Jenkins file Signed-off-by: Abhishree <abhishreetm@gmail.com> * Comment missed lines in Jenkins file Signed-off-by: Abhishree <abhishreetm@gmail.com> * Fix precision and validation/test outputs 1) Add precision fix to account for 16-mixed and bf16-mixed in megatron_t5_pretraining.py 2) Remove outputs args and add append loss to self.validation/test_step_outputs in megatron_lm_encoder_decoder_model.py 3) Add back resume_from_checkpoint in the megatron_t5_config.yaml 4) Comment out certain tests in Jenkins file Signed-off-by: Abhishree <abhishreetm@gmail.com> * Fix precision and validation/test/predict errors in megatron_t5_prompt_learning.py Signed-off-by: Abhishree <abhishreetm@gmail.com> * Precision fix and edit precision typo in all files 1) Account for 16-mixed and bf16-mixed in megatron_bart_pretraining.py and megatron_t5_seq2seq_finetune.py 2) Fix precision typo in all files Signed-off-by: Abhishree <abhishreetm@gmail.com> * Fix all CI TTS tests and comment few Jenkins tests Signed-off-by: Abhishree <abhishreetm@gmail.com> * Combine xx_epoch_end and on_xx_epoch_end Add on_inference_epoch_end to inference_epoch_end function and have a single on_validation/test_epoch_end in megatron_finetune_model.py and megatron_gpt_sft_model.py Signed-off-by: Abhishree <abhishreetm@gmail.com> * Add a missing comment in JenkinsFile Signed-off-by: Abhishree <abhishreetm@gmail.com> * Add try except StopIteration in validation_step for models with dataloader_iter Signed-off-by: Abhishree <abhishreetm@gmail.com> * Remove pyyaml from requirements Signed-off-by: Abhishree <abhishreetm@gmail.com> * Add try except for inference_step in megatron_finetune_model.py Signed-off-by: Abhishree <abhishreetm@gmail.com> * Remove limit_val_batches for mockGPTDataset test Signed-off-by: Abhishree <abhishreetm@gmail.com> * Add new self.validation_step_outputs for MegatronGPTSFTModel Signed-off-by: Abhishree <abhishreetm@gmail.com> * Minor edit Jenkinsfile Signed-off-by: Abhishree <abhishreetm@gmail.com> * Initialize self.validation/test_step_outputs in megatron_gpt_sft_model.py Initialize self.validation/test_step_outputs in setup of MegatronGPTSFTModel to take care of cases when datalaoders are not setup in ModelPT for example while restoring the model. Signed-off-by: Abhishree <abhishreetm@gmail.com> * Remove resume_from_checkpoint if trainer arg in conf yaml files Signed-off-by: Abhishree <abhishreetm@gmail.com> * Remove resume_from_checkpoint as trainer arg in GPT, T5 configs Signed-off-by: Abhishree <abhishreetm@gmail.com> * Remove resume_from_checkpoint in duplex_tn_config.yaml Signed-off-by: Abhishree <abhishreetm@gmail.com> * Fix typos, unused imports and refactor code to remove redundant funcs Signed-off-by: Abhishree <abhishreetm@gmail.com> * Remove commented code in megatron_nmt_model.py Signed-off-by: Abhishree <abhishreetm@gmail.com> * Fix overriden functions to match parent class functions Signed-off-by: Abhishree <abhishreetm@gmail.com> * Prefetch dataloader_iter to prevent hang for PP>1 Signed-off-by: Abhishree <abhishreetm@gmail.com> * Override setup() in NLPDDPStrategy to avoid hang during predict with PP>1 Signed-off-by: Abhishree <abhishreetm@gmail.com> * Uncomment tests in JenkinsFile Signed-off-by: Abhishree <abhishreetm@gmail.com> * Add '16' to precision checks and other minor fixes Signed-off-by: Abhishree <abhishreetm@gmail.com> * Clear validation/test_step_outputs with dataloader_idx for multi dataloaders Signed-off-by: Abhishree <abhishreetm@gmail.com> * Minor edits Signed-off-by: Abhishree <abhishreetm@gmail.com> * Modify precision checks to avoid indexing Signed-off-by: Abhishree <abhishreetm@gmail.com> * Remove self.validation_step_outputs_sft and add dataloader_idx to clear outputs Signed-off-by: Abhishree <abhishreetm@gmail.com> * Reference checkpoint with trainer.ckpt_path Signed-off-by: Abhishree <abhishreetm@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add _prefetch to NLPModel and minor fixes Signed-off-by: Abhishree <abhishreetm@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add limit_val_batches in JenkinsFile for NMT 1) Add trainer.limit_val_batches in Megatron NMT Training TP=2 2) Remove unused import in ModelPT Signed-off-by: Abhishree <abhishreetm@gmail.com> --------- Signed-off-by: Abhishree <abhishreetm@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: jubick1337 <mattyson.so@gmail.com> * Include the scripts for preprocessing OAST and unit tests for chat sft datasets (#7112) * scripts for sft Signed-off-by: Yi Dong <yidong@nvidia.com> * fix style Signed-off-by: Yi Dong <yidong@nvidia.com> * adde special token only for huggingface model Signed-off-by: Yi Dong <yidong@nvidia.com> * change default name Signed-off-by: Yi Dong <yidong@nvidia.com> * print out error datapoint content Signed-off-by: Yi Dong <yidong@nvidia.com> * show error id Signed-off-by: Yi Dong <yidong@nvidia.com> * annotation script working Signed-off-by: Yi Dong <yidong@nvidia.com> * try to be compatible with huggingface tokenizer Signed-off-by: Yi Dong <yidong@nvidia.com> * added examples Signed-off-by: Yi Dong <yidong@nvidia.com> * added lang Signed-off-by: Yi Dong <yidong@nvidia.com> * added lang Signed-off-by: Yi Dong <yidong@nvidia.com> * text to value special case Signed-off-by: Yi Dong <yidong@nvidia.com> * configure the slider Signed-off-by: Yi Dong <yidong@nvidia.com> * annoatation handles lang Signed-off-by: Yi Dong <yidong@nvidia.com> * added the unit test for chat sft dataset Signed-off-by: Yi Dong <yidong@nvidia.com> * used the file in the test dir Signed-off-by: Yi Dong <yidong@nvidia.com> * fix json error Signed-off-by: Yi Dong <yidong@nvidia.com> * load local tokenizer Signed-off-by: Yi Dong <yidong@nvidia.com> * remove mask count check Signed-off-by: Yi Dong <yidong@nvidia.com> * added HF dataset backend Signed-off-by: Yi Dong <yidong@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Yi Dong <yidong@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: jubick1337 <mattyson.so@gmail.com> * add paths to labeler. (#7087) Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Signed-off-by: jubick1337 <mattyson.so@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Kim Ngo <6362111+findkim@users.noreply.github.com> Signed-off-by: jubick1337 <mattyson.so@gmail.com> Signed-off-by: tbartley94 <tbartley@nvidia.com> Signed-off-by: Nikolay Karpov <karpnv@gmail.com> Signed-off-by: Yi Dong <yidong@nvidia.com> Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com> Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru> Signed-off-by: smajumdar <titu1994@gmail.com> Signed-off-by: AlexGrinch <grinchuk.alexey@gmail.com> Signed-off-by: Evelina <ebakhturina@nvidia.com> Signed-off-by: Vladimir Bataev <vbataev@nvidia.com> Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com> Signed-off-by: stevehuang52 <heh@nvidia.com> Signed-off-by: arendu <adithya.r@gmail.com> Signed-off-by: sam1373 <samuelkriman@gmail.com> Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com> Signed-off-by: fayejf <fayejf07@gmail.com> Signed-off-by: Somshubra Majumdar <titu1994@gmail.com> Signed-off-by: Daniel Egert <degert@nvidia.com> Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Signed-off-by: Jan Beckmann <king-jan1999@hotmail.de> Signed-off-by: Tim Moon <tmoon@nvidia.com> Signed-off-by: Linnea Pari Leaver <lleaver@lleaver-mlt.client.nvidia.com> Signed-off-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com> Signed-off-by: Ryan <rlangman@nvidia.com> Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com> Signed-off-by: Ante Jukić <ajukic@nvidia.com> Signed-off-by: Dmytro Pykhtar <dpykhtar@nvidia.com> Signed-off-by: ericharper <complex451@gmail.com> Signed-off-by: Taejin Park <tango4j@gmail.com> Signed-off-by: Abhishree <abhishreetm@gmail.com> Co-authored-by: Kim Ngo <6362111+findkim@users.noreply.github.com> Co-authored-by: tbartley94 <90423858+tbartley94@users.noreply.github.com> Co-authored-by: Nikolay Karpov <karpnv@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Somshubra Majumdar <titu1994@gmail.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Yi Dong <43824965+yidong72@users.noreply.github.com> Co-authored-by: Aleksandr Laptev <alaptev@nvidia.com> Co-authored-by: bene-ges <antonova_sasha@list.ru> Co-authored-by: Evelina <10428420+ekmb@users.noreply.github.com> Co-authored-by: Aleksey Grinchuk (Oleksii Hrinchuk) <grinchuk.alexey@gmail.com> Co-authored-by: Vladimir Bataev <vbataev@nvidia.com> Co-authored-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com> Co-authored-by: Eric Harper <complex451@gmail.com> Co-authored-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com> Co-authored-by: fayejf <36722593+fayejf@users.noreply.github.com> Co-authored-by: Adi Renduchintala <adithyar…

Signed-off-by: zhehuaichen <139396994+zhehuaichen@users.noreply.github.com>

zhehuaichen and others added 2 commits September 22, 2023 16:24

Merge branch 'speechllm_selene_clean' into speechllm_selene_clean-merge

b554dc3

Signed-off-by: zhehuaichen <139396994+zhehuaichen@users.noreply.github.com>

zhehuaichen marked this pull request as ready for review September 22, 2023 20:42

zhehuaichen merged commit 206af78 into speechllm_selene_clean Sep 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sync after 6915 (#14) #15

sync after 6915 (#14) #15

zhehuaichen commented Sep 22, 2023

sync after 6915 (#14) #15

sync after 6915 (#14) #15

Conversation

zhehuaichen commented Sep 22, 2023

What does this PR do ?

Changelog

Usage

Before your PR is "Ready for review"

Who can review?

Additional Information