forked from NVIDIA/NeMo
-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* backbone Signed-off-by: fayejf <fayejf07@gmail.com> * engineer and analyzer Signed-off-by: fayejf <fayejf07@gmail.com> * offline_by_chunked Signed-off-by: fayejf <fayejf07@gmail.com> * test_ds wip Signed-off-by: fayejf <fayejf07@gmail.com> * temp remove inference Signed-off-by: fayejf <fayejf07@gmail.com> * mandarin yaml Signed-off-by: fayejf <fayejf07@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * augmentor and a few updates Signed-off-by: fayejf <fayejf07@gmail.com> * address alerts and revert unnecessary changes Signed-off-by: fayejf <fayejf07@gmail.com> * Add readme Signed-off-by: fayejf <fayejf07@gmail.com> * rename Signed-off-by: fayejf <fayejf07@gmail.com> * typo fix Signed-off-by: fayejf <fayejf07@gmail.com> * small fix Signed-off-by: fayejf <fayejf07@gmail.com> * add missing header Signed-off-by: fayejf <fayejf07@gmail.com> * rename augmentor_config to augmentor Signed-off-by: fayejf <fayejf07@gmail.com> * raname inference_mode to inference Signed-off-by: fayejf <fayejf07@gmail.com> * move utils.py Signed-off-by: fayejf <fayejf07@gmail.com> * update temp file Signed-off-by: fayejf <fayejf07@gmail.com> * make wer cer clear Signed-off-by: fayejf <fayejf07@gmail.com> * seed_everything Signed-off-by: fayejf <fayejf07@gmail.com> * fix missing rn augmentor_config in rnnt Signed-off-by: fayejf <fayejf07@gmail.com> * fix rnnt transcribe Signed-off-by: fayejf <fayejf07@gmail.com> * add more docstring and style fix Signed-off-by: fayejf <fayejf07@gmail.com> * address codeQL Signed-off-by: fayejf <fayejf07@gmail.com> * reflect comments Signed-off-by: fayejf <fayejf07@gmail.com> * update readme Signed-off-by: fayejf <fayejf07@gmail.com> * clearer Signed-off-by: fayejf <fayejf07@gmail.com> Signed-off-by: fayejf <fayejf07@gmail.com> Signed-off-by: fayejf <36722593+fayejf@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com>
- Loading branch information
Showing
12 changed files
with
685 additions
and
3 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,44 @@ | ||
ASR evaluator | ||
-------------------- | ||
|
||
A tool for thoroughly evaluating the performance of ASR models and other features such as Voice Activity Detection. | ||
|
||
Features: | ||
- Simple step to evaluate a model in all three modes currently supported by NeMo: offline, chunked, and offline_by_chunked. | ||
- On-the-fly data augmentation (such as silence, noise, etc.,) for ASR robustness evaluation. | ||
- Investigate the model's performance by detailed insertion, deletion, and substitution error rates for each and all samples. | ||
- Evaluate models' reliability on different target groups such as gender, and audio length if metadata is presented. | ||
|
||
|
||
ASR evaluator contains two main parts: | ||
- **ENGINE**. To conduct ASR inference. | ||
- **ANALYST**. To evaluate model performance based on predictions. | ||
|
||
In Analyst, we can evaluate on metadata (such as duration, emotion, etc.) if it presents in manifest. For example, with the following config, we can calculate WERs for audios in different interval groups, where each group (in seconds) is defined by [[0,2],[2,5],[5,10],[10,20],[20,100000]]. Also, we can calculate the WERs for three groups of emotions, where each group is defined by [['happy','laugh'],['neutral'],['sad']]. Moreover, if we set save_wer_per_class=True, it will calculate WERs for audios in all classes presented in the data (i.e. above 5 classes + 'cry' which presented in data but not in the slot). | ||
|
||
``` | ||
analyst: | ||
metadata: | ||
duration: | ||
enable: True | ||
slot: [[0,2],[2,5],[5,10],[10,20],[20,100000]] | ||
save_wer_per_class: False # whether to save wer for each presented class. | ||
emotion: | ||
enable: True | ||
slot: [['happy','laugh'],['neutral'],['sad']] # we could have 'cry' in data but not in slot we focus on. | ||
save_wer_per_class: False | ||
``` | ||
Check `./conf/eval.yaml` for the supported configuration. | ||
|
||
If you plan to evaluate/add new tasks such as Punctuation and Capitalization, add it to the engine. | ||
|
||
Run | ||
``` | ||
python asr_evaluator.py \ | ||
engine.pretrained_name="stt_en_conformer_transducer_large" \ | ||
engine.inference_mode.mode="offline" \ | ||
engine.test_ds.augmentor.noise.manifest_path=<manifest file for noise data> | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,92 @@ | ||
# Copyright (c) 2022, NVIDIA CORPORATION. All rights reserved. | ||
# | ||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||
# you may not use this file except in compliance with the License. | ||
# You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
import json | ||
|
||
import git | ||
from omegaconf import OmegaConf | ||
from utils import cal_target_metadata_wer, cal_write_wer, run_asr_inference | ||
|
||
from nemo.core.config import hydra_runner | ||
from nemo.utils import logging | ||
|
||
|
||
""" | ||
This script serves as evaluator of ASR models | ||
Usage: | ||
python python asr_evaluator.py \ | ||
engine.pretrained_name="stt_en_conformer_transducer_large" \ | ||
engine.inference.mode="offline" \ | ||
engine.test_ds.augmentor.noise.manifest_path=<manifest file for noise data> \ | ||
..... | ||
Check out parameters in ./conf/eval.yaml | ||
""" | ||
|
||
|
||
@hydra_runner(config_path="conf", config_name="eval.yaml") | ||
def main(cfg): | ||
report = {} | ||
logging.info(f'Hydra config: {OmegaConf.to_yaml(cfg)}') | ||
|
||
# Store git hash for reproducibility | ||
if cfg.env.save_git_hash: | ||
repo = git.Repo(search_parent_directories=True) | ||
report['git_hash'] = repo.head.object.hexsha | ||
|
||
## Engine | ||
# Could skip next line to use generated manifest | ||
|
||
# If need to change more parameters for ASR inference, change it in | ||
# 1) shell script in eval_utils.py in nemo/collections/asr/parts/utils or | ||
# 2) TranscriptionConfig on top of the executed scripts such as transcribe_speech.py in examples/asr | ||
cfg.engine = run_asr_inference(cfg=cfg.engine) | ||
|
||
## Analyst | ||
cfg, total_res, eval_metric = cal_write_wer(cfg) | ||
report.update({"res": total_res}) | ||
|
||
for target in cfg.analyst.metadata: | ||
if cfg.analyst.metadata[target].enable: | ||
occ_avg_wer = cal_target_metadata_wer( | ||
manifest=cfg.analyst.metric_calculator.output_filename, | ||
target=target, | ||
meta_cfg=cfg.analyst.metadata[target], | ||
eval_metric=eval_metric, | ||
) | ||
report[target] = occ_avg_wer | ||
|
||
config_engine = OmegaConf.to_object(cfg.engine) | ||
report.update(config_engine) | ||
|
||
config_metric_calculator = OmegaConf.to_object(cfg.analyst.metric_calculator) | ||
report.update(config_metric_calculator) | ||
|
||
pretty = json.dumps(report, indent=4) | ||
res = "%.3f" % (report["res"][eval_metric] * 100) | ||
logging.info(pretty) | ||
logging.info(f"Overall {eval_metric} is {res} %") | ||
|
||
## Writer | ||
report_file = "report.json" | ||
if "report_filename" in cfg.writer and cfg.writer.report_filename: | ||
report_file = cfg.writer.report_filename | ||
|
||
with open(report_file, "a") as fout: | ||
json.dump(report, fout) | ||
fout.write('\n') | ||
fout.flush() | ||
|
||
|
||
if __name__ == "__main__": | ||
main() |
Oops, something went wrong.