[WIP] Pruned-transducer-stateless5-for-WenetSpeech (offline and strea…

…ming) (facebookresearch#447) * pruned-rnnt5-for-wenetspeech * style check * style check * add streaming conformer * add streaming decode * changes codes for fast_beam_search and export cpu jit * add modified-beam-search for streaming decoding * add modified-beam-search for streaming decoding * change for streaming_beam_search.py * add README.md and RESULTS.md * change for style_check.yml * do some changes * do some changes for export.py * add some decode commands for usage * add streaming results on README.md
yfyeung · Jul 28, 2022 · f26b62a · f26b62a
1 parent 385645d
commit f26b62a
Show file tree

Hide file tree

Showing 22 changed files with 5,311 additions and 9 deletions.
diff --git a/.github/workflows/style_check.yml b/.github/workflows/style_check.yml
@@ -29,7 +29,7 @@ jobs:
     runs-on: ${{ matrix.os }}
     strategy:
       matrix:
-        os: [ubuntu-18.04, macos-10.15]
+        os: [ubuntu-18.04, macos-latest]
         python-version: [3.7, 3.9]
       fail-fast: false
 

diff --git a/README.md b/README.md
@@ -250,17 +250,25 @@ We provide a Colab notebook to run a pre-trained Pruned Transducer Stateless mod
 
 ### WenetSpeech
 
-We provide one model for this recipe: [Pruned stateless RNN-T: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss][WenetSpeech_pruned_transducer_stateless2].
+We provide some models for this recipe: [Pruned stateless RNN-T_2: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss][WenetSpeech_pruned_transducer_stateless2] and [Pruned stateless RNN-T_5: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss][WenetSpeech_pruned_transducer_stateless5].
 
-#### Pruned stateless RNN-T: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss (trained with L subset)
+#### Pruned stateless RNN-T_2: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss (trained with L subset, offline ASR)
 
 |                      |  Dev  | Test-Net | Test-Meeting |
 |----------------------|-------|----------|--------------|
 |    greedy search     | 7.80  |  8.75    |  13.49       |
 |   fast beam search   | 7.94  |  8.74    |  13.80       |
 | modified beam search | 7.76  |  8.71    |  13.41       |
 
-We provide a Colab notebook to run a pre-trained Pruned Transducer Stateless model: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1EV4e1CHa1GZgEF-bZgizqI9RyFFehIiN?usp=sharing)
+#### Pruned stateless RNN-T_5: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss (trained with L subset)
+**Streaming**:
+|                      |  Dev  | Test-Net | Test-Meeting |
+|----------------------|-------|----------|--------------|
+| greedy_search | 8.78 | 10.12 | 16.16 |
+| modified_beam_search | 8.53| 9.95 | 15.81 |
+| fast_beam_search| 9.01 | 10.47 | 16.28 |
+
+We provide a Colab notebook to run a pre-trained Pruned Transducer Stateless2 model: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1EV4e1CHa1GZgEF-bZgizqI9RyFFehIiN?usp=sharing)
 
 ### Alimeeting
 
@@ -333,6 +341,7 @@ Please see: [![Open In Colab](https://colab.research.google.com/assets/colab-bad
 [GigaSpeech_pruned_transducer_stateless2]: egs/gigaspeech/ASR/pruned_transducer_stateless2
 [Aidatatang_200zh_pruned_transducer_stateless2]: egs/aidatatang_200zh/ASR/pruned_transducer_stateless2
 [WenetSpeech_pruned_transducer_stateless2]: egs/wenetspeech/ASR/pruned_transducer_stateless2
+[WenetSpeech_pruned_transducer_stateless5]: egs/wenetspeech/ASR/pruned_transducer_stateless5
 [Alimeeting_pruned_transducer_stateless2]: egs/alimeeting/ASR/pruned_transducer_stateless2
 [Aishell4_pruned_transducer_stateless5]: egs/aishell4/ASR/pruned_transducer_stateless5
 [TAL_CSASR_pruned_transducer_stateless5]: egs/tal_csasr/ASR/pruned_transducer_stateless5

diff --git a/egs/wenetspeech/ASR/README.md b/egs/wenetspeech/ASR/README.md
@@ -13,6 +13,7 @@ The following table lists the differences among them.
 |                                       | Encoder             | Decoder            | Comment                     |
 |---------------------------------------|---------------------|--------------------|-----------------------------|
 | `pruned_transducer_stateless2`        | Conformer(modified) | Embedding + Conv1d | Using k2 pruned RNN-T loss  |                      |
+| `pruned_transducer_stateless5`        | Conformer(modified) | Embedding + Conv1d | Using k2 pruned RNN-T loss  |                      |
 
 The decoder in `transducer_stateless` is modified from the paper
 [Rnn-Transducer with Stateless Prediction Network](https://ieeexplore.ieee.org/document/9054419/).

diff --git a/egs/wenetspeech/ASR/RESULTS.md b/egs/wenetspeech/ASR/RESULTS.md
@@ -1,12 +1,84 @@
 ## Results
 
+### WenetSpeech char-based training results (offline and streaming) (Pruned Transducer 5)
+
+#### 2022-07-22
+
+Using the codes from this PR https://github.com/k2-fsa/icefall/pull/447.
+
+When training with the L subset, the CERs are
+
+**Offline**:
+|decoding-method| epoch | avg | use-averaged-model | DEV | TEST-NET | TEST-MEETING|
+|-- | -- | -- | -- | -- | -- | --|
+|greedy_search | 4 | 1 | True | 8.22 | 9.03 | 14.54|
+|modified_beam_search | 4 | 1 | True | **8.17** | **9.04** | **14.44**|
+|fast_beam_search | 4 | 1 | True | 8.29 | 9.00 | 14.93|
+
+The offline training command for reproducing is given below:
+```
+export CUDA_VISIBLE_DEVICES="0,1,2,3,4,5,6,7"
+
+./pruned_transducer_stateless5/train.py \
+  --lang-dir data/lang_char \
+  --exp-dir pruned_transducer_stateless5/exp_L_offline \
+  --world-size 8 \
+  --num-epochs 15 \
+  --start-epoch 2 \
+  --max-duration 120 \
+  --valid-interval 3000 \
+  --model-warm-step 3000 \
+  --save-every-n 8000 \
+  --average-period 1000 \
+  --training-subset L
+```
+
+The tensorboard training log can be found at https://tensorboard.dev/experiment/SvnN2jfyTB2Hjqu22Z7ZoQ/#scalars .
+
+
+A pre-trained offline model and decoding logs can be found at <https://huggingface.co/luomingshuang/icefall_asr_wenetspeech_pruned_transducer_stateless5_offline>
+
+**Streaming**:
+|decoding-method| epoch | avg | use-averaged-model | DEV | TEST-NET | TEST-MEETING|
+|--|--|--|--|--|--|--|
+| greedy_search | 7| 1| True | 8.78 | 10.12 | 16.16 |
+| modified_beam_search | 7| 1| True| **8.53**| **9.95** | **15.81** |
+| fast_beam_search | 7 | 1| True | 9.01 | 10.47 | 16.28 |
+
+The streaming training command for reproducing is given below:
+```
+export CUDA_VISIBLE_DEVICES="0,1,2,3,4,5,6,7"
+
+./pruned_transducer_stateless5/train.py \
+  --lang-dir data/lang_char \
+  --exp-dir pruned_transducer_stateless5/exp_L_streaming \
+  --world-size 8 \
+  --num-epochs 15 \
+  --start-epoch 1 \
+  --max-duration 140 \
+  --valid-interval 3000 \
+  --model-warm-step 3000 \
+  --save-every-n 8000 \
+  --average-period 1000 \
+  --training-subset L \
+  --dynamic-chunk-training True \
+  --causal-convolution True \
+  --short-chunk-size 25 \
+  --num-left-chunks 4
+```
+
+The tensorboard training log can be found at https://tensorboard.dev/experiment/E2NXPVflSOKWepzJ1a1uDQ/#scalars .
+
+
+A pre-trained offline model and decoding logs can be found at <https://huggingface.co/luomingshuang/icefall_asr_wenetspeech_pruned_transducer_stateless5_streaming>
+
 ### WenetSpeech char-based training results (Pruned Transducer 2)
 
 #### 2022-05-19
 
 Using the codes from this PR https://github.com/k2-fsa/icefall/pull/349.
 
-When training with the L subset, the WERs are
+When training with the L subset, the CERs are
 
 |                                    |  dev  | test-net | test-meeting | comment                                  |
 |------------------------------------|-------|----------|--------------|------------------------------------------|
@@ -72,7 +144,7 @@ avg=2
         --max-states 8
 ```
 
-When training with the M subset, the WERs are
+When training with the M subset, the CERs are
 
 |                                    |   dev  | test-net  | test-meeting  | comment                                   |
 |------------------------------------|--------|-----------|---------------|-------------------------------------------|
@@ -81,7 +153,7 @@ When training with the M subset, the WERs are
 | fast beam search (set as default)  | 10.18  | 11.10     | 19.32         | --epoch 29, --avg 11, --max-duration 1500 |
 
 
-When training with the S subset, the WERs are
+When training with the S subset, the CERs are
 
 |                                    |  dev   | test-net  | test-meeting  | comment                                   |
 |------------------------------------|--------|-----------|---------------|-------------------------------------------|

diff --git a/egs/wenetspeech/ASR/pruned_transducer_stateless2/train.py b/egs/wenetspeech/ASR/pruned_transducer_stateless2/train.py
@@ -348,7 +348,6 @@ def get_params() -> AttributeDict:
                            epochs.
         - log_interval:  Print training loss if batch_idx % log_interval` is 0
         - reset_interval: Reset statistics if batch_idx % reset_interval is 0
-        - valid_interval:  Run validation if batch_idx % valid_interval is 0
         - feature_dim: The model input dim. It has to match the one used
                        in computing features.
         - subsampling_factor:  The subsampling factor for the model.
@@ -376,7 +375,6 @@ def get_params() -> AttributeDict:
             "decoder_dim": 512,
             # parameters for joiner
             "joiner_dim": 512,
-            # parameters for Noam
             "env_info": get_env_info(),
         }
     )

diff --git a/egs/wenetspeech/ASR/pruned_transducer_stateless5/__init__.py b/egs/wenetspeech/ASR/pruned_transducer_stateless5/__init__.py
diff --git a/egs/wenetspeech/ASR/pruned_transducer_stateless5/asr_datamodule.py b/egs/wenetspeech/ASR/pruned_transducer_stateless5/asr_datamodule.py
@@ -0,0 +1 @@
+../pruned_transducer_stateless2/asr_datamodule.py
diff --git a/egs/wenetspeech/ASR/pruned_transducer_stateless5/beam_search.py b/egs/wenetspeech/ASR/pruned_transducer_stateless5/beam_search.py
@@ -0,0 +1 @@
+../../../librispeech/ASR/pruned_transducer_stateless5/beam_search.py