-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
【librispeech】No converge? Is it possible to release model configs or trained models? #22
Comments
Sorry for the late reply. As we stated in the paper, the training process will be a bit unstable in the initial stage, and it usually works to increase the weight of CTC loss (lamb in train.py) to help convergence. According to my experience, a CTC weight of 0.3 should ensure convergence for librispeech, even the batchsize is small (e.g. 64 or 32), but the result may be a little worse. You can adjust the CTC weight manually after the first few train steps for better results, and I am preparing to design an automatic way to gradually reduce the weight of CTC as the training progresses. |
Thank you!
train loss is about 20, cv loss:
params: lr:0.00001, lamb:0.01 |
Hi! @FzuGsr ,we update the script of librispeech, removing the pruning of trigram contexts for denominator graph (line 85) and the training has become more stable. You can try the following configuration to get desirable result: |
Thank you for your reply! I will try later. |
Yes, the denominator graph will affect the loss calculation. |
hi @aky15
Loss is negative, getting smaller and smaller. Look forward to your reply |
CTC_CRF_LOSS input:
is that right? I found gpu_ctc has blank_label and gpu_den didn't ? ctc_crf_base.gpu_ctc param blank_index is default 0, but the token file blank_index is 1 ?
a bug? |
In our default model (e.g. BLSTM, LSTM), the length of neural network output is equal to the length of input frames. If the lengths have been changed in your model (e.g. you use down/up sampling or other techniques that leads to the change of the feature lengths. In your examples, the length of neural network output is 131, while the original input has 123 frames at most), you should use the corresponding length of neural network output as the input of CTC_CRF loss. |
About the blank index, please refer to #11 . |
Thank you for your reply. |
Yes,I use the down sampling and let the input_lengths down sampling too, but get the loss -inf
is right for CTC_CRF loss input? ctc_crf_base.gpu_den get -inf ?
Look forward to your reply |
For example, if the labels are "A A B B", the input feature should have 6 frames at least.
|
Thank you very much for your reply! Look forward to your reply |
The gpu_ctc is modified from Baidu warp-ctc https://github.com/baidu-research/warp-ctc. We change the warp-ctc's input from logits (without log_softmax) to log_softmax. torch.nn.function.ctc_loss is pytorch's implementation of CTC. It may use CUDNN's CTC implementation internally (and the input is log_softmax). Anyway, warp-ctc and pytorch's CTC are different implementations of CTC, so they should have similar results. They both support calculation using cuda. I have not compared the speed of them and I think the CTC computation only accounts for small propotation of the whole computation (including neural network computation). |
Thank you very much for your reply! You are so excellent. |
HI, I run the libr/run.sh demo ,but the loss is still so large, the model can't converge.Can you help me? Is it possible to release model configs or trained models?
my env: pytorch 1.5 cuda 10.1 python3.7
run
python3 steps/train.py --lr=0.001 --output_unit=72 --lamb=0.001 --data_path=$dir --batch_size=256
loss:
Look forward to your reply. Thank you
The text was updated successfully, but these errors were encountered: