some question of pad_with_border #8

Nickkk1124 · 2018-04-24T06:56:59Z

Hello:
Really impressed by your work and got a few questions in terms of how you process the data.

Do pad_with_border mean this?

Many thanks,
Nick

Nickkk1124 · 2018-04-24T08:57:29Z

Sorry
In addition, I would like to ask if I want to use this speech-enhanced system in the front of the ASR. How do I do this?

Many thanks,
Nick

qiuqiangkong · 2018-04-24T10:57:24Z

Hi Nick, The picture you show is correct. pad_with_border simply extend the left and right border. You may obtain enhanced speech from by running this code. Then ASR may apply post-hoc. Best wishes, Qiuqiang

…

________________________________ From: Nickkk1124 <notifications@github.com> Sent: 24 April 2018 09:57:30 To: yongxuUSTC/sednn Cc: Subscribed Subject: Re: [yongxuUSTC/sednn] some question of pad_with_border (#8) Sorry In addition, I would like to ask if I want to use this speech-enhanced system on the front of the asr. How do I do this? Many thanks, Nick — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub<#8 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AMt5ydHnaYUDLH5wENARAsUg_HJAvFJbks5truj6gaJpZM4ThGhz>.

Nickkk1124 · 2018-04-24T13:03:31Z

Hello Qiuqiang,

Mat_2d_to_3d is to convert features to (n_segs, n_concat, n_freq).

The center frame of the first round of stacking frames is t=1, and the center frame of the second round of stacking frames should not be t=2?

But as shown in the following figure, why is the center frame of the second round of stacking frames t=4?

Many thanks,

Nick

yongxuUSTC · 2018-04-24T14:35:50Z

Hi Nick, Yes, you can use the enhanced features for ASR. But maybe you should use retraining or joint-training of your backend acoustic model for ASR. Good luck. Best regards, yong

…

-------------------------------------------------------- Dr. Yong XU https://sites.google.com/view/xuyong/home From: Nickkk1124 Date: 2018-04-24 09:57 To: yongxuUSTC/sednn CC: Subscribed Subject: Re: [yongxuUSTC/sednn] some question of pad_with_border (#8) Sorry In addition, I would like to ask if I want to use this speech-enhanced system on the front of the asr. How do I do this? Many thanks, Nick — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

Nickkk1124 · 2018-04-24T16:18:57Z

Hi Yong,

Thank you for your replying!
There are some questions I'd like to ask:

The "enhanced features for ASR" you mentioned, do you mean the magnitudes of log power spectrogram?
Do you think using recover enhanced wav as ASR input is feasible?
What would you recommend about applying the enhancement system to dealing with the environmental noise?

Many thanks,
Nick

qiuqiangkong · 2018-04-25T21:15:09Z

Hi Nick, In the picture you draw, it is correct. center frame=1 and center frame=4 in your drawing. It also depends on the hop. "The "enhanced features for ASR" you mentioned, do you mean the magnitudes of log power spectrogram?" - It means either enhanced spectrogram or log power spectrogram. "Do you think using recover enhanced wav as ASR input is feasible?" It is feasible if the dataset is small. However bare in mind any speech denoising - method will lose some information. Some work did a joint enhancement and recognition. "What would you recommend about applying the enhancement system to dealing with the environmental noise?" - I think applying on environmental noise should be fine, as long as the noise for training covers most environmental noise. Best wishes, Qiuqiang

…

________________________________ From: Nickkk1124 <notifications@github.com> Sent: 24 April 2018 17:18:58 To: yongxuUSTC/sednn Cc: Kong Q Mr (PG/R - Elec Electronic Eng); Comment Subject: Re: [yongxuUSTC/sednn] some question of pad_with_border (#8) Hi Yong, Thank you for your replying! There are some questions I'd like to ask: 1. The "enhanced features for ASR" you mentioned, do you mean the magnitudes of log power spectrogram? 2. Do you think using recover enhanced wav as ASR input is feasible? 3. What would you recommend about applying the enhancement system to dealing with the environmental noise? Many thanks, Nick — You are receiving this because you commented. Reply to this email directly, view it on GitHub<#8 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AMt5yahThNECOw9f22-pO8B3RIlbgshRks5tr1BxgaJpZM4ThGhz>.

akshayaCap · 2018-07-05T11:12:45Z

Hello Qiuqiang,

This is a great work. It will be of great help if you could elaborate on below points mentioned by you in above discussion.

"- method will lose some information. Some work did a joint enhancement and recognition."

I get the point of information loss. Can you please tell more about Joint enhancement and recognition?

Is it like two 2 DNN models interlinked or preprocessing and ASR.

Thank-you.

qiuqiangkong · 2018-07-06T10:54:39Z

Hi Nick, If speech enhancement and ASR are done separately, the ASR performance might be reduced. Because sometimes speech enhancement will also move out some useful information of a speech. However, if they are combined to a single neural network it might be helpful. For example, use speech enhancement as lower layer of a neural network and use ASR as a high layer neural network. The loss function can combine the ASR and speech enhancement. It is just my conjecture and I am not aware if there is such work or not. Best wishes, Qiuqiang

…

________________________________ From: akshayaCap <notifications@github.com> Sent: 05 July 2018 12:12:45 To: yongxuUSTC/sednn Cc: Kong Q Mr (PG/R - Elec Electronic Eng); Comment Subject: Re: [yongxuUSTC/sednn] some question of pad_with_border (#8) Hello Qiuqiang, This is a great work. It will be of great help if you could elaborate on below points mentioned by you in above discussion. "- method will lose some information. Some work did a joint enhancement and recognition." I get the point of information loss. Can you please tell more about Joint enhancement and recognition? Is it like two 2 DNN models interlinked or preprocessing and ASR. Thank-you. — You are receiving this because you commented. Reply to this email directly, view it on GitHub<#8 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AMt5ybZnHdThX_lUbV1r7wLirIbLZnQuks5uDfStgaJpZM4ThGhz>.

yongxuUSTC · 2018-07-06T17:16:28Z

Hi Nick, Yes, there are joint SE & ASR training papers: https://www.isca-speech.org/archive/interspeech_2014/i14_0616.html https://ieeexplore.ieee.org/abstract/document/7178797/ Best regards, yong

…

---------------------------------------------------------- Yong XU https://sites.google.com/view/xuyong/home From: qiuqiangkong Date: 2018-07-06 03:55 To: yongxuUSTC/sednn CC: yong xu @ seattle; Comment Subject: Re: [yongxuUSTC/sednn] some question of pad_with_border (#8) Hi Nick, If speech enhancement and ASR are done separately, the ASR performance might be reduced. Because sometimes speech enhancement will also move out some useful information of a speech. However, if they are combined to a single neural network it might be helpful. For example, use speech enhancement as lower layer of a neural network and use ASR as a high layer neural network. The loss function can combine the ASR and speech enhancement. It is just my conjecture and I am not aware if there is such work or not. Best wishes, Qiuqiang

________________________________ From: akshayaCap <notifications@github.com> Sent: 05 July 2018 12:12:45 To: yongxuUSTC/sednn Cc: Kong Q Mr (PG/R - Elec Electronic Eng); Comment Subject: Re: [yongxuUSTC/sednn] some question of pad_with_border (#8) Hello Qiuqiang, This is a great work. It will be of great help if you could elaborate on below points mentioned by you in above discussion. "- method will lose some information. Some work did a joint enhancement and recognition." I get the point of information loss. Can you please tell more about Joint enhancement and recognition? Is it like two 2 DNN models interlinked or preprocessing and ASR. Thank-you. — You are receiving this because you commented. Reply to this email directly, view it on GitHub<#8 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AMt5ybZnHdThX_lUbV1r7wLirIbLZnQuks5uDfStgaJpZM4ThGhz>. — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

akshayaCap · 2018-07-10T09:26:25Z

Dear Yong,
"
Yes, there are joint SE & ASR training papers:
https://www.isca-speech.org/archive/interspeech_2014/i14_0616.html
https://ieeexplore.ieee.org/abstract/document/7178797/
"
It was an informative read. It would be great if you could post a link to its implementation (source code)

Thank-you,
Akshaya

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

some question of pad_with_border #8

some question of pad_with_border #8

Nickkk1124 commented Apr 24, 2018

Nickkk1124 commented Apr 24, 2018 •

edited

Loading

qiuqiangkong commented Apr 24, 2018 via email

Nickkk1124 commented Apr 24, 2018 •

edited

Loading

yongxuUSTC commented Apr 24, 2018 via email

Nickkk1124 commented Apr 24, 2018

qiuqiangkong commented Apr 25, 2018 via email

akshayaCap commented Jul 5, 2018

qiuqiangkong commented Jul 6, 2018 via email

yongxuUSTC commented Jul 6, 2018 via email

akshayaCap commented Jul 10, 2018

some question of pad_with_border #8

some question of pad_with_border #8

Comments

Nickkk1124 commented Apr 24, 2018

Nickkk1124 commented Apr 24, 2018 • edited Loading

qiuqiangkong commented Apr 24, 2018 via email

Nickkk1124 commented Apr 24, 2018 • edited Loading

yongxuUSTC commented Apr 24, 2018 via email

Nickkk1124 commented Apr 24, 2018

qiuqiangkong commented Apr 25, 2018 via email

akshayaCap commented Jul 5, 2018

qiuqiangkong commented Jul 6, 2018 via email

yongxuUSTC commented Jul 6, 2018 via email

akshayaCap commented Jul 10, 2018

Nickkk1124 commented Apr 24, 2018 •

edited

Loading

Nickkk1124 commented Apr 24, 2018 •

edited

Loading