Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

some question of pad_with_border #8

Open
Nickkk1124 opened this issue Apr 24, 2018 · 10 comments
Open

some question of pad_with_border #8

Nickkk1124 opened this issue Apr 24, 2018 · 10 comments

Comments

@Nickkk1124
Copy link

Hello:
Really impressed by your work and got a few questions in terms of how you process the data.

31131456_1863410970346818_1790379927908909056_n

Do pad_with_border mean this?

Many thanks,
Nick

@Nickkk1124
Copy link
Author

Nickkk1124 commented Apr 24, 2018

Sorry
In addition, I would like to ask if I want to use this speech-enhanced system in the front of the ASR. How do I do this?

Many thanks,
Nick

@qiuqiangkong
Copy link
Collaborator

qiuqiangkong commented Apr 24, 2018 via email

@Nickkk1124
Copy link
Author

Nickkk1124 commented Apr 24, 2018

Hello Qiuqiang,

Mat_2d_to_3d is to convert features to (n_segs, n_concat, n_freq).

The center frame of the first round of stacking frames is t=1, and the center frame of the second round of stacking frames should not be t=2?

But as shown in the following figure, why is the center frame of the second round of stacking frames t=4?

31179984_1863648643656384_4102975115338186752_n

Many thanks,

Nick

@yongxuUSTC
Copy link
Owner

yongxuUSTC commented Apr 24, 2018 via email

@Nickkk1124
Copy link
Author

Hi Yong,

Thank you for your replying!
There are some questions I'd like to ask:

  1. The "enhanced features for ASR" you mentioned, do you mean the magnitudes of log power spectrogram?

  2. Do you think using recover enhanced wav as ASR input is feasible?

  3. What would you recommend about applying the enhancement system to dealing with the environmental noise?

Many thanks,
Nick

@qiuqiangkong
Copy link
Collaborator

qiuqiangkong commented Apr 25, 2018 via email

@akshayaCap
Copy link

Hello Qiuqiang,

This is a great work. It will be of great help if you could elaborate on below points mentioned by you in above discussion.

"- method will lose some information. Some work did a joint enhancement and recognition."

I get the point of information loss. Can you please tell more about Joint enhancement and recognition?

Is it like two 2 DNN models interlinked or preprocessing and ASR.

Thank-you.

@qiuqiangkong
Copy link
Collaborator

qiuqiangkong commented Jul 6, 2018 via email

@yongxuUSTC
Copy link
Owner

yongxuUSTC commented Jul 6, 2018 via email

@akshayaCap
Copy link

Dear Yong,
"
Yes, there are joint SE & ASR training papers:
https://www.isca-speech.org/archive/interspeech_2014/i14_0616.html
https://ieeexplore.ieee.org/abstract/document/7178797/
"
It was an informative read. It would be great if you could post a link to its implementation (source code)

Thank-you,
Akshaya

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants