Sound source localization along with speech recognition #1

srinivasanviki · 2017-09-19T12:49:59Z

As part of the project we are trying to implement Sound Localization along with Speech Recognition in ROS using Xbox Kinect. We have run into problem where we need to find the position of a person who spoke something ( which is handled by Sound Localization module ) along with what the person said ( Speech Recognition ).

Can you please advise us on the implementation of the same as we are not able to figure out if HARK can get the data of localization along with speech recognition ( who said what from which direction ) in single go.

awesomebytes · 2017-09-19T13:32:18Z

I haven't tried the speech recognition code of HARK, so I can't really tell you. If you use some other speech recognition engine you should be able to guess who spoke with a bit of code to keep track of the last sound localization results. Good luck!

…

On Sep 19, 2017 22:50, "srinivasanviki" ***@***.***> wrote: As part of the project we are trying to implement Sound Localization along with Speech Recognition in ROS using Xbox Kinect. We have run into problem where we need to find the position of a person who spoke something ( which is handled by Sound Localization module ) along with what the person said ( Speech Recognition ). Can you please advise us on the implementation of the same as we are not able to figure out if HARK can get the data of localization along with speech recognition ( who said what from which direction ) in single go. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#1>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABpFdGOya-WDNugeFky8uDmH76L5v7bWks5sj7h3gaJpZM4PcV0W> .

srinivasanviki · 2017-09-19T14:46:40Z

But your localization result is constant stream how do we keep track of last localization results in ros topic please suggest

awesomebytes · 2017-09-19T14:58:01Z

Keep a buffer of the last... Few seconds of localization results, when you get a speech recognition result, estimate the duration of the speech given the text recognition + the delay of getting the result and make an average of the loudest localized sounds in that timeframe. For example, keep all the messages of the last 10s from the localization. Estimate how long from when you speak to when the recognition engine gives a result it takes. For example 100ms. When you get a callback from the recognition engine, for example, it recognized "hello world", estimate that those 3 syllables (this paper https://www.google.com.au/url?sa=t&source=web&rct=j&url=http://www.asel.udel.edu/icslp/cdrom/vol4/301/a301.pdf&ved=0ahUKEwiyiezqv7HWAhWLQpQKHVHNBZkQFggdMAA&usg=AFQjCNHjHYikZy-oDmBYdZ04jNqRwjAYVg says the average duration of a syllabe is 150ms~). Then you got 3 * 150 = 450ms. Now go to your buffer and from the end go back -100ms and from there get all the messages to -550ms. Average the localization, probably by taking also only the messages with a louder volume. That's how I would try. From a kinda hacky perspective. Other than that, learn how to use HARK speech recognizer too.

…

On Sep 20, 2017 00:46, "srinivasanviki" ***@***.***> wrote: But your localization result is constant stream how do we keep track of last localization results in ros topic please suggest — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#1 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABpFdGYcIcXWk11zR7_pbqXoju4fq6Xnks5sj9PQgaJpZM4PcV0W> .

srinivasanviki · 2017-09-19T16:59:02Z

Thanks for the suggestion I have a problem with localization too , When i do ROSLANCH pr2_kinect iam getting a continuous stream of localization results on topic HarkSource even when iam not speaking.
Iam also getting an incorrect results as -4,-9 azimuth degrees but never gets to positive angle even when iam on positive x axis.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sound source localization along with speech recognition #1

Sound source localization along with speech recognition #1

srinivasanviki commented Sep 19, 2017

awesomebytes commented Sep 19, 2017 via email

srinivasanviki commented Sep 19, 2017

awesomebytes commented Sep 19, 2017 via email

srinivasanviki commented Sep 19, 2017

Sound source localization along with speech recognition #1

Sound source localization along with speech recognition #1

Comments

srinivasanviki commented Sep 19, 2017

awesomebytes commented Sep 19, 2017 via email

srinivasanviki commented Sep 19, 2017

awesomebytes commented Sep 19, 2017 via email

srinivasanviki commented Sep 19, 2017