Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WeKws Roadmap 2.0 #121

Open
3 tasks
robin1001 opened this issue Dec 11, 2022 · 5 comments
Open
3 tasks

WeKws Roadmap 2.0 #121

robin1001 opened this issue Dec 11, 2022 · 5 comments

Comments

@robin1001
Copy link
Contributor

robin1001 commented Dec 11, 2022

WeKws is a community-driven project and we love your feedback and proposals on where we should be heading.
Feel free to volunteer yourself if you are interested in trying out some items(they do not have to be on the list).

The following items are in 2.0:

  • Rubustness, improve the robustness by learning acoustic feature rather than other features of the keywords.
  • Various on-device chips support.
  • Unsupervised model or pretrained model exploration.
@robin1001 robin1001 pinned this issue Dec 11, 2022
@harryfyodor
Copy link

harryfyodor commented Feb 1, 2023

Hi,Robin. I found that modelscope has opensourced a model that borrow many codes from wekws. They use CTC as loss function and it seems to work well. I think there are two directions as least worth trying:

  • Pretrain in the manner of ASR, and train KWS model with limited data.
  • Develop a customizable wake up word system.

Thank you for opensource wekws anyway! It is wonderful!

@duj12
Copy link
Contributor

duj12 commented Jun 3, 2023

Hi,Robin. I found that modelscope has opensourced a model that borrow many codes from wekws. They use CTC as loss function and it seems to work well. I think there are two directions as least worth trying:

  • Pretrain in the manner of ASR, and train KWS model with limited data.
  • Develop a customizable wake up word system.

Thank you for opensource wekws anyway! It is wonderful!

Hi, I implement this in PR #135, of course I borrow a lot of codes from modelscope, too. Hope this will be a good solution. But there is still some thing need to be done, especially runtime code.

@StuartIanNaylor
Copy link

StuartIanNaylor commented Jul 14, 2023

Hi, Robin. My personal feeling is the KWS we have are good but the manner we train and the datasets we use are not so great.
KWS are usually simple classification models and often the dataset is just a binary of KW and !KW as x2 labels that during training often the accuracy curve is a sign of overfitting as the space between KW and the other classification of anything that is not a KW is huge.
Often a 'noise' classification of non spoken audio increases accuracy as adding classifications gives the model something to train to.
Noise & !KW are non KW catch-alls but still distant from the KW of choice
This is where we are likely shy on datasets as phonetically similar words should create further classifications to hug the KW more tightly and give the model data to train harder on and create a less steep learning curve and negate the overfitting model.

The simple models we have are not the problem its the datasets that are not liguistically analysed to create phonetic classification around a KW that we are missing.
I have been trying this using the Multilingual Spoken Words which is just extracted from the Mozilla Common Voice project.
Sadly Common Voice contains huge swaithes of bad data, from wrong labels, bad recordings, alignment and non native language speakers, also the metadata is extremely sparse.

If you had the datasets and also datasets of the device of use then you can make accurate simple and lite KWS so its a bit of a catch-22, but if you picked a model and a device and gave users an option to opt-in then the dataset could be collected as did Big Data with accompany quality metadata that had simple gender,age, region.

Also you can collect locally as with ondevice training you can bias a larger pretrained model with a smaller on device model of a dataset collected locally.

@heibaidaolx123
Copy link

Hi, @robin1001 and @duj12 , any plan for ctc-kws runtime?

@StuartIanNaylor
Copy link

StuartIanNaylor commented Sep 28, 2023

I just noticed Mining Effective Negative Training Samples for Keyword Spotting (github, paper)

I have been wondering about a dataset creator and how to select !KW without class imbalance.
I have only had a brief look through the code but is there a dataset creator and also one that implements the above?

Just to add with my own experiments of using 'own voice' I can quickly make a KWS that is very accurate.
For !KW I used phonetic pangrams which sentences that contain all forty sounds of English i.e. they use all the phonemes, or phones, of English (rather than alphabetic characters). Here are a few examples : "That quick beige fox jumped in the air over each thin dog. Look out, I shout, for he's foiled you again, creating chaos."
Being English I assume there are similar other language based phonetic pangrams?

https://github.com/StuartIanNaylor/Dataset-builder was just a rough hack to create a word capture cli boutique to quickly capture 'own voice' KW & !KW as forced aligment is so prone to error (plus is my voice). These are augmented with speex with random noise added to give 2k-4k items in each class.

Its really easy to make very accurate 'Own Voice' KWS but they are totally useless for anyone else.
This is where I think transfer learning could be really important as an initial model could be shipped and via on-device training a smaller model of captures of use could be created to bias the larger model.
Via transfer learning over time through use the KWS would garner accuracy to those who commonly use it.

The small collection of a few Phonetic pangrams surprised me to how accurate the results are and always had a hunch that in larger datasets and how Phones have distinct spectra that the balance of phones and position in the timeframe requires balance, or at least balance uniqueness.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants