Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TFJS - How to create model for custom word(Speech commands model) #1717

Closed
ranjithrengaraj opened this issue Jul 3, 2019 · 11 comments
Closed
Assignees
Labels
type:support user support questions

Comments

@ranjithrengaraj
Copy link

To get help from the community, we encourage using Stack Overflow and the tensorflow.js tag.

TensorFlow.js version

Node version : V12.4.0

Browser version

Describe the problem or feature request

I used the audio_model which has been given in https://github.com/tensorflow/tfjs-models/tree/master/speech-commands.
Followed the document and train the model with custom word like (wakeup) with existing dataset and saved the model.
1 . Create_model wakeup up down left right
2. loaded the dataset
3. train 100
4. save_model

Model json and weights.bin got generated then i imported the model in js but its not detecting any word.

Please suggest how to train custom word and how much training epoch required.

Code to reproduce the bug / link to feature request

@tafsiri
Copy link
Contributor

tafsiri commented Jul 3, 2019

Could you give a bit more detail and describe how you did the retraining? Did you use the transfer learning api or something different. A code snippet would be great for us to get a better sense of what may be going on. Also how many samples/examples do you have for each of the words in your vocabulary?

@tafsiri tafsiri added the type:support user support questions label Jul 3, 2019
@ranjithrengaraj
Copy link
Author

We download the speech data set from https://storage.cloud.google.com/download.tensorflow.org/data/speech_commands_v0.02.tar.gz and added one more folder called wakeup with 500 samples and trained the data we followed this tutorial to train the data .
https://github.com/tensorflow/tfjs-models/tree/master/speech-commands/training

  1. create up down left right wakeup
  2. load_dataset all /tmp/data ( loaded all the 5 datasets)
  3. train 500
  4. save_model /tmp/audio_model.

We got model.json and weights.bin files.
Updated metadata.json({"words": ["up","down","left","right","wakeup"], "frameSize": 232}).

Loaded model.json ,weights.bin and metadata.json in SpeechCommands.js and called the prediction using below code snippet.

let recognizer;

function predictWord() {
console.log("predictWord--");
// Array of words that the recognizer is trained to recognize.
const words = recognizer.wordLabels();

recognizer.listen(({scores}) => {
console.log("scores--",scores);
// Turn scores into a list of (score,word) pairs.
scores = Array.from(scores).map((s, i) => ({score: s, word: words[i]}));
// Find the most probable word.
console.log("Scores--",scores);
scores.sort((s1, s2) => s2.score - s1.score);
document.querySelector('#console').textContent = scores[0].word;
}, {probabilityThreshold: 0.75});
}

async function app() {
recognizer = speechCommands.create('BROWSER_FFT','directional4w');
//recognizer = speechCommands.create('BROWSER_FFT');
await recognizer.ensureModelLoaded();
predictWord();
}

app();

We are not able detect any keywords like up,down or left and right. Please let me if anything wrong in the training process ? or do we need to train more steps ?

We dint use transfer learning api.

@tafsiri
Copy link
Contributor

tafsiri commented Jul 8, 2019

Thanks for the information @ranjithrengaraj a few things stand out to me.

  1. speechCommands.create('BROWSER_FFT','directional4w'); will load the existing pretrained recognizer. I don't see how it connects to the metadata and model you created. Could you add a snippet for how you did the following?

Updated metadata.json({"words": ["up","down","left","right","wakeup"], "frameSize": 232}).

Loaded model.json ,weights.bin and metadata.json in SpeechCommands.js and called the prediction using below code snippet.

It is surprising that you don't get the original words being recognized. Were you able to get the base model working without modification.

  1. More importantly the training script you linked to might be for a different model that is trainable in node.js (I do think this is confusing so I'll try to get that fixed or at least better described/located). The instructions seem incomplete for how to load it and do inference with it. @pyu10055 Could you update https://github.com/tensorflow/tfjs-models/tree/master/speech-commands/training with code snippets for how use the model trained from that script.

Apologies for how confusing this all is.

@ranjithrengaraj
Copy link
Author

Thanks tafsiri.

Were you able to get the base model working without modification. - Yes we are able to detect the keyword without modification.

speechCommands.create('BROWSER_FFT','directional4w'); will load the existing pretrained recognizer. I don't see how it connects to the metadata and model you created. Could you add a snippet for how you did the following? - We have loaded custom model using loadLayersModel .Same method working for pre trained model but not custom training model.

this.modelURL='http://localhost:28440/model.json'
return i.sent(), [4, t.loadLayersModel(this.modelURL)];

Someway i loaded metadata.json also.

@nsteins
Copy link

nsteins commented Jul 24, 2019

I'm also interested in figuring out how to train a model that can later be loaded in the browser. I was able to train and save model following the README in the training/soft-fft directory, though it appears that functionality is not yet supported by speechCommands. I looked into training/browser-fft but there appears to be a missing step:

  1. Run WebAudio FFT on the .dat files generated in step 2 in the browser.

Anywhere you can point me to figure the best way to run the WebAudio FFT on the processed files?

@rthadur
Copy link
Contributor

rthadur commented Jul 24, 2019

@caisq gentle ping ! Did you get chance to look at this ?

@markusthoemmes
Copy link

+1 to @nsteins question. Some pointers would be great!

@zappys
Copy link

zappys commented Apr 5, 2020

I'm also interested in some details regarding WebAudio FFT.

Stuck at step 3:
"Run WebAudio FFT on the .dat files generated in step 2 in the browser. TODO(cais): Provide more details here."

@rthadur
Copy link
Contributor

rthadur commented Jun 5, 2020

Closing this due to lack of activity, feel to reopen. Thank you

@rthadur rthadur closed this as completed Jun 5, 2020
@jcambre
Copy link

jcambre commented Jul 16, 2020

Like others on this thread, I'm also still unclear about step 3 on this README:

  1. Run WebAudio FFT on the .dat files generated in step 2 in the browser. TODO(cais): Provide more details here.

Could someone please provide additional details on that step of training? Thanks so much.

@adridelgal
Copy link

Hello, I would also be interested in further information about step 3 of data preparation :

"Run WebAudio FFT on the .dat files generated in step 2 in the browser. TODO(cais): Provide more details here."

Does anyone have any idea or indications on how to do this given the available code? Would be greatly appreciated

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type:support user support questions
Projects
None yet
Development

No branches or pull requests

10 participants