TFJS - How to create model for custom word(Speech commands model) #1717

ranjithrengaraj · 2019-07-03T00:38:56Z

To get help from the community, we encourage using Stack Overflow and the tensorflow.js tag.

TensorFlow.js version

Node version : V12.4.0

Browser version

Describe the problem or feature request

I used the audio_model which has been given in https://github.com/tensorflow/tfjs-models/tree/master/speech-commands.
Followed the document and train the model with custom word like (wakeup) with existing dataset and saved the model.
1 . Create_model wakeup up down left right
2. loaded the dataset
3. train 100
4. save_model

Model json and weights.bin got generated then i imported the model in js but its not detecting any word.

Please suggest how to train custom word and how much training epoch required.

Code to reproduce the bug / link to feature request

The text was updated successfully, but these errors were encountered:

tafsiri · 2019-07-03T14:28:19Z

Could you give a bit more detail and describe how you did the retraining? Did you use the transfer learning api or something different. A code snippet would be great for us to get a better sense of what may be going on. Also how many samples/examples do you have for each of the words in your vocabulary?

ranjithrengaraj · 2019-07-03T19:15:39Z

We download the speech data set from https://storage.cloud.google.com/download.tensorflow.org/data/speech_commands_v0.02.tar.gz and added one more folder called wakeup with 500 samples and trained the data we followed this tutorial to train the data .
https://github.com/tensorflow/tfjs-models/tree/master/speech-commands/training

create up down left right wakeup
load_dataset all /tmp/data ( loaded all the 5 datasets)
train 500
save_model /tmp/audio_model.

We got model.json and weights.bin files.
Updated metadata.json({"words": ["up","down","left","right","wakeup"], "frameSize": 232}).

Loaded model.json ,weights.bin and metadata.json in SpeechCommands.js and called the prediction using below code snippet.

let recognizer;

function predictWord() {
console.log("predictWord--");
// Array of words that the recognizer is trained to recognize.
const words = recognizer.wordLabels();

recognizer.listen(({scores}) => {
console.log("scores--",scores);
// Turn scores into a list of (score,word) pairs.
scores = Array.from(scores).map((s, i) => ({score: s, word: words[i]}));
// Find the most probable word.
console.log("Scores--",scores);
scores.sort((s1, s2) => s2.score - s1.score);
document.querySelector('#console').textContent = scores[0].word;
}, {probabilityThreshold: 0.75});
}

async function app() {
recognizer = speechCommands.create('BROWSER_FFT','directional4w');
//recognizer = speechCommands.create('BROWSER_FFT');
await recognizer.ensureModelLoaded();
predictWord();
}

app();

We are not able detect any keywords like up,down or left and right. Please let me if anything wrong in the training process ? or do we need to train more steps ?

We dint use transfer learning api.

tafsiri · 2019-07-08T14:34:13Z

Thanks for the information @ranjithrengaraj a few things stand out to me.

speechCommands.create('BROWSER_FFT','directional4w'); will load the existing pretrained recognizer. I don't see how it connects to the metadata and model you created. Could you add a snippet for how you did the following?

Updated metadata.json({"words": ["up","down","left","right","wakeup"], "frameSize": 232}).

Loaded model.json ,weights.bin and metadata.json in SpeechCommands.js and called the prediction using below code snippet.

It is surprising that you don't get the original words being recognized. Were you able to get the base model working without modification.

More importantly the training script you linked to might be for a different model that is trainable in node.js (I do think this is confusing so I'll try to get that fixed or at least better described/located). The instructions seem incomplete for how to load it and do inference with it. @pyu10055 Could you update https://github.com/tensorflow/tfjs-models/tree/master/speech-commands/training with code snippets for how use the model trained from that script.

Apologies for how confusing this all is.

ranjithrengaraj · 2019-07-08T18:37:03Z

Thanks tafsiri.

Were you able to get the base model working without modification. - Yes we are able to detect the keyword without modification.

speechCommands.create('BROWSER_FFT','directional4w'); will load the existing pretrained recognizer. I don't see how it connects to the metadata and model you created. Could you add a snippet for how you did the following? - We have loaded custom model using loadLayersModel .Same method working for pre trained model but not custom training model.

this.modelURL='http://localhost:28440/model.json'
return i.sent(), [4, t.loadLayersModel(this.modelURL)];

Someway i loaded metadata.json also.

nsteins · 2019-07-24T18:55:41Z

I'm also interested in figuring out how to train a model that can later be loaded in the browser. I was able to train and save model following the README in the training/soft-fft directory, though it appears that functionality is not yet supported by speechCommands. I looked into training/browser-fft but there appears to be a missing step:

Run WebAudio FFT on the .dat files generated in step 2 in the browser.

Anywhere you can point me to figure the best way to run the WebAudio FFT on the processed files?

rthadur · 2019-07-24T21:13:27Z

@caisq gentle ping ! Did you get chance to look at this ?

markusthoemmes · 2020-03-22T10:52:46Z

+1 to @nsteins question. Some pointers would be great!

zappys · 2020-04-05T17:51:42Z

I'm also interested in some details regarding WebAudio FFT.

Stuck at step 3:
"Run WebAudio FFT on the .dat files generated in step 2 in the browser. TODO(cais): Provide more details here."

rthadur · 2020-06-05T18:10:56Z

Closing this due to lack of activity, feel to reopen. Thank you

jcambre · 2020-07-16T21:23:42Z

Like others on this thread, I'm also still unclear about step 3 on this README:

Run WebAudio FFT on the .dat files generated in step 2 in the browser. TODO(cais): Provide more details here.

Could someone please provide additional details on that step of training? Thanks so much.

adridelgal · 2020-07-31T11:16:15Z

Hello, I would also be interested in further information about step 3 of data preparation :

"Run WebAudio FFT on the .dat files generated in step 2 in the browser. TODO(cais): Provide more details here."

Does anyone have any idea or indications on how to do this given the available code? Would be greatly appreciated

tafsiri added the type:support user support questions label Jul 3, 2019

tafsiri assigned pyu10055 and caisq Jul 9, 2019

rthadur closed this as completed Jun 5, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TFJS - How to create model for custom word(Speech commands model) #1717

TFJS - How to create model for custom word(Speech commands model) #1717

ranjithrengaraj commented Jul 3, 2019

tafsiri commented Jul 3, 2019

ranjithrengaraj commented Jul 3, 2019

tafsiri commented Jul 8, 2019

ranjithrengaraj commented Jul 8, 2019

nsteins commented Jul 24, 2019

rthadur commented Jul 24, 2019 •

edited

Loading

markusthoemmes commented Mar 22, 2020

zappys commented Apr 5, 2020

rthadur commented Jun 5, 2020

jcambre commented Jul 16, 2020

adridelgal commented Jul 31, 2020

TFJS - How to create model for custom word(Speech commands model) #1717

TFJS - How to create model for custom word(Speech commands model) #1717

Comments

ranjithrengaraj commented Jul 3, 2019

TensorFlow.js version

Browser version

Describe the problem or feature request

Code to reproduce the bug / link to feature request

tafsiri commented Jul 3, 2019

ranjithrengaraj commented Jul 3, 2019

tafsiri commented Jul 8, 2019

ranjithrengaraj commented Jul 8, 2019

nsteins commented Jul 24, 2019

rthadur commented Jul 24, 2019 • edited Loading

markusthoemmes commented Mar 22, 2020

zappys commented Apr 5, 2020

rthadur commented Jun 5, 2020

jcambre commented Jul 16, 2020

adridelgal commented Jul 31, 2020

rthadur commented Jul 24, 2019 •

edited

Loading