Speech-commands model: Add inference code for browser-native FFT #58

caisq · 2018-08-09T16:41:46Z

Add class BrowserFftSpeechCommandRecognizer for streaming and offline recognition (this will be called by the public-facing factory API)
Add class BrowserFftFeatureExtractor for extracting browser-native FFT using AudioContext (this will not be directly public-facing, but is used under the hood by BrowserFftSpeechCommandRecognizer)
Add testing utilities (mainly fakes for the WebAudio API) in browser_test_utils.ts
Add Node.js-based unit tests for the aforementioned classes
Make some minor changes to the interfaces in types.ts.

This change is

pyu10055

Reviewed 11 of 11 files at r1.
Reviewable status: 0 of 1 approvals obtained (waiting on @nsthorat, @dsmilkov, and @pyu10055)

pyu10055

Reviewable status: complete! 1 of 1 approvals obtained (waiting on @nsthorat, @dsmilkov, and @pyu10055)

caisq · 2018-08-14T02:58:43Z

@nsthorat @dsmilkov Please let me know if you would like to comment on this PR.

caisq · 2018-08-14T20:58:20Z

I will merge this PR soon so I can proceed with the next steps. Feel free to comment and I'll be happy to respond in follow-up PRs. @nsthorat @dsmilkov

dsmilkov

Awesome work! I left a few comments post-submit. Sorry for the delay on this!

Reviewed 5 of 11 files at r1.
Reviewable status: complete! 1 of 1 approvals obtained (waiting on @caisq, @nsthorat, and @dsmilkov)

speech-commands/src/browser_fft_extractor.ts, line 47 at r1 (raw file):

   * dimension.
   * The return value is assumed to be whether a flag for whether the
   * refractory period should initiate, e.g., when a word is recognized.

unfamiliar with this word: refractory. Is there a more common alternative?

speech-commands/src/browser_fft_extractor.ts, line 207 at r1 (raw file):

  }

  async stop(): Promise<void> {

Any reason why stop() needs to be async? I see no await/async call inside? Do you foresee future other implementations of FeatureExtractor being async?

speech-commands/src/browser_fft_extractor.ts, line 223 at r1 (raw file):

  }

  getFeatures(): Float32Array[] {

who else implements FeaturesExtractor? Can we change the interface to fit this need, instead of having to throw error on a method that no one implements?

speech-commands/src/browser_fft_extractor.ts, line 270 at r1 (raw file):

 * and suppression time.
 */
export class Tracker {

keep this class private?

speech-commands/src/browser_fft_extractor.ts, line 280 at r1 (raw file):

   *
   * @param period The event-firing period, in number of frames.
   * @param suppressionPeriod The suppression period, in number of frames.

Will you use this suppressionPeriod soon? Let's error on the side of less code/config/logic until we need it.

speech-commands/src/types.ts, line 27 at r1 (raw file):

export type RecognizerCallback = (result: SpeechCommandRecognizerResult) =>
    Promise<void>;

any reason why the user-provided recognizerCallback need to return a promise? Let's just do void

speech-commands/src/types.ts, line 66 at r1 (raw file):

  // Getter for word labels.
  wordLabels(): string[];

you can mark it as get wordLabels() { ... }

speech-commands/src/types.ts, line 69 at r1 (raw file):

  // Get the required number of frames.
  params(): RecognizerConfigParams;

same here.

speech-commands/src/types.ts, line 129 at r1 (raw file):

}

export interface RecognizerConfigParams {

just call it RecognizerConfig to be consistent with other config interfaces.

caisq

Reviewable status: complete! 1 of 1 approvals obtained (waiting on @caisq and @nsthorat)

speech-commands/src/browser_fft_extractor.ts, line 47 at r1 (raw file):