Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speech-commands model: Add inference code for browser-native FFT #58

Merged
merged 20 commits into from Aug 15, 2018

Conversation

caisq
Copy link
Collaborator

@caisq caisq commented Aug 9, 2018

  • Add class BrowserFftSpeechCommandRecognizer for streaming and offline recognition (this will be called by the public-facing factory API)
  • Add class BrowserFftFeatureExtractor for extracting browser-native FFT using AudioContext (this will not be directly public-facing, but is used under the hood by BrowserFftSpeechCommandRecognizer)
  • Add testing utilities (mainly fakes for the WebAudio API) in browser_test_utils.ts
  • Add Node.js-based unit tests for the aforementioned classes
  • Make some minor changes to the interfaces in types.ts.

This change is Reviewable

@caisq caisq requested review from pyu10055 and nsthorat August 9, 2018 16:54
@caisq caisq requested a review from dsmilkov August 9, 2018 16:58
Copy link
Collaborator

@pyu10055 pyu10055 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 11 of 11 files at r1.
Reviewable status: 0 of 1 approvals obtained (waiting on @nsthorat, @dsmilkov, and @pyu10055)

Copy link
Collaborator

@pyu10055 pyu10055 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm_strong:

Reviewable status: :shipit: complete! 1 of 1 approvals obtained (waiting on @nsthorat, @dsmilkov, and @pyu10055)

@caisq
Copy link
Collaborator Author

caisq commented Aug 14, 2018

@nsthorat @dsmilkov Please let me know if you would like to comment on this PR.

@caisq
Copy link
Collaborator Author

caisq commented Aug 14, 2018

I will merge this PR soon so I can proceed with the next steps. Feel free to comment and I'll be happy to respond in follow-up PRs. @nsthorat @dsmilkov

@caisq caisq merged commit d9af3f9 into tensorflow:master Aug 15, 2018
Copy link
Contributor

@dsmilkov dsmilkov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome work! I left a few comments post-submit. Sorry for the delay on this!

Reviewed 5 of 11 files at r1.
Reviewable status: :shipit: complete! 1 of 1 approvals obtained (waiting on @caisq, @nsthorat, and @dsmilkov)


speech-commands/src/browser_fft_extractor.ts, line 47 at r1 (raw file):

   * dimension.
   * The return value is assumed to be whether a flag for whether the
   * refractory period should initiate, e.g., when a word is recognized.

unfamiliar with this word: refractory. Is there a more common alternative?


speech-commands/src/browser_fft_extractor.ts, line 207 at r1 (raw file):

  }

  async stop(): Promise<void> {

Any reason why stop() needs to be async? I see no await/async call inside? Do you foresee future other implementations of FeatureExtractor being async?


speech-commands/src/browser_fft_extractor.ts, line 223 at r1 (raw file):

  }

  getFeatures(): Float32Array[] {

who else implements FeaturesExtractor? Can we change the interface to fit this need, instead of having to throw error on a method that no one implements?


speech-commands/src/browser_fft_extractor.ts, line 270 at r1 (raw file):

 * and suppression time.
 */
export class Tracker {

keep this class private?


speech-commands/src/browser_fft_extractor.ts, line 280 at r1 (raw file):

   *
   * @param period The event-firing period, in number of frames.
   * @param suppressionPeriod The suppression period, in number of frames.

Will you use this suppressionPeriod soon? Let's error on the side of less code/config/logic until we need it.


speech-commands/src/types.ts, line 27 at r1 (raw file):

export type RecognizerCallback = (result: SpeechCommandRecognizerResult) =>
    Promise<void>;

any reason why the user-provided recognizerCallback need to return a promise? Let's just do void


speech-commands/src/types.ts, line 66 at r1 (raw file):

  // Getter for word labels.
  wordLabels(): string[];

you can mark it as get wordLabels() { ... }


speech-commands/src/types.ts, line 69 at r1 (raw file):

  // Get the required number of frames.
  params(): RecognizerConfigParams;

same here.


speech-commands/src/types.ts, line 129 at r1 (raw file):

}

export interface RecognizerConfigParams {

just call it RecognizerConfig to be consistent with other config interfaces.

Copy link
Collaborator Author

@caisq caisq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 1 approvals obtained (waiting on @caisq and @nsthorat)


speech-commands/src/browser_fft_extractor.ts, line 47 at r1 (raw file):

Previously, dsmilkov (Daniel Smilkov) wrote…

unfamiliar with this word: refractory. Is there a more common alternative?

Changing it to "suppression" in #61


speech-commands/src/browser_fft_extractor.ts, line 207 at r1 (raw file):

Previously, dsmilkov (Daniel Smilkov) wrote…

Any reason why stop() needs to be async? I see no await/async call inside? Do you foresee future other implementations of FeatureExtractor being async?

Yes, this is because stop() may call some WebAudio functions that are actually async.


speech-commands/src/browser_fft_extractor.ts, line 223 at r1 (raw file):

Previously, dsmilkov (Daniel Smilkov) wrote…

who else implements FeaturesExtractor? Can we change the interface to fit this need, instead of having to throw error on a method that no one implements?

@pyu10055 may use this in later PRs.


speech-commands/src/browser_fft_extractor.ts, line 270 at r1 (raw file):

Previously, dsmilkov (Daniel Smilkov) wrote…

keep this class private?

This is exported for testing. It is not included in index.ts and is hence effectively private (from public API point of view)


speech-commands/src/browser_fft_extractor.ts, line 280 at r1 (raw file):

Previously, dsmilkov (Daniel Smilkov) wrote…

Will you use this suppressionPeriod soon? Let's error on the side of less code/config/logic until we need it.

Yes, see #61.


speech-commands/src/types.ts, line 27 at r1 (raw file):

Previously, dsmilkov (Daniel Smilkov) wrote…

any reason why the user-provided recognizerCallback need to return a promise? Let's just do void

The callback can potentially be async. Type definition doesn't allow me to specify "async" here, but specifying the return type of Promise suffices.


speech-commands/src/types.ts, line 66 at r1 (raw file):

Previously, dsmilkov (Daniel Smilkov) wrote…

you can mark it as get wordLabels() { ... }

I don't think TypeScript interfaces allow you to use get in it.


speech-commands/src/types.ts, line 129 at r1 (raw file):

Previously, dsmilkov (Daniel Smilkov) wrote…

just call it RecognizerConfig to be consistent with other config interfaces.

Done in #61

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants