Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New: add(words) API and some code improvements #34

Merged
merged 4 commits into from May 25, 2017

Conversation

BrunoBerisso
Copy link
Contributor

This branch has some general code improvements (fix access rights, prefer guard over if, etc) and three important changes:

  1. Add a new API to add words to the recognition dictionary on runtime. Be aware that new words can't be added while a recognition is in progress. You should add new words before start a recognition process.
    The API expect an array of tuples of String with the form: (word: "HELLO", phones: "HH EH L OW"). The first component is the word in plain English. The second is the pronunciation phones as appear in the cmudict (more here: http://www.speech.cs.cmu.edu/tools/lextool.html) In the future the second component should be calculated

  2. The decode functions now throw exceptions when apply.

  3. There is a new approach to the live decode logic with AVAudioConverter. The idea is read the data in a more appealing format for iOS (float 32, 16000 Hz) and convert it to the Sphinx format (int 16, 16000Hz). AVAudioConverter is only available from iOS 9.0 so the deployment target needs to change. This should address Device does not support required sample rate recording #24 and ps_add_word #33

Please let everybody know your thoughts about this changes.
Thanks!

Bruno Berisso added 4 commits January 24, 2017 12:13
…nstead of open. The same goes for the functions
- Chenge some 'if' statements for 'guards', mostely in the tests
- Use STrue | SFalse instead of 1 | 0 to denote true | false when applicable
…gin in live decoding. The idea is read the data in a more appealing format for iOS (float 32, 16000 Hz) and convert it (with AVAudioConverter) to the Sphinx format (int 16, 16000Hz). AVAudioConverter is only available from iOS 9.0 so the deployment traget needs to change.
…Be aware that new words can't be added while a recognition is in progress. You should add new words before start a recognition process.

The API expect an array of tuples of String with the form: (word: 'HELLO', phones: 'HH EH L OW'). The first component is the word in plain English. The second is the pronunciation phones as appear in the cmudict (more here: http://www.speech.cs.cmu.edu/tools/lextool.html) In the future the second component should be calculated
@BrunoBerisso BrunoBerisso merged commit c05fdef into development May 25, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant