-
-
Notifications
You must be signed in to change notification settings - Fork 5.4k
Android: Voice typing: Add setting to allow specifying a glossary #12370
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Android: Voice typing: Add setting to allow specifying a glossary #12370
Conversation
'voiceTyping.prompt': { | ||
value: '', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
'voiceTyping.prompt': { | |
value: '', | |
'voiceTyping.prompt': { | |
advanced: true, | |
value: '', |
It may make sense to move this to the advanced settings section by default.
label: () => _('Voice typing prompt'), | ||
description: () => _('A short example of transcribed text. A prompt can help correct voice typing spelling or change the style of transcription. Leave empty to use the default prompt.'), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel this field may be difficult to use or understand as it requires knowing how Whisper works internally.
Would it make sense instead to have a "glossary" property? We ask users to input the words they want the model to understand, separated by commas. Then we automatically prefix this with "glossary:" and set that as a prompt?
Later if we find that access to the actual prompt is needed, we could have a second property for this, but even then I don't think that will be needed. For example if users say they'd like the text to be all lowercase, then we add a property "Set output to lowercase" and we provide a custom prompt ourselves.
Basically we should try to focus on the features the users need, and then convert this to a prompt. Because we know Whisper better than the user we can create better prompts based on their preferences.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the feedback!
Would it make sense instead to have a "glossary" property? We ask users to input the words they want the model to understand, separated by commas. Then we automatically prefix this with "glossary:" and set that as a prompt?
Originally, my concern with a "Glossary" property was translating glossary:
to all languages supported by Whisper. However, perhaps it would be fine to omit glossary:
if we don't have a translation for it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be resolved by 6d3e6cc. It currently works by:
- Generating a "Glossary:" prompt based on the
voiceTyping.glossary
setting (if set).- If the current locale doesn't have a translation for
Glossary:
, the "Glossary:" prefix is omitted and thevoiceTyping.glossary
setting is used directly as a prompt.
- If the current locale doesn't have a translation for
- Concatenating the "Glossary:" prompt with any existing prompt included in the model config.
I'm converting this to a draft until the changes from 6d3e6cc have been manually tested. |
While testing this with longer audio segments on a low-end device, I've observed several app crashes (perhaps due to high memory usage?). I suspect that the crashes are related to #12352 and not this pull request. |
Marking as ready for review — the issue doesn't seem related to this PR. |
readme/dev/spec/voice_typing.md
Outdated
@@ -10,6 +10,12 @@ By default, Joplin uses Whisper.cpp for voice typing. | |||
|
|||
Whisper.cpp provides a number of pre-trained models for transcribing speech in different languages. Both [English-only and multilingual models](https://github.com/openai/whisper?tab=readme-ov-file#available-models-and-languages) are available. The multilingual models support a variety of different languages. Joplin uses the smallest of the multilingual models by default. | |||
|
|||
### Preventing spelling mistakes | |||
|
|||
Joplin allows specifying a glossary for voice typing using the "Voice typing: Glossary" setting (in the "Note" section of settings). Including uncommon words in the glossary makes voice typing more likely to spell them correctly. For example, providing `Scott Joplin, ragtime.` as the prompt helps voice typing correctly spell "Scott Joplin" and "ragtime". |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Joplin allows specifying a glossary for voice typing using the "Voice typing: Glossary" setting (in the "Note" section of settings). Including uncommon words in the glossary makes voice typing more likely to spell them correctly. For example, providing `Scott Joplin, ragtime.` as the prompt helps voice typing correctly spell "Scott Joplin" and "ragtime". | |
Joplin allows specifying a glossary for voice typing using the "Voice typing: Glossary" setting (in the "Note" section of settings). Including uncommon words in the glossary makes voice typing more likely to spell them correctly. For example, providing `Scott Joplin, ragtime.` as the glossary helps voice typing correctly spell "Scott Joplin" and "ragtime". |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion applied in 6fcae46.
public: true, | ||
appTypes: [AppType.Mobile], | ||
label: () => _('Voice typing: Glossary'), | ||
description: () => _('A comma-separated list of words'), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
description: () => _('A comma-separated list of words'), | |
description: () => _('A comma-separated list of words. May be used for uncommon words, to ensures that voice-typing spells them correctly.'), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replaced ensures
with help
, since this setting does not guarantee that voice typing will spell the glossary words correctly. Edit: With this replacement, the suggestion has been applied in f47306f.
Co-authored-by: Laurent Cozic <laurent22@users.noreply.github.com>
Summary
This pull request adds a setting to allow users to customize the voice typing prompt. Among other things, this allows users to provide spelling and style suggestions for transcriptions.
Testing plan
I've tested this pull request by changing the prompt then starting voice typing and checking the output.
For example, on Android 13, I:
this text is all lowercase. longer prompts. seem to allow more unusual transcription styles. lowercase.
in settings > note.