Skip to content

Android: Voice typing: Add setting to allow specifying a glossary #12370

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

personalizedrefrigerator
Copy link
Collaborator

Summary

This pull request adds a setting to allow users to customize the voice typing prompt. Among other things, this allows users to provide spelling and style suggestions for transcriptions.

Testing plan

I've tested this pull request by changing the prompt then starting voice typing and checking the output.

For example, on Android 13, I:

  1. Set the prompt to this text is all lowercase. longer prompts. seem to allow more unusual transcription styles. lowercase. in settings > note.
  2. Read 3-4 paragraphs of text.
  3. Checked that many of the sentences start with lowercase letters.
    • With this prompt, sentences sometimes still start with uppercase letters.

Comment on lines 1822 to 1823
'voiceTyping.prompt': {
value: '',
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
'voiceTyping.prompt': {
value: '',
'voiceTyping.prompt': {
advanced: true,
value: '',

It may make sense to move this to the advanced settings section by default.

Comment on lines 1827 to 1828
label: () => _('Voice typing prompt'),
description: () => _('A short example of transcribed text. A prompt can help correct voice typing spelling or change the style of transcription. Leave empty to use the default prompt.'),
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel this field may be difficult to use or understand as it requires knowing how Whisper works internally.

Would it make sense instead to have a "glossary" property? We ask users to input the words they want the model to understand, separated by commas. Then we automatically prefix this with "glossary:" and set that as a prompt?

Later if we find that access to the actual prompt is needed, we could have a second property for this, but even then I don't think that will be needed. For example if users say they'd like the text to be all lowercase, then we add a property "Set output to lowercase" and we provide a custom prompt ourselves.

Basically we should try to focus on the features the users need, and then convert this to a prompt. Because we know Whisper better than the user we can create better prompts based on their preferences.

Copy link
Collaborator Author

@personalizedrefrigerator personalizedrefrigerator Jun 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the feedback!

Would it make sense instead to have a "glossary" property? We ask users to input the words they want the model to understand, separated by commas. Then we automatically prefix this with "glossary:" and set that as a prompt?

Originally, my concern with a "Glossary" property was translating glossary: to all languages supported by Whisper. However, perhaps it would be fine to omit glossary: if we don't have a translation for it?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be resolved by 6d3e6cc. It currently works by:

  • Generating a "Glossary:" prompt based on the voiceTyping.glossary setting (if set).
    • If the current locale doesn't have a translation for Glossary:, the "Glossary:" prefix is omitted and the voiceTyping.glossary setting is used directly as a prompt.
  • Concatenating the "Glossary:" prompt with any existing prompt included in the model config.

@personalizedrefrigerator
Copy link
Collaborator Author

I'm converting this to a draft until the changes from 6d3e6cc have been manually tested.

@personalizedrefrigerator personalizedrefrigerator marked this pull request as draft June 6, 2025 15:59
@personalizedrefrigerator
Copy link
Collaborator Author

While testing this with longer audio segments on a low-end device, I've observed several app crashes (perhaps due to high memory usage?). I suspect that the crashes are related to #12352 and not this pull request.

@personalizedrefrigerator
Copy link
Collaborator Author

Marking as ready for review — the issue doesn't seem related to this PR.

@personalizedrefrigerator personalizedrefrigerator marked this pull request as ready for review June 6, 2025 20:07
@@ -10,6 +10,12 @@ By default, Joplin uses Whisper.cpp for voice typing.

Whisper.cpp provides a number of pre-trained models for transcribing speech in different languages. Both [English-only and multilingual models](https://github.com/openai/whisper?tab=readme-ov-file#available-models-and-languages) are available. The multilingual models support a variety of different languages. Joplin uses the smallest of the multilingual models by default.

### Preventing spelling mistakes

Joplin allows specifying a glossary for voice typing using the "Voice typing: Glossary" setting (in the "Note" section of settings). Including uncommon words in the glossary makes voice typing more likely to spell them correctly. For example, providing `Scott Joplin, ragtime.` as the prompt helps voice typing correctly spell "Scott Joplin" and "ragtime".
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Joplin allows specifying a glossary for voice typing using the "Voice typing: Glossary" setting (in the "Note" section of settings). Including uncommon words in the glossary makes voice typing more likely to spell them correctly. For example, providing `Scott Joplin, ragtime.` as the prompt helps voice typing correctly spell "Scott Joplin" and "ragtime".
Joplin allows specifying a glossary for voice typing using the "Voice typing: Glossary" setting (in the "Note" section of settings). Including uncommon words in the glossary makes voice typing more likely to spell them correctly. For example, providing `Scott Joplin, ragtime.` as the glossary helps voice typing correctly spell "Scott Joplin" and "ragtime".

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion applied in 6fcae46.

public: true,
appTypes: [AppType.Mobile],
label: () => _('Voice typing: Glossary'),
description: () => _('A comma-separated list of words'),
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
description: () => _('A comma-separated list of words'),
description: () => _('A comma-separated list of words. May be used for uncommon words, to ensures that voice-typing spells them correctly.'),

Copy link
Collaborator Author

@personalizedrefrigerator personalizedrefrigerator Jun 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replaced ensures with help, since this setting does not guarantee that voice typing will spell the glossary words correctly. Edit: With this replacement, the suggestion has been applied in f47306f.

@personalizedrefrigerator personalizedrefrigerator changed the title Android: Voice typing: Add setting to allow customizing the prompt Android: Voice typing: Add setting to allow specifying a glossary Jun 12, 2025
@laurent22 laurent22 merged commit 6a5c85d into laurent22:dev Jun 28, 2025
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants