Lightweight Speech to text desktop app for OSX Using IBM Watson API
This app was an initial prototype to test the quality of IBM STT, and is no longer activly supported I am now working on a more full fledge version at https://github.com/OpenNewsLabs/autoEdit_2 ( http://www.autoedit.io ).
IBM Speech to text API
To use this app you need to get IBM Watson API keys for their speech to text service, by making an account with Bluemix
Usage - Development
If you clone the repo you can start the app with
Usage - User
Or you can get the latest release packaged and ready for use here
This is a Tray Menu app.
Select Media, audio or video you'd like to transcribe.
Notifications show when a transcription as started and when it's finished.
On completion a editable text area shows you the transcription.
By default the transcription is also saved to clipboard.
You can disable
Autosave to clipboard if working on text editing or making use of the system clipboard for some other program to avoid it overwriting something else you might be doing with it.
Setting IBM Watson API keys
First time you start the application you'd be prompt to set the API keys.
Should you need to change those you can use shortcut
cmd + shift + a.
These are saved inside the app as a json file
wttskeys.json at the root of the application.
Which is in the
.gitignore so that it doesn't accidentally gets added to git by mistake, when in development mode.
Overview of project
- Once you select a video, the app converts it into audio and sends it to the IBM Speech to text API.
- When the transcription comes back it's copied to clipboard, unless you un-tick the option in the menu.
- Paste the transcription wherever you want and take it from there.
Convert video to audio
Audio files are saved in
IBM Speech to text API
stt folder contains the module to interact with the IBM Speech to text API.
If you want to dive more into this their documentation on how to interact with the API is pretty good.
transcribe.js requires both modules described above and brings it all together.
Converts audio into video, and then sends it to Watson for transcriptions. Transcriptions are saved onto a text file in
module returns the path to the text file.
transcribe.js in case the interface needs to change at a later stage.
indext.html contains the Implementation of the NWJS app.
Adding Menu Tray to the application.
When a user selects a video it's transcribed, appropriate system notifications for start and end are triggered.
When done unless option is un-ticked transcription is saved to clipboard.
in which case user can click on
Copy transcriptions to cliboard to get the transcriptions.
Build NWJS app
Use deploy script
This creates a build folder inside the repo. The build folder is also in
.gitignore to avoid accidentally pushing it to remote.
To rebuild the app in NWJS refer to the documentation
npm install -g nw-builder
From one level above the application folder (
cd .. from root of repo)
nwbuild -p osx64 ./transcriber
build folder that contains the app
- Write proper test using testing framework.
- Ad some proper form of error handling
- IBM has a size limit of 100mb per audio post request. Double check if there's a use case when converting video to audio it exceeds that size. Rough test with 54gb video to audio with that module ended up 50 to 70 mb. So it would seem ok for now?