A free-to-use captioning, transcription, and real-time translation tool for live streams, presentations, and more.
Demo video: https://www.youtube.com/watch?v=v7172QO8z6c
- Open https://caption.ninja in a supported browser (Chrome or Edge recommended)
- Accept microphone permissions when prompted
- Start speaking - your words will be transcribed automatically
- Access the overlay URL (provided on the page) to display captions in OBS or other streaming software
CAPTION.Ninja leverages your browser's built-in speech recognition capabilities to perform real-time transcription:
- Your browser captures audio from your default microphone (or virtual audio device)
- Browser-based speech recognition converts the audio to text
- The text is sent through a websocket server to any connected overlay pages
- Overlay pages display the text with customizable formatting
The application runs entirely in your browser - no software installation required. Speech-to-text processing is handled by Google's speech recognition services (through the browser), while optional translation features use either Mozilla's free translation service or Google Cloud Translation API.
For best results, use Google Chrome or Microsoft Edge. These browsers provide the most reliable speech recognition services.
Important Note: Firefox does not currently include free speech-to-text capabilities, making it unsuitable for the main transcription page. However, Firefox can still be used for displaying the overlay page.
Some users report Chrome has issues with text truncation, so Edge may provide more consistent results.
- Open CAPTION.Ninja in Chrome/Edge and allow microphone access
- Copy the overlay URL provided on the page
- Add the overlay URL as a Browser Source in OBS Studio, vMix, or similar software
- Customize the appearance using CSS as needed (see customization section below)
For desktop applications that need captions overlay, use the Electron Capture app: https://github.com/steveseguin/electroncapture
This allows you to pin the captions on top of other applications on your desktop.
CAPTION.Ninja uses your system's default recording device. To capture audio from other sources:
Using a virtual audio cable allows you to route audio from any application to CAPTION.Ninja:
- Install a virtual audio cable solution like VB-Audio Cable
- Set the virtual cable as your default recording device in your system sound settings
- Route audio from your desired source (media player, streaming site, etc.) to the virtual cable
- CAPTION.Ninja will now transcribe audio from any application sending to the virtual cable
This technique works for:
- YouTube or Twitch live streams
- Audio from video files
- System sounds
- Audio from other applications like Zoom or Teams
- Game audio
The virtual audio cable acts as a bridge between your audio sources and CAPTION.Ninja, effectively turning any audio into captions.
CAPTION.Ninja offers multiple ways to translate content:
Use https://caption.ninja/translate for real-time translation capabilities:
- Select source and target languages from the dropdown menus
- Browser-based transcription + Mozilla's free translation service
- Optional Google Cloud Translation integration for premium results
- Works with the same overlay system
A more efficient approach for multiple language support:
- Use the standard capture page (index.html) with your preferred input language
- Create multiple overlay pages with different target languages by adding the
&translate=XX
parameter - Share these overlay URLs with viewers who need different languages
Example:
Main Capture: https://caption.ninja/?room=abc123&lang=en-US
English Overlay: https://caption.ninja/overlay?room=abc123
Spanish Overlay: https://caption.ninja/overlay?room=abc123&translate=es
French Overlay: https://caption.ninja/overlay?room=abc123&translate=fr
German Overlay: https://caption.ninja/overlay?room=abc123&translate=de
Benefits of this approach:
- Single transcription source with multiple translation outputs
- No need to run multiple browser tabs for different languages
- Lower resource usage on the broadcasting computer
- Viewers select their preferred language by accessing the appropriate URL
- Translation processing happens in the viewer's browser
Note: The translation quality using this method relies on the viewer's browser capabilities and may vary compared to the dedicated translation page.
Default language is &lang=en-US
. Change the language by adding a language code parameter.
Supported language codes: https://cloud.google.com/speech-to-text/docs/languages
For situations where automatic transcription isn't ideal, use manual text entry: https://caption.ninja/manual.html
This lets you type captions directly, which appear on the same overlay system.
You can customize the CSS in several ways:
- Self-host just the overlay.html file and modify it
- Use OBS Browser Source CSS overrides
- Use the following CSS as a starting point:
.output {
margin: 0;
background-color: #0000;
color: white;
font-family: Cousine, monospace;
font-size: 3.2em;
line-height: 1.1em;
letter-spacing: 0.0em;
padding: 0em;
text-shadow: 0.05em 0.05em 0px rgb(0 0 0);
}
For non-standard fonts, you can use Base64 encoding:
- Use a tool like WOFF to Base64 or Transfonter
- Find a font, like Atari ST 8x16 System Font
- Apply the Base64 font to your OBS browser source CSS:
body {
background-color: rgba(0, 0, 0, 0); margin: 0px auto; overflow: hidden;
}
.output{
font-family: "Atari ST 8x16 System Font", Cousine, monospace;
}
@font-face {
font-family: "Atari ST 8x16 System Font";
font-weight: 100 900;
font-style: normal italic;
src: url(data:application/octet-stream;base64,AAEAAAAOAIAAAwBgRkZUTWXP4NIAAIdkAAAAHEdERUYADwAeAACHRAAAAB5PUy8yY0WLpAAAAWgAAABgY21hcJmJPykAAAPUAAAD7mN2dCAANQP1AAAHxAAAAARnYXNw//8AAwAAhzwAAAAIZ2x5Zpiad3sAAAnMAAB1NGhlYWT70........AAAwBgRkZUTWIKM=);
}
The base64 string will be quite long, which is normal.
Add &label=xxx
to the capture page to give the outbound messages a label:
https://caption.ninja/?room=abc123&label=steve
For HTML-enabled labels, add &html
to the overlay page:
https://caption.ninja/?room=abc123&label=<b>steve</b>
https://caption.ninja/overlay?room=abc123&html
Specify how long messages stay visible with:
&showtime=5000
Time is in milliseconds. Setting to 0 will disable auto-hiding.
To save the transcription:
- Select all text (Ctrl+A)
- Copy the selected text (Ctrl+C)
- Paste into a text editor (Ctrl+V)
Alternatively, use the "Download transcription" button that appears during sessions.
Self-hosting is possible for free:
- Fork this Github repository
- Use Github Pages to host the website
- Modify the code as needed for custom styling, domain name, etc.
For additional privacy, deploy your own websocket server: https://github.com/steveseguin/websocket_server/
Note: The actual voice-to-text transcriptions typically use Google cloud servers, so full self-hosting of that component isn't possible in most cases. However, some devices (like Pixel smartphones) may do on-device voice-to-text.
The Mozilla-powered translation component can be deployed from https://github.com/mozilla/translate if you want the free translation component.
Free support is available at https://discord.vdo.ninja
Ask for @steve for help in the #miscellaneous or #vdo-ninja-support channels.
For email support: steve@seguin.email (support is limited and not guaranteed)
I am not responsible if this app fails to work, service violations, or whatever else. It is provided as-is without warranty or support. I do not take responsibility for any liability.
You are responsible for your own premium service API keys and fees.
Private data may be made available to Google, Microsoft, and other cloud providers, for the purpose of providing their services. Data is also sent over a hosted websocket channel, which can be publicly listened to by anyone if they know the session/room ID, but this hosted websocket server does not collect said messaging data -- it's just routed.
That said, things change, and problems occur, so you accept any risks to using this service.
Fonts are provided with their own license; apache 2.0 I believe, but confirm yourself.
The free translation component is powered by Mozilla Translate; https://github.com/mozilla/translate - MPL 2.0 - Mozilla
As per CAPTION.NInja, to keep in spirit of what Mozilla has created, the code here contributed as part of this CAPTION.Ninja project is also made available as MPL 2.0.