CAPTION.Ninja

A free-to-use captioning, transcription, and real-time translation tool for live streams, presentations, and more.

Demo video: https://www.youtube.com/watch?v=v7172QO8z6c

Quick Start Guide

Open https://caption.ninja in a supported browser (Chrome or Edge recommended)
Accept microphone permissions when prompted
Start speaking - your words will be transcribed automatically
Access the overlay URL (provided on the page) to display captions in OBS or other streaming software

How It Works

CAPTION.Ninja leverages your browser's built-in speech recognition capabilities to perform real-time transcription:

Your browser captures audio from your default microphone (or virtual audio device)
Browser-based speech recognition converts the audio to text
The text is sent through a websocket server to any connected overlay pages
Overlay pages display the text with customizable formatting

The application runs entirely in your browser - no software installation required. Speech-to-text processing is handled by Google's speech recognition services (through the browser), while optional translation features use either Mozilla's free translation service or Google Cloud Translation API.

Browser Compatibility

For best results, use Google Chrome or Microsoft Edge. These browsers provide the most reliable speech recognition services.

Important Note: Firefox does not currently include free speech-to-text capabilities, making it unsuitable for the main transcription page. However, Firefox can still be used for displaying the overlay page.

Some users report Chrome has issues with text truncation, so Edge may provide more consistent results.

Setting Up for Streaming

Basic Setup

Open CAPTION.Ninja in Chrome/Edge and allow microphone access
Copy the overlay URL provided on the page
Add the overlay URL as a Browser Source in OBS Studio, vMix, or similar software
Customize the appearance using CSS as needed (see customization section below)

Using with Electron Capture

For desktop applications that need captions overlay, use the Electron Capture app: https://github.com/steveseguin/electroncapture

This allows you to pin the captions on top of other applications on your desktop.

Using Non-Microphone Audio Sources

CAPTION.Ninja uses your system's default recording device. To capture audio from other sources:

Virtual Audio Cable Method

Using a virtual audio cable allows you to route audio from any application to CAPTION.Ninja:

Install a virtual audio cable solution like VB-Audio Cable
Set the virtual cable as your default recording device in your system sound settings
Route audio from your desired source (media player, streaming site, etc.) to the virtual cable
CAPTION.Ninja will now transcribe audio from any application sending to the virtual cable

This technique works for:

YouTube or Twitch live streams
Audio from video files
System sounds
Audio from other applications like Zoom or Teams
Game audio

The virtual audio cable acts as a bridge between your audio sources and CAPTION.Ninja, effectively turning any audio into captions.

Translation Features

CAPTION.Ninja offers multiple ways to translate content:

Method 1: Dedicated Translation Page

Use https://caption.ninja/translate for real-time translation capabilities:

Select source and target languages from the dropdown menus
Browser-based transcription + Mozilla's free translation service
Optional Google Cloud Translation integration for premium results
Works with the same overlay system

Method 2: Multiple Language Outputs from Single Source

A more efficient approach for multiple language support:

Use the standard capture page (index.html) with your preferred input language
Create multiple overlay pages with different target languages by adding the &translate=XX parameter
Share these overlay URLs with viewers who need different languages

Example:

Main Capture: https://caption.ninja/?room=abc123&lang=en-US
English Overlay: https://caption.ninja/overlay?room=abc123
Spanish Overlay: https://caption.ninja/overlay?room=abc123&translate=es
French Overlay: https://caption.ninja/overlay?room=abc123&translate=fr
German Overlay: https://caption.ninja/overlay?room=abc123&translate=de

Benefits of this approach:

Single transcription source with multiple translation outputs
No need to run multiple browser tabs for different languages
Lower resource usage on the broadcasting computer
Viewers select their preferred language by accessing the appropriate URL
Translation processing happens in the viewer's browser

Note: The translation quality using this method relies on the viewer's browser capabilities and may vary compared to the dedicated translation page.

Language Support

Default language is &lang=en-US. Change the language by adding a language code parameter.

Supported language codes: https://cloud.google.com/speech-to-text/docs/languages

Manual Text Entry Mode

For situations where automatic transcription isn't ideal, use manual text entry: https://caption.ninja/manual.html

This lets you type captions directly, which appear on the same overlay system.

Customizing Appearance

Changing Font Size and Styling

You can customize the CSS in several ways:

Self-host just the overlay.html file and modify it
Use OBS Browser Source CSS overrides
Use the following CSS as a starting point:

.output {
    margin: 0;
    background-color: #0000;
    color: white;
    font-family: Cousine, monospace;
    font-size: 3.2em;
    line-height: 1.1em;
    letter-spacing: 0.0em;
    padding: 0em;
    text-shadow: 0.05em 0.05em 0px rgb(0 0 0);
}

Using Custom Fonts

For non-standard fonts, you can use Base64 encoding:

Use a tool like WOFF to Base64 or Transfonter
Find a font, like Atari ST 8x16 System Font
Apply the Base64 font to your OBS browser source CSS:

body { 
  background-color: rgba(0, 0, 0, 0); margin: 0px auto; overflow: hidden; 
}
.output{
 font-family: "Atari ST 8x16 System Font", Cousine, monospace;
}
@font-face { 
  font-family: "Atari ST 8x16 System Font";
  font-weight: 100 900;
  font-style: normal italic;
  src: url(data:application/octet-stream;base64,AAEAAAAOAIAAAwBgRkZUTWXP4NIAAIdkAAAAHEdERUYADwAeAACHRAAAAB5PUy8yY0WLpAAAAWgAAABgY21hcJmJPykAAAPUAAAD7mN2dCAANQP1AAAHxAAAAARnYXNw//8AAwAAhzwAAAAIZ2x5Zpiad3sAAAnMAAB1NGhlYWT70........AAAwBgRkZUTWIKM=);
}

The base64 string will be quite long, which is normal.

Additional Features

Adding Labels

Add &label=xxx to the capture page to give the outbound messages a label:

https://caption.ninja/?room=abc123&label=steve

For HTML-enabled labels, add &html to the overlay page:

https://caption.ninja/?room=abc123&label=<b>steve</b>
https://caption.ninja/overlay?room=abc123&html

Caption Display Time

Specify how long messages stay visible with:

&showtime=5000

Time is in milliseconds. Setting to 0 will disable auto-hiding.

Saving Transcriptions

To save the transcription:

Select all text (Ctrl+A)
Copy the selected text (Ctrl+C)
Paste into a text editor (Ctrl+V)

Alternatively, use the "Download transcription" button that appears during sessions.

Self-Hosting

Self-hosting is possible for free:

Fork this Github repository
Use Github Pages to host the website
Modify the code as needed for custom styling, domain name, etc.

For additional privacy, deploy your own websocket server: https://github.com/steveseguin/websocket_server/

Note: The actual voice-to-text transcriptions typically use Google cloud servers, so full self-hosting of that component isn't possible in most cases. However, some devices (like Pixel smartphones) may do on-device voice-to-text.

The Mozilla-powered translation component can be deployed from https://github.com/mozilla/translate if you want the free translation component.

Need Support?

Free support is available at https://discord.vdo.ninja

Ask for @steve for help in the #miscellaneous or #vdo-ninja-support channels.

For email support: steve@seguin.email (support is limited and not guaranteed)

Disclaimers

I am not responsible if this app fails to work, service violations, or whatever else. It is provided as-is without warranty or support. I do not take responsibility for any liability.

You are responsible for your own premium service API keys and fees.

Private data may be made available to Google, Microsoft, and other cloud providers, for the purpose of providing their services. Data is also sent over a hosted websocket channel, which can be publicly listened to by anyone if they know the session/room ID, but this hosted websocket server does not collect said messaging data -- it's just routed.

That said, things change, and problems occur, so you accept any risks to using this service.

License

Fonts are provided with their own license; apache 2.0 I believe, but confirm yourself.

The free translation component is powered by Mozilla Translate; https://github.com/mozilla/translate - MPL 2.0 - Mozilla

As per CAPTION.NInja, to keep in spirit of what Mozilla has created, the code here contributed as part of this CAPTION.Ninja project is also made available as MPL 2.0.

Name		Name	Last commit message	Last commit date
Latest commit History 84 Commits
fonts		fonts
LICENSE		LICENSE
README.md		README.md
bergamot-translator-worker.js		bergamot-translator-worker.js
bergamot-translator-worker.wasm		bergamot-translator-worker.wasm
index.html		index.html
manual.html		manual.html
overlay.html		overlay.html
security-utils.js		security-utils.js
test-security-warning.html		test-security-warning.html
translate.html		translate.html
translate_premium.html		translate_premium.html
worker.js		worker.js

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

CAPTION.Ninja

Quick Start Guide

How It Works

Browser Compatibility

Setting Up for Streaming

Basic Setup

Using with Electron Capture

Using Non-Microphone Audio Sources

Virtual Audio Cable Method

Translation Features

Method 1: Dedicated Translation Page

Method 2: Multiple Language Outputs from Single Source

Language Support

Manual Text Entry Mode

Customizing Appearance

Changing Font Size and Styling

Using Custom Fonts

Additional Features

Adding Labels

Caption Display Time

Saving Transcriptions

Self-Hosting

Need Support?

Disclaimers

License

About

Uh oh!

Releases

Packages

Languages

License

steveseguin/captionninja

Folders and files

Latest commit

History

Repository files navigation

CAPTION.Ninja

Quick Start Guide

How It Works

Browser Compatibility

Setting Up for Streaming

Basic Setup

Using with Electron Capture

Using Non-Microphone Audio Sources

Virtual Audio Cable Method

Translation Features

Method 1: Dedicated Translation Page

Method 2: Multiple Language Outputs from Single Source

Language Support

Manual Text Entry Mode

Customizing Appearance

Changing Font Size and Styling

Using Custom Fonts

Additional Features

Adding Labels

Caption Display Time

Saving Transcriptions

Self-Hosting

Need Support?

Disclaimers

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages