Hermes, the messenger of the gods, now brings you ultra-fast video transcription powered by cutting-edge AI! This Python library and CLI tool harnesses the speed of Groq and the flexibility of multiple providers to convert your videos into text with unprecedented efficiency.
- Blazing Fast: Transcribe a 393-second video in just 1 second with Groq's distil-whisper model!
- Multi-Provider Support: Choose from Groq (default), MLX Whisper, or OpenAI for transcription
- YouTube Support: Easily transcribe YouTube videos by simply passing the URL
- Flexible: Support for various models and output formats
- Python Library & CLI: Use Hermes in your Python projects or directly from the command line
- LLM Processing: Process the transcription with an LLM for further analysis
If you're using Google Colab or a Linux system like Ubuntu, you need to install some additional dependencies first. Run the following command:
!apt install libasound2-dev portaudio19-dev libportaudio2 libportaudiocpp0 ffmpeg
You can install Hermes directly from GitHub using pip. There are two installation options:
For most users, the standard installation without MLX support is recommended:
pip install git+https://github.com/unclecode/hermes.git@main
This installation includes all core features but excludes MLX-specific functionality.
If you're using a Mac or an MPS system and want to use MLX Whisper for local transcription, install Hermes with MLX support:
pip install git+https://github.com/unclecode/hermes.git@main#egg=hermes[mlx]
This installation includes all core features plus MLX Whisper support for local transcription.
Note: MLX support is currently only available for Mac or MPS systems. If you're unsure which version to install, start with the standard installation.
Hermes uses a configuration file to manage its settings. On first run, Hermes will automatically create a .hermes
folder in your home directory and populate it with a default config.yml
file.
You can customize Hermes' behavior by editing this file. Here's an example of what the config.yml
might look like:
# LLM (Language Model) settings
llm:
provider: groq
model: llama-3.1-8b-instant
api_key: your_groq_api_key_here
# Transcription settings
transcription:
provider: groq
model: distil-whisper-large-v3-en
api_key: your_groq_api_key_here
# Cache settings
cache:
enabled: true
directory: ~/.hermes/cache
# Source type for input (auto-detect by default)
source_type: auto
The configuration file is located at ~/.hermes/config.yml
. You can edit this file to change providers, models, API keys, and other settings.
Note: If you don't specify API keys in the config file, Hermes will look for them in your environment variables. For example, it will look for GROQ_API_KEY
if you're using Groq as a provider.
To override the configuration temporarily, you can also use command-line arguments when running Hermes. These will take precedence over the settings in the config file.
- Basic transcription:
from hermes import transcribe
result = transcribe('path/to/your/video.mp4', provider='groq')
print(result['transcription'])
- Transcribe a YouTube video:
result = transcribe('https://www.youtube.com/watch?v=v=PNulbFECY-I', provider='groq')
print(result['transcription'])
- Use a different model:
result = transcribe('path/to/your/video.mp4', provider='groq', model='whisper-large-v3')
print(result['transcription'])
- Get JSON output:
result = transcribe('path/to/your/video.mp4', provider='groq', response_format='json')
print(result['transcription'])
- Process with LLM:
result = transcribe('path/to/your/video.mp4', provider='groq', llm_prompt="Summarize this transcription in 3 bullet points")
print(result['llm_processed'])
- Basic usage:
hermes path/to/your/video.mp4 -p groq
- Transcribe a YouTube video:
hermes https://www.youtube.com/watch?v=v=PNulbFECY-I -p groq
- Use a different model:
hermes path/to/your/video.mp4 -p groq -m whisper-large-v3
- Get JSON output:
hermes path/to/your/video.mp4 -p groq --response_format json
- Process with LLM:
hermes path/to/your/video.mp4 -p groq --llm_prompt "Summarize this transcription in 3 bullet points"
For a 393-second video:
Provider | Model | Time (seconds) |
---|---|---|
Groq | distil-whisper-large-v3-en | 1 |
Groq | whisper-large-v3 | 2 |
MLX Whisper | distil-whisper-large-v3 | 11 |
OpenAI | whisper-1 | 21 |
Test Hermes performance with different providers and models:
python -m hermes.benchmark path/to/your/video.mp4
or
python -m hermes.benchmark https://www.youtube.com/watch?v=v=PNulbFECY-I
This will generate a performance report for all supported providers and models.
- Unmatched Speed: Groq's distil-whisper model transcribes 393 seconds of audio in just 1 second!
- Flexibility: Choose the provider that best suits your needs
- Easy Integration: Use as a Python library or CLI tool
- YouTube Support: Transcribe YouTube videos without manual downloads
- Local Option: Use MLX Whisper for fast, local transcription on Mac or MPS systems
- Cloud Power: Leverage Groq's LPU for the fastest cloud-based transcription
Huge shoutout to the @GroqInc team for their incredible distil-whisper model, making ultra-fast transcription a reality!
We're living in amazing times! Whether you need the lightning speed of Groq, the convenience of OpenAI, or the local power of MLX Whisper, Hermes has got you covered. Happy transcribing!