# Hermes: Lightning-Fast Video Transcription Tutorial

## Introduction

Welcome to this tutorial on Hermes, a powerful Python library and CLI tool for lightning-fast video transcription! Developed by [@unclecode](https://twitter.com/unclecode) and powered by cutting-edge AI, Hermes leverages the speed of Groq and the flexibility of multiple providers (Groq, MLX Whisper, and OpenAI) to convert your videos into text.

Before we dive in, head over to the GitHub repo and show your support:

- **Star the repo:** https://github.com/unclecode/hermes
- **Follow me on X:** [@unclecode](https://twitter.com/unclecode)

## Installation

Let's get Hermes installed! Use pip to install directly from GitHub:

In [4]:
!apt install libasound2-dev portaudio19-dev libportaudio2 libportaudiocpp0 ffmpeg

Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
libasound2-dev is already the newest version (1.2.6.1-1ubuntu1).
ffmpeg is already the newest version (7:4.4.2-0ubuntu0.22.04.1).
Suggested packages:
  portaudio19-doc
The following NEW packages will be installed:
  libportaudio2 libportaudiocpp0 portaudio19-dev
0 upgraded, 3 newly installed, 0 to remove and 45 not upgraded.
Need to get 188 kB of archives.
After this operation, 927 kB of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu jammy/universe amd64 libportaudio2 amd64 19.6.0-1.1 [65.3 kB]
Get:2 http://archive.ubuntu.com/ubuntu jammy/universe amd64 libportaudiocpp0 amd64 19.6.0-1.1 [16.1 kB]
Get:3 http://archive.ubuntu.com/ubuntu jammy/universe amd64 portaudio19-dev amd64 19.6.0-1.1 [106 kB]
Fetched 188 kB in 0s (578 kB/s)
Selecting previously unselected package libportaudio2:amd64.
(Reading database ... 123595 files and directories currently installed.)
Pre

In [1]:
!pip install git+https://github.com/unclecode/hermes.git@main

Collecting git+https://github.com/unclecode/hermes.git@main
  Cloning https://github.com/unclecode/hermes.git (to revision main) to /tmp/pip-req-build-xqallhm5
  Running command git clone --filter=blob:none --quiet https://github.com/unclecode/hermes.git /tmp/pip-req-build-xqallhm5
  Resolved https://github.com/unclecode/hermes.git to commit 1dde137d1f7b0c1eefab8d76353c68e1fe36b31b
  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting yt-dlp>=2024.8.6 (from hermes==0.1.0)
  Downloading yt_dlp-2024.8.6-py3-none-any.whl.metadata (170 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m170.1/170.1 kB[0m [31m1.4 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting ffmpeg-python>=0.2.0 (from hermes==0.1.0)
  Downloading ffmpeg_python-0.2.0-py3-none-any.whl.metadata (1.7 kB)
Collecting openai>=1.42.0 (from hermes==0.1.0)
  Downloading openai-1.42.0-py3-none-any.whl.metadata (22 kB)
Collecting groq>=0.9.0 (from hermes==0.1.0)
  Downloading groq-0.9.0-py3-none-any.whl.met

## Python Library

### Basic Transcription

Here's how to transcribe a local video file using the `transcribe` function with Groq as the provider:

In [1]:
import os
from google.colab import userdata
os.environ['GROQ_API_KEY'] = userdata.get('GROQ_API_KEY')
from hermes import transcribe

# Replace with the actual path to your video file
video_file = 'input.mp4'

result = transcribe(video_file, provider='groq')

# Print the transcription
print(result['transcription'])

 Hello, your beautiful people. This is Uncle Code and today I'm going to review quickly the Q1-2 function calling ability. This model is really, to me, is very interesting. First of all, it came up with the really good stuff with 72 billion models and you can take a look, check their blog and especially their instruct model. And also it supports the majority of the language in the world, the regions, the major regions. And that is great because this is a large language model sounds like all of us and it's cool and go and play around with it so what I'm what I'm trying to do is trying to challenge a little bit the function calling like similar the thing that I did for other models like the mistral so first they have this nice library the coin agents that led you to create agentic software and applications and also speed up the work with the large language model what you can just install it quickly and then there are different ways that you can work with the model you can maybe use Olama

**Explanation:**

- We import the `transcribe` function from the `hermes` library.
- We provide the path to our video file.
- We specify `provider='groq'` to use Groq's powerful transcription models.
- The `transcribe` function returns a dictionary containing the transcription and other metadata.

### Transcribing YouTube Videos

Transcribing YouTube videos is a breeze with Hermes. Simply pass the YouTube URL to the `transcribe` function:

In [2]:
from hermes import transcribe

youtube_url = 'https://www.youtube.com/watch?v=PNulbFECY-I'  # Example URL

result = transcribe(youtube_url, provider='groq')
print(result['transcription'])

[youtube] Extracting URL: https://www.youtube.com/watch?v=dQw4w9WgXcQ
[youtube] dQw4w9WgXcQ: Downloading webpage
[youtube] dQw4w9WgXcQ: Downloading ios player API JSON
[youtube] dQw4w9WgXcQ: Downloading web creator player API JSON
[youtube] dQw4w9WgXcQ: Downloading player a87a9450
[youtube] dQw4w9WgXcQ: Downloading m3u8 information
[info] dQw4w9WgXcQ: Downloading 1 format(s): 251
[download] Destination: dQw4w9WgXcQ.webm
[download] 100% of    3.28MiB in 00:00:00 at 10.12MiB/s  
[ExtractAudio] Destination: dQw4w9WgXcQ.mp3
Deleting original file dQw4w9WgXcQ.webm (pass -k to keep)
 We're no strange to love. You know the rules, and so do I. I feel commitments want to thinking us. You wouldn't get this from any other guy I just want to tell you how I'm feeling Gotta make you understand Never gonna give you up Never gonna let you down Never gonna run around and desert you Never gonna make you cry Never gonna say goodbye Never gonna tell the lie And hurt you We've known each other for so long.

**Explanation:**

- Hermes handles the YouTube video download automatically.
- No need to manually download the video!

### Using Different Models

Hermes supports various transcription models. You can specify the desired model using the `model` parameter:

In [3]:
from hermes import transcribe

video_file = 'input.mp4'

result = transcribe(video_file, provider='groq', model='whisper-large-v3')
print(result['transcription'])

 Hello, you beautiful people. This is Uncle Code. And today I'm going to review quickly the Q1.2 function calling ability. This model is really, to me, is very interesting. First of all, it came up with a really good stuff with 72 billion models. And you can take a look, check their blog and especially their instruct model. And also it supports the majority of the language in the world, the regions, the major regions, and that is great because this is a large language model. sounds like all of us. And it's cool, and go and play around with it. So what I'm trying to do is trying to challenge a little bit the function calling. Like similar to the thing that I did for other models like the Mistral. So first, they have this nice library, the Cohen agents, that let you to create agent-next softwares and applications and also speed up the work with the large language model. What you can just install it quickly. And then there are different ways that you can work with the model. You can maybe

**Explanation:**

- Here, we use the `whisper-large-v3` model instead of the default `distil-whisper` model.

### JSON Output and LLM Processing

- To get the transcription in JSON format, set `response_format='json'`.
- To further process the transcription with an LLM (e.g., for summarization), use the `llm_prompt` parameter:

In [4]:
from hermes import transcribe

video_file = 'input.mp4'

# Get JSON output
result = transcribe(video_file, provider='groq', response_format='json')
print(result['transcription'])

# Summarize with LLM
result = transcribe(video_file, provider='groq', llm_prompt="Summarize this transcription in 3 bullet points")
print(result['llm_processed'])

{"text":" Hello, your beautiful people. This is Uncle Code and today I'm going to review quickly the Q1-2 function calling ability. This model is really, to me, is very interesting. First of all, it came up with the really good stuff with 72 billion models and you can take a look, check their blog and especially their instruct model. And also it supports the majority of the language in the world, the regions, the major regions. And that is great because this is a large language model sounds like all of us and it's cool and go and play around with it so what I'm what I'm trying to do is trying to challenge a little bit the function calling like similar the thing that I did for other models like the mistral so first they have this nice library the coin agents that led you to create agentic software and applications and also speed up the work with the large language model what you can just install it quickly and then there are different ways that you can work with the model you can maybe 

**Explanation:**

-  LLM processing requires an API key for the LLM provider (e.g., Groq). Make sure to set it up in your `~/.hermes/config.yml` file or as an environment variable.

## Command Line Interface (CLI)

Hermes also provides a convenient CLI for transcribing videos. Here are some examples:

### Basic Usage

In [5]:
!hermes input.mp4 -p groq

 Hello, your beautiful people. This is Uncle Code and today I'm going to review quickly the Q1-2 function calling ability. This model is really, to me, is very interesting. First of all, it came up with the really good stuff with 72 billion models and you can take a look, check their blog and especially their instruct model. And also it supports the majority of the language in the world, the regions, the major regions. And that is great because this is a large language model sounds like all of us and it's cool and go and play around with it so what I'm what I'm trying to do is trying to challenge a little bit the function calling like similar the thing that I did for other models like the mistral so first they have this nice library the coin agents that led you to create agentic software and applications and also speed up the work with the large language model what you can just install it quickly and then there are different ways that you can work with the model you can maybe use Olama

### YouTube Videos

In [7]:
!hermes https://www.youtube.com/watch?v=PNulbFECY-I -p groq

[youtube] Extracting URL: https://www.youtube.com/watch?v=PNulbFECY-I
[youtube] PNulbFECY-I: Downloading webpage
[youtube] PNulbFECY-I: Downloading ios player API JSON
[youtube] PNulbFECY-I: Downloading web creator player API JSON
[youtube] PNulbFECY-I: Downloading m3u8 information
[info] PNulbFECY-I: Downloading 1 format(s): 251
[download] Destination: PNulbFECY-I.webm
[K[download] 100% of    2.13MiB in [1;37m00:00:00[0m at [0;32m6.55MiB/s[0m
[ExtractAudio] Destination: PNulbFECY-I.mp3
Deleting original file PNulbFECY-I.webm (pass -k to keep)
 Much is said about the virtues and pleasures of individuality, of being someone who stands out from the crowd and delights in their own particularity. But let's also admit to how frankly lonely and frightening it can be to find ourselves yet again in a peculiar minority where the differences between us and others strike us as bewildering rather than emboldening. When, for example, everyone seems to want to gossip, but we prefer generosity a

### Different Models

In [8]:
!hermes input.mp4 -p groq -m whisper-large-v3

 Hello, you beautiful people. This is Uncle Code. And today I'm going to review quickly the Q1.2 function calling ability. This model is really, to me, is very interesting. First of all, it came up with a really good stuff with 72 billion models. And you can take a look, check their blog and especially their instruct model. And also it supports the majority of the language in the world, the regions, the major regions, and that is great because this is a large language model. sounds like all of us. And it's cool, and go and play around with it. So what I'm trying to do is trying to challenge a little bit the function calling. Like similar to the thing that I did for other models like the Mistral. So first, they have this nice library, the Cohen agents, that let you to create agent-next softwares and applications and also speed up the work with the large language model. What you can just install it quickly. And then there are different ways that you can work with the model. You can maybe

### JSON Output

In [9]:
!hermes input.mp4 -p groq --response_format json

{"text":" Hello, your beautiful people. This is Uncle Code and today I'm going to review quickly the Q1-2 function calling ability. This model is really, to me, is very interesting. First of all, it came up with the really good stuff with 72 billion models and you can take a look, check their blog and especially their instruct model. And also it supports the majority of the language in the world, the regions, the major regions. And that is great because this is a large language model sounds like all of us and it's cool and go and play around with it so what I'm what I'm trying to do is trying to challenge a little bit the function calling like similar the thing that I did for other models like the mistral so first they have this nice library the coin agents that led you to create agentic software and applications and also speed up the work with the large language model what you can just install it quickly and then there are different ways that you can work with the model you can maybe 

### LLM Processing

In [10]:
!hermes input.mp4 -p groq --llm_prompt "Summarize this transcription in 3 bullet points"

 Hello, your beautiful people. This is Uncle Code and today I'm going to review quickly the Q1-2 function calling ability. This model is really, to me, is very interesting. First of all, it came up with the really good stuff with 72 billion models and you can take a look, check their blog and especially their instruct model. And also it supports the majority of the language in the world, the regions, the major regions. And that is great because this is a large language model sounds like all of us and it's cool and go and play around with it so what I'm what I'm trying to do is trying to challenge a little bit the function calling like similar the thing that I did for other models like the mistral so first they have this nice library the coin agents that led you to create agentic software and applications and also speed up the work with the large language model what you can just install it quickly and then there are different ways that you can work with the model you can maybe use Olama

## Conclusion

That's it! You've learned the basics of using Hermes for lightning-fast video transcription. Explore the different providers, models, and response formats to find what works best for your needs. Happy transcribing!

**Extra Comments:**

- Remember to replace the example video file paths and YouTube URLs with your actual content.
- Hermes has excellent performance, especially with Groq's `distil-whisper` model.
- Check out the `examples` folder in the GitHub repository for more advanced usage.
- Feel free to contribute to the project and report any issues you encounter.
- Don't forget to star the repo and follow [@unclecode](https://twitter.com/unclecode) on X!