# N46Whisper

N46Whisper is a Google Colab notebook application for streamlined video subtitle file generation.The original purpose of the project was to improve the productivity of Nogizaka46 (and Sakamichi groups) subbers. However, it can also be used to create subtitles in general.The application could significantly reduce the labour and time costs of sub-groups or individual subbers. However, despite its impressive performance, the Whisper model, AI translation and the application itself are not without limitations.


N46Whisper 是基于 Google Colab 的应用。开发初衷旨在提高乃木坂46（以及坂道系）字幕组日语视频的制作效率,但亦适于所有外语视频的字幕制作。本应用的目标并非生产完美的字幕文件， 而旨在于搭建并提供一个简单且自动化的使用平台以节省生产成品字幕的时间和精力。Whisper模型有其本身的应用场景限制，AI 翻译的质量亦还不能尽如人意。

<font size='4'>**对于中文用户，推荐在使用前阅读[常见问题说明](https://github.com/Ayanaminn/N46Whisper/blob/main/FAQ.md)。如果你觉得本应用对你有所帮助，欢迎帮助扩散给更多的人。**


<font size='4'>**联系作者/Contact me：[E-mail](admin@ikedateresa.cc)**


## 更新/What's Latest：
历史更新日志

<font size = '3'>**本项目将不再进行维护和更新，感谢大家的帮助与支持。**
</br></br>

2024.4.17:
* 添加使用Google Gemini API翻译的选项。

2024.1.31:
* 鉴于集成的参数选项（还会）越来越多有使流程变得繁琐的趋势，这有违开发初衷。因此测试分离了一个[轻量版](https://colab.research.google.com/github/Ayanaminn/N46Whisper/blob/dev/N46WhisperLite.ipynb)，只保留最少的必要操作。

2023.12.4:
* 支持基于faster-whisper的WhisperV3模型/Support faster-whisper based WhisperV3 model

2023.11.7:
* 现在可以加载最新的WhisperV3模型/Enable users to load lastest Whisper V3 model.
* 允许用户自行设置beam size/ Enable customerize beam size parameter.

2023.4.30:
* 优化提示词/Refine the translation prompt.
* 允许用户使用个人提示词并调节Temperature参数/Allow user to custom prompt and temperature for translation.
* 显示翻译任务消费统计/Display the token used and total cost for the translation task.

2023.4.15:
* 使用faster-whisper模型重新部署以提高效率，节省资源。Reimplement Whsiper based on faster-whsiper to improve efficiency.
* 提供faster-whisper集成的vad filter选项以提高转录精度。Enable vad filter that integrated in faster-whisper to improve transcribe accuracy

**<font size='5'>以下选择文件方式按需执行其中一种即可，不需要全部运行</font>**

In [1]:
!pip install -U gdown
import gdown

# 用 gdown 下载共享文件
file_id = "1w7ZYbHv2mRNBOH0r0JkGvAZcrKkEZCMF"
gdown.download(f"https://drive.google.com/uc?id={file_id}", output="downloaded_file", quiet=False)



Downloading...
From: https://drive.google.com/uc?id=1w7ZYbHv2mRNBOH0r0JkGvAZcrKkEZCMF
To: /content/downloaded_file
100%|██████████| 50.0M/50.0M [00:00<00:00, 55.9MB/s]


'downloaded_file'

**<font size='5'>以下顺次点击下方每个单元格左侧的“运行”图标，不可跳过步骤</font>**
**</br>【重要】:** 务必在"修改"->"笔记本设置"->"硬件加速器"中选择GPU！否则处理速度会非常慢。
 **</br>【IMPORTANT】:** Make sure you select GPU as hardware accelerator in notebook settings, otherwise the processing speed will be very slow.

In [2]:
# 安装依赖（如果之前已装可注释）
!pip install ffmpeg pysubs2 faster-whisper ctranslate2==4.4.0
!wget https://raw.githubusercontent.com/Ayanaminn/N46Whisper/main/srt2ass.py

# 系统依赖
!apt remove --purge -y libcudnn9 libcudnn9-dev
!apt autoremove -y
!apt update
!apt install -y libcudnn8 libcudnn8-dev

import torch
from faster_whisper import WhisperModel
from IPython.display import clear_output
import os, time, pysubs2, zipfile
from tqdm import tqdm
from google.colab import files
from pathlib import Path
from srt2ass import srt2ass

clear_output()
print('准备识别模型 Ready to run Whisper...')

# 使用指定文件
file_path = "downloaded_file"


# 转写文件名基础
file_basename = Path(file_path).stem
output_dir = Path(file_path).parent.resolve()


# 加载模型
model = WhisperModel('large-v3')
torch.cuda.empty_cache()

print('识别中 Transcribe in progress...')
tic = time.time()

segments, info = model.transcribe(
    audio="downloaded_file",
    language='ja',
    vad_filter=False,
    beam_size=3,
    vad_parameters={"min_silence_duration_ms": 200}
)

# 解析 segments
total_duration = round(info.duration, 2)
results = []
with tqdm(total=total_duration, unit=" seconds") as pbar:
    for s in segments:
        segment_dict = {'start': s.start, 'end': s.end, 'text': s.text}
        results.append(segment_dict)
        pbar.update(s.end - s.start)

# 输出 SRT
srt_filename = 'false-3-200' + ".srt"
subs = pysubs2.load_from_whisper(results)
subs.save(srt_filename)



# 下载文件

files.download(srt_filename)


toc = time.time()
print(f"✅ 字幕生成完毕！耗时 {toc - tic:.2f} 秒")

准备识别模型 Ready to run Whisper...


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


识别中 Transcribe in progress...


 89%|████████▉ | 11152.639999999914/12499.03 [20:38<02:29,  9.00 seconds/s]


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

✅ 字幕生成完毕！耗时 1277.41 秒


In [None]:
#@title **【实验功能】Experimental Features:**

# @markdown **AI文本翻译/AI Translation:**
# @markdown **</br>**<font size="2"> 此功能允许用户使用AI翻译服务对识别的字幕文件做逐行翻译，并以相同的格式生成双语对照字幕。
# @markdown **</br>**阅读项目文档以了解更多。</font>
# @markdown **</br>**<font size="2"> This feature allow users to translate previously transcribed subtitle text line by line using AI translation.
# @markdown **</br>**Then generate bilingual subtitle files in same sub style.Read documentaion to learn more.</font>

# @markdown **</br>**希望在本地使用字幕翻译功能的用户，推荐尝试 [subtitle-translator-electron](https://github.com/gnehs/subtitle-translator-electron)

# @markdown **</br><font size="3">Select subtitle file source</br>
# @markdown <font size="3">选择字幕文件(使用上一步的转录-use_transcribed/新上传-upload_new）</br>**
# @markdown <font size="2">支持SRT与ASS文件
sub_source = "upload_new"  # @param ["use_transcribed","upload_new"]

# @markdown **chatGPT:**
# @markdown **</br>**<font size="2"> 要使用chatGPT翻译，请填入你自己的OpenAI API Key，目标语言，输出类型，然后执行单元格。</font>
# @markdown **</br>**<font size="2"> Please input your own OpenAI API Key, then execute this cell.</font>
# @markdown **</br>**<font size="2">【注意】 免费的API对速度有所限制，需要较长时间，用户可以自行考虑付费方案。</font>
# @markdown **</br>**<font size="2">【Note】There are limitaions on usage for free API, consider paid plan to speed up.</font>
openai_key = '' # @param {type:"string"}
target_language = 'zh-hans'# @param ["zh-hans","english"]
prompt = "You are a language expert.Your task is to translate the input subtitle text, sentence by sentence, into the user specified target language.However, please utilize the context to improve the accuracy and quality of translation.Please be aware that the input text could contain typos and grammar mistakes, utilize the context to correct the translation.Please return only translated content and do not include the origin text.Please do not use any punctuation around the returned text.Please do not translate people's name and leave it as original language.\"" # @param {type:"string"}
temperature = 0.6 #@param {type:"slider", min:0, max:1.0, step:0.1}
# @markdown <font size="4">Default prompt: </br>
# @markdown ```You are a language expert.```</br>
# @markdown ```Your task is to translate the input subtitle text, sentence by sentence, into the user specified target language.```</br>
# @markdown ```Please utilize the context to improve the accuracy and quality of translation.```</br>
# @markdown ```Please be aware that the input text could contain typos and grammar mistakes, utilize the context to correct the translation.```</br>
# @markdown ```Please return only translated content and do not include the origin text.```</br>
# @markdown ```Please do not use any punctuation around the returned text.```</br>
# @markdown ```Please do not translate people's name and leave it as original language.```</br>
output_format = "ass"  # @param ["ass","srt"]

import sys
import os
import re
import time
import codecs
import regex as re
from pathlib import Path
from tqdm import tqdm
from google.colab import files
from IPython.display import clear_output

clear_output()

if sub_source == 'upload_new':
  uploaded = files.upload()
  sub_name = list(uploaded.keys())[0]
  sub_basename = Path(sub_name).stem
elif sub_source == 'use_transcribed':
  sub_name = file_basenames[0] +'.ass'
  sub_basename = file_basenames[0]

!pip install openai
!pip install pysubs2
from openai import OpenAI
import pysubs2

clear_output()

class ChatGPTAPI():
    def __init__(self, key, language, prompt, temperature):
        self.key = key
        # self.keys = itertools.cycle(key.split(","))
        self.language = language
        self.key_len = len(key.split(","))
        self. prompt = prompt
        self.temperature = temperature


    # def rotate_key(self):
    #     openai.api_key = next(self.keys)

    def translate(self, text):
        # print(text)
        # self.rotate_key()
        client = OpenAI(
            api_key=self.key,
            )

        try:
            completion = client.chat.completions.create(
                model="gpt-3.5-turbo",
                messages=[
                    {
                        "role": "system",
                        # english prompt here to save tokens
                        "content": f'{self.prompt}'
                    },
                    {
                        "role":"user",
                        "content": f"Original text:`{text}`. Target language: {self.language}"
                    }
                ],
                temperature=self.temperature
            )
            t_text = (
                completion.choices[0].message.content.encode("utf8").decode()
            )
            total_tokens = completion.usage.total_tokens # include prompt_tokens and completion_tokens
        except Exception as e:
            # TIME LIMIT for open api , pay to reduce the waiting time
            sleep_time = int(60 / self.key_len)
            time.sleep(sleep_time)
            print(e, f"will sleep  {sleep_time} seconds")
            # self.rotate_key()
            client = OpenAI(
            api_key=self.key,
            )
            completion = client.chat.completions.create(
                model="gpt-3.5-turbo",
                messages=[
                    {
                        "role": "system",
                        "content": f'{self.prompt}'
                    },
                    {
                        "role": "user",
                        "content": f"Original text:`{text}`. Target language: {self.language}"
                    }
                ],
                temperature=self.temperature
            )
            t_text = (
                completion.choices[0].message.content.encode("utf8").decode()
            )
        total_tokens = completion.usage.total_tokens
        return t_text, total_tokens


class SubtitleTranslator():
    def __init__(self, sub_src, model, key, language, prompt,temperature):
        self.sub_src = sub_src
        self.translate_model = model(key, language,prompt,temperature)
        self.translations = []
        self.total_tokens = 0

    def calculate_price(self,num_tokens):
        price_per_token = 0.000002 #gpt-3.5-turbo	$0.002 / 1K tokens
        return num_tokens * price_per_token

    def translate_by_line(self):
        sub_trans = pysubs2.load(self.sub_src)
        total_lines = len(sub_trans)
        for line in tqdm(sub_trans,total = total_lines):
            line_trans, tokens_per_task = self.translate_model.translate(line.text)
            line.text += (r'\N'+ line_trans)
            print(line_trans)
            self.translations.append(line_trans)
            self.total_tokens += tokens_per_task

        return sub_trans, self.translations, self.total_tokens


clear_output()

translate_model = ChatGPTAPI

assert translate_model is not None, "unsupported model"
OPENAI_API_KEY = openai_key

if not OPENAI_API_KEY:
    raise Exception(
        "OpenAI API key not provided, please google how to obtain it"
    )
# else:
#     OPENAI_API_KEY = openai_key

t = SubtitleTranslator(
    sub_src=sub_name,
    model= translate_model,
    key = OPENAI_API_KEY,
    language=target_language,
    prompt=prompt,
    temperature=temperature)

translation, _, total_token = t.translate_by_line()
total_price = t.calculate_price(total_token)
#Download ass file

if output_format == 'ass':
  translation.save(sub_basename + '_translation.ass')
  files.download(sub_basename + '_translation.ass')
elif output_format == 'srt':
  translation.save(sub_basename + '_translation.srt')
  files.download(sub_basename + '_translation.srt')



print('双语字幕生成完毕 All done!')
print(f"Total number of tokens used: {total_token}")
print(f"Total price (USD): ${total_price:.4f}")

# @markdown **</br>**<font size='3'>**实验功能的开发亦是为了尝试帮助大家更有效率的制作字幕。但是只有在用户实际使用体验反馈的基础上，此应用才能不断完善，如果您有任何想法，都欢迎以任何方式联系我，提出[issue](https://github.com/Ayanaminn/N46Whisper/issues)或者分享在[讨论区](https://github.com/Ayanaminn/N46Whisper/discussions)。**
# @markdown **</br>**<font size='3'>**The efficacy of this application cannot get improved without the feedbacks from everyday users.Please feel free to share your thoughts with me or post it [here](https://github.com/Ayanaminn/N46Whisper/discussions)**

In [None]:
#@title **【实验功能】Experimental Features:**

# @markdown **Google Gemini AI文本翻译/Google Gemini AI Translation:**
# @markdown **</br>**<font size="2"> 由于谷歌Gemini提供免费的API，想要免费使用AI翻译的用户可以执行该单元格。
# @markdown **</br>**阅读项目文档以了解更多。</font>
# @markdown **</br>**<font size="2"> Since Google Gemini provides a free tier API, the users that want to use free AI translations can execute this block of code.
# @markdown **</br>**Then generate bilingual subtitle files in same sub style.Read documentaion to learn more.</font>

# @markdown **注意：同时执行Whisper翻译和Gemini API翻译有可能遇到runtime问题，建议在disconnect runtime之后重新执行该单元格**

# @markdown **Attention: Executing Whisper and Gemini translation at the same time might run into runtime errors, we recommend that the users disconnect their runtimes before executing this block**
output_format = "ass"  # @param ["ass","srt"]

# @markdown **Google Gemini:**
# @markdown **</br>**<font size="2"> 要使用Gemini翻译，请填入你自己的Gemini API Key，目标语言，输出类型，然后执行单元格。</font>
# @markdown **</br>**<font size="2"> Please input your own Gemini API Key, then execute this cell.</font>

google_api_key = '' # @param {type:"string"}
target_language = 'zh-hans'# @param ["zh-hans","english"]
prompt = "You are a language expert.Your task is to translate the input subtitle text, sentence by sentence, into the user specified target language.However, please utilize the context to improve the accuracy and quality of translation.Please be aware that the input text could contain typos and grammar mistakes, utilize the context to correct the translation.Please return only translated content and do not include the origin text.Please do not use any punctuation around the returned text.Please do not translate people's name and leave it as original language.\"" # @param {type:"string"}
temperature = 0.6 #@param {type:"slider", min:0, max:1.0, step:0.1}
# @markdown <font size="4">Default prompt: </br>
# @markdown ```You are a language expert.```</br>
# @markdown ```Your task is to translate the input subtitle text, sentence by sentence, into the user specified target language.```</br>
# @markdown ```Please utilize the context to improve the accuracy and quality of translation.```</br>
# @markdown ```Please be aware that the input text could contain typos and grammar mistakes, utilize the context to correct the translation.```</br>
# @markdown ```Please return only translated content and do not include the origin text.```</br>
# @markdown ```Please do not use any punctuation around the returned text.```</br>
# @markdown ```Please do not translate people's name and leave it as original language.```</br>

import sys
import os
import re
import time
import codecs
import regex as re
from pathlib import Path
from tqdm import tqdm
from google.colab import files
from IPython.display import clear_output

uploaded = files.upload()
sub_name = list(uploaded.keys())[0]
sub_basename = Path(sub_name).stem

clear_output()

!pip install -q -U google-generativeai
!pip install pysubs2

import google.generativeai as genai
from google.generativeai.types import HarmCategory, HarmBlockThreshold
import pysubs2

genai.configure(api_key=google_api_key)

model = genai.GenerativeModel('gemini-pro')

def translate(prompt, language, text, retry_times=0):
        # print(text)
        # self.rotate_key()
        try:
            completion = model.generate_content(prompt + "target language :" + language + "text: " + text, safety_settings={
                    HarmCategory.HARM_CATEGORY_HATE_SPEECH: HarmBlockThreshold.BLOCK_NONE,
                    HarmCategory.HARM_CATEGORY_HARASSMENT: HarmBlockThreshold.BLOCK_NONE,
                    HarmCategory.HARM_CATEGORY_SEXUALLY_EXPLICIT: HarmBlockThreshold.BLOCK_NONE,
                    HarmCategory.HARM_CATEGORY_DANGEROUS_CONTENT: HarmBlockThreshold.BLOCK_NONE,
            })
            t_text = (
                completion.text
            )
            print(t_text)
            return t_text
        except Exception as e:
            if re.search(r"^429 POST|Remote end closed connection|Connection reset by peer", str(e), flags=re.I) and retry_times < 6:
                print("翻译接口请求过于频繁或被断开，将再次重试，重试次数: %d" % (retry_times + 1))
                print("The translation interface request is too frequent or disconnected. Will try again. Number of retries: %d" % (retry_times + 1))
                time.sleep(min(retry_times + 1, 3))
                return translate(prompt, language, text, retry_times + 1)
            else:
                # Since Google API should not run into runtime error, this would be an unknown error
                print("未知错误，用户可以尝试查看报错信息并在Repository里提交issue")
                print("Unknown Error, please check the error log and open an issue in the repository")
                return ">>>>> UnknownError"
class SubtitleTranslator():
    def __init__(self, sub_src):
        self.sub_src = sub_src
        self.translations = []

    def translate_by_line(self):
        sub_trans = pysubs2.load(self.sub_src)
        total_lines = len(sub_trans)
        for line in tqdm(sub_trans,total = total_lines):
            line_trans = translate(prompt, target_language, line.text)
            line.text += (r'\N'+ line_trans)
            print(line_trans)
            self.translations.append(line_trans)
        return sub_trans, self.translations

clear_output()

if not google_api_key:
    raise Exception(
        "Google Gemini API key not provided, please google how to obtain it"
    )

t = SubtitleTranslator(sub_src=sub_name)

translation, _, = t.translate_by_line()

if output_format == 'ass':
  translation.save(sub_basename + '_translation.ass')
  files.download(sub_basename + '_translation.ass')
elif output_format == 'srt':
  translation.save(sub_basename + '_translation.srt')
  files.download(sub_basename + '_translation.srt')

print('双语字幕生成完毕 All done!')

In [None]:
#@title **【实验功能】Experimental Features:**

# @markdown **Ollama AI文本翻译/Ollama AI Translation:**
# @markdown **</br>**<font size="2"> 使用Ollama来执行本地部署的LLM，想要免费使用AI翻译的用户可以执行以下单元格。
# @markdown **</br>**阅读项目文档以了解更多。</font>
# @markdown **</br>**<font size="2"> Since Ollama provides the ability to deploy LLMs locally, the users that want to use free AI translations can execute the following blocks of code.
# @markdown **</br>**Then generate bilingual subtitle files in same sub style.Read documentaion to learn more.</font>

In [None]:
#@title **下载Ollama Library/Importing Ollama Library**

!sudo apt update

!sudo apt install -y pciutils

!curl https://ollama.ai/install.sh | sh

In [None]:
#@title **初始化Ollama Server/Starting Ollama Server**


import threading
import subprocess
import time

def run_ollama_serve():
  subprocess.Popen(["ollama", "serve"])

thread = threading.Thread(target=run_ollama_serve)
thread.start()
time.sleep(5)

In [None]:
#@title **下载Ollama LLM/Pulling Ollama LLM**

# @markdown **用户可以在这里选择想要部署的LLM**

# @markdown **The user can choose the LLM type for the deployment here**

model_type = "deepseek-llm"  # @param ["llama3.2:3b", "deepseek-r1:7b", "deepseek-llm"]

!ollama pull $model_type

In [None]:
#@title **进行翻译/Translating The Subtitles**

!pip install -U langchain-ollama

from langchain_core.prompts import ChatPromptTemplate
from langchain_ollama.llms import OllamaLLM

import sys
import os
import re
import time
import codecs
import regex as re
from pathlib import Path
from tqdm import tqdm
from google.colab import files
from IPython.display import clear_output

uploaded = files.upload()
sub_name = list(uploaded.keys())[0]
sub_basename = Path(sub_name).stem

clear_output()

!pip install pysubs2

import pysubs2

output_format = "ass"  # @param ["ass","srt"]

target_language = 'zh-hans'# @param ["zh-hans","english"]
template = "You are a language expert.Your task is to translate the input subtitle text, sentence by sentence, into {target_language}.However, please utilize the context to improve the accuracy and quality of translation.Please be aware that the input text could contain typos and grammar mistakes, utilize the context to correct the translation.Please return only translated content and do not include the origin text. Do not return your thinking process and only return the translated text. Please do not use any punctuation around the returned text.Please do not translate people's name and leave it as original language. Here is the text to translate: {text}\"" # @param {type:"string"}

# @markdown <font size="4">Default prompt: </br>
# @markdown ```You are a language expert.```</br>
# @markdown ```Your task is to translate the input subtitle text, sentence by sentence, into {target_language}.```</br>
# @markdown ```Please utilize the context to improve the accuracy and quality of translation.```</br>
# @markdown ```Please be aware that the input text could contain typos and grammar mistakes, utilize the context to correct the translation.```</br>
# @markdown ```Please return only translated content and do not include the origin text.```</br>
# @markdown ```Do not return your thinking process and only return the translated text.```</br>
# @markdown ```Please do not use any punctuation around the returned text.```</br>
# @markdown ```Please do not translate people's name and leave it as original language.```</br>
# @markdown ```Here is the text to translate: {text}```</br>

llm = OllamaLLM(model=model_type)
prompt = ChatPromptTemplate.from_template(template)
chain = prompt | llm

def translate(prompt, language, text):
        # print(text)
        # self.rotate_key()
        t_text = chain.invoke({"target_language": language, "text": text})
        print(t_text)
        return t_text

class SubtitleTranslator():
    def __init__(self, sub_src):
        self.sub_src = sub_src
        self.translations = []

    def translate_by_line(self):
        sub_trans = pysubs2.load(self.sub_src)
        total_lines = len(sub_trans)
        for line in tqdm(sub_trans,total = total_lines):
            line_trans = translate(prompt, target_language, line.text)
            line.text += (r'\N'+ line_trans)
            print(line_trans)
            self.translations.append(line_trans)
        return sub_trans, self.translations

clear_output()

t = SubtitleTranslator(sub_src=sub_name)

translation, _, = t.translate_by_line()

if output_format == 'ass':
  translation.save(sub_basename + '_translation.ass')
  files.download(sub_basename + '_translation.ass')
elif output_format == 'srt':
  translation.save(sub_basename + '_translation.srt')
  files.download(sub_basename + '_translation.srt')

print('双语字幕生成完毕 All done!')