<a href="https://colab.research.google.com/github/yoko-murasame/Cesium-Learning/blob/master/autotranslate.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Videos Transcription and Translation with Faster Whisper and ChatGPT**


[![notebook shield](https://img.shields.io/static/v1?label=&message=Notebook&color=blue&style=for-the-badge&logo=googlecolab&link=https://colab.research.google.com/github/lewangdev/autotranslate/blob/main/autotranslate.ipynb)](https://colab.research.google.com/github/lewangdev/autotranslate/blob/main/autotranslate.ipynb)
[![repository shield](https://img.shields.io/static/v1?label=&message=Repository&color=blue&style=for-the-badge&logo=github&link=https://github.com/lewangdev/autotranslate)](https://github.com/lewangdev/autotranslate)

This Notebook will guide you through the transcription and translation of video using [Faster Whisper](https://github.com/guillaumekln/faster-whisper) and ChatGPT. You'll be able to explore most inference parameters or use the Notebook as-is to store the transcript, translation and video audio in your Google Drive.

In [1]:
#@markdown # **Check GPU type** 🕵️

#@markdown The type of GPU you get assigned in your Colab session defined the speed at which the video will be transcribed.
#@markdown The higher the number of floating point operations per second (FLOPS), the faster the transcription.
#@markdown But even the least powerful GPU available in Colab is able to run any Whisper model.
#@markdown Make sure you've selected `GPU` as hardware accelerator for the Notebook (Runtime &rarr; Change runtime type &rarr; Hardware accelerator).

#@markdown |  GPU   |  GPU RAM   | FP32 teraFLOPS |     Availability   |
#@markdown |:------:|:----------:|:--------------:|:------------------:|
#@markdown |  T4    |    16 GB   |       8.1      |         Free       |
#@markdown | P100   |    16 GB   |      10.6      |      Colab Pro     |
#@markdown | V100   |    16 GB   |      15.7      |  Colab Pro (Rare)  |

#@markdown ---
#@markdown **Factory reset your Notebook's runtime if you want to get assigned a new GPU.**

!nvidia-smi -L

!nvidia-smi

GPU 0: Tesla T4 (UUID: GPU-b3f38314-d6a0-6e28-24fd-5af8c5906530)
Sun Feb 23 09:16:29 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15              Driver Version: 550.54.15      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|   0  Tesla T4                       Off |   00000000:00:04.0 Off |                    0 |
| N/A   37C    P8              9W /   70W |       0MiB /  15360MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+-------

In [2]:
#@markdown # **Install libraries** 🏗️
#@markdown This cell will take a little while to download several libraries.

#@markdown ---
! pip install faster-whisper==0.10.0
! pip install yt-dlp==2023.11.16
! pip install openai==0.28.1

# Windows Libs：https://github.com/Purfview/whisper-standalone-win/releases/download/libs/cuBLAS.and.cuDNN_CUDA11_win_v2.7z
! apt install -y p7zip-full p7zip-rar
! wget https://github.com/Purfview/whisper-standalone-win/releases/download/libs/cuBLAS.and.cuDNN_CUDA11_linux_v2.7z
! 7z x cuBLAS.and.cuDNN_CUDA11_linux_v2.7z -o/usr/lib



Collecting faster-whisper==0.10.0
  Downloading faster_whisper-0.10.0-py3-none-any.whl.metadata (14 kB)
Collecting av==11.* (from faster-whisper==0.10.0)
  Downloading av-11.0.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (4.5 kB)
Collecting ctranslate2<5,>=4.0 (from faster-whisper==0.10.0)
  Downloading ctranslate2-4.5.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (10 kB)
Collecting tokenizers<0.16,>=0.13 (from faster-whisper==0.10.0)
  Downloading tokenizers-0.15.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.7 kB)
Collecting onnxruntime<2,>=1.14 (from faster-whisper==0.10.0)
  Downloading onnxruntime-1.20.1-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.metadata (4.5 kB)
Collecting coloredlogs (from onnxruntime<2,>=1.14->faster-whisper==0.10.0)
  Downloading coloredlogs-15.0.1-py2.py3-none-any.whl.metadata (12 kB)
Collecting humanfriendly>=9.1 (from coloredlogs->onnxruntime<2,>=1.14->faster-whispe

In [3]:
#@markdown # **Import libraries for Python** 🐍

#@markdown This cell will import all libraries for python code.
import sys
import warnings
from faster_whisper import WhisperModel
from pathlib import Path
import yt_dlp
import subprocess
import torch
import shutil
import numpy as np
from IPython.display import display, Markdown, YouTubeVideo

device = torch.device('cuda:0')
print('Using device:', device, file=sys.stderr)

Using device: cuda:0


In [4]:
#@markdown # **Optional:** Save data in Google Drive 💾
#@markdown Enter a Google Drive path and run this cell if you want to store the results inside Google Drive.

# Uncomment to copy generated images to drive, faster than downloading directly from colab in my experience.
from google.colab import drive
drive_mount_path = Path("/") / "content" / "drive"
drive.mount(str(drive_mount_path))
drive_mount_path /= "My Drive"
#@markdown ---
drive_path = "Colab Notebooks/Videos Transcription and Translation" #@param {type:"string"}
#@markdown ---
#@markdown **Run this cell again if you change your Google Drive path.**

drive_whisper_path = drive_mount_path / Path(drive_path.lstrip("/"))
drive_whisper_path.mkdir(parents=True, exist_ok=True)

Mounted at /content/drive


In [6]:
#@markdown # **Model selection** 🧠

#@markdown As of the first public release, there are 4 pre-trained options to play with:

#@markdown |  Size  | Parameters | English-only model | Multilingual model | Required VRAM | Relative speed |
#@markdown |:------:|:----------:|:------------------:|:------------------:|:-------------:|:--------------:|
#@markdown |  tiny  |    39 M    |     `tiny.en`      |       `tiny`       |     ~0.8 GB     |      ~32x      |
#@markdown |  base  |    74 M    |     `base.en`      |       `base`       |     ~1.0 GB     |      ~16x      |
#@markdown | small  |   244 M    |     `small.en`     |      `small`       |     ~1.4 GB     |      ~6x       |
#@markdown | medium |   769 M    |    `medium.en`     |      `medium`      |     ~2.7 GB     |      ~2x       |
#@markdown | large-v1  |   1550 M   |        N/A         |      `large-v1`       |    ~4.3 GB     |       1x       |
#@markdown | large-v2  |   1550 M   |        N/A         |      `large-v2`       |    ~4.3 GB     |       1x       |
#@markdown | large-v3  |   1550 M   |        N/A         |      `large-v2`       |    ~3.6 GB     |       1x       |

#@markdown ---
model_size = 'large-v3' #@param ['tiny', 'tiny.en', 'base', 'base.en', 'small', 'small.en', 'medium', 'medium.en', 'large-v1', 'large-v2', 'large-v3']
device_type = "cuda" #@param {type:"string"} ['cuda', 'cpu']
compute_type = "float16" #@param {type:"string"} ['float16', 'int8_float16', 'int8']
#@markdown ---
#@markdown **Run this cell again if you change the model.**

model = WhisperModel(model_size, device=device_type, compute_type=compute_type)


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


model.bin:   0%|          | 0.00/3.09G [00:00<?, ?B/s]

vocabulary.json:   0%|          | 0.00/1.07M [00:00<?, ?B/s]

preprocessor_config.json:   0%|          | 0.00/340 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/2.39k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/2.48M [00:00<?, ?B/s]

In [8]:
#@markdown # **Video selection** 📺

#@markdown Enter the URL of the video you want to transcribe, wether you want to save the audio file in your Google Drive, and run the cell.

Type = "Google Drive" #@param ['Video or playlist URL', 'Google Drive']
#@markdown ---
#@markdown #### **Video or playlist URL**
URL = "" #@param {type:"string"}
# store_audio = True #@param {type:"boolean"}
#@markdown ---
#@markdown #### **Google Drive video, audio (mp4, wav), or folder containing video and/or audio files**
video_path = "Colab Notebooks/Videos Transcription and Translation/1.mp4" #@param {type:"string"}
#@markdown ---
#@markdown **Run this cell again if you change the video.**

video_path_local_list = []

if Type == "Video or playlist URL":

    ydl_opts = {
        'format': 'm4a/bestaudio/best',
        'outtmpl': '%(id)s.%(ext)s',
        # ℹ️ See help(yt_dlp.postprocessor) for a list of available Postprocessors and their arguments
        'postprocessors': [{  # Extract audio using ffmpeg
            'key': 'FFmpegExtractAudio',
            'preferredcodec': 'wav',
        }]
    }

    with yt_dlp.YoutubeDL(ydl_opts) as ydl:
        error_code = ydl.download([URL])
        list_video_info = [ydl.extract_info(URL, download=False)]

    for video_info in list_video_info:
        video_path_local_list.append(Path(f"{video_info['id']}.wav"))

elif Type == "Google Drive":
    # video_path_drive = drive_mount_path / Path(video_path.lstrip("/"))
    video_path = drive_mount_path / Path(video_path.lstrip("/"))
    if video_path.is_dir():
        for video_path_drive in video_path.glob("**/*"):
            if video_path_drive.is_file():
                display(Markdown(f"**{str(video_path_drive)} selected for transcription.**"))
            elif video_path_drive.is_dir():
                display(Markdown(f"**Subfolders not supported.**"))
            else:
                display(Markdown(f"**{str(video_path_drive)} does not exist, skipping.**"))
            video_path_local = Path(".").resolve() / (video_path_drive.name)
            shutil.copy(video_path_drive, video_path_local)
            video_path_local_list.append(video_path_local)
    elif video_path.is_file():
        video_path_local = Path(".").resolve() / (video_path.name)
        shutil.copy(video_path, video_path_local)
        video_path_local_list.append(video_path_local)
        display(Markdown(f"**{str(video_path)} selected for transcription.**"))
    else:
        display(Markdown(f"**{str(video_path)} does not exist.**"))

else:
    raise(TypeError("Please select supported input type."))

for video_path_local in video_path_local_list:
    if video_path_local.suffix == ".mp4":
        video_path_local = video_path_local.with_suffix(".wav")
        result  = subprocess.run(["ffmpeg", "-i", str(video_path_local.with_suffix(".mp4")), "-vn", "-acodec", "pcm_s16le", "-ar", "16000", "-ac", "1", str(video_path_local)])


**/content/drive/My Drive/Colab Notebooks/Videos Transcription and Translation/1.mp4 selected for transcription.**

In [9]:
#@markdown # **Run the model** 🚀

#@markdown Run this cell to execute the transcription of the video. This can take a while and very based on the length of the video and the number of parameters of the model selected above.
def seconds_to_time_format(s):
    # Convert seconds to hours, minutes, seconds, and milliseconds
    hours = s // 3600
    s %= 3600
    minutes = s // 60
    s %= 60
    seconds = s // 1
    milliseconds = round((s % 1) * 1000)

    # Return the formatted string
    return f"{int(hours):02d}:{int(minutes):02d}:{int(seconds):02d},{int(milliseconds):03d}"


#@markdown ## **Parameters** ⚙️

#@markdown ### **Behavior control**
#@markdown #### Language
language_options = {
    "Auto Detect": "auto",
    "English": "en",
    "中文(Chinese)": "zh",
    "日本語(Japanese)": "ja",
    "Deutsch(German)": "de",
    "Français(French)": "fr"
}

language_option = "Auto Detect" #@param ["Auto Detect", "English", "中文(Chinese)", "日本語(Japanese)", "Deutsch(German)", "Français(French)"] {allow-input: true}
language = language_options.get(language_option, language_option)

#@markdown #### initial prompt
initial_prompt = "Hello, Let's begin to talk." #@param {type:"string"}
#@markdown ---
#@markdown #### Word-level timestamps
word_level_timestamps = True #@param {type:"boolean"}
#@markdown ---
#@markdown #### VAD filter
vad_filter = False #@param {type:"boolean"}
vad_filter_min_silence_duration_ms = 50 #@param {type:"integer"}
#@markdown ---


segments, info = model.transcribe(str(video_path_local), beam_size=5,
                                  language=None if language == "auto" else language,
                                  initial_prompt=initial_prompt,
                                  word_timestamps=word_level_timestamps,
                                  vad_filter=vad_filter,
                                  vad_parameters=dict(min_silence_duration_ms=vad_filter_min_silence_duration_ms))

language_detected = info.language
display(Markdown(f"Detected language '{info.language}' with probability {info.language_probability}"))

fragments = []

for segment in segments:
  print(f"[{seconds_to_time_format(segment.start)} --> {seconds_to_time_format(segment.end)}] {segment.text}")
  if word_level_timestamps:
    for word in segment.words:
      ts_start = seconds_to_time_format(word.start)
      ts_end = seconds_to_time_format(word.end)
      #print(f"[{ts_start} --> {ts_end}] {word.word}")
      fragments.append(dict(start=word.start,end=word.end,text=word.word))
  else:
    ts_start = seconds_to_time_format(segment.start)
    ts_end = seconds_to_time_format(segment.end)
    #print(f"[{ts_start} --> {ts_end}] {segment.text}")
    fragments.append(dict(start=segment.start,end=segment.end,text=segment.text))


Detected language 'ja' with probability 0.98095703125

[00:00:06,440 --> 00:00:07,240] ご視聴ありがとうございました。
[00:00:30,000 --> 00:00:33,600] 青さ、四月の起草の光の底を、
[00:00:34,300 --> 00:00:38,440] 椿し、萩しり、雪きする。
[00:00:40,100 --> 00:00:42,560] 俺は一人の修羅なのだ。
[00:00:50,700 --> 00:00:53,860] アナー、ただいまー。
[00:00:55,760 --> 00:00:57,160] アナー?
[00:00:57,300 --> 00:00:59,420] お、おかえり、お母さん。
[00:00:59,420 --> 00:01:01,460] すぐおやつの支度するわね。
[00:01:01,580 --> 00:01:03,080] 手を洗ってらっしゃい。
[00:01:03,180 --> 00:01:03,540] うん。
[00:01:12,700 --> 00:01:14,060] 朗読か。
[00:01:36,120 --> 00:01:37,760] またねー。
[00:01:41,660 --> 00:01:42,840] またねー。
[00:01:42,840 --> 00:01:55,360] 私の夢が、
[00:01:55,500 --> 00:01:58,280] 君の声と、
[00:01:58,420 --> 00:01:59,400] 競ってくれる。
[00:01:59,400 --> 00:02:05,260] 取り合って強い力で 私を引っ張り上げる
[00:02:06,300 --> 00:02:11,740] 昨日とまるで違う 私の世界
[00:02:12,680 --> 00:02:14,660] 色づけてくれたのは
[00:02:16,260 --> 00:02:24,320] 見たことない自分に 会いに行こう
[00:02:25,660 --> 00:02:32,160] 君が手を取ってくれたから 今始まるんだ
[00:02:32,160 --> 00:02:37,120] 君も聞いていてよ 私の声を
[00:02:37,120 --> 00:02:40,100] ほらね 私じゃないみたい
[00:

In [10]:
#@title Merge words/segments to sentences

#@markdown Run this cell to merge words/segments to sentences.
#@markdown ## **Parameters** ⚙️

#@markdown ### **Behavior control**
#@markdown #### Milliseconds gap between_two sentences
max_gap_ms_between_two_sentence = 200 #@param {type:"integer"}

import json

# Merge words/segments to sentences
def merge_fragments(fragments, gap_ms):
  new_fragments = []
  new_fragment = {}
  length = len(fragments)
  for i, fragment in enumerate(fragments):
    start = fragment['start']
    end = fragment['end']
    text = fragment['text']

    if new_fragment.get('start', None) is None:
      new_fragment['start'] = start
    if new_fragment.get('end', None) is None:
      new_fragment['end'] = end
    if new_fragment.get('text', None) is None:
      new_fragment['text'] = ""

    if start - new_fragment['end'] > gap_ms:
      new_fragments.append(new_fragment)
      new_fragment = dict(start=start, end=end, text=text)
      continue

    new_fragment['end'] = end

    #delimiter = '' if text.startswith('-') else ' '
    delimiter = ' ' if language_detected in ['en', 'de', 'fr'] else ''
    new_fragment['text'] = f"{new_fragment['text']}{delimiter}{text.lstrip()}"

    # End of a sentence when symbols found: [.?]
    if (len(text) > 0 and text[-1] in ['.', '?', '。', '？', '!', '！']) or i == length-1:
      new_fragments.append(new_fragment)
      new_fragment = {}
  return new_fragments


new_fragments = merge_fragments(fragments, max_gap_ms_between_two_sentence/1000.0)

# Save as json file
json_ext_name = ".json"
json_transcript_file_name = video_path_local.stem + json_ext_name
with open(json_transcript_file_name, 'w') as f:
  f.write(json.dumps(new_fragments))
display(Markdown(f"**Transcript SRT file created: {video_path_local.parent / json_transcript_file_name}**"))

# Save as srt
srt_ext_name = ".srt"
srt_transcript_file_name = video_path_local.stem + srt_ext_name
with open(srt_transcript_file_name, 'w') as f:
  for sentence_idx, fragment in enumerate(new_fragments):
    ts_start = seconds_to_time_format(fragment['start'])
    ts_end = seconds_to_time_format(fragment['end'])
    text = fragment['text']
    print(f"[{ts_start} --> {ts_end}] {text}")
    f.write(f"{sentence_idx + 1}\n")
    f.write(f"{ts_start} --> {ts_end}\n")
    f.write(f"{text.strip()}\n\n")

try:
  shutil.copy(video_path_local.parent / srt_transcript_file_name,
            drive_whisper_path / srt_transcript_file_name
  )
  display(Markdown(f"**Transcript SRT file created: {drive_whisper_path / srt_transcript_file_name}**"))
except:
  display(Markdown(f"**Transcript SRT file created: {video_path_local.parent / srt_transcript_file_name}**"))


**Transcript SRT file created: /content/1.json**

[00:00:06,440 --> 00:00:07,240] ご視聴ありがとうございました。
[00:00:30,000 --> 00:00:33,600] 青さ、四月の起草の光の底を、
[00:00:34,300 --> 00:00:38,440] 椿し、萩しり、雪きする。
[00:00:40,100 --> 00:00:42,560] 俺は一人の修羅なのだ。
[00:00:50,700 --> 00:00:53,860] アナー、ただいまー。
[00:00:55,760 --> 00:00:57,160] アナー?
[00:00:57,300 --> 00:00:59,420] お、おかえり、お母さん。
[00:00:59,420 --> 00:01:01,460] すぐおやつの支度するわね。
[00:01:01,580 --> 00:01:03,080] 手を洗ってらっしゃい。
[00:01:03,180 --> 00:01:03,540] うん。
[00:01:12,700 --> 00:01:14,060] 朗読か。
[00:01:36,120 --> 00:01:37,760] またねー。
[00:01:41,660 --> 00:01:42,840] またねー。
[00:01:42,840 --> 00:01:59,400] 私の夢が、君の声と、競ってくれる。
[00:01:59,400 --> 00:02:05,260] 取り合って強い力で私を引っ張り上げる
[00:02:05,900 --> 00:02:09,240] 昨日とまるで違う
[00:02:09,920 --> 00:02:11,740]  私の世界
[00:02:12,320 --> 00:02:14,660] 色づけてくれたのは
[00:02:15,900 --> 00:02:24,320] 見たことない自分に会いに行こう
[00:02:25,660 --> 00:02:46,620] 君が手を取ってくれたから今始まるんだ君も聞いていてよ私の声をほらね私じゃないみたい明日はどんな声でどんな物語を生きようか
[00:03:01,960 --> 00:03:09,740] ある日カモメくんがいつものように空を飛んでいると海の底にキラリと光るものを見つけました
[00:03:10,340

**Transcript SRT file created: /content/drive/My Drive/Colab Notebooks/Videos Transcription and Translation/1.srt**

In [None]:
#@markdown # **Translate**
#@markdown Run this cell to translate subtitles to the language you want.
#@markdown ## **Parameters** ⚙️

#@markdown ### **Behavior control**

#@markdown #### API Type
api_type = "openai" #@param ["azure", "openai"]

#@markdown #### Azure API Config（If you are using `openai`, please leave these fields blank.）
api_base = "https://api.x.ai/v1/chat/completions" #@param {type:"string"}
api_version = "2023-05-15" #@param {type:"string"}
deployment_id = "gpt3" #@param {type:"string"}

#@markdown #### API Key and Model Config
api_key = "sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx" #@param {type:"string"}
model_name = "gpt-3.5-turbo-1106" #@param ["gpt-3.5-turbo","gpt-3.5-turbo-1106","gpt-4","gpt-4-1106-preview"] {allow-input: true}
temperature = 0 #@param {type:"number"}
#@markdown ---
#@markdown #### Target Language
target_language = "\u7B80\u4F53\u4E2D\u6587" # @param ["\u7B80\u4F53\u4E2D\u6587", "\u7E41\u9AD4\u4E2D\u6587", "\u65E5\u672C\u8A9E", "English", "German", "French"] {allow-input: true}
#@markdown ---
#@markdown #### Retry and Token Chunks
translate_max_retry_times = 10 #@param {type:"integer"}
count_of_sentence_send_once_limit = 5 #@param {type:"integer"}

# This prompt is from https://twitter.com/dotey/status/1665476562219573249
system_prompt = f"""You are a program responsible for translating subtitles. Your task is to translate the subtitles into {target_language}, maintaining a colloquial tone and style, avoiding long sentences, and ignoring verbal tics such as 'so', 'you know', etc.
The input will be a JSON-formatted string array, which should be translated in accordance with the following steps:
Step1: Join the string array to a sentence, then translate it to {target_language};
Step2: Split the translated sentence to a string array, each item of which should correspond to an item in the original input array.
Step3: Verify if the count of items in the output array equals that of the input array and no item is blank. If it doesn't, go back to Step 2 and try again.

Respond with a JSON-formatted string array:
"""
import openai
import json

openai.api_key = api_key

if api_type == "azure":
  openai.api_type = "azure"
  openai.api_base = api_base
  openai.api_version = api_version
else:
  deployment_id = None


def translate_by_chatgpt(sentences, max_retry_times=10, deployment_id=None, model_name="gpt-3.5-turbo", temperature=0.7):
  system_msg = dict(role="system", content=system_prompt)
  user_msg_content = json.dumps(sentences)
  user_msg = dict(role="user", content=user_msg_content)
  current_retry_times = 0
  sentences_translated = []

  while True:
    try:
      chat_completion = openai.ChatCompletion.create(deployment_id=deployment_id,
                                                     model=model_name,
                                                     messages=[system_msg, user_msg],
                                                     temperature=temperature)
      sentences_translated = json.loads(chat_completion.choices[0].message.content)

      if len(sentences_translated) != len(sentences) and current_retry_times < max_retry_times:
        current_retry_times = current_retry_times + 1
        print(f"==Tranlate Retry with {current_retry_times} times, Reason: translated={len(sentences_translated)}, origin={len(sentences)}")
        continue

      break
    except:
      if current_retry_times >= max_retry_times:
        break
      current_retry_times = current_retry_times + 1
      print(f"==Tranlate Retry with {current_retry_times} times")
      continue
  return sentences_translated

def translate_fragments(fragments, sentence_send_limit=5):
  system_msg = dict(role="system", content=system_prompt)
  fragments_translated = []

  # Todo: The count of tokens in sentences must be less than Max Tokens API allowed
  length = len(fragments)
  for n in range(0, length, sentence_send_limit):
    fragments_will_be_translated = fragments[n:n+sentence_send_limit]
    sentences_translated = translate_by_chatgpt(list(map(lambda x: x['text'], fragments_will_be_translated)),
                                                translate_max_retry_times,
                                                deployment_id,
                                                model_name)

    for i, sentence_translated in enumerate(sentences_translated):
      print(f"{seconds_to_time_format(fragments_will_be_translated[i]['start'])} --> {seconds_to_time_format(fragments_will_be_translated[i]['end'])}")
      print("Original  : " + fragments_will_be_translated[i]['text'].lstrip())
      print("Translated: " + sentence_translated)
      print('\n')
      fragments_will_be_translated[i]['text_translated'] = sentence_translated

    fragments_translated.extend(fragments_will_be_translated)

  return fragments_translated

fragments_translated = translate_fragments(new_fragments, count_of_sentence_send_once_limit)

# Save translation as json file
json_translated_file_name = f"{video_path_local.stem}-translated.json"
with open(json_translated_file_name, 'w') as f:
  f.write(json.dumps(new_fragments))
display(Markdown(f"**Translation JSON file created: {video_path_local.parent / json_translated_file_name}**"))

# Save translation as srt file
srt_translated_file_name = f"{video_path_local.stem}-translated.srt"
with open(srt_translated_file_name, 'w') as f:
  for sentence_idx, fragment in enumerate(fragments_translated):
    ts_start = seconds_to_time_format(fragment['start'])
    ts_end = seconds_to_time_format(fragment['end'])
    text = fragment.get('text', '')
    text_translated = fragment.get('text_translated', '')
    f.write(f"{sentence_idx + 1}\n")
    f.write(f"{ts_start} --> {ts_end}\n")
    f.write(f"{text_translated.strip()}\n")
    f.write(f"{text.strip()}\n\n")

try:
  shutil.copy(video_path_local.parent / srt_translated_file_name,
            drive_whisper_path / srt_translated_file_name
  )
  display(Markdown(f"**Translated SRT file created: {drive_whisper_path / srt_translated_file_name}**"))
except:
  display(Markdown(f"**Translated SRT file created: {video_path_local.parent / srt_translated_file_name}**"))

