### 1. Settings
Write in the path of your audio files zip, choose a Whisper model and how you want to name the character in the JSON file.
The output JSON will have a format similar to "hanako: {quote: {text: 'おはよう', lang: 'ja'} }"

In [2]:
AUDIO_FILES = "/content/drive/MyDrive/modding/ba-transcriptions/voices/JP_Hanako.zip" #@param {type:"string"}
OUTPUT_FOLER = "/content/drive/MyDrive/modding/ba-transcriptions/transcriptions" #@param {type:"string"}
CHAR_NAME = "hanako" #@param {type:"string"}
SHOW_PROGRESS = True #@param {type:"boolean"}
WHISPER_MODEL = "large-v2" #@param ["large-v2", "large", "medium", "small", "base", "tiny"] {type:"string"}

In [None]:
from google.colab import drive
drive.mount('/content/drive')

### 2. Install whisper requirements

In [None]:
!git clone https://huggingface.co/spaces/openai/whisper
%cd whisper
!pip install -r requirements.txt
!pip install gradio

Cloning into 'whisper'...
remote: Enumerating objects: 86, done.[K
remote: Counting objects: 100% (86/86), done.[K
remote: Compressing objects: 100% (84/84), done.[K
remote: Total 86 (delta 49), reused 0 (delta 0), pack-reused 0[K
Unpacking objects: 100% (86/86), 15.41 KiB | 463.00 KiB/s, done.
/content/whisper
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting transformers
  Downloading transformers-4.26.1-py3-none-any.whl (6.3 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m6.3/6.3 MB[0m [31m29.6 MB/s[0m eta [36m0:00:00[0m
Collecting huggingface-hub<1.0,>=0.11.0
  Downloading huggingface_hub-0.12.1-py3-none-any.whl (190 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m190.3/190.3 KB[0m [31m21.5 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting tokenizers!=0.11.3,<0.14,>=0.11.1
  Downloading tokenizers-0.13.2-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.6 MB)
[2K    

### 3. Install whisper

In [None]:
! pip install git+https://github.com/openai/whisper.git -q

  Preparing metadata (setup.py) ... [?25l[?25hdone
  Building wheel for openai-whisper (setup.py) ... [?25l[?25hdone


### 4. Unzip audio files

In [None]:
import zipfile
import os

with zipfile.ZipFile(AUDIO_FILES, 'r') as zip_ref:
    zip_ref.extractall("/content")

# get the path of the folder where contents were extracted
file_name = os.path.basename(AUDIO_FILES)
file_name = os.path.splitext(file_name)[0]
EXTRACTED_FILES_PATH = os.path.join("/content",file_name)

### 5. Transcribe

In [None]:
import whisper
model = whisper.load_model(WHISPER_MODEL)

In [None]:
def transcribe(path):
  # load audio and pad/trim it to fit 30 seconds
  audio = whisper.load_audio(path)
  audio = whisper.pad_or_trim(audio)

  # make log-Mel spectrogram and move to the same device as the model
  mel = whisper.log_mel_spectrogram(audio).to(model.device)

  # detect the spoken language
  _, probs = model.detect_language(mel)
  #print(f"Detected language: {max(probs, key=probs.get)}")

  # decode the audio
  options = whisper.DecodingOptions()
  result = whisper.decode(model, mel, options)

  return result.text, max(probs, key=probs.get)

# transcribes all audio files in a folder
# returns them as a dictionary, with file names as the keys
def transcribe_folder(path):
  entries = {}
  for file in os.listdir(path):
    fullpath = os.path.join(path, file)
    if os.path.isfile(fullpath):
      try:
        result, lang = transcribe(fullpath)
      except:
        if SHOW_PROGRESS:
          print("Failed to transcribe " + file)
        continue

      entry = {
          "quote": {
              lang: result
              }
            }
      entries[file] = entry
      if SHOW_PROGRESS:
        print(file + ":")
        print(f"{lang}: {result}")

  return entries

In [None]:
import sys
import json

# transcribe all audio files using the selected model
print(f"Transcribing with whisper {WHISPER_MODEL}...")
transcriptions = transcribe_folder(EXTRACTED_FILES_PATH)

# save file
with open(f"{OUTPUT_FOLER}/{CHAR_NAME}_transcriptions.json", "w", encoding="utf-8") as f:
  f.write(json.dumps(transcriptions, indent=4, ensure_ascii=False))
print("Finished transcribing.")

Transcribing with whisper...
Hanako_Cafe_monolog_2.ogg:
ja: この内装は先生が?素敵ですね
Hanako_Season_Birthday.ogg:
ja: 誕生日プレゼントですか?あ、先生がプレゼントなんて素敵かもしれませんね包みは私が解いてもいいですか?
Hanako_Formation_In_2.ogg:
ja: ふふふ、私のことを求めてくれるなんて。
Hanako_LogIn_1.ogg:
ja: お帰りなさい先生 今日もよろしくお願い致します
Hanako_ExSkill_3.ogg:
ja: では、あ゛っ……!
Hanako_Battle_Move_2.ogg:
ja: まだまだいけますよー
Hanako_Gachaget.ogg:
ja: 裏は花子です これから色々とよろしくお願いしますねはい それはもう色々と
Hanako_Battle_Damage_2.ogg:
ja: いったぁ
Hanako_MemorialLobby_4_2.ogg:
ja: いけないことをしてる?そんな感じがして。
Hanako_MemorialLobby_2_1.ogg:
ja: こうしていると涼しくて とっても気持ちいいんですよ
Hanako_MemorialLobby_3_2.ogg:
ja: ここで一緒に…脱いでしまいません?
Hanako_Battle_Shout_2.ogg:
en: H E Y!
Hanako_Relationship_Up_2.ogg:
ja: まだまだ、いけますよね?先生
Hanako_Lobby_4.ogg:
ja: 先生 何か私にしてほしいことはありますか
Hanako_Tactic_In_1.ogg:
ja: さあ、始めましょうか
Hanako_Tactic_In_2.ogg:
ja: 全部、先生の言う通りにしますようふふ
Hanako_Cafe_monolog_1.ogg:
ja: あら、可愛らしいところですね
Hanako_Battle_Covered_1.ogg:
ja: うふふっ
Hanako_Cafe_Act_2.ogg:
ja: この空間、見ているだけで楽しいですね
Hanako_Tactic_Defeat_2.ogg:
ja: 少し よそ見をしすぎました
Hanako

SystemExit: ignored

  warn("To exit: use 'exit', 'quit', or Ctrl-D.", stacklevel=1)
