-
Notifications
You must be signed in to change notification settings - Fork 68
Open
Description
Hi, I use this software for extended real-time transcription sessions (2-3 hours), but occasionally encounter the error attached below, causing transcription to halt.
Regarding when the error occurs, it seems to happen when � remains at the End of decoding loop, though I haven't verified if this is a necessary and sufficient condition.
Regarding the error, Faster-Whisper has implemented a fix via the following pull request, but it has not been fixed in the original Whisper.
SYSTRAN/faster-whisper#111
DEBUG <|startoftranscript|><|ja|><|transcribe|><|notimestamps|>もう渡込みちゃうやばいんだよね。もう一回行く。もう一回行く。やったー。おー!われらにそうやった。いつからそうだった?いや、明日ですよ。え、そうやめっちゃ嬉
DEBUG [998] most att frames
DEBUG current tokenstorch.Size([1, 65])
DEBUG attention reaches the end: 998/1020
INFO End of decoding loop
DEBUG new_hypothesis: [1543, 6474, 1231, 9955, 7355, 11429, 41380, 161, 105]
INFO Output: 。え、そうやめっちゃ�
Traceback (most recent call last):
File "/mnt/c/Users/usr/sample/SimulStreaming/simulstreaming_whisper_server.py", line 6, in <module>
main_server(simul_asr_factory, add_args=simulwhisper_args)
File "/mnt/c/Users/usr/sample/SimulStreaming/whisper_streaming/whisper_server.py", line 174, in main_server
proc.process()
File "/mnt/c/Users/usr/sample/SimulStreaming/whisper_streaming/whisper_server.py", line 105, in process
o = self.online_asr_proc.process_iter()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/mnt/c/Users/usr/sample/SimulStreaming/whisper_streaming/vac_online_processor.py", line 101, in process_iter
ret = self.online.process_iter()
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/mnt/c/Users/usr/sample/SimulStreaming/simulstreaming_whisper.py", line 220, in process_iter
tokens = self.hide_incomplete_unicode(tokens)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/mnt/c/Users/usr/sample/SimulStreaming/simulstreaming_whisper.py", line 200, in hide_incomplete_unicode
chars, _ = self.model.tokenizer.split_tokens_on_unicode(tokens)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/mnt/c/Users/usr/sample/SimulStreaming/simul_whisper/whisper/tokenizer.py", line 301, in split_tokens_on_unicode
or decoded_full[unicode_offset + decoded.index(replacement_char)]
~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
IndexError: string index out of range
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels