You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
First thanks a lot for opensourcing the software 馃憤
I encounter some pbs with whisper_online_server.py, it does not seem to work and produce some "random" content. I tested the same file with whisper_online.py and it works perfectly. Any idea what it could be?
Loading Whisper large-v2 model for en... done. It took 3.65 seconds.
Whisper is not warmed up
whisper-server-INFO: INFO: Listening on('localhost', 43007)
whisper-server-INFO: INFO: Connected to client on ('127.0.0.1', 56506)
...
INCOMPLETE: (4.04, 10.0, ' Well, that was nice of you. Okay, uh, maybe next time. Take care.')
len of buffer now: 10.37
(None, None, '')
b'C\xfeG\xfeK\xfeP\xfeT\xfe'
65536
PROMPT:
CONTEXT: Thank you very much.
transcribing 12.42 seconds from 0.00
whisper-server-INFO: Processing audio with duration 00:12.416
>>>>COMPLETE NOW: (None, None, '')
INCOMPLETE: (4.0600000000000005, 12.38, " Well that is it for this episode of Gamer Gear. I hope you enjoyed it. Thank you all for watching. I'll see you all next time.")
len of buffer now: 12.42
(None, None, '')
b'\xdd\xff\xea\xff\x15\x00\xff\xff\xd6\xff'
65536
PROMPT:
CONTEXT: Thank you very much.
transcribing 14.46 seconds from 0.00
whisper-server-INFO: Processing audio with duration 00:14.464
>>>>COMPLETE NOW: (None, None, '')
INCOMPLETE: (4.36, 13.26, " Well, that was not too hard, was it? No, it was a little bit difficult. It was a long time ago. It's hard to say.")
len of buffer now: 14.46
(None, None, '')
b'<\x07[\x07\x82\x08)\nj\x0c'
65536
PROMPT:
CONTEXT: Thank you very much.
transcribing 16.51 seconds from 0.00
whisper-server-INFO: Processing audio with duration 00:16.512
>>>>COMPLETE NOW: (None, None, '')
INCOMPLETE: (4.079999999999999, 16.48, ' dramatic presentation of a video that we all enjoy watching. If you enjoy it, please subscribe.')
len of buffer now: 16.51
(None, None, '')
b'\xd3\xfd\xda\xfd\xe2\xfd\xef\xfd\x00\xfe'
65536
...
The text was updated successfully, but these errors were encountered:
Hi,
are you sure that the ffmpeg options are correct? For file format conversion, I'm using ffmpeg -i in.mp3 -acodec pcm_s16le -ac 1 -ar 16000 out.wav, in.mp3 is input and out.wav is output.
Furthermore, I think that this way of using ffmpeg flushes all the bytes of the recording at once. If you want to simulate real-time mode, you should make sure it's being sent real-time. I convert file to wav and then I'm using a Python script that outputs X second of audio every X seconds.
Hey Dominik,
Thanks for the reply and sorry for not getting back to you earlier. I'll try different way to stream it and let you know what worked at the end :)
Hello,
First thanks a lot for opensourcing the software 馃憤
I encounter some pbs with
whisper_online_server.py
, it does not seem to work and produce some "random" content. I tested the same file withwhisper_online.py
and it works perfectly. Any idea what it could be?Client side
Command
ffmpeg -i GMT20231102-120819_Recording.m4a -f s16le -acodec pcm_s16le - | nc localhost 43007
Log
Server side
Command
python whisper_online_server.py --min-chunk-size 1
Log
The text was updated successfully, but these errors were encountered: