whisper_online_server.py not producing a correct text #28

gu-ma · 2023-11-14T11:09:31Z

Hello,

First thanks a lot for opensourcing the software 👍

I encounter some pbs with whisper_online_server.py, it does not seem to work and produce some "random" content. I tested the same file with whisper_online.py and it works perfectly. Any idea what it could be?

Client side

Command

ffmpeg -i GMT20231102-120819_Recording.m4a -f s16le -acodec pcm_s16le - | nc localhost 43007

Log

ffmpeg -i GMT20231102-120819_Recording.m4a -f s16le -acodec pcm_s16le - | nc localhost 43007
ffmpeg version 4.4.2-0ubuntu0.22.04.1 Copyright (c) 2000-2021 the FFmpeg developers
  built with gcc 11 (Ubuntu 11.2.0-19ubuntu1)
  configuration: --prefix=/usr --extra-version=0ubuntu0.22.04.1 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --arch=amd64 --enable-gpl --disable-stripping --enable-gnutls --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libdav1d --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libjack --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librabbitmq --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libsrt --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzimg --enable-libzmq --enable-libzvbi --enable-lv2 --enable-omx --enable-openal --enable-opencl --enable-opengl --enable-sdl2 --enable-pocketsphinx --enable-librsvg --enable-libmfx --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-chromaprint --enable-frei0r --enable-libx264 --enable-shared
  libavutil      56. 70.100 / 56. 70.100
  libavcodec     58.134.100 / 58.134.100
  libavformat    58. 76.100 / 58. 76.100
  libavdevice    58. 13.100 / 58. 13.100
  libavfilter     7.110.100 /  7.110.100
  libswscale      5.  9.100 /  5.  9.100
  libswresample   3.  9.100 /  3.  9.100
  libpostproc    55.  9.100 / 55.  9.100
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'GMT20231102-120819_Recording.m4a':
  Metadata:
    major_brand     : mp42
    minor_version   : 0
    compatible_brands: isommp42
    creation_time   : 2023-11-02T12:08:19.000000Z
  Duration: 01:22:02.94, start: 0.000000, bitrate: 127 kb/s
  Stream #0:0(und): Audio: aac (LC) (mp4a / 0x6134706D), 32000 Hz, mono, fltp, 126 kb/s (default)
    Metadata:
      creation_time   : 2023-11-02T12:08:19.000000Z
      handler_name    : AAC audio
      vendor_id       : [0][0][0][0]
Stream mapping:
  Stream #0:0 -> #0:0 (aac (native) -> pcm_s16le (native))
Press [q] to stop, [?] for help
Output #0, s16le, to 'pipe:':
  Metadata:
    major_brand     : mp42
    minor_version   : 0
    compatible_brands: isommp42
    encoder         : Lavf58.76.100
  Stream #0:0(und): Audio: pcm_s16le, 32000 Hz, mono, s16, 512 kb/s (default)
    Metadata:
      creation_time   : 2023-11-02T12:08:19.000000Z
      handler_name    : AAC audio
      vendor_id       : [0][0][0][0]
      encoder         : Lavc58.134.100 pcm_s16le
0 1660  Thank you very much.9.37 bitrate= 512.6kbits/s speed=18.1x
4000 8420  dramatic presentation of a video12.1kbits/s speed=4.89x
8420 12420  for sports.
13260 17020  If you're interested in watching.1kbits/s speed=3.99x
38740 39180  please feel free to leave a= 512.1kbits/s speed=1.95x
42540 42560  please feel free to leave a comment or a like, and don't forget to

Server side

Command

python whisper_online_server.py --min-chunk-size 1

Log

Loading Whisper large-v2 model for en... done. It took 3.65 seconds.
Whisper is not warmed up
whisper-server-INFO: INFO: Listening on('localhost', 43007)
whisper-server-INFO: INFO: Connected to client on ('127.0.0.1', 56506)
...
INCOMPLETE: (4.04, 10.0, ' Well, that was nice of you. Okay, uh, maybe next time. Take care.')
len of buffer now: 10.37
(None, None, '')
b'C\xfeG\xfeK\xfeP\xfeT\xfe'
65536
PROMPT:
CONTEXT:  Thank you very much.
transcribing 12.42 seconds from 0.00
whisper-server-INFO: Processing audio with duration 00:12.416
>>>>COMPLETE NOW: (None, None, '')
INCOMPLETE: (4.0600000000000005, 12.38, " Well that is it for this episode of Gamer Gear. I hope you enjoyed it. Thank you all for watching. I'll see you all next time.")
len of buffer now: 12.42
(None, None, '')
b'\xdd\xff\xea\xff\x15\x00\xff\xff\xd6\xff'
65536
PROMPT:
CONTEXT:  Thank you very much.
transcribing 14.46 seconds from 0.00
whisper-server-INFO: Processing audio with duration 00:14.464
>>>>COMPLETE NOW: (None, None, '')
INCOMPLETE: (4.36, 13.26, " Well, that was not too hard, was it? No, it was a little bit difficult. It was a long time ago. It's hard to say.")
len of buffer now: 14.46
(None, None, '')
b'<\x07[\x07\x82\x08)\nj\x0c'
65536
PROMPT:
CONTEXT:  Thank you very much.
transcribing 16.51 seconds from 0.00
whisper-server-INFO: Processing audio with duration 00:16.512
>>>>COMPLETE NOW: (None, None, '')
INCOMPLETE: (4.079999999999999, 16.48, ' dramatic presentation of a video that we all enjoy watching. If you enjoy it, please subscribe.')
len of buffer now: 16.51
(None, None, '')
b'\xd3\xfd\xda\xfd\xe2\xfd\xef\xfd\x00\xfe'
65536
...

The text was updated successfully, but these errors were encountered:

Gldkslfmsd · 2023-11-14T12:06:33Z

Hi,
are you sure that the ffmpeg options are correct? For file format conversion, I'm using ffmpeg -i in.mp3 -acodec pcm_s16le -ac 1 -ar 16000 out.wav, in.mp3 is input and out.wav is output.

Furthermore, I think that this way of using ffmpeg flushes all the bytes of the recording at once. If you want to simulate real-time mode, you should make sure it's being sent real-time. I convert file to wav and then I'm using a Python script that outputs X second of audio every X seconds.

gu-ma · 2023-11-24T08:25:56Z

Hey Dominik,
Thanks for the reply and sorry for not getting back to you earlier. I'll try different way to stream it and let you know what worked at the end :)

Gldkslfmsd closed this as completed Nov 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

whisper_online_server.py not producing a correct text #28

whisper_online_server.py not producing a correct text #28

gu-ma commented Nov 14, 2023 •

edited

Loading

Gldkslfmsd commented Nov 14, 2023

gu-ma commented Nov 24, 2023

whisper_online_server.py not producing a correct text #28

whisper_online_server.py not producing a correct text #28

Comments

gu-ma commented Nov 14, 2023 • edited Loading

Client side

Command

Log

Server side

Command

Log

Gldkslfmsd commented Nov 14, 2023

gu-ma commented Nov 24, 2023

gu-ma commented Nov 14, 2023 •

edited

Loading