Logging fixes #80

regularfry · 2024-04-14T19:35:36Z

I wrote this before I realised #76 existed, so apologies for that, but I've gone a little further:

Logging is mostly DEBUG rather than INFO so it's quiet by default
I've done the remaining bits in whisper_online_server.py too
The log level within faster-whisper is forced to WARNING so it doesn't make log noise for every processed chunk.
The transcription output is still just a print, which I figured was reasonable given that it's the important information.
I've added a --log-level command-line argument to control how verbose everything is, so if you want to see all the things, pass --log-level DEBUG and all will be revealed.

For convenience I've also added a requirements.txt and a separate cuda_requirements.txt, which gives me the right combination of versions today for faster-whisper not to complain about missing .so files. Happy to drop those and make a separate PR if that's too much all in one go, but it was literally the first thing I had to fix before I could do anything else.

…console

Gldkslfmsd · 2024-04-17T13:01:55Z

I wrote this before I realised #76 existed, so apologies for that, but I've gone a little further:
* Logging is mostly `DEBUG` rather than `INFO` so it's quiet by default

* I've done the remaining bits in `whisper_online_server.py` too

I will check this and #60 later.

* The log level within `faster-whisper` is forced to `WARNING` so it doesn't make log noise for every processed chunk.

I'd like to have cmd option for this and the DEBUG level as the default.

* The transcription output is still just a `print`, which I figured was reasonable given that it's the important information.

ok

* I've added a `--log-level` command-line argument to control how verbose everything is, so if you want to see all the things, pass `--log-level DEBUG` and all will be revealed.

ok

For convenience I've also added a requirements.txt and a separate cuda_requirements.txt, which gives me the right combination of versions today for faster-whisper not to complain about missing .so files. Happy to drop those and make a separate PR if that's too much all in one go, but it was literally the first thing I had to fix before I could do anything else.

thanks but I don't want them -- the same reasons as in #29 .
You can suggest better wording of the installation instruction so that it's easier to prevent the missing .so files. In a new PR.

regularfry · 2024-04-17T13:43:05Z

Happy to do the faster-whisper log level change and argument. What is it that you're using the DEBUG output for?

For the *requirements.txt question, I can remove them from this PR. It looks like the version issues should go away once SYSTRAN/faster-whisper#785 is resolved one way or another, so cuda_requirements.txt is only a temporary need while the faster-whisper README is wrong.

Gldkslfmsd · 2024-04-17T15:08:15Z

Happy to do the faster-whisper log level change and argument.

+1

What is it that you're using the DEBUG output for?

I believe that the debug output containing the transcribed/translated content can be used for easier bug detection and reproducibility. The outputs can be wrong but not crashing.
If anybody wants a production run without verbose logs, it's still possible with the option.

Gldkslfmsd

Thanks! see comments to lines.

Also, please check #76 and apply the edits that are better there.

Gldkslfmsd · 2024-04-17T15:12:19Z

whisper_online_server.py

@@ -142,7 +148,7 @@ def format_output_transcript(self,o):
            print("%1.0f %1.0f %s" % (beg,end,o[2]),flush=True,file=sys.stderr)
            return "%1.0f %1.0f %s" % (beg,end,o[2])
        else:
-            print(o,file=sys.stderr,flush=True)
+            # No text, so no output


there should be at least debug log output

Done in 88a7f3a.

whisper_online.py

Gldkslfmsd · 2024-04-17T15:15:40Z

whisper_online_server.py

@@ -44,25 +55,22 @@
 demo_audio_path = "cs-maji-2.16k.wav"


please merge current main and update

regularfry · 2024-04-17T21:27:41Z

* The log level within `faster-whisper` is forced to `WARNING` so it doesn't make log noise for every processed chunk.
I'd like to have cmd option for this and the DEBUG level as the default.

What I've done here is to move the log level into the shared args, and defaulted it to DEBUG across the board. I'm not quite sure that was what you were after, but it does bring back the log info you were looking for.

I've also tweaked the log format so it tells you which logger the output's coming from. Sample output looks like this:

  $ ./run.sh ../whisper.cpp/samples/jfk.wav 
whisper-server-DEBUG:whisper_online: Loading Whisper large-v3 model for en...
whisper-server-DEBUG:urllib3.connectionpool: Starting new HTTPS connection (1): huggingface.co:443
whisper-server-DEBUG:urllib3.connectionpool: https://huggingface.co:443 "GET /api/models/Systran/faster-whisper-large-v3/revision/main HTTP/1.1" 200 2061
whisper-server-DEBUG:whisper_online: done. It took 1.31 seconds.
whisper-server-WARNING:__main__: Whisper is not warmed up. The first chunk processing may take longer.
whisper-server-INFO:__main__: Listening on('0', 2024)
whisper-server-INFO:__main__: Connected to client on ('127.0.0.1', 41130)
...

You can see the relevant __name__ in between the colons.

Is that what you're looking for?

regularfry · 2024-04-17T21:37:36Z

For the *requirements.txt question, I can remove them from this PR.

Done at cc11b76.

regularfry · 2024-04-17T21:42:47Z

At a first pass, I think that's it.

Also, please check #76 and apply the edits that are better there.

Is there anything in particular I'm missing here? @jcassee, anything you don't want left off?

jcassee

@Gldkslfmsd @regularfry I added some comments. However, many are nit-picky, and this PR already makes the project strictly better in my opinion. So feel free to ignore most of it.

whisper_online.py

jcassee · 2024-04-17T21:58:04Z

whisper_online.py

-        print(f"transcribing {len(self.audio_buffer)/self.SAMPLING_RATE:2.2f} seconds from {self.buffer_time_offset:2.2f}",file=self.logfile)
+        logger.debug(f"PROMPT: {prompt}")
+        logger.debug(f"CONTEXT: {non_prompt}")
+        logger.debug(f"transcribing {len(self.audio_buffer)/self.SAMPLING_RATE:2.2f} seconds from {self.buffer_time_offset:2.2f}")


Nit-pick, and I also took shortcuts here so feel free to ignore, but formatting text in logs is better done using logger.debug("...%s...", value) because it avoids interpolation in case the log is not written because of the log level.

Similarly, if you create variables or do any processing just for logging (like in line 381 below) it is good form to wrap with if logger.isEnabledFor(somelevel):.

Yep, agree. I'll tidy those up.

whisper_online.py

jcassee · 2024-04-17T22:05:54Z

whisper_online.py

@@ -611,14 +622,18 @@ def asr_factory(args, logfile=sys.stderr):
    logfile = sys.stderr

    if args.offline and args.comp_unaware:
-        print("No or one option from --offline and --comp_unaware are available, not both. Exiting.",file=logfile)
+        logger.error("No or one option from --offline and --comp_unaware are available, not both. Exiting.")


I think command line errors go to stderr explicitly. If you set the log level too high the command may confusingly fail without error message. (I think the file argument in the original was also a mistake.)

Good catch, yes.

jcassee · 2024-04-17T22:10:16Z

whisper_online.py

@@ -645,16 +660,16 @@ def output_transcript(o, now=None):
            print("%1.4f %1.0f %1.0f %s" % (now*1000, o[0]*1000,o[1]*1000,o[2]),file=logfile,flush=True)
            print("%1.4f %1.0f %1.0f %s" % (now*1000, o[0]*1000,o[1]*1000,o[2]),flush=True)
        else:
-            print(o,file=logfile,flush=True)
+            # No text, so no output
+            pass


This seems a functional change, but could be a good one. If o[0] is the emission time, the "no text" comment seems incorrect?

What do we think? Is the emission time on a "no text" segment useful? Now you've said it I'm inclined to bring it back in, because otherwise the consumer has no clue that any silence has been processed.

jcassee · 2024-04-17T22:14:54Z

whisper_online.py

-            print("assertion error",file=logfile)
-            pass
+        except AssertionError as e:
+            log.error(f"assertion error: {repr(e)}")


I have never seen these errors, would a full backtrace be useful here? (If so, log.exception would be better.)

If we were making changes here I'd be inclined to go the other way and say that a failed assertion should just crash the process. assert should be telling us that something's coded wrong, so to a certain degree all bets are off. That being said, given that I don't know what the original problem was that caused these exception handlers to be written, logging.exception is a good compromise if that seems overly harsh.

whisper_online_server.py

whisper_online.py

Gldkslfmsd · 2024-04-18T16:15:22Z

Thanks, guys!
Sorry, I was lost in your conversation, it was too long. I did the changes that I liked and that I considered serious.
Like code deduplicating setting the log levels, with a simple hack.
Let's not overcomplicate it.

jcassee · 2024-04-18T21:20:18Z

@Gldkslfmsd @regularfry Thank you!

regularfry · 2024-04-18T21:30:46Z

Ah fantastic, thank you for that!

regularfry added 7 commits April 14, 2024 14:24

Turn prints into logging.debug calls in whisper_online_server.py

c83746b

Further tidying of print output, so by default there's little on the …

32191b5

…console

Add requirements.txt and a cuda_requirements.txt that's up to date

a6d9716

Add some logging around warmup

72cf84f

Remove 'INFO:' from a few log strings

736e538

Merge branch 'main' into ayo-logging-fixes

6db3f87

Set the log level inside faster-whisper again (lost in merge)

259a346

Gldkslfmsd requested changes Apr 17, 2024

View reviewed changes

Gldkslfmsd mentioned this pull request Apr 17, 2024

Convert print to logging #76

Closed

regularfry added 4 commits April 17, 2024 20:47

Merge branch 'main' into ayo-logging-fixes

3e02232

Construct an explicit logger rather than using the root logger

ebdde20

Remove requirements.txt files

cc11b76

Default log level to DEBUG, faster-whisper to match

243777b

regularfry mentioned this pull request Apr 17, 2024

tgt_language undefined problem on main #83

Closed

Add a debug log line when no text is detected

88a7f3a

jcassee reviewed Apr 17, 2024

View reviewed changes

Gldkslfmsd merged commit 88a7f3a into ufal:main Apr 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Logging fixes #80

Logging fixes #80

regularfry commented Apr 14, 2024

Gldkslfmsd commented Apr 17, 2024

regularfry commented Apr 17, 2024

Gldkslfmsd commented Apr 17, 2024

Gldkslfmsd left a comment

Gldkslfmsd Apr 17, 2024

regularfry Apr 17, 2024

Gldkslfmsd Apr 17, 2024

regularfry Apr 17, 2024

regularfry commented Apr 17, 2024

regularfry commented Apr 17, 2024

regularfry commented Apr 17, 2024

jcassee left a comment

jcassee Apr 17, 2024

regularfry Apr 18, 2024

jcassee Apr 17, 2024

regularfry Apr 18, 2024

jcassee Apr 17, 2024

regularfry Apr 18, 2024

jcassee Apr 17, 2024

regularfry Apr 18, 2024

Gldkslfmsd commented Apr 18, 2024

jcassee commented Apr 18, 2024

regularfry commented Apr 18, 2024

Logging fixes #80

Logging fixes #80

Conversation

regularfry commented Apr 14, 2024

Gldkslfmsd commented Apr 17, 2024

regularfry commented Apr 17, 2024

Gldkslfmsd commented Apr 17, 2024

Gldkslfmsd left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

regularfry commented Apr 17, 2024

regularfry commented Apr 17, 2024

regularfry commented Apr 17, 2024

jcassee left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Gldkslfmsd commented Apr 18, 2024

jcassee commented Apr 18, 2024

regularfry commented Apr 18, 2024