-
Notifications
You must be signed in to change notification settings - Fork 246
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Logging fixes #80
Logging fixes #80
Conversation
I will check this and #60 later.
I'd like to have cmd option for this and the DEBUG level as the default.
ok
ok
thanks but I don't want them -- the same reasons as in #29 . |
Happy to do the For the |
+1
I believe that the debug output containing the transcribed/translated content can be used for easier bug detection and reproducibility. The outputs can be wrong but not crashing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! see comments to lines.
Also, please check #76 and apply the edits that are better there.
whisper_online_server.py
Outdated
@@ -142,7 +148,7 @@ def format_output_transcript(self,o): | |||
print("%1.0f %1.0f %s" % (beg,end,o[2]),flush=True,file=sys.stderr) | |||
return "%1.0f %1.0f %s" % (beg,end,o[2]) | |||
else: | |||
print(o,file=sys.stderr,flush=True) | |||
# No text, so no output |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there should be at least debug log output
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in 88a7f3a.
whisper_online_server.py
Outdated
@@ -44,25 +55,22 @@ | |||
demo_audio_path = "cs-maji-2.16k.wav" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please merge current main and update
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
What I've done here is to move the log level into the shared args, and defaulted it to DEBUG across the board. I'm not quite sure that was what you were after, but it does bring back the log info you were looking for. I've also tweaked the log format so it tells you which logger the output's coming from. Sample output looks like this:
You can see the relevant Is that what you're looking for? |
Done at cc11b76. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Gldkslfmsd @regularfry I added some comments. However, many are nit-picky, and this PR already makes the project strictly better in my opinion. So feel free to ignore most of it.
print(f"transcribing {len(self.audio_buffer)/self.SAMPLING_RATE:2.2f} seconds from {self.buffer_time_offset:2.2f}",file=self.logfile) | ||
logger.debug(f"PROMPT: {prompt}") | ||
logger.debug(f"CONTEXT: {non_prompt}") | ||
logger.debug(f"transcribing {len(self.audio_buffer)/self.SAMPLING_RATE:2.2f} seconds from {self.buffer_time_offset:2.2f}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit-pick, and I also took shortcuts here so feel free to ignore, but formatting text in logs is better done using logger.debug("...%s...", value)
because it avoids interpolation in case the log is not written because of the log level.
Similarly, if you create variables or do any processing just for logging (like in line 381 below) it is good form to wrap with if logger.isEnabledFor(somelevel):
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, agree. I'll tidy those up.
@@ -611,14 +622,18 @@ def asr_factory(args, logfile=sys.stderr): | |||
logfile = sys.stderr | |||
|
|||
if args.offline and args.comp_unaware: | |||
print("No or one option from --offline and --comp_unaware are available, not both. Exiting.",file=logfile) | |||
logger.error("No or one option from --offline and --comp_unaware are available, not both. Exiting.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think command line errors go to stderr explicitly. If you set the log level too high the command may confusingly fail without error message. (I think the file
argument in the original was also a mistake.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, yes.
@@ -645,16 +660,16 @@ def output_transcript(o, now=None): | |||
print("%1.4f %1.0f %1.0f %s" % (now*1000, o[0]*1000,o[1]*1000,o[2]),file=logfile,flush=True) | |||
print("%1.4f %1.0f %1.0f %s" % (now*1000, o[0]*1000,o[1]*1000,o[2]),flush=True) | |||
else: | |||
print(o,file=logfile,flush=True) | |||
# No text, so no output | |||
pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems a functional change, but could be a good one. If o[0]
is the emission time, the "no text" comment seems incorrect?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do we think? Is the emission time on a "no text" segment useful? Now you've said it I'm inclined to bring it back in, because otherwise the consumer has no clue that any silence has been processed.
print("assertion error",file=logfile) | ||
pass | ||
except AssertionError as e: | ||
log.error(f"assertion error: {repr(e)}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have never seen these errors, would a full backtrace be useful here? (If so, log.exception
would be better.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we were making changes here I'd be inclined to go the other way and say that a failed assertion should just crash the process. assert
should be telling us that something's coded wrong, so to a certain degree all bets are off. That being said, given that I don't know what the original problem was that caused these exception handlers to be written, logging.exception
is a good compromise if that seems overly harsh.
Thanks, guys! |
@Gldkslfmsd @regularfry Thank you! |
Ah fantastic, thank you for that! |
I wrote this before I realised #76 existed, so apologies for that, but I've gone a little further:
DEBUG
rather thanINFO
so it's quiet by defaultwhisper_online_server.py
toofaster-whisper
is forced toWARNING
so it doesn't make log noise for every processed chunk.print
, which I figured was reasonable given that it's the important information.--log-level
command-line argument to control how verbose everything is, so if you want to see all the things, pass--log-level DEBUG
and all will be revealed.For convenience I've also added a
requirements.txt
and a separatecuda_requirements.txt
, which gives me the right combination of versions today forfaster-whisper
not to complain about missing.so
files. Happy to drop those and make a separate PR if that's too much all in one go, but it was literally the first thing I had to fix before I could do anything else.