-
Notifications
You must be signed in to change notification settings - Fork 84
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Phone times? #20
Comments
Hi, Thanks for your comments! # find all emitting frames
for i in range(len(logits)):
logit = logits[i]
logit[0] /= blank_factor
arg_max = np.argmax(logit)
# this is an emitting frame
if arg_max != cur_max_arg and arg_max != 0:
emit_frame_idx.append(i)
cur_max_arg = arg_max Basically, the iterator i is the index of frame, which is the indicator of timestamp, each frame has a duration of 200ms and shifts by 100ms per frame. For each recognized phone, there is usually an emitting frame corresponding to it. You can find its frame index and compute the timestamp using the 100ms shift. |
Thanks! Will try this. |
Started looking into this - returning emit_frame_idx through lm/decoder.compute() and then app.recognize(). However I'm getting much higher indexes than I'd expect for 100 ms shift per frame (ones that give times much longer than the length of the sample). Is it possible that the frame duration is actually 75 ms with a shift of 30 ms? Looking at pm/feature.mfcc(), the default winstep is 0.01 and winlen is 0.025. Then in pm/mfcc.compute(), there is this block:
Does pm/utils.feature_window() concatenate frames in groups of three? |
Hi, Yeah, sorry, you are correct. |
I decided to return (approximate) relative position, rather than index, which I assume will be robust to any change in step size parameters.
Thanks again for your help! |
Would it be straightforward to modify Allosaurus to return the approximate times of the recognized phones?
Also, I’m a novice in this area, but for what it’s worth, very impressive tool!
The text was updated successfully, but these errors were encountered: