Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Always chunking because of len #44

Closed
zmlee0514 opened this issue Dec 19, 2023 · 4 comments
Closed

Always chunking because of len #44

zmlee0514 opened this issue Dec 19, 2023 · 4 comments

Comments

@zmlee0514
Copy link

I have tested the whisper_online.py on a 10min video, but it always chunks when the buffer reaches 30s not sentence.
Running log: AI_pin.txt

I found it is due to the string comparison in the words_to_sentences function, so add strip() into it. I wonder if it is right because I want Whisper to output one sentence per line.

def words_to_sentences(self, words):
    """Uses self.tokenizer for sentence segmentation of words.
    Returns: [(beg,end,"sentence 1"),...]
    """
    
    cwords = [w for w in words]
    t = " ".join(o[2] for o in cwords)
    s = self.tokenizer.split(t)
    out = []
    while s:
        beg = None
        end = None
        sent = s.pop(0).strip()
        fsent = sent
        while cwords:
            b,e,w = cwords.pop(0)
            if beg is None and sent.startswith(w.strip()):
                beg = b
            elif end is None and sent == w.strip():
                end = e
                out.append((beg,end,fsent))
                break
            sent = sent[len(w):].strip()
    return out

The result log: AI_pin.txt

@Gldkslfmsd
Copy link
Collaborator

Hi,
can you please specify what back-end do you use? faster-whisper or whisper_timestamped? I think it has an impact

@Gldkslfmsd
Copy link
Collaborator

OK, thanks for feedback. #36 is a correct fix, I'm merging it

@zmlee0514
Copy link
Author

It still fails when I run with Whisper large-v3. My command simply uses the default for all arguments except for the model:
python whisper_online.py audio.wav --model large-v3

Because the large-v3 model of faster-whisper was just supported recently, it might be not tested. But I found the cause is the abnormal string like " . Welcome to Humane. This is the Humane AI pin.". It would be split into 3 sentences: [" .", "Welcome to Humane.", "This is the Humane AI pin."]. The first sentence " ." would only assign beg, but not end. This gap will destroy all remaining processes.
It can be simply solved by handling the case of the one-word sentence, but not elegant. I should study your algorithm first.

@Gldkslfmsd
Copy link
Collaborator

OK. Please reopen if you're sure you found a bug.

v3 is unrelated issue -- #45

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants