Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem with speech recognition #151

Open
ebracci opened this issue Mar 30, 2021 · 3 comments
Open

Problem with speech recognition #151

ebracci opened this issue Mar 30, 2021 · 3 comments

Comments

@ebracci
Copy link

ebracci commented Mar 30, 2021

Hi,

I'm using wit.ai in a simple python application in order to trascribe audio (Speech to text) and I have encountered the following issue:

when i send audio with the related api for transcription, it seems that the transcription stops at the first "point with no audio" without finishing the transcription.

How can I avoid this?

Thank you

@ebracci ebracci changed the title Problem with speech recongnition Problem with speech recognition Mar 30, 2021
@ruoyipu
Copy link

ruoyipu commented Mar 31, 2021

Hi @ebracci,

For streaming requests, we accept 10s chunks, otherwise for non streaming requests, it is cut off after 20s. (https://wit.ai/docs/http/20200513/#post__speech_link)

If you'd like us to take a deeper look, please provide examples of your requests.

@ebracci
Copy link
Author

ebracci commented Apr 1, 2021

Thank you for your answer and sorry for my bad english but I'll try to explain you the problem with an example.

I have to divide the 30 minutes audio in segment of 10 seconds due to request timeout and that's ok.

I have encountered the following issue: if an audio segment of 10 seconds contains silence, the request will return the transcpription of the audio segment until the silence.

Example:
if the audio lasts 10 seconds and there is a pause at 5 seconds, the request will return the transcript of the first 5 seconds only.

That's my problem.. To avoid this I have to remove the "silence chunks" from the audio but it becomes "complicated" with longer audio.

This is the sample code that I am using.

import os

from pydub import AudioSegment
from pydub.silence import split_on_silence
from wit import Wit


def recognize_speech_wit_ai(file):
    client = Wit('MYKEY')
    audio = AudioSegment.from_wav(file)
    file_name = os.path.basename(file)
    offset = 10000

    chunks = split_audio_on_silence(audio)

    # Process each chunk 
    for i, chunk in enumerate(chunks):
        start_time = 0
        # Create a silence chunk that's 0.5 seconds (or 500 ms) long for padding.
        silence_chunk = AudioSegment.silent(duration=500)

        # Add the padding chunk to beginning and end of the entire chunk.
        audio_chunk = silence_chunk + chunk + silence_chunk

        # Normalize the entire chunk.
        normalized_chunk = match_target_amplitude(audio_chunk, -20.0)

        while normalized_chunk.duration_seconds > (start_time / 1000):
            # Works in milliseconds
            t1 = start_time  
            t2 = start_time + offset
            new_audio = normalized_chunk[t1:t2]
            new_audio.export('temp/chunk{0}'.format(i) + str(t1) + '.wav', format="wav")
            with open('temp/chunk{0}'.format(i) + str(t1) + '.wav', 'rb') as source:
                resp = client.speech(source, {'Content-Type': 'audio/wav'})
                print('Yay, got Wit.ai response: ' + str(resp['text']))
                f = open("output/" + file_name + '.txt', "a")
                f.write(resp['text'] + '\n')
                f.close()
                start_time += offset


def match_target_amplitude(aChunk, target_dBFS):
    #Normalize given audio chunk
    change_in_dBFS = target_dBFS - aChunk.dBFS
    return aChunk.apply_gain(change_in_dBFS)


def split_audio_on_silence(audio):
    chunks = split_on_silence(
        audio,
        min_silence_len=1000,
        silence_thresh=-50
    )

    return chunks


if __name__ == "__main__":
    recognize_speech_wit_ai('audio/output.wav')

@ruoyipu
Copy link

ruoyipu commented Apr 1, 2021

Thank you for the info! Do you also have the wit.ai app ID and a sample wav file you can attach?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants