Skip to content

Audio transcription sent to undeclared/test Google Account and not to the provided llm client #1284

Open
@siavashg

Description

@siavashg

Transcribing audio content will use an unspecified Google API key for the transcription, and not as expected use the provided llm_client.

This is in part solved by #326 which at least provides an option to not route everything this way.


Instead of relying on the provided LLM llm_client markitdown will process audio via the SpeechRecognition library sr:

recognizer = sr.Recognizer()
with sr.AudioFile(audio_source) as source:
audio = recognizer.record(source)
transcript = recognizer.recognize_google(audio).strip()
return "[No speech detected]" if transcript == "" else transcript

In SpeechRecognition recognize_google is mapped to google_legacy:

https://github.com/Uberi/speech_recognition/blob/46e70560f605ed190b3b0c16f198ee34978de585/speech_recognition/__init__.py#L1288

The google_legacy method even comes with this warning (although does not declare where this key comes from and how the data may be used):

The Google Speech Recognition API key is specified by key. If not specified, it uses a generic key that works out of the box. This should generally be used for personal or testing purposes only, as it may be revoked by Google at any time.

https://github.com/Uberi/speech_recognition/blob/46e70560f605ed190b3b0c16f198ee34978de585/speech_recognition/recognizers/google.py#L225-L262

As it's using some unspecified API key
https://github.com/Uberi/speech_recognition/blob/46e70560f605ed190b3b0c16f198ee34978de585/speech_recognition/recognizers/google.py#L118-L119

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions