Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Setting the input_device_index has no effect #209

Open
TheMancha opened this issue Mar 6, 2025 · 3 comments
Open

Setting the input_device_index has no effect #209

TheMancha opened this issue Mar 6, 2025 · 3 comments

Comments

@TheMancha
Copy link

TheMancha commented Mar 6, 2025

I'm using the file realtimestt_test.py running in a miniconda environment. The script works awesome. The issue I have is that I need it to capture the audio coming from my system, not my mic.

Kolja told me to set the input_device_index in the file using show_devices.py to see the index of the device I want to set. I received 3 different devices. I tried setting the input_device_index to 4, 13 and 22 but none worked. No matter what, it keeps taking the audio coming from my mic.

I know that there's a workaround using stereomix, but that will interfere with other things in how I want to do things. Am I doing something wrong?

This is the device I want the script to take the audio from:

Device Index: 4
Name: FxSound Speakers (FxSound Audio
Sample Rate (Default): 44100.0 Hz
Max Input Channels: 0
Max Output Channels: 8
Host API: MME

Device Index: 13
Name: FxSound Speakers (FxSound Audio Enhancer)
Sample Rate (Default): 44100.0 Hz
Max Input Channels: 0
Max Output Channels: 8
Host API: Windows DirectSound

Device Index: 22
Name: FxSound Speakers (FxSound Audio Enhancer)
Sample Rate (Default): 48000.0 Hz
Max Input Channels: 0
Max Output Channels: 2
Host API: Windows WASAPI

And this is the fragment of the code in realtimestt_test.py that I'm modifiying:

Recorder configuration

recorder_config = {
    'spinner': False,
    'model': 'large-v2', # or large-v2 or deepdml/faster-whisper-large-v3-turbo-ct2 or ...
    'download_root': None, # default download root location. Ex. ~/.cache/huggingface/hub/ in Linux
    'input_device_index': 4,
    'realtime_model_type': 'tiny.en', # or small.en or distil-small.en or ...
@TheMancha
Copy link
Author

So, if I use "" in the input device index like:

input_device_index': "4"

I get this error

WARNING:root:Failed to get highest sample rate: 'str' object cannot be interpreted as an integer

Still doesn't get the audio from my system and keeps getting the audio from my mic.

PS. Sorry if this "error" looks stupid, I'm not a programmer myself. I just read and go trial-error until things work. :)

@TheMancha
Copy link
Author

I just discovered that stereomix actually works perfect for what I need. So, I'll go that way. I'll leave this open just in case Kolja or other person may propose a solution.

@Nenesh
Copy link

Nenesh commented Mar 7, 2025

Did you set use_microphone = False ? If not, I think it prioritize microphone input.
input_device_index take int, so you just do input_device_index = 4

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants