A voice-enabled AI personal assistant that leverages the Model Context Protocol (MCP) to integrate multiple tools and services through natural voice interactions.
- π€ Voice Input: Real-time speech-to-text using OpenAI Whisper
- π Voice Output: High-quality text-to-speech using ElevenLabs (with pyttsx3 fallback)
- π€ AI-Powered: Conversational AI with memory persistence
- π Multiple Model Providers: Works with any LLM provider that supports tool calling (OpenAI, Anthropic, Groq, LLama, etc.)
- π οΈ Multi-Tool Integration: Seamlessly connects to any MCP servers:
- πΎ Conversational Memory: Maintains context across interactions
- π― Extensible: Easy to add new MCP servers and capabilities
βββββββββββββββ ββββββββββββββββ βββββββββββββββ ββββββββββββββββ
β User Voice β --> β Speech-to- β --> β LLM with β --> β Text-to- β
β Input β β Text (STT) β β MCPAgent β β Speech (TTS) β
βββββββββββββββ ββββββββββββββββ βββββββββββββββ ββββββββββββββββ
Whisper β ElevenLabs
β
ββββββββΌβββββββ
β MCP Servers β
βββββββββββββββ€
β β’ Linear β
β β’ Playwrightβ
β β’ Filesystemβ
βββββββββββββββ
- Python 3.11+
- uv (Python package manager):
pip install uv
orpipx install uv
- Node.js (for MCP servers)
- System dependencies:
- macOS:
brew install portaudio
- Ubuntu/Debian:
sudo apt-get install portaudio19-dev
- Windows: PyAudio wheel includes PortAudio
- macOS:
# Clone the repository
git clone https://github.com/yourusername/mcp-voice-assistant.git
cd mcp-voice-assistant
# Create a virtual environment with uv
uv venv
# Activate the virtual environment
# On Linux/macOS:
source .venv/bin/activate
# On Windows:
# .venv\Scripts\activate
# Install in development mode
uv pip install -e .
# Or install directly
uv pip install .
Create a .env
file in your project root (see .env.example
for a complete template):
# Required
OPENAI_API_KEY=your-openai-api-key
# Optional but recommended for better voice output
ELEVENLABS_API_KEY=your-elevenlabs-api-key
# Optional - Model Provider Settings
# You can use any model provider that supports tool calling
OPENAI_API_KEY=your-openai-api-key # For OpenAI models
ANTHROPIC_API_KEY=your-anthropic-api-key # For Claude models
GROQ_API_KEY=your-groq-api-key # For Groq models
# Model selection (defaults to gpt-4)
OPENAI_MODEL=gpt-4 # OpenAI: gpt-4, gpt-4-turbo, gpt-3.5-turbo
# Or use other providers:
# ANTHROPIC_MODEL=claude-3-5-sonnet-20240620 # Anthropic Claude
# GROQ_MODEL=llama3-8b-8192 # Groq LLama
# Voice Settings
ELEVENLABS_VOICE_ID=ZF6FPAbjXT4488VcRRnw # Default: Rachel voice
# Optional - Audio Configuration
VOICE_SILENCE_THRESHOLD=500 # Lower = more sensitive
VOICE_SILENCE_DURATION=1.5 # Seconds to wait after speech
# Optional - Assistant Configuration
ASSISTANT_SYSTEM_PROMPT="You are a helpful voice assistant..." # Customize personality
# Optional - MCP Server Specific
LINEAR_API_KEY=your-linear-api-key # For Linear integration
All environment variables can be overridden via command-line arguments when using the CLI.
The assistant loads MCP server configurations from mcp_servers.json
in the project root. By default, it includes:
- playwright: Web automation and browser control
- linear: Task and project management
To add more servers, edit mcp_servers.json
or copy mcp_servers.example.json
which includes additional servers like:
- filesystem, github, gitlab, google-drive, postgres, sqlite, slack, memory, puppeteer, brave-search, fetch
Environment variables in the config (like ${GITHUB_PERSONAL_ACCESS_TOKEN}
) are automatically substituted from your .env
file.
To override the default configuration programmatically:
config = {
"mcpServers": {
"your_server": {
"command": "npx",
"args": ["-y", "@your-org/mcp-server"],
"env": {"YOUR_API_KEY": "${YOUR_API_KEY}"}
}
}
}
After installation, run the assistant:
# Using uv
uv run python voice_assistant/agent.py
# Or using python directly
python voice_assistant/agent.py
# Override specific settings via command line
python voice_assistant/agent.py --model gpt-3.5-turbo --silence-threshold 300
# Provide all settings via command line (no .env needed)
python voice_assistant/agent.py \
--openai-api-key YOUR_KEY \
--elevenlabs-api-key YOUR_ELEVENLABS_KEY \
--model gpt-4 \
--voice-id ZF6FPAbjXT4488VcRRnw \
--silence-threshold 500 \
--silence-duration 1.5
# See all available options
python voice_assistant/agent.py --help
Note: Command-line arguments take precedence over environment variables.
The voice assistant supports multiple LLM providers through LangChain. Any model with tool calling capabilities can be used:
from langchain_openai import ChatOpenAI
from langchain_anthropic import ChatAnthropic
from langchain_groq import ChatGroq
# Using OpenAI (default)
assistant = VoiceAssistant(
openai_api_key="your-key",
model="gpt-4" # or gpt-4-turbo, gpt-3.5-turbo
)
# Using Anthropic Claude
llm = ChatAnthropic(
api_key="your-anthropic-key",
model="claude-3-5-sonnet-20240620"
)
assistant = VoiceAssistant(
llm=llm, # Pass custom LLM instance
elevenlabs_api_key="your-key"
)
# Using Groq
llm = ChatGroq(
api_key="your-groq-key",
model="llama3-8b-8192"
)
assistant = VoiceAssistant(
llm=llm,
elevenlabs_api_key="your-key"
)
Note: Only models with tool calling capabilities can be used. Check your model provider's documentation for supported models.
Pass different parameters when initializing:
assistant = VoiceAssistant(
openai_api_key="your-key",
elevenlabs_api_key="your-key",
elevenlabs_voice_id="different-voice-id", # Change voice
silence_threshold=300, # More sensitive
silence_duration=2.0, # Wait longer
model="gpt-3.5-turbo" # Faster model
)
-
No Audio Input Detected
- Check microphone permissions
- Lower the
silence_threshold
value - Verify PyAudio:
python -c "import pyaudio; pyaudio.PyAudio()"
-
TTS Not Working
- Verify API keys are set correctly
- Check API quotas
- System will fall back to pyttsx3 if ElevenLabs fails
-
MCP Server Connection Issues
- Ensure Node.js is installed
- Check internet connection for npx downloads
- Verify API keys for specific servers
-
High Latency
- Use faster LLM model (e.g.,
gpt-3.5-turbo
) - Reduce
max_steps
in MCPAgent - Consider using local models
- Use faster LLM model (e.g.,
We welcome contributions! Please see our Contributing Guidelines for details.
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature
) - Commit your changes (
git commit -m 'Add amazing feature'
) - Push to the branch (
git push origin feature/amazing-feature
) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- Built on top of mcp-use
- Uses OpenAI Whisper for speech recognition
- Voice synthesis powered by ElevenLabs
- MCP servers from the Model Context Protocol ecosystem
- π§ Email: your.email@example.com
- π¬ Discord: Join our server
- π Issues: GitHub Issues
- π Documentation: Full Docs