Skip to content

mcp-use/mcp-use-voice-assistant

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

4 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

MCP Voice Assistant

built with

mcp use logo

A voice-enabled AI personal assistant that leverages the Model Context Protocol (MCP) to integrate multiple tools and services through natural voice interactions.

Features

  • 🎀 Voice Input: Real-time speech-to-text using OpenAI Whisper
  • πŸ”Š Voice Output: High-quality text-to-speech using ElevenLabs (with pyttsx3 fallback)
  • πŸ€– AI-Powered: Conversational AI with memory persistence
  • 🌐 Multiple Model Providers: Works with any LLM provider that supports tool calling (OpenAI, Anthropic, Groq, LLama, etc.)
  • πŸ› οΈ Multi-Tool Integration: Seamlessly connects to any MCP servers:
  • πŸ’Ύ Conversational Memory: Maintains context across interactions
  • 🎯 Extensible: Easy to add new MCP servers and capabilities

Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ User Voice  β”‚ --> β”‚ Speech-to-   β”‚ --> β”‚  LLM with   β”‚ --> β”‚ Text-to-     β”‚
β”‚   Input     β”‚     β”‚ Text (STT)   β”‚     β”‚  MCPAgent   β”‚     β”‚ Speech (TTS) β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                         Whisper                 β”‚                ElevenLabs
                                                 β”‚
                                          β”Œβ”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”
                                          β”‚ MCP Servers β”‚
                                          β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
                                          β”‚ β€’ Linear    β”‚
                                          β”‚ β€’ Playwrightβ”‚
                                          β”‚ β€’ Filesystemβ”‚
                                          β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Installation

Prerequisites

  1. Python 3.11+
  2. uv (Python package manager): pip install uv or pipx install uv
  3. Node.js (for MCP servers)
  4. System dependencies:
    • macOS: brew install portaudio
    • Ubuntu/Debian: sudo apt-get install portaudio19-dev
    • Windows: PyAudio wheel includes PortAudio

Install from Source

# Clone the repository
git clone https://github.com/yourusername/mcp-voice-assistant.git
cd mcp-voice-assistant

# Create a virtual environment with uv
uv venv

# Activate the virtual environment
# On Linux/macOS:
source .venv/bin/activate
# On Windows:
# .venv\Scripts\activate

# Install in development mode
uv pip install -e .

# Or install directly
uv pip install .

Configuration

Environment Variables

Create a .env file in your project root (see .env.example for a complete template):

# Required
OPENAI_API_KEY=your-openai-api-key

# Optional but recommended for better voice output
ELEVENLABS_API_KEY=your-elevenlabs-api-key

# Optional - Model Provider Settings
# You can use any model provider that supports tool calling
OPENAI_API_KEY=your-openai-api-key              # For OpenAI models
ANTHROPIC_API_KEY=your-anthropic-api-key        # For Claude models
GROQ_API_KEY=your-groq-api-key                  # For Groq models

# Model selection (defaults to gpt-4)
OPENAI_MODEL=gpt-4                              # OpenAI: gpt-4, gpt-4-turbo, gpt-3.5-turbo
# Or use other providers:
# ANTHROPIC_MODEL=claude-3-5-sonnet-20240620   # Anthropic Claude
# GROQ_MODEL=llama3-8b-8192                    # Groq LLama

# Voice Settings
ELEVENLABS_VOICE_ID=ZF6FPAbjXT4488VcRRnw      # Default: Rachel voice

# Optional - Audio Configuration
VOICE_SILENCE_THRESHOLD=500                     # Lower = more sensitive
VOICE_SILENCE_DURATION=1.5                      # Seconds to wait after speech

# Optional - Assistant Configuration
ASSISTANT_SYSTEM_PROMPT="You are a helpful voice assistant..."  # Customize personality

# Optional - MCP Server Specific
LINEAR_API_KEY=your-linear-api-key              # For Linear integration

All environment variables can be overridden via command-line arguments when using the CLI.

MCP Server Configuration

The assistant loads MCP server configurations from mcp_servers.json in the project root. By default, it includes:

  • playwright: Web automation and browser control
  • linear: Task and project management

To add more servers, edit mcp_servers.json or copy mcp_servers.example.json which includes additional servers like:

  • filesystem, github, gitlab, google-drive, postgres, sqlite, slack, memory, puppeteer, brave-search, fetch

Environment variables in the config (like ${GITHUB_PERSONAL_ACCESS_TOKEN}) are automatically substituted from your .env file.

To override the default configuration programmatically:

config = {
    "mcpServers": {
        "your_server": {
            "command": "npx",
            "args": ["-y", "@your-org/mcp-server"],
            "env": {"YOUR_API_KEY": "${YOUR_API_KEY}"}
        }
    }
}

Running the Assistant

After installation, run the assistant:

# Using uv
uv run python voice_assistant/agent.py

# Or using python directly
python voice_assistant/agent.py

# Override specific settings via command line
python voice_assistant/agent.py --model gpt-3.5-turbo --silence-threshold 300

# Provide all settings via command line (no .env needed)
python voice_assistant/agent.py \
  --openai-api-key YOUR_KEY \
  --elevenlabs-api-key YOUR_ELEVENLABS_KEY \
  --model gpt-4 \
  --voice-id ZF6FPAbjXT4488VcRRnw \
  --silence-threshold 500 \
  --silence-duration 1.5

# See all available options
python voice_assistant/agent.py --help

Note: Command-line arguments take precedence over environment variables.

Changing Model Provider

The voice assistant supports multiple LLM providers through LangChain. Any model with tool calling capabilities can be used:

from langchain_openai import ChatOpenAI
from langchain_anthropic import ChatAnthropic
from langchain_groq import ChatGroq

# Using OpenAI (default)
assistant = VoiceAssistant(
    openai_api_key="your-key",
    model="gpt-4"  # or gpt-4-turbo, gpt-3.5-turbo
)

# Using Anthropic Claude
llm = ChatAnthropic(
    api_key="your-anthropic-key",
    model="claude-3-5-sonnet-20240620"
)
assistant = VoiceAssistant(
    llm=llm,  # Pass custom LLM instance
    elevenlabs_api_key="your-key"
)

# Using Groq
llm = ChatGroq(
    api_key="your-groq-key",
    model="llama3-8b-8192"
)
assistant = VoiceAssistant(
    llm=llm,
    elevenlabs_api_key="your-key"
)

Note: Only models with tool calling capabilities can be used. Check your model provider's documentation for supported models.

Changing Voice Settings

Pass different parameters when initializing:

assistant = VoiceAssistant(
    openai_api_key="your-key",
    elevenlabs_api_key="your-key",
    elevenlabs_voice_id="different-voice-id",  # Change voice
    silence_threshold=300,  # More sensitive
    silence_duration=2.0,   # Wait longer
    model="gpt-3.5-turbo"  # Faster model
)

Troubleshooting

Common Issues

  1. No Audio Input Detected

    • Check microphone permissions
    • Lower the silence_threshold value
    • Verify PyAudio: python -c "import pyaudio; pyaudio.PyAudio()"
  2. TTS Not Working

    • Verify API keys are set correctly
    • Check API quotas
    • System will fall back to pyttsx3 if ElevenLabs fails
  3. MCP Server Connection Issues

    • Ensure Node.js is installed
    • Check internet connection for npx downloads
    • Verify API keys for specific servers
  4. High Latency

    • Use faster LLM model (e.g., gpt-3.5-turbo)
    • Reduce max_steps in MCPAgent
    • Consider using local models

Contributing

We welcome contributions! Please see our Contributing Guidelines for details.

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

Support

About

MCP Voice assistant powered by mcp-use

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages