The Thinking Cap Project is a wearable AI assistant that listens for a wake word, processes voice commands, and provides intelligent responses. It integrates local speech-to-text with the powerful capabilities of OpenAI's GPT models, including visual analysis through a connected camera. Responses are displayed on a compact OLED screen, making it a truly hands-free AI companion.
The system operates through the Arduino Uno Q, bridging a Linux MPU with a STM32 MCU. The MPU handles the voice to text and API calls, while the MCU operates the OLED display.
-
Wake Word Detection: The main Python script (
main.py) continuously listens for the wake word "jarvis" using thevoskoffline speech recognition engine. This ensures privacy and low-latency activation. -
Voice Command Capture: Upon detecting the wake word, the device begins recording the user's speech. It transcribes the audio into text in real-time. The recording stops after a period of silence.
-
AI Processing: The transcribed text is sent to the
gpt_prompter.pyscript. This script interacts with the OpenAI API. It uses a tool-calling feature that allows the AI model to decide if it needs to see its surroundings to answer a query. -
Visual Analysis: If the user asks a question about their environment (e.g., "What do you see?"), the AI can trigger the
take_snapshotfunction. The system captures an image from the camera, which is then sent along with the original prompt to the GPT-4V model for analysis. -
Displaying the Response: The final text response from the AI is sent to the Arduino via the
Arduino_RouterBridge. The Arduino sketch then displays the message on the OLED screen, with automatic scrolling for longer responses.
- An Arduino Uno Q
- A 128x64 OLED Display (driven by the SSD1306 chipset)
- A usb webcam/microphone
- Python:
vosk: For offline, on-device wake word detection and speech-to-text.openai: For interacting with GPT-4V.opencv-python: For capturing images from the camera.Arduino_RouterBridge: For communication between the Python script and the Arduino.
- Arduino (C++):
Adafruit_SSD1306&Adafruit_GFX: For controlling the OLED display.Arduino_RouterBridge: To receive messages from the Python backend.
- Connect the OLED display to the I2C pins of your microcontroller.
- Connect the microphone and camera to the host computer running the Python script.
- Upload the Arduino sketch to your microcontroller.
- Open the
sketch/sketch.inofile in the Arduino IDE. - Install the required libraries listed in
sketch/sketch.yaml, including:Adafruit GFX LibraryAdafruit SSD1306Arduino_RouterBridge
- Upload the sketch to your board.
- Clone this repository:
git clone https://github.com/timqn22/TheThinkingCapProjectV2.git cd TheThinkingCapProjectV2/python - Install the required Python packages:
pip install -r requirements.txt
- Create a
.envfile in thepython/directory and add your OpenAI API key:OPENAI_API_KEY='your_openai_api_key_here' - The required Vosk model is already included in the
python/models/directory. - You may need to update the audio device in
python/main.pyfrom"plughw:CARD=B100,DEV=0"to match your system's microphone configuration. You can list your devices usingarecord -l.
- Run the main Python script:
python python/main.py
- The script will initialize, and the console will display
Waiting for 'Hey Jarvis'.... The OLED screen will show "Say 'Hey Jarvis'". - Activate the assistant by saying "jarvis".
- The OLED will update to "Listening...". State your question or command.
- After you finish speaking, the script will process your request and the OLED will show "Asking GPT...".
- The AI's response will then be displayed on the OLED screen.
.
├── python/
│ ├── main.py # Main application entry point, handles wake word and STT
│ ├── gpt_prompter.py # Manages interaction with OpenAI API & camera
│ ├── requirements.txt # Python dependencies
│ └── models/ # Contains the Vosk speech recognition model
│
└── sketch/
├── sketch.ino # Main Arduino sketch for device-side logic
├── oled.ino # Helper functions for OLED display control
└── sketch.yaml # Arduino project configuration and libraries