A real-time, voice-to-voice AI agent system built with Strands Agents, Amazon Nova 2 Sonic, and AWS AgentCore Runtime. Speak into your mic, get a spoken response back — with barge-in support so you can interrupt the agent mid-sentence.
Microphone → CLI Client → WebSocket → FastAPI Backend → Nova 2 Sonic (Strands BidiAgent) → Speaker
The system has three components:
| Component | Description |
|---|---|
backend/ |
FastAPI WebSocket server wrapping a BidiAgent from the Strands SDK |
cli/ |
Python CLI client that streams mic audio and plays back agent responses |
cdk/ |
AWS CDK stack (TypeScript) to deploy the backend to AgentCore Runtime |
- Python >= 3.12
- Node.js >= 18.0.0
- uv (Python package manager)
- AWS credentials configured (
aws configureor environment variables) - PortAudio installed (required by PyAudio)
# macOS
brew install portaudio
# Ubuntu/Debian
sudo apt-get install portaudio19-devcd backend
uv venv
source .venv/bin/activate
uv sync
uv run uvicorn app.main:app --host 0.0.0.0 --port 8080Make sure your AWS credentials are set — the backend calls Bedrock directly.
cd cli
uv venv
source .venv/bin/activate
uv sync
# Connect to local backend
.venv/bin/python app/main.py
# Connect to AgentCore Runtime (deployed)
.venv/bin/python app/main.py --agent_arn 'arn:aws:bedrock-agentcore:ap-northeast-1:...:runtime/...'Once running, speak into your microphone. Press Enter to stop the session.
| Argument | Default | Description |
|---|---|---|
--endpoint |
ws://localhost:8080/ws |
WebSocket endpoint to connect to |
--agent_arn |
None | AgentCore Runtime ARN (overrides --endpoint) |
--debug |
False | Print all received event types for debugging |
The following environment variables can be set on the backend (or in the CDK stack):
| Variable | Default | Description |
|---|---|---|
MODEL_ID |
amazon.nova-2-sonic-v1:0 |
Bedrock model ID |
REGION_NAME |
ap-northeast-1 |
AWS region |
INPUT_SAMPLE_RATE |
16000 |
Mic input sample rate (Hz) |
OUTPUT_SAMPLE_RATE |
16000 |
Audio output sample rate (Hz) |
CHANNELS |
1 |
Audio channels (mono) |
Make sure the sample rates and channels match between the backend and CLI client.
cd backend
# Build for ARM64
docker buildx create --use
docker buildx build --platform linux/arm64 -t backend-agent:arm64 --load .
# Run
docker run --platform linux/arm64 -p 8080:8080 \
-e AWS_ACCESS_KEY_ID="$AWS_ACCESS_KEY_ID" \
-e AWS_SECRET_ACCESS_KEY="$AWS_SECRET_ACCESS_KEY" \
-e AWS_SESSION_TOKEN="$AWS_SESSION_TOKEN" \
-e AWS_REGION="$AWS_REGION" \
backend-agent:arm64cd cdk
npm install
npm run build # compile TypeScript
npx cdk deploy # deploy to AWSAfter deployment, the CDK stack outputs the AgentCoreRuntimeArn. Use that ARN with the CLI --agent_arn flag.
npx cdk diff # compare with deployed stack
npx cdk synth # emit CloudFormation template
npx cdk destroy # tear down the stackThe CDK TypeScript project compiles successfully with npm run build (TypeScript 5.6, aws-cdk-lib 2.222.0).
Node.js >= 18 is required for the CDK stack.
- The CLI captures PCM audio from your microphone in 512-frame chunks
- Each chunk is base64-encoded and sent as a
bidi_audio_inputevent over WebSocket - The FastAPI backend passes these events to a
BidiAgent(Strands SDK) backed by Nova 2 Sonic - The agent streams back
bidi_audio_streamevents (base64 PCM) andbidi_transcript_streamevents (text) - The CLI decodes and plays the audio in real time, and prints transcripts to the console
- Barge-in is supported — speaking while the agent is responding clears the audio queue and interrupts playback
When connecting to AgentCore Runtime, the CLI uses IAM credentials by default (via bedrock-agentcore SDK). If your runtime is configured with OAuth2, update the websockets.connect call in cli/app/main.py accordingly. See AWS docs.
