A real-time interview transcription system that captures audio from browser tabs and provides live transcription using Deepgram's streaming API.
- Frontend: Next.js 15 + React + TypeScript + Tailwind CSS
- Backend: FastAPI + Python with WebSocket proxy to Deepgram
- Audio Capture: Browser Screen Capture API with tab audio
- Transcription: Deepgram Nova-2/Nova-3 streaming API
- β
Real-time tab audio capture using
getDisplayMedia() - β Live transcription streaming via WebSocket
- β Interim and final transcript handling
- β Connection resilience with auto-reconnect
- β Multiple Deepgram model options (Nova-2/Nova-3)
- β Configurable transcription settings
- β Error handling for permissions and network issues
- β TypeScript types and reducer pattern
- β Responsive UI with status indicators
A comprehensive AI-powered interview platform that streamlines the research interview process with real-time transcription, intelligent question suggestions, and seamless project management.
MoonCorp Interview Platform is a professional-grade tool designed for researchers, UX professionals, and product teams to conduct, transcribe, and analyze interviews efficiently. The platform combines real-time speech-to-text capabilities with AI-generated follow-up questions to enhance interview quality and reduce manual effort.
Main interview interface with live transcription and AI suggestions
Real-time screen capture and transcription in action
Project management and interview organization
- Live Audio Capture: Browser-based screen sharing with tab audio capture
- Streaming Transcription: Real-time speech-to-text using Deepgram Nova-2/Nova-3 models
- Multiple Input Methods: Support for live capture, file upload, and manual entry
- High Accuracy: Advanced punctuation, formatting, and speaker detection
- Smart Follow-ups: Contextual question suggestions based on interviewee responses
- Question Categorization: Probe, deep-dive, quantitative, and clarification question types
- Real-time Generation: Instant AI suggestions as conversations progress
- Customizable Focus: Targeted questioning strategies (probe, deep, quant)
- Project Organization: Group interviews by research projects
- Interview Tracking: Status management and progress monitoring
- Collaboration: Multi-user support with role-based permissions
- Export Options: Comprehensive data export and reporting
- Responsive Design: Optimized for desktop and mobile devices
- Keyboard Shortcuts: Efficient navigation and control
- Live Video Preview: Embedded screen capture preview
- Professional UI: Clean, distraction-free interface
- Framework: Next.js 15 with React 19
- Language: TypeScript
- Styling: Tailwind CSS with shadcn/ui components
- Icons: Lucide React
- Forms: React Hook Form with Zod validation
- State Management: React hooks and context
- API Routes: Next.js API routes
- Authentication: Supabase Auth
- Database: Supabase PostgreSQL
- File Storage: Supabase Storage
- Real-time: WebSocket connections
- Speech-to-Text: Deepgram Nova-2/Nova-3 streaming API
- Backend: FastAPI + Python WebSocket proxy
- Audio Processing: Browser MediaRecorder API
- Screen Capture: getDisplayMedia API
- Language Model: OpenAI GPT for question generation
- Prompt Engineering: Sophisticated interview context analysis
- Real-time Processing: Streaming AI responses
- Package Manager: pnpm
- Linting: ESLint with TypeScript rules
- Testing: Jest and React Testing Library
- Deployment: Vercel (frontend), Railway/Docker (backend)
- Node.js 18+ and pnpm
- Python 3.8+ and pip
- Supabase account and project
- Deepgram API key
- OpenAI API key
git clone https://github.com/wesng/hack2025.git
cd hack2025cd mooncorpapp
pnpm installcd speech-to-text
pip install -r requirements.txtCreate .env.local in the mooncorpapp directory:
# Supabase Configuration
NEXT_PUBLIC_SUPABASE_URL=your_supabase_url
NEXT_PUBLIC_SUPABASE_ANON_KEY=your_supabase_anon_key
SUPABASE_SERVICE_ROLE_KEY=your_service_role_key
# OpenAI Configuration
OPENAI_API_KEY=your_openai_api_key
# Transcription Service
NEXT_PUBLIC_TRANSCRIPTION_WS_URL=ws://localhost:8000/wsCreate .env in the speech-to-text directory:
# Deepgram Configuration
DEEPGRAM_API_KEY=your_deepgram_api_key
# Server Configuration
HOST=0.0.0.0
PORT=8000- Create a new Supabase project
- Run the database migrations (SQL files in
/supabasedirectory) - Configure Row Level Security (RLS) policies
Terminal 1 - Frontend:
cd mooncorpapp
pnpm devTerminal 2 - Transcription Backend:
cd speech-to-text
python main.pyThe application will be available at:
- Frontend:
https://localhost:3000 - Transcription API:
http://localhost:8000
- Create Account: Sign up using email/password or OAuth
- Create Project: Organize your interviews by research project
- Start Interview: Create a new interview session
- Begin Transcription: Use live capture for real-time transcription
- Click "Start Live Capture" on the interview page
- Select browser tab with interview audio (Zoom, Meet, etc.)
- Check "Share tab audio" in browser permission dialog
- Watch live transcription stream into the response field
- Submit responses and receive AI-generated follow-up questions
Cmd/Ctrl + Enter- Submit turnCmd/Ctrl + Shift + K- Toggle suggestions dockCmd/Ctrl + R- Regenerate suggestions?- Show help dialogEscape- Cancel/close dialogs
The AI analyzes interview responses and generates categorized follow-up questions:
- Probe: Explore deeper into topics
- Deep: Get detailed information
- Quant: Gather quantitative data
- Clarify: Resolve ambiguities
// Create interview
POST /api/interviews
Body: { title: string, projectId: string, intervieweeName?: string }
// Update interview
PATCH /api/interviews/[id]
Body: { title?: string, status?: string }
// Get interview
GET /api/interviews/[id]// Generate suggestions
POST /api/ask_engine/v2/suggestions
Body: { interview_id: string, transcript: string }
Response: { items: Suggestion[] }
// Save transcript
POST /api/ask_engine/v2/transcript
Body: { interview_id: string, transcript: string }// WebSocket transcription
WS /ws
Messages: { audio: ArrayBuffer, config: TranscriptionConfig }
// File upload transcription
POST /api/transcribe
Body: FormData with audio fileinterface Suggestion {
question: string
type: 'probe' | 'deep' | 'quant' | 'clarify'
description: string
}
interface Interview {
id: string
title: string
projectId: string
status: 'draft' | 'active' | 'completed'
intervieweeName?: string
createdAt: string
updatedAt: string
}-- Projects table
CREATE TABLE projects (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
title TEXT NOT NULL,
description TEXT,
owner UUID REFERENCES auth.users(id),
created_at TIMESTAMP DEFAULT NOW(),
updated_at TIMESTAMP DEFAULT NOW()
);
-- Interviews table
CREATE TABLE interviews (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
title TEXT NOT NULL,
project_id UUID REFERENCES projects(id),
interviewee_name TEXT,
status TEXT DEFAULT 'draft',
transcript TEXT,
created_at TIMESTAMP DEFAULT NOW(),
updated_at TIMESTAMP DEFAULT NOW()
);
-- Turns table (interview responses)
CREATE TABLE turns (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
interview_id UUID REFERENCES interviews(id),
text TEXT NOT NULL,
focus_type TEXT,
created_at TIMESTAMP DEFAULT NOW()
);See /supabase/migrations/ for complete schema definitions.
cd mooncorpapp
pnpm testpnpm test:coverage- Happy Path: Start transcription β select tab β verify live streaming
- Permission Denied: Deny screen sharing β check error handling
- No Audio: Select tab without audio β verify error message
- Network Issues: Disconnect backend β test reconnection
- Submit various response types and verify appropriate suggestions
- Test different focus types (probe, deep, quant)
- Verify suggestion regeneration and feedback systems
MIT License
Copyright (c) 2025 MoonCorp
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
We welcome contributions! Please see our Contributing Guide for details.
- π§ Email: support@mooncorp.com
- π Documentation: docs.mooncorp.com
- π Issues: GitHub Issues
- π¬ Discord: MoonCorp Community
- Multi-language transcription support
- Advanced analytics and insights
- Integration with popular meeting platforms
- Mobile application
- Advanced AI interview coaching
- Team collaboration features
Built with β€οΈ by the MoonCorp team
- Navigate to
http://localhost:3000/transcription - Click "Start Transcription"
- Select a browser tab with audio and enable "Share tab audio"
- Watch live transcription appear in real-time
- Click "Stop Transcription" when done
Important: The Screen Capture API requires HTTPS in production. For local development, modern browsers allow it on localhost.
-
Happy Path:
- Start transcription β browser prompt appears
- Select tab with audio + "Share tab audio"
- See interim text (gray/italic) streaming
- See finalized text (normal) with timestamps
- Stop transcription β everything stops cleanly
-
Permission Denied:
- Click start β deny screen sharing
- Should show clear error message
-
No Audio Tab:
- Select tab without audio or don't check "Share tab audio"
- Should show "No audio track found" error
-
Network Issues:
- Start transcription, then disconnect backend
- Should show "Reconnecting..." status
- Restart backend β should auto-reconnect
-
Settings:
- Change model from Nova-2 to Nova-3
- Adjust endpointing timing
- Change language settings
- Chrome/Edge: Full support
- Firefox: Supported but may have audio quality differences
- Safari: Limited support for tab audio capture
mooncorpapp/
βββ app/transcription/page.tsx # Main transcription page
βββ components/transcriber-card.tsx # Main UI component
βββ hooks/use-tab-transcription.ts # Custom hook with WebSocket logic
βββ lib/types/transcription.ts # TypeScript type definitions
βββ __tests__/transcription-reducer.test.ts # Unit tests
speech-to-text/
βββ main.py # FastAPI server
βββ requirements.txt # Python dependencies
βββ .env # Environment variables
βββ README.md # Backend documentation
- TranscriberCard: Main UI component with controls and transcript display
- useTabTranscription: Custom hook managing WebSocket, MediaRecorder, and state
- Types: Strong TypeScript interfaces for all data structures
- Reducer: Predictable state management for transcription flow
- WebSocket Proxy: Forwards audio chunks to Deepgram, returns transcript events
- Error Handling: Structured error responses with request IDs for debugging
- Token Endpoint: Optional ephemeral token generation for future direct connections
Default configuration (customizable in UI):
{
model: "nova-2", // or "nova-3" for higher accuracy
language: "en-US",
smart_format: true, // Automatic punctuation/formatting
punctuate: true,
paragraphs: true,
endpointing: 300 // 300ms, 500ms, or 1000ms
}- Container: WebM with Opus codec
- Chunks: 250ms intervals
- No encoding params: Deepgram reads format from container
-
"Screen capture permission denied"
- Enable screen sharing in browser settings
- Try refreshing the page
-
"No audio track found"
- Ensure you check "Share tab audio" in the browser prompt
- Make sure the selected tab is actually playing audio
-
Transcription not appearing
- Check browser console for WebSocket errors
- Verify backend is running on port 8000
- Check Deepgram API key is valid
-
Poor transcription quality
- Try switching to Nova-3 model
- Ensure clear audio in the source tab
- Check network connection stability
# Backend logs
cd speech-to-text
python main.py # Check console for WebSocket events
# Frontend logs
# Open browser dev tools β Console tab
# Look for WebSocket connection messages
# Test reducer logic
# In browser console:
runTranscriptionReducerTests()- β Deepgram API key kept server-side only
- β CORS configured for localhost development
- β No audio/video data stored or logged
- β Ephemeral connections only
- SSL/HTTPS setup for production
- Audio quality controls
- Transcript export (JSON, text, SRT)
- Speaker identification
- Custom vocabulary/keywords
- Audio preprocessing (noise reduction)
Client β Server:
// Start
{
"type": "start",
"config": { "model": "nova-2", "language": "en-US" }
}
// Audio (binary WebM/Opus chunks)
// Finish
{ "type": "finish" }Server β Client:
// Transcript
{
"type": "transcript",
"data": {
"channel": { "alternatives": [{ "transcript": "Hello world", "confidence": 0.95 }] },
"is_final": true
}
}
// Status
{ "type": "status", "message": "Connected", "status": "connected" }
// Error
{ "type": "error", "error_type": "Connection failed", "details": "..." }