Share-Tab Live Transcription

A real-time interview transcription system that captures audio from browser tabs and provides live transcription using Deepgram's streaming API.

Architecture

Frontend: Next.js 15 + React + TypeScript + Tailwind CSS
Backend: FastAPI + Python with WebSocket proxy to Deepgram
Audio Capture: Browser Screen Capture API with tab audio
Transcription: Deepgram Nova-2/Nova-3 streaming API

Features

✅ Real-time tab audio capture using getDisplayMedia()
✅ Live transcription streaming via WebSocket
✅ Interim and final transcript handling
✅ Connection resilience with auto-reconnect
✅ Multiple Deepgram model options (Nova-2/Nova-3)
✅ Configurable transcription settings
✅ Error handling for permissions and network issues
✅ TypeScript types and reducer pattern
✅ Responsive UI with status indicators

MoonCorp Interview Platform

A comprehensive AI-powered interview platform that streamlines the research interview process with real-time transcription, intelligent question suggestions, and seamless project management.

🎯 Overview

MoonCorp Interview Platform is a professional-grade tool designed for researchers, UX professionals, and product teams to conduct, transcribe, and analyze interviews efficiently. The platform combines real-time speech-to-text capabilities with AI-generated follow-up questions to enhance interview quality and reduce manual effort.

📸 Demo / Screenshots

Main interview interface with live transcription and AI suggestions

Real-time screen capture and transcription in action

Project management and interview organization

✨ Features

🎙️ Real-Time Transcription

Live Audio Capture: Browser-based screen sharing with tab audio capture
Streaming Transcription: Real-time speech-to-text using Deepgram Nova-2/Nova-3 models
Multiple Input Methods: Support for live capture, file upload, and manual entry
High Accuracy: Advanced punctuation, formatting, and speaker detection

🤖 AI-Powered Interview Assistant

Smart Follow-ups: Contextual question suggestions based on interviewee responses
Question Categorization: Probe, deep-dive, quantitative, and clarification question types
Real-time Generation: Instant AI suggestions as conversations progress
Customizable Focus: Targeted questioning strategies (probe, deep, quant)

📊 Professional Interview Management

Project Organization: Group interviews by research projects
Interview Tracking: Status management and progress monitoring
Collaboration: Multi-user support with role-based permissions
Export Options: Comprehensive data export and reporting

🎨 Modern User Experience

Responsive Design: Optimized for desktop and mobile devices
Keyboard Shortcuts: Efficient navigation and control
Live Video Preview: Embedded screen capture preview
Professional UI: Clean, distraction-free interface

🛠️ Tech Stack

Frontend

Framework: Next.js 15 with React 19
Language: TypeScript
Styling: Tailwind CSS with shadcn/ui components
Icons: Lucide React
Forms: React Hook Form with Zod validation
State Management: React hooks and context

Backend & APIs

API Routes: Next.js API routes
Authentication: Supabase Auth
Database: Supabase PostgreSQL
File Storage: Supabase Storage
Real-time: WebSocket connections

Transcription Services

Speech-to-Text: Deepgram Nova-2/Nova-3 streaming API
Backend: FastAPI + Python WebSocket proxy
Audio Processing: Browser MediaRecorder API
Screen Capture: getDisplayMedia API

AI & ML

Language Model: OpenAI GPT for question generation
Prompt Engineering: Sophisticated interview context analysis
Real-time Processing: Streaming AI responses

Development & Deployment

Package Manager: pnpm
Linting: ESLint with TypeScript rules
Testing: Jest and React Testing Library
Deployment: Vercel (frontend), Railway/Docker (backend)

🚀 Installation Instructions

Prerequisites

Node.js 18+ and pnpm
Python 3.8+ and pip
Supabase account and project
Deepgram API key
OpenAI API key

1. Clone the Repository

git clone https://github.com/wesng/hack2025.git
cd hack2025

2. Frontend Setup

cd mooncorpapp
pnpm install

3. Backend Setup (Transcription Service)

cd speech-to-text
pip install -r requirements.txt

4. Environment Variables

Create .env.local in the mooncorpapp directory:

# Supabase Configuration
NEXT_PUBLIC_SUPABASE_URL=your_supabase_url
NEXT_PUBLIC_SUPABASE_ANON_KEY=your_supabase_anon_key
SUPABASE_SERVICE_ROLE_KEY=your_service_role_key

# OpenAI Configuration
OPENAI_API_KEY=your_openai_api_key

# Transcription Service
NEXT_PUBLIC_TRANSCRIPTION_WS_URL=ws://localhost:8000/ws

Create .env in the speech-to-text directory:

# Deepgram Configuration
DEEPGRAM_API_KEY=your_deepgram_api_key

# Server Configuration
HOST=0.0.0.0
PORT=8000

5. Database Setup

Create a new Supabase project
Run the database migrations (SQL files in /supabase directory)
Configure Row Level Security (RLS) policies

6. Start Development Servers

Terminal 1 - Frontend:

cd mooncorpapp
pnpm dev

Terminal 2 - Transcription Backend:

cd speech-to-text
python main.py

The application will be available at:

Frontend: https://localhost:3000
Transcription API: http://localhost:8000

📖 Usage

Getting Started

Create Account: Sign up using email/password or OAuth
Create Project: Organize your interviews by research project
Start Interview: Create a new interview session
Begin Transcription: Use live capture for real-time transcription

Live Transcription Workflow

Click "Start Live Capture" on the interview page
Select browser tab with interview audio (Zoom, Meet, etc.)
Check "Share tab audio" in browser permission dialog
Watch live transcription stream into the response field
Submit responses and receive AI-generated follow-up questions

Keyboard Shortcuts

Cmd/Ctrl + Enter - Submit turn
Cmd/Ctrl + Shift + K - Toggle suggestions dock
Cmd/Ctrl + R - Regenerate suggestions
? - Show help dialog
Escape - Cancel/close dialogs

Question Generation

The AI analyzes interview responses and generates categorized follow-up questions:

Probe: Explore deeper into topics
Deep: Get detailed information
Quant: Gather quantitative data
Clarify: Resolve ambiguities

📡 API Documentation

Core Endpoints

Interview Management

// Create interview
POST /api/interviews
Body: { title: string, projectId: string, intervieweeName?: string }

// Update interview
PATCH /api/interviews/[id]
Body: { title?: string, status?: string }

// Get interview
GET /api/interviews/[id]

AI Question Generation

// Generate suggestions
POST /api/ask_engine/v2/suggestions
Body: { interview_id: string, transcript: string }
Response: { items: Suggestion[] }

// Save transcript
POST /api/ask_engine/v2/transcript
Body: { interview_id: string, transcript: string }

Transcription

// WebSocket transcription
WS /ws
Messages: { audio: ArrayBuffer, config: TranscriptionConfig }

// File upload transcription
POST /api/transcribe
Body: FormData with audio file

Response Types

interface Suggestion {
  question: string
  type: 'probe' | 'deep' | 'quant' | 'clarify'
  description: string
}

interface Interview {
  id: string
  title: string
  projectId: string
  status: 'draft' | 'active' | 'completed'
  intervieweeName?: string
  createdAt: string
  updatedAt: string
}

🗄️ Database Schema

Core Tables

-- Projects table
CREATE TABLE projects (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  title TEXT NOT NULL,
  description TEXT,
  owner UUID REFERENCES auth.users(id),
  created_at TIMESTAMP DEFAULT NOW(),
  updated_at TIMESTAMP DEFAULT NOW()
);

-- Interviews table
CREATE TABLE interviews (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  title TEXT NOT NULL,
  project_id UUID REFERENCES projects(id),
  interviewee_name TEXT,
  status TEXT DEFAULT 'draft',
  transcript TEXT,
  created_at TIMESTAMP DEFAULT NOW(),
  updated_at TIMESTAMP DEFAULT NOW()
);

-- Turns table (interview responses)
CREATE TABLE turns (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  interview_id UUID REFERENCES interviews(id),
  text TEXT NOT NULL,
  focus_type TEXT,
  created_at TIMESTAMP DEFAULT NOW()
);

See /supabase/migrations/ for complete schema definitions.

🧪 Testing

Run Tests

cd mooncorpapp
pnpm test

Test Coverage

pnpm test:coverage

Manual Testing Guide

Live Transcription Test Cases

Happy Path: Start transcription → select tab → verify live streaming
Permission Denied: Deny screen sharing → check error handling
No Audio: Select tab without audio → verify error message
Network Issues: Disconnect backend → test reconnection

Question Generation

Submit various response types and verify appropriate suggestions
Test different focus types (probe, deep, quant)
Verify suggestion regeneration and feedback systems

📄 License

MIT License

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

🤝 Contributing

We welcome contributions! Please see our Contributing Guide for details.

📞 Support

📧 Email: support@mooncorp.com
📖 Documentation: docs.mooncorp.com
🐛 Issues: GitHub Issues
💬 Discord: MoonCorp Community

🗺️ Roadmap

Multi-language transcription support
Advanced analytics and insights
Integration with popular meeting platforms
Mobile application
Advanced AI interview coaching
Team collaboration features

Built with ❤️ by the MoonCorp team

3. Usage

Navigate to http://localhost:3000/transcription
Click "Start Transcription"
Select a browser tab with audio and enable "Share tab audio"
Watch live transcription appear in real-time
Click "Stop Transcription" when done

Important: The Screen Capture API requires HTTPS in production. For local development, modern browsers allow it on localhost.

Manual Testing Guide

Test Cases

Happy Path:
- Start transcription → browser prompt appears
- Select tab with audio + "Share tab audio"
- See interim text (gray/italic) streaming
- See finalized text (normal) with timestamps
- Stop transcription → everything stops cleanly
Permission Denied:
- Click start → deny screen sharing
- Should show clear error message
No Audio Tab:
- Select tab without audio or don't check "Share tab audio"
- Should show "No audio track found" error
Network Issues:
- Start transcription, then disconnect backend
- Should show "Reconnecting..." status
- Restart backend → should auto-reconnect
Settings:
- Change model from Nova-2 to Nova-3
- Adjust endpointing timing
- Change language settings

Browser Requirements

Chrome/Edge: Full support
Firefox: Supported but may have audio quality differences
Safari: Limited support for tab audio capture

Project Structure

mooncorpapp/
├── app/transcription/page.tsx          # Main transcription page
├── components/transcriber-card.tsx     # Main UI component
├── hooks/use-tab-transcription.ts      # Custom hook with WebSocket logic
├── lib/types/transcription.ts          # TypeScript type definitions
└── __tests__/transcription-reducer.test.ts  # Unit tests

speech-to-text/
├── main.py                             # FastAPI server
├── requirements.txt                    # Python dependencies
├── .env                               # Environment variables
└── README.md                          # Backend documentation

Key Components

Frontend

TranscriberCard: Main UI component with controls and transcript display
useTabTranscription: Custom hook managing WebSocket, MediaRecorder, and state
Types: Strong TypeScript interfaces for all data structures
Reducer: Predictable state management for transcription flow

Backend

WebSocket Proxy: Forwards audio chunks to Deepgram, returns transcript events
Error Handling: Structured error responses with request IDs for debugging
Token Endpoint: Optional ephemeral token generation for future direct connections

Configuration

Deepgram Settings

Default configuration (customizable in UI):

{
  model: "nova-2",        // or "nova-3" for higher accuracy
  language: "en-US",
  smart_format: true,     // Automatic punctuation/formatting
  punctuate: true,
  paragraphs: true,
  endpointing: 300        // 300ms, 500ms, or 1000ms
}

Audio Format

Container: WebM with Opus codec
Chunks: 250ms intervals
No encoding params: Deepgram reads format from container

Troubleshooting

Common Issues

"Screen capture permission denied"
- Enable screen sharing in browser settings
- Try refreshing the page
"No audio track found"
- Ensure you check "Share tab audio" in the browser prompt
- Make sure the selected tab is actually playing audio
Transcription not appearing
- Check browser console for WebSocket errors
- Verify backend is running on port 8000
- Check Deepgram API key is valid
Poor transcription quality
- Try switching to Nova-3 model
- Ensure clear audio in the source tab
- Check network connection stability

Development

# Backend logs
cd speech-to-text
python main.py  # Check console for WebSocket events

# Frontend logs
# Open browser dev tools → Console tab
# Look for WebSocket connection messages

# Test reducer logic
# In browser console:
runTranscriptionReducerTests()

Security Notes

✅ Deepgram API key kept server-side only
✅ CORS configured for localhost development
✅ No audio/video data stored or logged
✅ Ephemeral connections only

Future Enhancements

SSL/HTTPS setup for production
Audio quality controls
Transcript export (JSON, text, SRT)
Speaker identification
Custom vocabulary/keywords
Audio preprocessing (noise reduction)

API Reference

WebSocket Messages

Client → Server:

// Start
{
  "type": "start",
  "config": { "model": "nova-2", "language": "en-US" }
}

// Audio (binary WebM/Opus chunks)

// Finish
{ "type": "finish" }

Server → Client:

// Transcript
{
  "type": "transcript",
  "data": {
    "channel": { "alternatives": [{ "transcript": "Hello world", "confidence": 0.95 }] },
    "is_final": true
  }
}

// Status
{ "type": "status", "message": "Connected", "status": "connected" }

// Error
{ "type": "error", "error_type": "Connection failed", "details": "..." }

Name		Name	Last commit message	Last commit date
Latest commit History 88 Commits
mooncorpapp		mooncorpapp
speech-to-text		speech-to-text
.gitignore		.gitignore
BACKEND_FIXES_APPLIED.md		BACKEND_FIXES_APPLIED.md
LIVE_TRANSCRIPTION_UPDATE.md		LIVE_TRANSCRIPTION_UPDATE.md
MEDIARECORDER_FIX.md		MEDIARECORDER_FIX.md
README.md		README.md
TRANSCRIPTION_INTEGRATION.md		TRANSCRIPTION_INTEGRATION.md
TRANSCRIPTION_SETUP.md		TRANSCRIPTION_SETUP.md

wesng/hack2025

Folders and files

Latest commit

History

Repository files navigation

Share-Tab Live Transcription

Architecture

Features

MoonCorp Interview Platform

🎯 Overview

📸 Demo / Screenshots

✨ Features

🎙️ Real-Time Transcription

🤖 AI-Powered Interview Assistant

📊 Professional Interview Management

🎨 Modern User Experience

🛠️ Tech Stack

Frontend

Backend & APIs

Transcription Services

AI & ML

Development & Deployment

🚀 Installation Instructions

Prerequisites

1. Clone the Repository

2. Frontend Setup

3. Backend Setup (Transcription Service)

4. Environment Variables

5. Database Setup

6. Start Development Servers

📖 Usage

Getting Started

Live Transcription Workflow

Keyboard Shortcuts

Question Generation

📡 API Documentation

Core Endpoints

Interview Management

AI Question Generation

Transcription

Response Types

🗄️ Database Schema

Core Tables

🧪 Testing

Run Tests

Test Coverage

Manual Testing Guide

Live Transcription Test Cases

Question Generation

📄 License

🤝 Contributing

📞 Support

🗺️ Roadmap

3. Usage

Manual Testing Guide

Test Cases

Browser Requirements

Project Structure

Key Components

Frontend

Backend

Configuration

Deepgram Settings

Audio Format

Troubleshooting

Common Issues

Development

Security Notes

Future Enhancements

API Reference

WebSocket Messages

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Packages