Skip to content

wesng/hack2025

Repository files navigation

Share-Tab Live Transcription

A real-time interview transcription system that captures audio from browser tabs and provides live transcription using Deepgram's streaming API.

Architecture

  • Frontend: Next.js 15 + React + TypeScript + Tailwind CSS
  • Backend: FastAPI + Python with WebSocket proxy to Deepgram
  • Audio Capture: Browser Screen Capture API with tab audio
  • Transcription: Deepgram Nova-2/Nova-3 streaming API

Features

  • βœ… Real-time tab audio capture using getDisplayMedia()
  • βœ… Live transcription streaming via WebSocket
  • βœ… Interim and final transcript handling
  • βœ… Connection resilience with auto-reconnect
  • βœ… Multiple Deepgram model options (Nova-2/Nova-3)
  • βœ… Configurable transcription settings
  • βœ… Error handling for permissions and network issues
  • βœ… TypeScript types and reducer pattern
  • βœ… Responsive UI with status indicators

MoonCorp Interview Platform

A comprehensive AI-powered interview platform that streamlines the research interview process with real-time transcription, intelligent question suggestions, and seamless project management.

🎯 Overview

MoonCorp Interview Platform is a professional-grade tool designed for researchers, UX professionals, and product teams to conduct, transcribe, and analyze interviews efficiently. The platform combines real-time speech-to-text capabilities with AI-generated follow-up questions to enhance interview quality and reduce manual effort.

πŸ“Έ Demo / Screenshots

Interview Page Main interview interface with live transcription and AI suggestions

Live Capture Real-time screen capture and transcription in action

Project Dashboard Project management and interview organization

✨ Features

πŸŽ™οΈ Real-Time Transcription

  • Live Audio Capture: Browser-based screen sharing with tab audio capture
  • Streaming Transcription: Real-time speech-to-text using Deepgram Nova-2/Nova-3 models
  • Multiple Input Methods: Support for live capture, file upload, and manual entry
  • High Accuracy: Advanced punctuation, formatting, and speaker detection

πŸ€– AI-Powered Interview Assistant

  • Smart Follow-ups: Contextual question suggestions based on interviewee responses
  • Question Categorization: Probe, deep-dive, quantitative, and clarification question types
  • Real-time Generation: Instant AI suggestions as conversations progress
  • Customizable Focus: Targeted questioning strategies (probe, deep, quant)

πŸ“Š Professional Interview Management

  • Project Organization: Group interviews by research projects
  • Interview Tracking: Status management and progress monitoring
  • Collaboration: Multi-user support with role-based permissions
  • Export Options: Comprehensive data export and reporting

🎨 Modern User Experience

  • Responsive Design: Optimized for desktop and mobile devices
  • Keyboard Shortcuts: Efficient navigation and control
  • Live Video Preview: Embedded screen capture preview
  • Professional UI: Clean, distraction-free interface

πŸ› οΈ Tech Stack

Frontend

  • Framework: Next.js 15 with React 19
  • Language: TypeScript
  • Styling: Tailwind CSS with shadcn/ui components
  • Icons: Lucide React
  • Forms: React Hook Form with Zod validation
  • State Management: React hooks and context

Backend & APIs

  • API Routes: Next.js API routes
  • Authentication: Supabase Auth
  • Database: Supabase PostgreSQL
  • File Storage: Supabase Storage
  • Real-time: WebSocket connections

Transcription Services

  • Speech-to-Text: Deepgram Nova-2/Nova-3 streaming API
  • Backend: FastAPI + Python WebSocket proxy
  • Audio Processing: Browser MediaRecorder API
  • Screen Capture: getDisplayMedia API

AI & ML

  • Language Model: OpenAI GPT for question generation
  • Prompt Engineering: Sophisticated interview context analysis
  • Real-time Processing: Streaming AI responses

Development & Deployment

  • Package Manager: pnpm
  • Linting: ESLint with TypeScript rules
  • Testing: Jest and React Testing Library
  • Deployment: Vercel (frontend), Railway/Docker (backend)

πŸš€ Installation Instructions

Prerequisites

  • Node.js 18+ and pnpm
  • Python 3.8+ and pip
  • Supabase account and project
  • Deepgram API key
  • OpenAI API key

1. Clone the Repository

git clone https://github.com/wesng/hack2025.git
cd hack2025

2. Frontend Setup

cd mooncorpapp
pnpm install

3. Backend Setup (Transcription Service)

cd speech-to-text
pip install -r requirements.txt

4. Environment Variables

Create .env.local in the mooncorpapp directory:

# Supabase Configuration
NEXT_PUBLIC_SUPABASE_URL=your_supabase_url
NEXT_PUBLIC_SUPABASE_ANON_KEY=your_supabase_anon_key
SUPABASE_SERVICE_ROLE_KEY=your_service_role_key

# OpenAI Configuration
OPENAI_API_KEY=your_openai_api_key

# Transcription Service
NEXT_PUBLIC_TRANSCRIPTION_WS_URL=ws://localhost:8000/ws

Create .env in the speech-to-text directory:

# Deepgram Configuration
DEEPGRAM_API_KEY=your_deepgram_api_key

# Server Configuration
HOST=0.0.0.0
PORT=8000

5. Database Setup

  1. Create a new Supabase project
  2. Run the database migrations (SQL files in /supabase directory)
  3. Configure Row Level Security (RLS) policies

6. Start Development Servers

Terminal 1 - Frontend:

cd mooncorpapp
pnpm dev

Terminal 2 - Transcription Backend:

cd speech-to-text
python main.py

The application will be available at:

  • Frontend: https://localhost:3000
  • Transcription API: http://localhost:8000

πŸ“– Usage

Getting Started

  1. Create Account: Sign up using email/password or OAuth
  2. Create Project: Organize your interviews by research project
  3. Start Interview: Create a new interview session
  4. Begin Transcription: Use live capture for real-time transcription

Live Transcription Workflow

  1. Click "Start Live Capture" on the interview page
  2. Select browser tab with interview audio (Zoom, Meet, etc.)
  3. Check "Share tab audio" in browser permission dialog
  4. Watch live transcription stream into the response field
  5. Submit responses and receive AI-generated follow-up questions

Keyboard Shortcuts

  • Cmd/Ctrl + Enter - Submit turn
  • Cmd/Ctrl + Shift + K - Toggle suggestions dock
  • Cmd/Ctrl + R - Regenerate suggestions
  • ? - Show help dialog
  • Escape - Cancel/close dialogs

Question Generation

The AI analyzes interview responses and generates categorized follow-up questions:

  • Probe: Explore deeper into topics
  • Deep: Get detailed information
  • Quant: Gather quantitative data
  • Clarify: Resolve ambiguities

πŸ“‘ API Documentation

Core Endpoints

Interview Management

// Create interview
POST /api/interviews
Body: { title: string, projectId: string, intervieweeName?: string }

// Update interview
PATCH /api/interviews/[id]
Body: { title?: string, status?: string }

// Get interview
GET /api/interviews/[id]

AI Question Generation

// Generate suggestions
POST /api/ask_engine/v2/suggestions
Body: { interview_id: string, transcript: string }
Response: { items: Suggestion[] }

// Save transcript
POST /api/ask_engine/v2/transcript
Body: { interview_id: string, transcript: string }

Transcription

// WebSocket transcription
WS /ws
Messages: { audio: ArrayBuffer, config: TranscriptionConfig }

// File upload transcription
POST /api/transcribe
Body: FormData with audio file

Response Types

interface Suggestion {
  question: string
  type: 'probe' | 'deep' | 'quant' | 'clarify'
  description: string
}

interface Interview {
  id: string
  title: string
  projectId: string
  status: 'draft' | 'active' | 'completed'
  intervieweeName?: string
  createdAt: string
  updatedAt: string
}

πŸ—„οΈ Database Schema

Core Tables

-- Projects table
CREATE TABLE projects (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  title TEXT NOT NULL,
  description TEXT,
  owner UUID REFERENCES auth.users(id),
  created_at TIMESTAMP DEFAULT NOW(),
  updated_at TIMESTAMP DEFAULT NOW()
);

-- Interviews table
CREATE TABLE interviews (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  title TEXT NOT NULL,
  project_id UUID REFERENCES projects(id),
  interviewee_name TEXT,
  status TEXT DEFAULT 'draft',
  transcript TEXT,
  created_at TIMESTAMP DEFAULT NOW(),
  updated_at TIMESTAMP DEFAULT NOW()
);

-- Turns table (interview responses)
CREATE TABLE turns (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  interview_id UUID REFERENCES interviews(id),
  text TEXT NOT NULL,
  focus_type TEXT,
  created_at TIMESTAMP DEFAULT NOW()
);

See /supabase/migrations/ for complete schema definitions.

πŸ§ͺ Testing

Run Tests

cd mooncorpapp
pnpm test

Test Coverage

pnpm test:coverage

Manual Testing Guide

Live Transcription Test Cases

  1. Happy Path: Start transcription β†’ select tab β†’ verify live streaming
  2. Permission Denied: Deny screen sharing β†’ check error handling
  3. No Audio: Select tab without audio β†’ verify error message
  4. Network Issues: Disconnect backend β†’ test reconnection

Question Generation

  1. Submit various response types and verify appropriate suggestions
  2. Test different focus types (probe, deep, quant)
  3. Verify suggestion regeneration and feedback systems

πŸ“„ License

MIT License

Copyright (c) 2025 MoonCorp

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.


🀝 Contributing

We welcome contributions! Please see our Contributing Guide for details.

πŸ“ž Support

πŸ—ΊοΈ Roadmap

  • Multi-language transcription support
  • Advanced analytics and insights
  • Integration with popular meeting platforms
  • Mobile application
  • Advanced AI interview coaching
  • Team collaboration features

Built with ❀️ by the MoonCorp team

3. Usage

  1. Navigate to http://localhost:3000/transcription
  2. Click "Start Transcription"
  3. Select a browser tab with audio and enable "Share tab audio"
  4. Watch live transcription appear in real-time
  5. Click "Stop Transcription" when done

Important: The Screen Capture API requires HTTPS in production. For local development, modern browsers allow it on localhost.

Manual Testing Guide

Test Cases

  1. Happy Path:

    • Start transcription β†’ browser prompt appears
    • Select tab with audio + "Share tab audio"
    • See interim text (gray/italic) streaming
    • See finalized text (normal) with timestamps
    • Stop transcription β†’ everything stops cleanly
  2. Permission Denied:

    • Click start β†’ deny screen sharing
    • Should show clear error message
  3. No Audio Tab:

    • Select tab without audio or don't check "Share tab audio"
    • Should show "No audio track found" error
  4. Network Issues:

    • Start transcription, then disconnect backend
    • Should show "Reconnecting..." status
    • Restart backend β†’ should auto-reconnect
  5. Settings:

    • Change model from Nova-2 to Nova-3
    • Adjust endpointing timing
    • Change language settings

Browser Requirements

  • Chrome/Edge: Full support
  • Firefox: Supported but may have audio quality differences
  • Safari: Limited support for tab audio capture

Project Structure

mooncorpapp/
β”œβ”€β”€ app/transcription/page.tsx          # Main transcription page
β”œβ”€β”€ components/transcriber-card.tsx     # Main UI component
β”œβ”€β”€ hooks/use-tab-transcription.ts      # Custom hook with WebSocket logic
β”œβ”€β”€ lib/types/transcription.ts          # TypeScript type definitions
└── __tests__/transcription-reducer.test.ts  # Unit tests

speech-to-text/
β”œβ”€β”€ main.py                             # FastAPI server
β”œβ”€β”€ requirements.txt                    # Python dependencies
β”œβ”€β”€ .env                               # Environment variables
└── README.md                          # Backend documentation

Key Components

Frontend

  • TranscriberCard: Main UI component with controls and transcript display
  • useTabTranscription: Custom hook managing WebSocket, MediaRecorder, and state
  • Types: Strong TypeScript interfaces for all data structures
  • Reducer: Predictable state management for transcription flow

Backend

  • WebSocket Proxy: Forwards audio chunks to Deepgram, returns transcript events
  • Error Handling: Structured error responses with request IDs for debugging
  • Token Endpoint: Optional ephemeral token generation for future direct connections

Configuration

Deepgram Settings

Default configuration (customizable in UI):

{
  model: "nova-2",        // or "nova-3" for higher accuracy
  language: "en-US",
  smart_format: true,     // Automatic punctuation/formatting
  punctuate: true,
  paragraphs: true,
  endpointing: 300        // 300ms, 500ms, or 1000ms
}

Audio Format

  • Container: WebM with Opus codec
  • Chunks: 250ms intervals
  • No encoding params: Deepgram reads format from container

Troubleshooting

Common Issues

  1. "Screen capture permission denied"

    • Enable screen sharing in browser settings
    • Try refreshing the page
  2. "No audio track found"

    • Ensure you check "Share tab audio" in the browser prompt
    • Make sure the selected tab is actually playing audio
  3. Transcription not appearing

    • Check browser console for WebSocket errors
    • Verify backend is running on port 8000
    • Check Deepgram API key is valid
  4. Poor transcription quality

    • Try switching to Nova-3 model
    • Ensure clear audio in the source tab
    • Check network connection stability

Development

# Backend logs
cd speech-to-text
python main.py  # Check console for WebSocket events

# Frontend logs
# Open browser dev tools β†’ Console tab
# Look for WebSocket connection messages

# Test reducer logic
# In browser console:
runTranscriptionReducerTests()

Security Notes

  • βœ… Deepgram API key kept server-side only
  • βœ… CORS configured for localhost development
  • βœ… No audio/video data stored or logged
  • βœ… Ephemeral connections only

Future Enhancements

  • SSL/HTTPS setup for production
  • Audio quality controls
  • Transcript export (JSON, text, SRT)
  • Speaker identification
  • Custom vocabulary/keywords
  • Audio preprocessing (noise reduction)

API Reference

WebSocket Messages

Client β†’ Server:

// Start
{
  "type": "start",
  "config": { "model": "nova-2", "language": "en-US" }
}

// Audio (binary WebM/Opus chunks)

// Finish
{ "type": "finish" }

Server β†’ Client:

// Transcript
{
  "type": "transcript",
  "data": {
    "channel": { "alternatives": [{ "transcript": "Hello world", "confidence": 0.95 }] },
    "is_final": true
  }
}

// Status
{ "type": "status", "message": "Connected", "status": "connected" }

// Error
{ "type": "error", "error_type": "Connection failed", "details": "..." }

About

2025 hackathon

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •