Skip to content

wittyreference/twilio-synthetic-call-data-generator

Repository files navigation

Twilio Synthetic Call Data Generator

License Tests Node Twilio

Generate realistic AI-powered customer service phone conversations for testing Voice Intelligence, analytics pipelines, and contact center workflows.

πŸ“Œ This is a GitHub Template - Click the "Use this template" button above to create your own repository with this code!


A production-grade system for generating realistic synthetic call data using Twilio Programmable Voice and Segment CDP. Features random customer-agent pairing for realistic scenarios (including challenging interactions), AI-powered conversations with OpenAI, Voice Intelligence transcription, and ML-based customer profiling with churn risk and propensity scores.

Architecture: Built with production-grade patterns including comprehensive test coverage for core TwiML functions, retry logic with exponential backoff, circuit breakers, and webhook signature validation.

🎯 What It Does

Generates realistic synthetic call data for testing, development, and analytics:

  1. Random Pairing - Creates realistic scenarios including challenging interactions (frustrated customers with inexperienced agents)
  2. AI Conversations - OpenAI-powered realistic agent-customer conversations with Voice Intelligence transcription
  3. Customer Profiling - Creates and updates Segment CDP profiles with ML scores
  4. ML Analytics - Calculates churn risk, propensity to buy, and satisfaction scores
  5. Complete Pipeline - End-to-end automation from pairing to analytics

πŸš€ Features

βœ… One-Command Deployment - Pre-deployment checks + deploy + post-deployment validation βœ… Production Testing - Smoke tests against real Twilio and Segment APIs βœ… Comprehensive Test Coverage - 634 tests across unit, integration, and E2E βœ… Realistic Pairing - Random customer-agent matching creates diverse, realistic scenarios βœ… Segment CDP Integration - Automatic profile creation and ML score updates βœ… Twilio Serverless - Conference webhooks and AI conversation orchestration

πŸ›  Tech Stack

  • Backend: Node.js 18+, Twilio Serverless Functions
  • AI: OpenAI gpt-4o-mini, Twilio Voice Intelligence
  • Data: Segment CDP, Twilio Sync
  • Testing: Jest (634 tests), Newman (Postman)
  • CI/CD: GitHub Actions
  • Code Quality: ESLint, Prettier

πŸ’° Cost Estimation

Per 100 synthetic calls (assuming 2-minute average conversation, 5-minute maximum):

Service Usage Cost
Twilio Voice 200 minutes @ $0.013/min ~$2.60
Twilio Voice Intelligence 200 minutes @ $0.02/min ~$4.00
OpenAI gpt-4o-mini ~1M tokens @ $0.15/1M input, $0.60/1M output ~$0.30
Twilio Sync Included in usage Free tier
Segment CDP Up to 10K MTUs/month Free tier
Total ~$6.90 per 100 calls

Budget Planning:

  • MAX_DAILY_CALLS=1000 (default) = ~$67.50/day maximum
  • MAX_DAILY_CALLS=100 = ~$6.75/day for testing
  • Adjust MAX_DAILY_CALLS in .env to control spending

Cost Controls Built-in:

  • βœ… Auto-termination at 5 minutes - Prevents runaway conversation costs
  • βœ… Rate limiting - MAX_DAILY_CALLS prevents accidental overspending
  • βœ… Efficient model - gpt-4o-mini is optimized for cost and performance

Cost-Saving Tips:

  • Use shorter conversations for testing (conferences auto-terminate at 5 minutes)
  • Start with MAX_DAILY_CALLS=10 during development
  • Monitor OpenAI usage at platform.openai.com/usage
  • Use Twilio's free trial credits for initial testing

πŸ“‹ Table of Contents

πŸ“š Documentation Quick Links

🎯 Using This Template

This repository is a GitHub Template. Create your own synthetic call data generator in 3 steps:

1. Create Your Repository

  1. Click "Use this template" at the top of this page
  2. Name your repository (e.g., my-call-data-generator)
  3. Choose visibility (public or private)
  4. Click "Create repository from template"

2. Clone and Setup

# Clone YOUR new repository (not this template!)
git clone https://github.com/YOUR-USERNAME/YOUR-REPO-NAME.git
cd YOUR-REPO-NAME

# Create TwiML App
twilio api:core:applications:create --friendly-name "Synthetic Call Generator"
# Copy the SID for .env

# Install dependencies
npm install

# Configure environment
cp .env.example .env
# Edit .env with credentials (see Quick Start section for details)

# Update phone numbers in assets/customers.json with YOUR Twilio numbers

3. Deploy and Test

# Deploy to Twilio
npm run deploy

# Generate your first test call
curl -X POST "https://YOUR-DOMAIN.twil.io/create-conference" \
  -u "$TWILIO_ACCOUNT_SID:$TWILIO_AUTH_TOKEN" \
  -d "agentPhoneNumber=YOUR_PHONE_NUMBER"

That's it! Check Twilio Console β†’ Voice Intelligence β†’ Transcripts to see your synthetic conversation.

Customization

Update Personas: Edit assets/customers.json and assets/agents.json to match your business:

  • Customer pain points and technical proficiency
  • Agent characteristics and competence levels
  • Introduction scripts and conversation patterns

Adjust Call Behavior:

  • Duration: Modify timeLimit in functions/utils/add-participant.js (default: 5 minutes)
  • AI Model: Change OpenAI model in functions/respond.js (default: gpt-4o-mini)
  • Speech Recognition: Adjust speechModel in functions/transcribe.js (default: experimental_conversations)

Add Integrations:


⚑ Quick Start (5 Minutes)

1. Create TwiML App

Before installing, create a TwiML Application using the Twilio CLI:

# Create TwiML App (requires Twilio CLI)
twilio api:core:applications:create --friendly-name "Synthetic Call Generator"
# Copy the SID (starts with AP...) for the next step

2. Install & Configure

# Install dependencies
npm install

# Create .env file
cp .env.example .env

Edit .env with your credentials:

# Get from https://console.twilio.com
TWILIO_ACCOUNT_SID=ACxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
TWILIO_AUTH_TOKEN=your_auth_token_here
OPENAI_API_KEY=sk-...

# From step 1 above
TWIML_APP_SID=APxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

# Required: Skip webhook validation for TwiML App calls
SKIP_WEBHOOK_VALIDATION=true

# Your Twilio phone numbers (find at console.twilio.com β†’ Phone Numbers)
AGENT_PHONE_NUMBER=+1234567890
CUSTOMER_PHONE_NUMBER=+1234567890

# Sync Service SID (see docs/sync-setup-guide.md)
SYNC_SERVICE_SID=ISxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

# Optional: Segment CDP integration
SEGMENT_WRITE_KEY=your_segment_write_key_here

3. Update Phone Numbers

The persona files contain example phone numbers. Replace them with YOUR Twilio numbers:

# Update persona phone numbers with YOUR Twilio numbers from the prerequisites
# Edit assets/customers.json - replace the PhoneNumber fields
# Agents automatically use AGENT_PHONE_NUMBER from .env

4. Validate Setup

# Run pre-deployment checks (validates env, tests, APIs)
npm run pre-deploy

Expected: βœ“ ALL CHECKS PASSED (7/7) (some tests may skip if optional services not configured)

5. Deploy to Twilio

# Deploy with automatic validation
npm run deploy

This runs:

  1. Pre-deployment checks
  2. Twilio serverless deployment
  3. Post-deployment validation

6. Generate Synthetic Calls

# Create your first synthetic conference
node src/main.js

What happens:

  • Pairs a customer with an agent (random for realistic scenarios)
  • Creates Segment CDP profiles
  • Generates Twilio conference with AI conversation
  • Updates profiles with ML scores (churn risk, propensity, satisfaction)

Pairing Strategies (configurable):

  • random (default) - Random pairing for diverse scenarios (frustrated customer + inexperienced agent = sparks fly! πŸ”₯)
  • frustrated - Match difficult customers with experienced agents
  • patient - Patient customers with any agent

πŸ“– For detailed instructions, see docs/quick-start.md

Note: If calls fail with 'busy' status, ensure you've updated assets/customers.json with YOUR Twilio phone numbers (step 3).


πŸ›  Development Tools

Deployment Automation

# Pre-deployment validation (env, tests, credentials, data files)
npm run pre-deploy

# Safe deployment with all checks
npm run deploy:safe

# Post-deployment validation
npm run post-deploy

# Smoke test (validates real APIs without deploying)
npm run smoke-test

Testing

# Run all tests (634 tests, 26 suites)
npm test

# Watch mode
npm run test:watch

# Coverage report
npm run test:coverage

# E2E tests only
npm run test:e2e

Development

# Start local Twilio serverless development server
npm run dev

# Validate customer and agent data
node scripts/validate-customers.js
node scripts/validate-agents.js

Get your tokens:

4. Start Development

# Start Twilio Functions locally
npm run dev

# In another terminal, run tests in watch mode
npm run test:watch

# Create GitHub issues from your todos
npm run create-issue from-todos

πŸ“– Detailed Setup

If you prefer manual setup or encounter issues:

Prerequisites

Required:

Optional:

  • Segment Write Key for CDP integration (setup guide)
  • Voice Intelligence SID for advanced transcription (Twilio Console)
  • Python 3.8+ with uv package manager (for Python development)

Manual Installation Steps

  1. Install Node.js dependencies:

    npm install
  2. Install Python dependencies (if using Python):

    uv sync --group test --group dev
  3. Install global tools:

    npm install -g twilio-cli newman
    twilio plugins:install @twilio-labs/plugin-serverless
  4. Authenticate with Twilio:

    twilio login
  5. Set up environment variables:

    cp .env.example .env
    # Edit .env with your credentials

πŸ”„ Development Workflow

Core Commands

# Development
npm run dev                # Start local Twilio Functions server
npm run build              # Run linting, tests, and formatting checks

# Testing
npm test                   # Run all Jest tests
npm run test:watch         # Run tests in watch mode
npm run test:coverage      # Generate coverage report
npm run test:api           # Run Newman API tests
uv run pytest              # Run Python tests (if applicable)

# Code Quality
npm run lint               # Check code quality
npm run lint:fix           # Fix linting issues automatically
npm run format             # Format code with Prettier
npm run format:check       # Check if code is formatted

# Deployment
npm run twilio:deploy      # Deploy to Twilio production
npm run twilio:deploy:dev  # Deploy to development environment

πŸ§ͺ Testing Strategy

Test-Driven Development (TDD)

We practice strict TDD with comprehensive coverage:

  1. Write failing test (Red)
  2. Write minimal code to pass (Green)
  3. Refactor while keeping tests green

Test Types & Coverage

  • Unit Tests: Individual function testing (Jest)
  • Integration Tests: Component interactions and regression prevention
  • E2E Tests: Full pipeline validation with real Twilio APIs
  • API Tests: End-to-end validation (Newman)
  • Coverage Target: >80% for all test types

Regression Prevention Tests

Critical regression tests protect against production issues:

OpenAI API Compatibility (tests/integration/openai-api-parameters.test.js)

Fast static code analysis (~200ms) that validates:

  • βœ… Using max_completion_tokens (not deprecated max_tokens)
  • βœ… No unsupported temperature parameter for gpt-5-nano
  • βœ… Prevents 400 BadRequest errors from OpenAI
npm test tests/integration/openai-api-parameters.test.js

Transcript Content Validation (tests/e2e/transcript-content-validation.test.js)

Full E2E test (~6-7 minutes) that validates:

  • βœ… Transcripts contain real AI conversations (not error messages)
  • βœ… Multi-speaker dialogue (agent + customer)
  • βœ… Contextual customer responses (not generic errors)
  • βœ… Agent introductions are captured correctly
npm test tests/e2e/transcript-content-validation.test.js

Running Tests

# All tests
npm test

# Watch mode for development
npm run test:watch

# Coverage report
npm run test:coverage

# Fast regression tests only (recommended for CI)
npm test tests/integration/

# API tests (Newman/Postman)
npm run test:api
newman run postman/collection.json -e postman/environment.json

πŸš€ CI/CD Pipeline

GitHub Actions Workflow

The CI/CD pipeline (.github/workflows/test.yml) automatically:

  1. Test Node.js - Runs Jest tests with coverage
  2. Code Quality - ESLint and Prettier validation
  3. API Testing - Validates endpoints with Newman

Required GitHub Secrets

Set these in your repository settings β†’ Secrets:

TWILIO_ACCOUNT_SID    # Your Twilio Account SID
TWILIO_AUTH_TOKEN     # Your Twilio Auth Token

The GITHUB_TOKEN is automatically provided by GitHub Actions.

Deployment Environments

  • Development: npm run twilio:deploy:dev
  • Production: npm run twilio:deploy:prod

πŸ“ Project Structure

twilio-synthetic-call-data-generator/
β”œβ”€β”€ .github/
β”‚   β”œβ”€β”€ workflows/test.yml      # CI/CD pipeline
β”‚   β”œβ”€β”€ ISSUE_TEMPLATE/        # Bug/feature templates
β”‚   └── PULL_REQUEST_TEMPLATE.md
β”œβ”€β”€ functions/                 # Twilio Serverless Functions
β”‚   β”œβ”€β”€ voice-handler.js      # Conference participant routing
β”‚   β”œβ”€β”€ transcribe.js         # Speech-to-text capture
β”‚   β”œβ”€β”€ respond.js           # OpenAI response generation
β”‚   β”œβ”€β”€ conference-status-webhook.js
β”‚   β”œβ”€β”€ transcription-webhook.js
β”‚   └── utils/               # Shared utilities
β”œβ”€β”€ src/                     # Core application
β”‚   β”œβ”€β”€ main.js             # Entry point
β”‚   β”œβ”€β”€ personas/           # Customer/agent loaders
β”‚   β”œβ”€β”€ pairing/            # Pairing strategies
β”‚   β”œβ”€β”€ orchestration/      # Conference creation
β”‚   └── segment/            # CDP integration
β”œβ”€β”€ scripts/                # Deployment & validation
β”‚   β”œβ”€β”€ pre-deployment-check.js
β”‚   β”œβ”€β”€ post-deployment-validation.js
β”‚   └── smoke-test.js
β”œβ”€β”€ tests/                  # 634 tests (unit/integration/e2e)
β”œβ”€β”€ docs/                   # Documentation
β”œβ”€β”€ postman/               # API test collections
β”œβ”€β”€ customers.json         # Customer personas
β”œβ”€β”€ package.json          # Dependencies & scripts
└── README.md            # This file

🎯 Example Use Cases

Generate Test Data for Analytics Pipeline

# Generate 100 synthetic calls with random pairing
node scripts/generate-bulk-calls.js --count 100 --cps 1

# Results: Recordings, transcripts, Voice Intelligence insights
# β†’ Feeds into Segment CDP β†’ Data warehouse β†’ BI tools

Train ML Models on Customer Service Data

# Generate diverse scenarios (frustrated + inexperienced agent, etc.)
npm run start  # Creates random pairings

# Extract Voice Intelligence operator results
# β†’ Sentiment analysis, PII detection, call classification
# β†’ Use for supervised ML training data

Test Voice Application Changes

# Deploy new TwiML function
npm run deploy

# Validate with E2E tests
npm run smoke-test

# Generate synthetic calls to test behavior
node src/main.js

🀝 Contributing

  1. Fork the repository
  2. Create feature branch: git checkout -b feature/amazing-feature
  3. Follow TDD: Write tests first, then implementation
  4. Run checks: npm run build (linting + tests + formatting)
  5. Commit changes: Use conventional commits
  6. Push and create Pull Request

Development Standards

  • Tests Required: Comprehensive test coverage for all code
  • TDD Approach: Red β†’ Green β†’ Refactor
  • Code Quality: Must pass ESLint + Prettier
  • Documentation: Update relevant docs
  • No Secrets: Never commit credentials

πŸ“š Additional Resources

πŸ“„ License

MIT License - see LICENSE file for details.


Ready to build? Start with git clone and npm run setup - you'll be ready to party! πŸš€

About

Generate realistic AI-powered phone conversations for testing Voice Intelligence and analytics pipelines

Topics

Resources

License

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •