# 📊 Data Designer: Marketing Analytics Demo

[Click here](https://raw.githubusercontent.com/NVIDIA-NeMo/DataDesigner/refs/heads/main/docs/notebooks/4-marketing-analytics-demo.ipynb) to download this notebook to your computer.

#### 📚 What you'll learn

This notebook demonstrates how to use Data Designer for **marketing analytics** use cases. Perfect for marketing analytics classes, this demo shows how to generate realistic synthetic data for:

- **Customer demographics and segmentation**
- **Product catalogs with pricing**
- **Marketing campaign performance**
- **Customer journeys and conversions**
- **Social media engagement metrics**
- **Email campaign analytics**

#### 🎯 Learning Objectives

By the end of this notebook, you'll understand how to:

1. Generate synthetic customer profiles with realistic demographics
2. Create product catalogs with market-driven pricing
3. Simulate marketing campaign performance data
4. Build customer journey and conversion funnel data
5. Generate social media engagement metrics
6. Create email marketing campaign datasets

#### 💼 Real-World Applications

This synthetic data can be used for:

- Teaching marketing analytics concepts without privacy concerns
- Testing marketing dashboards and analytics tools
- Training machine learning models for customer segmentation
- Simulating A/B testing scenarios
- Practicing SQL and data analysis queries
- Building customer lifetime value (CLV) models

### 📦 Import the essentials

- The `essentials` module provides quick access to the most commonly used objects.

In [None]:
from data_designer.essentials import (
    CategorySamplerParams,
    DataDesigner,
    DataDesignerConfigBuilder,
    DatetimeSamplerParams,
    ExpressionColumnConfig,
    GaussianSamplerParams,
    InferenceParameters,
    LLMStructuredColumnConfig,
    LLMTextColumnConfig,
    ModelConfig,
    PersonFromFakerSamplerParams,
    SamplerColumnConfig,
    SamplerType,
    SubcategorySamplerParams,
    UniformSamplerParams,
)
import pandas as pd

### ⚙️ Initialize the Data Designer interface

- `DataDesigner` is the main object that manages the data generation process.
- When initialized without arguments, the [default model providers](https://nvidia-nemo.github.io/DataDesigner/models/default-model-settings/) are used.

In [None]:
data_designer = DataDesigner()

### 🎛️ Define model configurations

We'll use a fast, efficient model for generating marketing content and insights.

In [None]:
MODEL_PROVIDER = "nvidia"
MODEL_ID = "nvidia/nvidia-nemotron-nano-9b-v2"
MODEL_ALIAS = "marketing-text-model"
SYSTEM_PROMPT = "/no_think"

model_configs = [
    ModelConfig(
        alias=MODEL_ALIAS,
        provider=MODEL_PROVIDER,
        model=MODEL_ID,
        system_prompt=SYSTEM_PROMPT,
        inference_parameters=InferenceParameters(
            temperature=0.8,
            top_p=0.95,
            max_tokens=500,
        ),
    )
]

## 📊 Example 1: Customer Demographics and Segmentation

Generate a customer database with demographic information that can be used for market segmentation analysis.

In [None]:
# Create a configuration builder for customer demographics
customer_config = DataDesignerConfigBuilder(model_configs=model_configs)

# Customer ID
customer_config.add_column(
    SamplerColumnConfig(
        name="customer_id",
        sampler_type=SamplerType.UUID,
    )
)

# Customer personal information
customer_config.add_column(
    SamplerColumnConfig(
        name="customer",
        sampler_type=SamplerType.PERSON_FROM_FAKER,
        params=PersonFromFakerSamplerParams(
            include_age=True,
            include_gender=True,
            include_email=True,
        ),
    )
)

# Age group for segmentation
customer_config.add_column(
    ExpressionColumnConfig(
        name="age_group",
        expression="'18-25' if customer['age'] < 26 else ('26-35' if customer['age'] < 36 else ('36-50' if customer['age'] < 51 else '51+'))",
    )
)

# Location (US states)
customer_config.add_column(
    SamplerColumnConfig(
        name="state",
        sampler_type=SamplerType.CATEGORY,
        params=CategorySamplerParams(
            values=["CA", "NY", "TX", "FL", "IL", "PA", "OH", "GA", "NC", "MI"],
            weights=[0.15, 0.12, 0.11, 0.10, 0.08, 0.08, 0.07, 0.06, 0.06, 0.05],
        ),
    )
)

# Income bracket
customer_config.add_column(
    SamplerColumnConfig(
        name="income_bracket",
        sampler_type=SamplerType.CATEGORY,
        params=CategorySamplerParams(
            values=["<$30k", "$30k-$50k", "$50k-$75k", "$75k-$100k", "$100k+"],
            weights=[0.20, 0.25, 0.25, 0.15, 0.15],
        ),
    )
)

# Customer segment based on demographics
customer_config.add_column(
    LLMTextColumnConfig(
        name="customer_segment",
        model_alias=MODEL_ALIAS,
        prompt="""Based on these customer demographics, assign them to ONE marketing segment (just the segment name, no explanation):
- Age Group: {{ age_group }}
- Income: {{ income_bracket }}
- Gender: {{ customer.gender }}

Choose from: Budget Conscious, Premium Seeker, Young Professional, Family Oriented, or Luxury Enthusiast""",
    )
)

# Customer lifetime value (CLV) estimate
customer_config.add_column(
    SamplerColumnConfig(
        name="estimated_clv",
        sampler_type=SamplerType.UNIFORM,
        params=UniformSamplerParams(min_value=100, max_value=5000),
    )
)

# Registration date
customer_config.add_column(
    SamplerColumnConfig(
        name="registration_date",
        sampler_type=SamplerType.DATETIME,
        params=DatetimeSamplerParams(
            start="2022-01-01",
            end="2024-12-01",
            date_format="%Y-%m-%d",
        ),
    )
)

# Preview customer demographics
print("\n=== Customer Demographics Dataset ===")
customer_preview = data_designer.preview(config_builder=customer_config, num_samples=3)
customer_preview.display_sample_record()

## 🛍️ Example 2: Product Catalog with Pricing

Generate a realistic product catalog with categories, pricing, and descriptions.

In [None]:
product_config = DataDesignerConfigBuilder(model_configs=model_configs)

# Product ID
product_config.add_column(
    SamplerColumnConfig(
        name="product_id",
        sampler_type=SamplerType.UUID,
    )
)

# Product category
product_config.add_column(
    SamplerColumnConfig(
        name="category",
        sampler_type=SamplerType.CATEGORY,
        params=CategorySamplerParams(
            values=["Electronics", "Clothing", "Home & Kitchen", "Sports & Outdoors", "Books", "Beauty"],
            weights=[0.25, 0.20, 0.20, 0.15, 0.10, 0.10],
        ),
    )
)

# Generate product name and description
product_config.add_column(
    LLMStructuredColumnConfig(
        name="product_info",
        model_alias=MODEL_ALIAS,
        prompt="""Generate a product for the {{ category }} category. Include:
- name: A catchy product name (3-6 words)
- description: A brief marketing description (1-2 sentences)""",
        schema={
            "type": "object",
            "properties": {
                "name": {"type": "string"},
                "description": {"type": "string"},
            },
            "required": ["name", "description"],
        },
    )
)

# Base price (varies by category)
product_config.add_column(
    SamplerColumnConfig(
        name="base_price",
        sampler_type=SamplerType.SUBCATEGORY,
        params=SubcategorySamplerParams(
            category_column="category",
            subcategory_distributions={
                "Electronics": {"sampler_type": SamplerType.UNIFORM, "params": {"min_value": 50, "max_value": 1500}},
                "Clothing": {"sampler_type": SamplerType.UNIFORM, "params": {"min_value": 20, "max_value": 200}},
                "Home & Kitchen": {"sampler_type": SamplerType.UNIFORM, "params": {"min_value": 15, "max_value": 300}},
                "Sports & Outdoors": {"sampler_type": SamplerType.UNIFORM, "params": {"min_value": 25, "max_value": 400}},
                "Books": {"sampler_type": SamplerType.UNIFORM, "params": {"min_value": 10, "max_value": 50}},
                "Beauty": {"sampler_type": SamplerType.UNIFORM, "params": {"min_value": 15, "max_value": 150}},
            },
        ),
    )
)

# Current discount percentage
product_config.add_column(
    SamplerColumnConfig(
        name="discount_pct",
        sampler_type=SamplerType.CATEGORY,
        params=CategorySamplerParams(
            values=[0, 10, 15, 20, 25, 30],
            weights=[0.40, 0.25, 0.15, 0.10, 0.07, 0.03],
        ),
    )
)

# Calculate final price
product_config.add_column(
    ExpressionColumnConfig(
        name="final_price",
        expression="round(base_price * (1 - discount_pct / 100), 2)",
    )
)

# Stock status
product_config.add_column(
    SamplerColumnConfig(
        name="in_stock",
        sampler_type=SamplerType.CATEGORY,
        params=CategorySamplerParams(
            values=[True, False],
            weights=[0.85, 0.15],
        ),
    )
)

# Average rating
product_config.add_column(
    SamplerColumnConfig(
        name="avg_rating",
        sampler_type=SamplerType.GAUSSIAN,
        params=GaussianSamplerParams(
            mean=4.0,
            std_dev=0.6,
            min_value=1.0,
            max_value=5.0,
            round_to=1,
        ),
    )
)

# Preview product catalog
print("\n=== Product Catalog Dataset ===")
product_preview = data_designer.preview(config_builder=product_config, num_samples=3)
product_preview.display_sample_record()

## 📧 Example 3: Email Marketing Campaign Performance

Generate data for analyzing email marketing campaigns.

In [None]:
email_config = DataDesignerConfigBuilder(model_configs=model_configs)

# Campaign ID
email_config.add_column(
    SamplerColumnConfig(
        name="campaign_id",
        sampler_type=SamplerType.UUID,
    )
)

# Campaign type
email_config.add_column(
    SamplerColumnConfig(
        name="campaign_type",
        sampler_type=SamplerType.CATEGORY,
        params=CategorySamplerParams(
            values=["Newsletter", "Promotional", "Abandoned Cart", "Product Launch", "Welcome Series"],
            weights=[0.30, 0.25, 0.20, 0.15, 0.10],
        ),
    )
)

# Campaign subject line
email_config.add_column(
    LLMTextColumnConfig(
        name="subject_line",
        model_alias=MODEL_ALIAS,
        prompt="""Generate a compelling email subject line for a {{ campaign_type }} email campaign. Keep it under 60 characters and make it attention-grabbing.""",
    )
)

# Send date
email_config.add_column(
    SamplerColumnConfig(
        name="send_date",
        sampler_type=SamplerType.DATETIME,
        params=DatetimeSamplerParams(
            start="2024-01-01",
            end="2024-12-01",
            date_format="%Y-%m-%d",
        ),
    )
)

# Emails sent
email_config.add_column(
    SamplerColumnConfig(
        name="emails_sent",
        sampler_type=SamplerType.UNIFORM,
        params=UniformSamplerParams(min_value=1000, max_value=50000),
    )
)

# Open rate (varies by campaign type)
email_config.add_column(
    SamplerColumnConfig(
        name="open_rate",
        sampler_type=SamplerType.SUBCATEGORY,
        params=SubcategorySamplerParams(
            category_column="campaign_type",
            subcategory_distributions={
                "Newsletter": {"sampler_type": SamplerType.GAUSSIAN, "params": {"mean": 0.22, "std_dev": 0.05, "min_value": 0.10, "max_value": 0.40, "round_to": 3}},
                "Promotional": {"sampler_type": SamplerType.GAUSSIAN, "params": {"mean": 0.18, "std_dev": 0.04, "min_value": 0.08, "max_value": 0.35, "round_to": 3}},
                "Abandoned Cart": {"sampler_type": SamplerType.GAUSSIAN, "params": {"mean": 0.35, "std_dev": 0.06, "min_value": 0.20, "max_value": 0.50, "round_to": 3}},
                "Product Launch": {"sampler_type": SamplerType.GAUSSIAN, "params": {"mean": 0.25, "std_dev": 0.05, "min_value": 0.12, "max_value": 0.42, "round_to": 3}},
                "Welcome Series": {"sampler_type": SamplerType.GAUSSIAN, "params": {"mean": 0.45, "std_dev": 0.07, "min_value": 0.25, "max_value": 0.65, "round_to": 3}},
            },
        ),
    )
)

# Click-through rate (CTR)
email_config.add_column(
    ExpressionColumnConfig(
        name="click_rate",
        expression="round(open_rate * uniform(0.15, 0.35), 3)",
    )
)

# Conversion rate
email_config.add_column(
    ExpressionColumnConfig(
        name="conversion_rate",
        expression="round(click_rate * uniform(0.08, 0.25), 3)",
    )
)

# Calculate absolute numbers
email_config.add_column(
    ExpressionColumnConfig(
        name="emails_opened",
        expression="int(emails_sent * open_rate)",
    )
)

email_config.add_column(
    ExpressionColumnConfig(
        name="clicks",
        expression="int(emails_sent * click_rate)",
    )
)

email_config.add_column(
    ExpressionColumnConfig(
        name="conversions",
        expression="int(emails_sent * conversion_rate)",
    )
)

# Revenue generated (assuming average order value)
email_config.add_column(
    ExpressionColumnConfig(
        name="revenue",
        expression="round(conversions * uniform(50, 200), 2)",
    )
)

# Preview email campaign data
print("\n=== Email Marketing Campaign Dataset ===")
email_preview = data_designer.preview(config_builder=email_config, num_samples=3)
email_preview.display_sample_record()

## 📱 Example 4: Social Media Engagement Metrics

Generate social media post performance data for different platforms.

In [None]:
social_config = DataDesignerConfigBuilder(model_configs=model_configs)

# Post ID
social_config.add_column(
    SamplerColumnConfig(
        name="post_id",
        sampler_type=SamplerType.UUID,
    )
)

# Platform
social_config.add_column(
    SamplerColumnConfig(
        name="platform",
        sampler_type=SamplerType.CATEGORY,
        params=CategorySamplerParams(
            values=["Instagram", "Facebook", "Twitter", "LinkedIn", "TikTok"],
            weights=[0.30, 0.25, 0.20, 0.15, 0.10],
        ),
    )
)

# Content type
social_config.add_column(
    SamplerColumnConfig(
        name="content_type",
        sampler_type=SamplerType.CATEGORY,
        params=CategorySamplerParams(
            values=["Image", "Video", "Carousel", "Story", "Reel"],
            weights=[0.30, 0.25, 0.20, 0.15, 0.10],
        ),
    )
)

# Post caption
social_config.add_column(
    LLMTextColumnConfig(
        name="caption",
        model_alias=MODEL_ALIAS,
        prompt="""Generate a social media caption for {{ platform }} using {{ content_type }} format. Keep it engaging and under 100 characters. Include relevant hashtags.""",
    )
)

# Post date/time
social_config.add_column(
    SamplerColumnConfig(
        name="posted_at",
        sampler_type=SamplerType.DATETIME,
        params=DatetimeSamplerParams(
            start="2024-01-01 00:00:00",
            end="2024-12-01 23:59:59",
            date_format="%Y-%m-%d %H:%M:%S",
        ),
    )
)

# Impressions (varies by platform)
social_config.add_column(
    SamplerColumnConfig(
        name="impressions",
        sampler_type=SamplerType.SUBCATEGORY,
        params=SubcategorySamplerParams(
            category_column="platform",
            subcategory_distributions={
                "Instagram": {"sampler_type": SamplerType.UNIFORM, "params": {"min_value": 500, "max_value": 10000}},
                "Facebook": {"sampler_type": SamplerType.UNIFORM, "params": {"min_value": 300, "max_value": 8000}},
                "Twitter": {"sampler_type": SamplerType.UNIFORM, "params": {"min_value": 200, "max_value": 15000}},
                "LinkedIn": {"sampler_type": SamplerType.UNIFORM, "params": {"min_value": 100, "max_value": 5000}},
                "TikTok": {"sampler_type": SamplerType.UNIFORM, "params": {"min_value": 1000, "max_value": 50000}},
            },
        },
    )
)

# Engagement rate
social_config.add_column(
    SamplerColumnConfig(
        name="engagement_rate",
        sampler_type=SamplerType.GAUSSIAN,
        params=GaussianSamplerParams(
            mean=0.035,
            std_dev=0.015,
            min_value=0.005,
            max_value=0.15,
            round_to=4,
        ),
    )
)

# Calculate engagement metrics
social_config.add_column(
    ExpressionColumnConfig(
        name="likes",
        expression="int(impressions * engagement_rate * uniform(0.70, 0.85))",
    )
)

social_config.add_column(
    ExpressionColumnConfig(
        name="comments",
        expression="int(impressions * engagement_rate * uniform(0.05, 0.15))",
    )
)

social_config.add_column(
    ExpressionColumnConfig(
        name="shares",
        expression="int(impressions * engagement_rate * uniform(0.10, 0.20))",
    )
)

# Reach (unique users)
social_config.add_column(
    ExpressionColumnConfig(
        name="reach",
        expression="int(impressions * uniform(0.60, 0.85))",
    )
)

# Preview social media data
print("\n=== Social Media Engagement Dataset ===")
social_preview = data_designer.preview(config_builder=social_config, num_samples=3)
social_preview.display_sample_record()

## 🎯 Example 5: Customer Journey and Conversion Funnel

Generate data to analyze customer journeys from awareness to purchase.

In [None]:
journey_config = DataDesignerConfigBuilder(model_configs=model_configs)

# Session ID
journey_config.add_column(
    SamplerColumnConfig(
        name="session_id",
        sampler_type=SamplerType.UUID,
    )
)

# Customer ID (references customer dataset)
journey_config.add_column(
    SamplerColumnConfig(
        name="customer_id",
        sampler_type=SamplerType.UUID,
    )
)

# Traffic source
journey_config.add_column(
    SamplerColumnConfig(
        name="traffic_source",
        sampler_type=SamplerType.CATEGORY,
        params=CategorySamplerParams(
            values=["Organic Search", "Paid Search", "Social Media", "Email", "Direct", "Referral"],
            weights=[0.30, 0.25, 0.20, 0.12, 0.08, 0.05],
        ),
    )
)

# Device type
journey_config.add_column(
    SamplerColumnConfig(
        name="device",
        sampler_type=SamplerType.CATEGORY,
        params=CategorySamplerParams(
            values=["Mobile", "Desktop", "Tablet"],
            weights=[0.60, 0.30, 0.10],
        ),
    )
)

# Session start time
journey_config.add_column(
    SamplerColumnConfig(
        name="session_start",
        sampler_type=SamplerType.DATETIME,
        params=DatetimeSamplerParams(
            start="2024-01-01 00:00:00",
            end="2024-12-01 23:59:59",
            date_format="%Y-%m-%d %H:%M:%S",
        ),
    )
)

# Pages viewed
journey_config.add_column(
    SamplerColumnConfig(
        name="pages_viewed",
        sampler_type=SamplerType.GAUSSIAN,
        params=GaussianSamplerParams(
            mean=4.5,
            std_dev=2.0,
            min_value=1,
            max_value=20,
            round_to=0,
        ),
    )
)

# Time on site (minutes)
journey_config.add_column(
    SamplerColumnConfig(
        name="time_on_site",
        sampler_type=SamplerType.GAUSSIAN,
        params=GaussianSamplerParams(
            mean=6.5,
            std_dev=3.5,
            min_value=0.5,
            max_value=30,
            round_to=1,
        ),
    )
)

# Funnel stage reached
journey_config.add_column(
    SamplerColumnConfig(
        name="funnel_stage",
        sampler_type=SamplerType.CATEGORY,
        params=CategorySamplerParams(
            values=["Homepage", "Product Page", "Add to Cart", "Checkout", "Purchase"],
            weights=[0.35, 0.30, 0.18, 0.10, 0.07],
        ),
    )
)

# Conversion indicator
journey_config.add_column(
    ExpressionColumnConfig(
        name="converted",
        expression="funnel_stage == 'Purchase'",
    )
)

# Cart value (if applicable)
journey_config.add_column(
    ExpressionColumnConfig(
        name="cart_value",
        expression="round(uniform(20, 500), 2) if funnel_stage in ['Add to Cart', 'Checkout', 'Purchase'] else 0",
    )
)

# Purchase value (if converted)
journey_config.add_column(
    ExpressionColumnConfig(
        name="purchase_value",
        expression="cart_value if converted else 0",
    )
)

# Preview customer journey data
print("\n=== Customer Journey Dataset ===")
journey_preview = data_designer.preview(config_builder=journey_config, num_samples=3)
journey_preview.display_sample_record()

## 📊 Generate Full Datasets

Now that we've previewed the data, let's generate full datasets for analysis.

In [None]:
# Generate datasets (adjust num_records as needed)
NUM_CUSTOMERS = 100
NUM_PRODUCTS = 50
NUM_EMAIL_CAMPAIGNS = 20
NUM_SOCIAL_POSTS = 30
NUM_SESSIONS = 200

print("Generating customer demographics...")
customers_result = data_designer.design(config_builder=customer_config, num_records=NUM_CUSTOMERS)
customers_df = customers_result.dataframe

print("Generating product catalog...")
products_result = data_designer.design(config_builder=product_config, num_records=NUM_PRODUCTS)
products_df = products_result.dataframe

print("Generating email campaigns...")
email_result = data_designer.design(config_builder=email_config, num_records=NUM_EMAIL_CAMPAIGNS)
email_df = email_result.dataframe

print("Generating social media posts...")
social_result = data_designer.design(config_builder=social_config, num_records=NUM_SOCIAL_POSTS)
social_df = social_result.dataframe

print("Generating customer journeys...")
journey_result = data_designer.design(config_builder=journey_config, num_records=NUM_SESSIONS)
journey_df = journey_result.dataframe

print("\n✅ All datasets generated successfully!")

## 📈 Quick Analysis Examples

Let's demonstrate some basic analytics that students can perform on this data.

In [None]:
# Customer segmentation analysis
print("=== Customer Segmentation ===")
print(customers_df['customer_segment'].value_counts())
print(f"\nAverage CLV by Segment:")
print(customers_df.groupby('customer_segment')['estimated_clv'].mean().round(2))

In [None]:
# Product pricing analysis
print("\n=== Product Pricing Analysis ===")
print(f"Average price by category:")
print(products_df.groupby('category')['final_price'].mean().round(2))
print(f"\nAverage discount by category:")
print(products_df.groupby('category')['discount_pct'].mean().round(1))

In [None]:
# Email campaign performance
print("\n=== Email Campaign Performance ===")
print(f"Average metrics by campaign type:")
campaign_metrics = email_df.groupby('campaign_type')[['open_rate', 'click_rate', 'conversion_rate']].mean()
print(campaign_metrics.round(3))
print(f"\nTotal revenue by campaign type:")
print(email_df.groupby('campaign_type')['revenue'].sum().round(2))

In [None]:
# Social media engagement
print("\n=== Social Media Engagement ===")
print(f"Average engagement by platform:")
social_metrics = social_df.groupby('platform')[['impressions', 'likes', 'comments', 'shares']].mean()
print(social_metrics.round(0))
print(f"\nEngagement rate by platform:")
print(social_df.groupby('platform')['engagement_rate'].mean().round(4))

In [None]:
# Conversion funnel analysis
print("\n=== Conversion Funnel Analysis ===")
print(f"Sessions by funnel stage:")
funnel_counts = journey_df['funnel_stage'].value_counts().sort_index()
print(funnel_counts)
print(f"\nConversion rate: {journey_df['converted'].mean():.2%}")
print(f"Average purchase value: ${journey_df[journey_df['converted']]['purchase_value'].mean():.2f}")
print(f"\nConversion rate by traffic source:")
print(journey_df.groupby('traffic_source')['converted'].mean().round(3))

## 💾 Export Data for Further Analysis

Save the generated datasets for use in other tools or assignments.

In [None]:
# Export to CSV files
customers_df.to_csv('customers.csv', index=False)
products_df.to_csv('products.csv', index=False)
email_df.to_csv('email_campaigns.csv', index=False)
social_df.to_csv('social_media.csv', index=False)
journey_df.to_csv('customer_journeys.csv', index=False)

print("✅ Datasets exported to CSV files:")
print("  - customers.csv")
print("  - products.csv")
print("  - email_campaigns.csv")
print("  - social_media.csv")
print("  - customer_journeys.csv")

## 🎓 Teaching Applications

### Assignments and Exercises

Here are some ideas for using this data in a marketing analytics class:

#### **1. Customer Segmentation Analysis**
- Use clustering algorithms (K-means, hierarchical) on customer demographics
- Analyze CLV patterns across different segments
- Create customer personas based on demographic and behavioral data

#### **2. Campaign Performance Optimization**
- Compare email campaign types to identify best practices
- A/B test analysis: Which subject lines perform best?
- Calculate ROI for different campaign types

#### **3. Social Media Analytics**
- Analyze engagement patterns across platforms
- Identify optimal posting times and content types
- Calculate cost per engagement metrics

#### **4. Conversion Funnel Optimization**
- Identify drop-off points in the customer journey
- Compare conversion rates by traffic source
- Calculate the value of improving each funnel stage

#### **5. Product Analytics**
- Analyze price elasticity by category
- Identify optimal discount strategies
- Product portfolio analysis

#### **6. Multi-Channel Attribution**
- Trace customer touchpoints across channels
- Build attribution models
- Calculate channel contribution to conversions

### SQL Practice Queries

Students can practice SQL on this data:

```sql
-- Top performing email campaigns
SELECT campaign_type, AVG(open_rate), AVG(click_rate), SUM(revenue)
FROM email_campaigns
GROUP BY campaign_type
ORDER BY SUM(revenue) DESC;

-- Customer lifetime value by segment
SELECT customer_segment, AVG(estimated_clv), COUNT(*)
FROM customers
GROUP BY customer_segment;

-- Conversion funnel analysis
SELECT funnel_stage, COUNT(*), 
       100.0 * COUNT(*) / SUM(COUNT(*)) OVER () as percentage
FROM customer_journeys
GROUP BY funnel_stage;
```

### Visualization Projects

Create dashboards showing:
- Customer demographic distributions
- Campaign performance trends over time
- Social media engagement heatmaps
- Conversion funnel visualizations
- Geographic distribution of customers

### Machine Learning Projects

- **Churn prediction**: Predict which customers are likely to stop engaging
- **Purchase prediction**: Which customers are likely to convert?
- **Recommendation systems**: Product recommendations based on customer segments
- **Campaign optimization**: Predict which campaign types work for which segments

## 🔄 Customization

To modify this demo for your specific needs:

1. **Adjust distributions**: Change the weights in CategorySamplerParams to match your target market
2. **Add more columns**: Include additional metrics like bounce rate, cart abandonment reasons, etc.
3. **Change timeframes**: Modify date ranges to match your course schedule
4. **Scale up/down**: Adjust NUM_RECORDS to generate more or fewer samples
5. **Add validators**: Use ValidationColumnConfig to ensure data quality
6. **Connect datasets**: Use matching IDs to create relational datasets

## 📚 Next Steps

To learn more about Data Designer:

- Explore [validators](https://nvidia-nemo.github.io/DataDesigner/concepts/validators/) to ensure data quality
- Learn about [seed datasets](../3-seeding-with-a-dataset/) to base generation on real data
- Check out [person sampling](https://nvidia-nemo.github.io/DataDesigner/concepts/person_sampling/) for more demographic options
- Review the [full documentation](https://nvidia-nemo.github.io/DataDesigner/)

Happy teaching! 🎓