# üìä Udemy Courses: Exploratory Data Analysis

**Author:** Tharun Ponnam  
**GitHub:** [@tharun-ship-it](https://github.com/tharun-ship-it)  
**Email:** tharunponnam007@gmail.com  
**Dataset:** [Kaggle - Udemy Courses](https://www.kaggle.com/andrewmvd/udemy-courses)

---

## Abstract

This notebook presents a comprehensive **exploratory data analysis** of Udemy's online course catalog, analyzing **3,682 courses** across four major subjects spanning 2011-2017. The analysis implements a complete data science pipeline‚Äîfrom data cleaning and feature engineering through statistical analysis and visualization‚Äîto uncover insights about pricing strategies, subscriber engagement patterns, and temporal trends in the online education market.

### Key Features:

- **Large-Scale Analysis:** Processing of 3,682 courses with 11.9M+ total subscribers
- **Multi-Dimensional Exploration:** Subject distribution, pricing dynamics, temporal patterns
- **Feature Engineering:** Engagement metrics, revenue estimation, temporal feature extraction
- **Statistical Insights:** Correlation analysis, distribution studies, comparative analysis
- **Publication-Ready Visualizations:** Professional figures including heatmaps, distributions, and trend analysis

---

### üìã Table of Contents

1. [Environment Setup](#1-environment-setup)
2. [Data Loading & Exploration](#2-data-loading--exploration)
3. [Data Cleaning & Preprocessing](#3-data-cleaning--preprocessing)
4. [Feature Engineering](#4-feature-engineering)
5. [Descriptive Statistics](#5-descriptive-statistics)
6. [Univariate Analysis](#6-univariate-analysis)
7. [Bivariate Analysis](#7-bivariate-analysis)
8. [Temporal Analysis](#8-temporal-analysis)
9. [Advanced Insights](#9-advanced-insights)
10. [Conclusions & Recommendations](#10-conclusions--recommendations)

## 1. Environment Setup

In [None]:
# Install dependencies (uncomment if running in Colab)
# !pip install pandas numpy matplotlib seaborn -q

In [None]:
# Core libraries
import numpy as np
import pandas as pd
from datetime import datetime
import warnings
warnings.filterwarnings('ignore')

# Visualization
import matplotlib.pyplot as plt
import seaborn as sns

# Configuration
pd.set_option('display.max_columns', None)
pd.set_option('display.float_format', '{:.2f}'.format)

# Random seed for reproducibility
RANDOM_SEED = 42
np.random.seed(RANDOM_SEED)

print("‚úÖ Environment setup complete!")
print(f"   NumPy version: {np.__version__}")
print(f"   Pandas version: {pd.__version__}")
print(f"   Analysis timestamp: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")

In [None]:
# Configure visualization style
plt.style.use('seaborn-v0_8-whitegrid')
plt.rcParams['figure.figsize'] = (12, 6)
plt.rcParams['font.size'] = 11
plt.rcParams['axes.titlesize'] = 14
plt.rcParams['axes.labelsize'] = 12
plt.rcParams['figure.dpi'] = 150

# Custom color palette for subjects
SUBJECT_COLORS = {
    'Web Development': '#3498db',
    'Business Finance': '#2ecc71',
    'Graphic Design': '#e74c3c',
    'Musical Instruments': '#9b59b6'
}

# Color palette for paid vs free
PAYMENT_COLORS = {
    True: '#27ae60',
    False: '#e74c3c'
}

print("‚úÖ Visualization configuration complete!")

## 2. Data Loading & Exploration

In [None]:
# For Google Colab: Download dataset
# Uncomment and run if using Colab

# import os
# !pip install kaggle -q
# !mkdir -p ~/.kaggle
# # Upload your kaggle.json API key first
# !kaggle datasets download -d andrewmvd/udemy-courses --unzip
# print("‚úÖ Dataset downloaded successfully!")

In [None]:
def load_udemy_data(filepath):
    """
    Load Udemy courses dataset with initial preprocessing.
    
    Parameters:
    -----------
    filepath : str
        Path to the CSV file
        
    Returns:
    --------
    pd.DataFrame
        Loaded DataFrame with basic info displayed
    """
    print("üì• Loading dataset...")
    
    try:
        df = pd.read_csv(filepath, encoding='utf-8')
    except UnicodeDecodeError:
        df = pd.read_csv(filepath, encoding='latin-1')
    
    print(f"‚úÖ Loaded {len(df):,} courses with {len(df.columns)} features")
    print(f"   Memory usage: {df.memory_usage(deep=True).sum() / 1024**2:.2f} MB")
    
    return df

In [None]:
# Load the dataset
# Adjust path based on your environment
DATA_PATH = '../data/udemy_courses.csv'

# Alternative paths for Colab
# DATA_PATH = 'udemy_courses.csv'
# DATA_PATH = '/content/udemy_courses.csv'

df = load_udemy_data(DATA_PATH)

In [None]:
# Display first few records
print("\nüìã First 5 Records:")
df.head()

In [None]:
# Data types and structure
print("\nüìä Dataset Structure:")
print("="*60)
df.info()

In [None]:
# Column overview
print("\nüìù Available Features:")
print("="*60)
for i, col in enumerate(df.columns, 1):
    dtype = df[col].dtype
    non_null = df[col].notna().sum()
    print(f"  {i:2d}. {col:<25} | Type: {str(dtype):<10} | Non-null: {non_null:,}")

## 3. Data Cleaning & Preprocessing

### 3.1 Missing Value Analysis

In [None]:
def analyze_missing_values(dataframe):
    """
    Generate a comprehensive missing value report.
    """
    missing = dataframe.isnull().sum()
    missing_pct = (missing / len(dataframe)) * 100
    
    report = pd.DataFrame({
        'Missing Count': missing,
        'Missing %': missing_pct,
        'Data Type': dataframe.dtypes
    })
    
    report = report[report['Missing Count'] > 0].sort_values('Missing %', ascending=False)
    
    return report if len(report) > 0 else "‚úÖ No missing values detected!"

print("\nüîç Missing Value Analysis:")
print("="*60)
analyze_missing_values(df)

In [None]:
# Visualize missing values pattern
fig, ax = plt.subplots(figsize=(12, 4))

sns.heatmap(df.isnull().T, cbar=True, cmap='YlOrRd', yticklabels=df.columns, ax=ax)

ax.set_title('Missing Value Pattern (Yellow = Missing)', fontsize=14, fontweight='bold')
ax.set_xlabel('Sample Index')

plt.tight_layout()
plt.show()

print("‚úÖ Missing value visualization complete!")

### 3.2 Duplicate Detection

In [None]:
# Check for duplicate records
duplicates = df.duplicated().sum()
duplicate_ids = df['course_id'].duplicated().sum()

print("\nüîÑ Duplicate Analysis:")
print("="*60)
print(f"   Complete duplicate rows: {duplicates:,}")
print(f"   Duplicate course IDs: {duplicate_ids:,}")

if duplicates > 0:
    print(f"\n   ‚ö†Ô∏è Removing {duplicates} duplicate rows...")
    df = df.drop_duplicates()
    print(f"   ‚úÖ Dataset now has {len(df):,} unique records")
else:
    print("   ‚úÖ No duplicates found!")

### 3.3 Data Type Conversion

In [None]:
# Convert timestamp to datetime
df['published_timestamp'] = pd.to_datetime(df['published_timestamp'])

# Convert boolean payment status
df['is_paid'] = df['is_paid'].astype(bool)

print("\nüîß Data Type Conversions:")
print("="*60)
print("   ‚úÖ 'published_timestamp' ‚Üí datetime64")
print("   ‚úÖ 'is_paid' ‚Üí boolean")
print(f"\n   Date range: {df['published_timestamp'].min().strftime('%Y-%m-%d')} to {df['published_timestamp'].max().strftime('%Y-%m-%d')}")

## 4. Feature Engineering

In [None]:
# Extract temporal features
df['published_year'] = df['published_timestamp'].dt.year
df['published_month'] = df['published_timestamp'].dt.month
df['published_day_of_week'] = df['published_timestamp'].dt.dayofweek
df['published_quarter'] = df['published_timestamp'].dt.quarter

day_mapping = {0: 'Monday', 1: 'Tuesday', 2: 'Wednesday', 3: 'Thursday', 
               4: 'Friday', 5: 'Saturday', 6: 'Sunday'}
df['day_name'] = df['published_day_of_week'].map(day_mapping)

print("\nüìÖ Temporal Features Created:")
print("="*60)
print("   ‚úÖ published_year, published_month, published_quarter")
print("   ‚úÖ published_day_of_week, day_name")

In [None]:
# Create engagement metrics
df['reviews_per_subscriber'] = np.where(df['num_subscribers'] > 0,
    df['num_reviews'] / df['num_subscribers'], 0)

df['lectures_per_hour'] = np.where(df['content_duration'] > 0,
    df['num_lectures'] / df['content_duration'], 0)

df['estimated_revenue'] = np.where(df['is_paid'],
    df['price'] * df['num_subscribers'], 0)

df['engagement_score'] = df['reviews_per_subscriber'] * 100

print("\nüìä Engagement Metrics Created:")
print("="*60)
print("   ‚úÖ reviews_per_subscriber, lectures_per_hour")
print("   ‚úÖ estimated_revenue, engagement_score")

## 5. Descriptive Statistics

In [None]:
# Numerical features summary
numerical_cols = ['price', 'num_subscribers', 'num_reviews', 'num_lectures', 'content_duration']

print("\nüìà Numerical Features Summary:")
print("="*80)
df[numerical_cols].describe().round(2)

In [None]:
# Categorical features summary
print("\nüìä Categorical Features Summary:")
print("="*60)

print("\nüéØ Subject Distribution:")
for subject, count in df['subject'].value_counts().items():
    print(f"   ‚Ä¢ {subject}: {count:,} courses ({count/len(df)*100:.1f}%)")

print("\nüéöÔ∏è Level Distribution:")
for level, count in df['level'].value_counts().items():
    print(f"   ‚Ä¢ {level}: {count:,} courses ({count/len(df)*100:.1f}%)")

print("\nüí∞ Payment Status:")
print(f"   ‚Ä¢ Paid: {df['is_paid'].sum():,} ({df['is_paid'].mean()*100:.1f}%)")
print(f"   ‚Ä¢ Free: {(~df['is_paid']).sum():,} ({(~df['is_paid']).mean()*100:.1f}%)")

## 6. Univariate Analysis

In [None]:
# Distribution of key numerical features
fig, axes = plt.subplots(2, 3, figsize=(15, 10))
axes = axes.flatten()

features_to_plot = ['price', 'num_subscribers', 'num_reviews', 
                    'num_lectures', 'content_duration', 'engagement_score']

for idx, col in enumerate(features_to_plot):
    ax = axes[idx]
    data = df[col].dropna()
    
    if col in ['num_subscribers', 'num_reviews']:
        data = data[data > 0]
        ax.hist(data, bins=50, alpha=0.7, color='#3498db', edgecolor='white')
        ax.set_xscale('log')
    else:
        ax.hist(data, bins=50, alpha=0.7, color='#3498db', edgecolor='white')
    
    ax.axvline(data.mean(), color='red', linestyle='--', linewidth=2, label=f'Mean: {data.mean():,.1f}')
    ax.axvline(data.median(), color='green', linestyle='-', linewidth=2, label=f'Median: {data.median():,.1f}')
    
    ax.set_title(f'Distribution of {col}', fontsize=12, fontweight='bold')
    ax.legend(fontsize=8)

plt.tight_layout()
plt.show()

print("‚úÖ Numerical distribution visualization complete!")

In [None]:
# Subject distribution
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

subject_counts = df['subject'].value_counts()
colors = [SUBJECT_COLORS.get(s, '#95a5a6') for s in subject_counts.index]

# Bar chart
bars = axes[0].bar(subject_counts.index, subject_counts.values, color=colors, edgecolor='white', linewidth=2)
for bar, val in zip(bars, subject_counts.values):
    axes[0].text(bar.get_x() + bar.get_width()/2, bar.get_height() + 20,
                 f'{val:,}', ha='center', fontweight='bold')
axes[0].set_title('Course Count by Subject', fontsize=14, fontweight='bold')
axes[0].tick_params(axis='x', rotation=15)

# Pie chart
axes[1].pie(subject_counts.values, labels=subject_counts.index, autopct='%1.1f%%',
            colors=colors, explode=[0.02]*len(subject_counts), shadow=True)
axes[1].set_title('Subject Distribution', fontsize=14, fontweight='bold')

plt.tight_layout()
plt.show()

## 7. Bivariate Analysis

In [None]:
# Correlation matrix
correlation_cols = ['price', 'num_subscribers', 'num_reviews', 'num_lectures', 
                    'content_duration', 'engagement_score']
correlation_matrix = df[correlation_cols].corr()

fig, ax = plt.subplots(figsize=(10, 8))
mask = np.triu(np.ones_like(correlation_matrix, dtype=bool))

sns.heatmap(correlation_matrix, mask=mask, annot=True, fmt='.2f', cmap='RdBu_r',
            center=0, square=True, linewidths=0.5, cbar_kws={'shrink': 0.8})

ax.set_title('Feature Correlation Matrix', fontsize=14, fontweight='bold')
plt.tight_layout()
plt.show()

print("\nüìä Key Correlations:")
print(f"   ‚Ä¢ Subscribers vs Reviews: {correlation_matrix.loc['num_subscribers', 'num_reviews']:.3f}")
print(f"   ‚Ä¢ Price vs Subscribers: {correlation_matrix.loc['price', 'num_subscribers']:.3f}")

In [None]:
# Subscribers by subject
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

total_subs = df.groupby('subject')['num_subscribers'].sum().sort_values(ascending=True)
colors = [SUBJECT_COLORS.get(s, '#95a5a6') for s in total_subs.index]

bars = axes[0].barh(total_subs.index, total_subs.values, color=colors, edgecolor='white')
for bar, val in zip(bars, total_subs.values):
    axes[0].text(val + 50000, bar.get_y() + bar.get_height()/2, f'{val/1e6:.1f}M', va='center', fontweight='bold')
axes[0].set_title('Total Subscribers by Subject', fontsize=14, fontweight='bold')

avg_subs = df.groupby('subject')['num_subscribers'].mean().sort_values(ascending=True)
colors = [SUBJECT_COLORS.get(s, '#95a5a6') for s in avg_subs.index]
bars = axes[1].barh(avg_subs.index, avg_subs.values, color=colors, edgecolor='white')
for bar, val in zip(bars, avg_subs.values):
    axes[1].text(val + 100, bar.get_y() + bar.get_height()/2, f'{val:,.0f}', va='center', fontweight='bold')
axes[1].set_title('Average Subscribers per Course', fontsize=14, fontweight='bold')

plt.tight_layout()
plt.show()

In [None]:
# Price analysis
paid_courses = df[df['is_paid'] == True].copy()

fig, axes = plt.subplots(2, 2, figsize=(14, 10))

# Price by subject
sns.boxplot(data=paid_courses, x='subject', y='price', palette='Set2', ax=axes[0,0])
axes[0,0].set_title('Price Distribution by Subject', fontweight='bold')
axes[0,0].tick_params(axis='x', rotation=15)

# Price vs Subscribers
axes[0,1].scatter(paid_courses['price'], paid_courses['num_subscribers'], alpha=0.5, s=30)
axes[0,1].set_xlabel('Price (USD)')
axes[0,1].set_ylabel('Subscribers')
axes[0,1].set_title('Price vs Subscribers', fontweight='bold')
axes[0,1].set_yscale('log')

# Price histogram
axes[1,0].hist(paid_courses['price'], bins=50, alpha=0.7, color='#27ae60', edgecolor='white')
axes[1,0].axvline(paid_courses['price'].median(), color='blue', linestyle='-', linewidth=2, 
                  label=f"Median: ${paid_courses['price'].median():.0f}")
axes[1,0].set_title('Price Distribution', fontweight='bold')
axes[1,0].legend()

# Price by level
sns.violinplot(data=paid_courses, x='level', y='price', palette='viridis', ax=axes[1,1])
axes[1,1].set_title('Price by Level', fontweight='bold')
axes[1,1].tick_params(axis='x', rotation=15)

plt.tight_layout()
plt.show()

In [None]:
# Free vs Paid comparison
fig, axes = plt.subplots(1, 3, figsize=(15, 5))

# Pie chart
payment_counts = df['is_paid'].value_counts()
axes[0].pie(payment_counts.values, labels=['Paid', 'Free'], autopct='%1.1f%%',
            colors=['#27ae60', '#e74c3c'], explode=[0.02, 0.02], shadow=True)
axes[0].set_title('Paid vs Free Distribution', fontweight='bold')

# Subscriber comparison
subs_by_payment = df.groupby('is_paid')['num_subscribers'].mean()
axes[1].bar(['Free', 'Paid'], subs_by_payment.values, color=['#e74c3c', '#27ae60'], edgecolor='white')
axes[1].set_title('Average Subscribers', fontweight='bold')
axes[1].set_ylabel('Average Subscribers')

# Review comparison
df['payment_type'] = df['is_paid'].map({True: 'Paid', False: 'Free'})
sns.boxplot(data=df, x='payment_type', y='num_reviews', palette=['#e74c3c', '#27ae60'], ax=axes[2])
axes[2].set_yscale('log')
axes[2].set_title('Review Distribution', fontweight='bold')

plt.tight_layout()
plt.show()

## 8. Temporal Analysis

In [None]:
# Yearly publication trend
yearly_stats = df.groupby('published_year').agg({'course_id': 'count', 'num_subscribers': 'sum'})
yearly_stats.columns = ['courses', 'total_subscribers']

fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Line plot
axes[0].plot(yearly_stats.index, yearly_stats['courses'], marker='o', linewidth=2, markersize=8, color='#3498db')
axes[0].fill_between(yearly_stats.index, yearly_stats['courses'], alpha=0.3, color='#3498db')
axes[0].set_title('Courses Published per Year', fontsize=14, fontweight='bold')
axes[0].set_xlabel('Year')
axes[0].set_ylabel('Number of Courses')

for x, y in zip(yearly_stats.index, yearly_stats['courses']):
    axes[0].annotate(f'{y:,}', (x, y), textcoords='offset points', xytext=(0, 10), ha='center', fontweight='bold')

# Stacked area by subject
subject_yearly = df.groupby(['published_year', 'subject']).size().unstack(fill_value=0)
subject_yearly.plot(kind='area', stacked=True, ax=axes[1], alpha=0.7,
                    color=[SUBJECT_COLORS.get(s, '#95a5a6') for s in subject_yearly.columns])
axes[1].set_title('Publications by Subject Over Time', fontsize=14, fontweight='bold')
axes[1].legend(title='Subject', loc='upper left')

plt.tight_layout()
plt.show()

print("\nüìÖ Yearly Growth:")
for year in yearly_stats.index:
    print(f"   {year}: {yearly_stats.loc[year, 'courses']:,} courses")

In [None]:
# Monthly and weekly patterns
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Monthly
monthly_counts = df.groupby('published_month').size()
month_names = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
axes[0].bar(range(1, 13), monthly_counts.values, color=sns.color_palette('coolwarm', 12), edgecolor='white')
axes[0].set_xticks(range(1, 13))
axes[0].set_xticklabels(month_names)
axes[0].set_title('Publications by Month', fontsize=14, fontweight='bold')

# Weekly
daily_counts = df.groupby('published_day_of_week').size()
day_names = ['Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun']
axes[1].bar(range(7), daily_counts.values, color=sns.color_palette('viridis', 7), edgecolor='white')
axes[1].set_xticks(range(7))
axes[1].set_xticklabels(day_names)
axes[1].set_title('Publications by Day of Week', fontsize=14, fontweight='bold')

plt.tight_layout()
plt.show()

## 9. Advanced Insights

In [None]:
# Top 10 courses
print("\nüèÜ Top 10 Most Popular Courses:")
print("="*80)

top_courses = df.nlargest(10, 'num_subscribers')[['course_title', 'subject', 'num_subscribers', 'price', 'is_paid']].copy()
top_courses['num_subscribers'] = top_courses['num_subscribers'].apply(lambda x: f"{x:,}")
top_courses['price'] = top_courses.apply(lambda x: f"${x['price']:.0f}" if x['is_paid'] else "Free", axis=1)
top_courses = top_courses.drop('is_paid', axis=1)
top_courses

In [None]:
# Visualize top courses
fig, ax = plt.subplots(figsize=(12, 6))

top_10 = df.nlargest(10, 'num_subscribers')
colors = [SUBJECT_COLORS.get(s, '#95a5a6') for s in top_10['subject']]
titles = [t[:40] + '...' if len(t) > 40 else t for t in top_10['course_title']]

bars = ax.barh(range(len(top_10)), top_10['num_subscribers'].values, color=colors, edgecolor='white')
ax.set_yticks(range(len(top_10)))
ax.set_yticklabels(titles)
ax.invert_yaxis()
ax.set_xlabel('Number of Subscribers')
ax.set_title('Top 10 Most Popular Courses', fontsize=14, fontweight='bold')

for i, val in enumerate(top_10['num_subscribers'].values):
    ax.text(val + 5000, i, f'{val:,}', va='center', fontweight='bold')

legend_handles = [plt.Rectangle((0,0), 1, 1, color=c) for c in SUBJECT_COLORS.values()]
ax.legend(legend_handles, SUBJECT_COLORS.keys(), title='Subject', loc='lower right')

plt.tight_layout()
plt.show()

In [None]:
# Executive summary
print("\n" + "="*70)
print("üìã EXECUTIVE SUMMARY: Udemy Courses Analysis")
print("="*70)

print(f"""
üìä Dataset Overview
   ‚Ä¢ Total Courses: {len(df):,}
   ‚Ä¢ Date Range: {df['published_timestamp'].min().strftime('%Y-%m-%d')} to {df['published_timestamp'].max().strftime('%Y-%m-%d')}
   ‚Ä¢ Subjects: {df['subject'].nunique()}

üë• Subscribers
   ‚Ä¢ Total: {df['num_subscribers'].sum():,}
   ‚Ä¢ Average per Course: {df['num_subscribers'].mean():,.0f}
   ‚Ä¢ Top Subject: {df.groupby('subject')['num_subscribers'].sum().idxmax()}

üí∞ Pricing
   ‚Ä¢ Paid: {df['is_paid'].sum():,} ({df['is_paid'].mean()*100:.1f}%)
   ‚Ä¢ Free: {(~df['is_paid']).sum():,} ({(~df['is_paid']).mean()*100:.1f}%)
   ‚Ä¢ Median Price: ${paid_courses['price'].median():.2f}

üîë Key Findings
   1. Price shows weak correlation (œÅ ‚âà {correlation_matrix.loc['price', 'num_subscribers']:.2f}) with subscribers
   2. Web Development dominates subscriber engagement
   3. Course publication accelerated significantly post-2013
   4. Free courses represent {(~df['is_paid']).mean()*100:.1f}% of catalog
""")

print("="*70)
print("‚úÖ Summary Complete!")

## 10. Conclusions & Recommendations

### üéØ Key Findings

**1. Market Composition**
- Udemy hosts a diverse catalog spanning Web Development, Business Finance, Graphic Design, and Musical Instruments
- Web Development commands the highest subscriber engagement

**2. Pricing Dynamics**
- Course pricing shows weak correlation with subscriber acquisition (œÅ ‚âà 0.05)
- Quality signals outweigh price sensitivity in purchase decisions

**3. Free vs. Paid**
- Free courses (~8%) demonstrate unique engagement patterns
- May serve as effective lead generation for instructors

**4. Temporal Evolution**
- Platform growth accelerated significantly post-2013
- Coincided with mobile expansion and funding rounds

---

### üí° Strategic Recommendations

**For Course Creators:**
- Focus on Web Development and Business Finance for maximum reach
- Optimize descriptions rather than competing on price

**For Platform Operators:**
- Invest in recommendation algorithms
- Consider free-tier strategies for user acquisition

**For Learners:**
- Evaluate courses on review-to-subscriber ratios
- Consider content structure over price

---

### üîÆ Future Work

- [ ] Sentiment analysis of reviews
- [ ] Predictive modeling for subscribers
- [ ] NLP analysis of titles and descriptions
- [ ] Time series forecasting

In [None]:
# Save processed data
df.to_csv('../data/udemy_courses_analyzed.csv', index=False)
print("üíæ Analyzed data saved!")
print(f"\n‚úÖ Analysis completed at {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")

---

## üìö References

1. **Dataset**: Udemy Courses - [Kaggle](https://www.kaggle.com/andrewmvd/udemy-courses)
2. **Pandas**: McKinney, W. (2010). Data Structures for Statistical Computing in Python.
3. **Seaborn**: Waskom, M. (2021). seaborn: statistical data visualization.
4. **Matplotlib**: Hunter, J. D. (2007). Matplotlib: A 2D Graphics Environment.