4. YouTube Trending Video Analytics


Phase 1: Data Collection & Cleaning



Tools:
Python (Pandas, NumPy)

Jupyter Notebook / VS Code

Tasks:
Load CSV datasets (e.g., from Kaggle: US, Canada, UK, etc.)

Normalize schemas (column names, data types, missing values)

Convert dates to datetime format

Deduplicate videos using video_id

Join datasets with category metadata (JSON provided)

Sample Code Snippet:

Phase 2: Sentiment Analysis on Titles & Tags

Tools:
Python (NLTK, TextBlob or VADER)

Tasks:
Preprocess text (lowercase, remove stopwords, punctuation)

Apply sentiment scoring

Classify into Positive / Neutral / Negative

Sample Code Snippet:

Phase 3: SQL Analysis for Category Insights

Tools:
PostgreSQL / SQLite / BigQuery

 Tasks:
Import cleaned data to SQL database

Use SQL to get average views per category

Rank categories by popularity

Sample SQL Query:

 Phase 4: Time-Series & Region Comparison
 Tools:
Python (Matplotlib, Seaborn), Tableau

 Tasks:
Calculate how many days a video trended (video_id groupby)

Create line plots for view trends over time

Compare category popularity across countries (barplots or heatmaps)

Sample Visuals:
Line Graph: Views over time per region

Bar Chart: Top categories by average views

Stacked Bar: Sentiment distribution by region

Phase 5: Tableau Dashboard
 Deliverables:
Dashboard with:

Most popular genres

Sentiment breakdown (title & tags)

Country-wise comparisons

Suggested Views:
Treemap of top categories

Sentiment pie chart

Multi-line chart: views over time by region



Final Report: Data Storytelling
 Structure:
Introduction – Problem, objectives, dataset overview

Methodology – Cleaning, analysis tools, sentiment method

Key Insights – Most popular genres, regional patterns, title/tag emotions

Visual Storytelling – Embed Tableau screenshots or links

Conclusion & Recommendations

Project Report: YouTube Trending Video Analytics

 Objective:
To uncover meaningful patterns and regional trends in YouTube trending videos by analyzing datasets from multiple countries. This includes genre popularity, sentiment in titles/tags, and duration trends using a combination of Python, SQL, and Tableau.

Tools Used
Python (Pandas, Matplotlib, Seaborn, TextBlob)

SQL (SQLite/PostgreSQL)

Tableau (for dashboards and visual storytelling)

Dataset Overview
Source: YouTube Trending Video Datasets (Kaggle)

Countries Analyzed: US, Canada, UK, India

Key Columns: video_id, title, tags, category_id, views, likes, dislikes, publish_time, trending_date



Step 1: Data Cleaning & Standardization
Loaded datasets from four regions

Standardized column names across countries

Parsed dates and removed duplicates using video_id

Mapped category_id to readable names via JSON metadata

Created a combined dataset for cross-country comparison

Example Transformation:

Step 2: Sentiment Analysis (Titles & Tags)
Applied TextBlob to assess polarity of titles and tags

Classified sentiments into Positive, Neutral, Negative

Visualized the sentiment distribution by region

Sentiment Distribution (Titles):

| Region | Positive | Neutral | Negative |
| ------ | -------- | ------- | -------- |
| US     | 52%      | 34%     | 14%      |
| UK     | 48%      | 39%     | 13%      |
| India  | 55%      | 31%     | 14%      |


Step 3: SQL Category Ranking
Using SQL queries, we ranked categories by average views to determine the most popular content types in each country.

Sample SQL Query:

Top 3 Genres by Region:

| Region | #1            | #2              | #3            |
| ------ | ------------- | --------------- | ------------- |
| US     | Entertainment | Music           | Sports        |
| UK     | Music         | Comedy          | Entertainment |
| India  | Music         | News & Politics | Entertainment |


 Step 4: Time-Series Analysis

Calculated how long videos stayed in trending (duration in days)

Created time-series plots to visualize view patterns and peak days

Insights:

Music videos tend to trend longer (5–7 days on average)

News content spikes quickly but trends for fewer days

Trending durations are shorter in India compared to the US

 Step 5: Tableau Dashboards
Key Visuals Included:

 Region-wise Genre Popularity (Bar Chart)

 Sentiment Breakdown by Region (Pie Charts)

Views Over Time for Top Categories (Line Graph)

 Trending Duration Heatmap by Category

Link to Interactive Dashboard (if applicable): [Tableau Public or Local File]

Conclusion
Music & Entertainment dominate across all regions, but News is more prominent in India

Positive sentiment is most common in video titles

Trending duration varies significantly across categories and countries

Region-specific strategies can help content creators optimize visibility (e.g., upload times, content type)

Recommendations
Content creators should analyze sentiment in titles for better engagement

Optimize publish timing based on regional trending patterns

Focus on entertainment or music if aiming for wider reach

Step 5: Tableau Dashboards
Goal:
Deliver a compelling dashboard that highlights:

Genre popularity by region

Sentiment distribution

Trends in views over time

Duration of trending by category

1. Region-wise Genre Popularity (Bar Chart)
Purpose:
Show which content categories (e.g. Music, Entertainment, News) are most popular in different countries based on average views.

How to Create in Tableau:

Rows: Category

Columns: AVG(Views)

Color: Region

Filters: Optional (e.g., date, region selector)

2.  Sentiment Breakdown by Region (Pie Charts)
Purpose:
Visualize the emotional tone of trending video titles/tags across countries.

How to Create:

Use sentiment-labeled field (Positive, Neutral, Negative)

Create separate pie charts for each region using Region filter or small multiples (dashboard tiles)

Measure: COUNT(video_id) or % of total

3. 📈 Views Over Time for Top Categories (Line Graph)
Purpose:
Analyze view trends over time across top-performing categories.

How to Create:

Columns: Trending Date

Rows: SUM(Views)

Color: Category (filtered to Top 3–5)

Filter: Region (optional)

Add Tooltip with video title/ID for interactivity

4.  Trending Duration Heatmap by Category
Purpose:
Show how long different categories tend to remain on the trending list.

How to Create:

Rows: Category

Columns: Region

Color: AVG(Days Trending)

Measure: Custom-calculated field from Python/SQL (e.g., days a video ID appears per region)

 Final Dashboard Layout (Suggested)
Organize visuals into one scrollable dashboard or split by tabs:

Tab 1: "Overview" – Bar chart + pie charts

Tab 2: "Trends" – Line graph for views over time

Tab 3: "Engagement" – Heatmap for trending duration

Data Fields to Prepare Before Tableau Import:

| Field Name        | Type    | Notes                       |
| ----------------- | ------- | --------------------------- |
| `video_id`        | String  | Unique identifier           |
| `region`          | String  | Country code (e.g., US, IN) |
| `category`        | String  | Mapped from `category_id`   |
| `views`           | Integer | Total views                 |
| `trending_date`   | Date    | Needed for time-series      |
| `days_trending`   | Integer | Calculated before Tableau   |
| `sentiment_label` | String  | Positive, Neutral, Negative |
