 ## Project - YouTube Trending Video Statistics Modeling 

<img src="https://www.techdotmatrix.com/wp-content/uploads/2016/12/YouTube-new-logo.jpg" 
 style="height: 170px;"/ align=left>


### 1) Problem Statement 

##### With data collected from the 200 listed trending YouTube videos every day in five countries (U.S., U.K., Canada, Germany and France), this project aims to identify
  - what audience sentiment on YouTube trending videos looks like overtime in different countries 
  - what factors affect popularity of a YouTube video?
  - whether video clusters can be set based on comments and statistics? 
  - whether a recommendation engine can be created to predict the videos which a audience likes?

### 2) Hypothese

  - The sentiment based on trending video comments is neutral for all countries but changes overtime. 
  - Number of video likes is positively associated with number of video views. 

### 3) Potential Methods

  - Sentiment Analysis
  - Statistical Exploratory Analysis 
  - Classification Modeling 

### 4) Data Exploration

In [4]:
import numpy as np
import pandas as pd
import seaborn as sb
import matplotlib.pyplot as plt
%matplotlib inline

In [10]:
path = '/Users/celia/Desktop/DS2017/FinalProject/Data-YouTube/youtube-new/'

In [12]:
us=pd.read_csv(path+'USvideos.csv')
gb=pd.read_csv(path+'GBvideos.csv')
ca=pd.read_csv(path+'CAvideos.csv')
de=pd.read_csv(path+'DEvideos.csv')
fr=pd.read_csv(path+'FRvideos.csv')

In [14]:
us.shape

(5800, 16)

In [15]:
us.columns

Index(['video_id', 'trending_date', 'title', 'channel_title', 'category_id',
       'publish_time', 'tags', 'views', 'likes', 'dislikes', 'comment_count',
       'thumbnail_link', 'comments_disabled', 'ratings_disabled',
       'video_error_or_removed', 'description'],
      dtype='object')

In [16]:
us.head()

Unnamed: 0,video_id,trending_date,title,channel_title,category_id,publish_time,tags,views,likes,dislikes,comment_count,thumbnail_link,comments_disabled,ratings_disabled,video_error_or_removed,description
0,2kyS6SvSYSE,17.14.11,WE WANT TO TALK ABOUT OUR MARRIAGE,CaseyNeistat,22,2017-11-13T17:13:01.000Z,SHANtell martin,748374,57527,2966,15954,https://i.ytimg.com/vi/2kyS6SvSYSE/default.jpg,False,False,False,SHANTELL'S CHANNEL - https://www.youtube.com/s...
1,1ZAPwfrtAFY,17.14.11,The Trump Presidency: Last Week Tonight with J...,LastWeekTonight,24,2017-11-13T07:30:00.000Z,"last week tonight trump presidency|""last week ...",2418783,97185,6146,12703,https://i.ytimg.com/vi/1ZAPwfrtAFY/default.jpg,False,False,False,"One year after the presidential election, John..."
2,5qpjK5DgCt4,17.14.11,"Racist Superman | Rudy Mancuso, King Bach & Le...",Rudy Mancuso,23,2017-11-12T19:05:24.000Z,"racist superman|""rudy""|""mancuso""|""king""|""bach""...",3191434,146033,5339,8181,https://i.ytimg.com/vi/5qpjK5DgCt4/default.jpg,False,False,False,WATCH MY PREVIOUS VIDEO ▶ \n\nSUBSCRIBE ► http...
3,puqaWrEC7tY,17.14.11,Nickelback Lyrics: Real or Fake?,Good Mythical Morning,24,2017-11-13T11:00:04.000Z,"rhett and link|""gmm""|""good mythical morning""|""...",343168,10172,666,2146,https://i.ytimg.com/vi/puqaWrEC7tY/default.jpg,False,False,False,Today we find out if Link is a Nickelback amat...
4,d380meD0W0M,17.14.11,I Dare You: GOING BALD!?,nigahiga,24,2017-11-12T18:01:41.000Z,"ryan|""higa""|""higatv""|""nigahiga""|""i dare you""|""...",2095731,132235,1989,17518,https://i.ytimg.com/vi/d380meD0W0M/default.jpg,False,False,False,I know it's been a while since we did this sho...


In [19]:
us.dtypes

video_id                  object
trending_date             object
title                     object
channel_title             object
category_id                int64
publish_time              object
tags                      object
views                      int64
likes                      int64
dislikes                   int64
comment_count              int64
thumbnail_link            object
comments_disabled           bool
ratings_disabled            bool
video_error_or_removed      bool
description               object
dtype: object

#### Data Dictionary


Variable | Description | Type
---| ---| ---
video_id| --- | categorical
trending_date | the date the video was collected, YY.MM.DD | continuous 
title | --- | string
channel_title | --- | string
category_id | JSON file look-up needed | categorical
publish_time | the time at which the video was published on YouTube | continuous
tags | separated by [|] character [none] means no tags | categorical
views | --- | integer
likes | --- | integer
dislikes | --- | integer
comment_count | --- | integer
thumbnail_link | --- | string
comments_disabled | --- | bool
ratings_disabled | --- | bool
video_error_or_removed | --- | bool
description | --- | string