### **1. Importing Libraries**

In [1]:
import pandas as pd
import numpy as np

### **2. Importing Data**

In [2]:
df = pd.read_json("datasets/CLEANED_videos_count_data.json")

In [3]:
df = df.T

In [4]:
display(df)

Unnamed: 0,id,title,kind,viewCount,likeCount,commentCount,date,time,artist
0,YudHcBIxlYw,JISOO - ‘꽃(FLOWER)’ M\/V,Music Video,390657394,10117824,1340919,2023-09-19,17:01:02.392672,JISOO
1,POe9SOEKotk,BLACKPINK - ‘Shut Down’ M\/V,Music Video,522252772,10824498,2239017,2023-09-19,17:01:02.392707,BLACKPINK
2,gQlMMD8auMs,BLACKPINK - ‘Pink Venom’ M\/V,Music Video,731994719,16069895,3569773,2023-09-19,17:01:02.392717,BLACKPINK
3,awkkyBH2zEo,LISA - 'LALISA' M\/V,Music Video,653808793,17313679,2815724,2023-09-19,17:01:02.392725,LISA
4,K9_VFxzCuQ0,ROSÉ - 'Gone' M\/V,Music Video,267505076,7376149,1240229,2023-09-19,17:01:02.392734,ROSÉ
...,...,...,...,...,...,...,...,...,...
74685,EKHdMwRaU60,BLACKPINK​ - '붐바야(BOOMBAYAH)' 0828 SBS Inkigayo,Performance,61585167,832100,23060,2025-08-11,02:01:23.405731,BLACKPINK
74686,metZ_f8aqC0,BLACKPINK​ - '휘파람(WHISTLE)' 0821 SBS Inkigayo,Performance,45451628,925239,27048,2025-08-11,02:01:23.405739,BLACKPINK
74687,RGmL76BBGZk,BLACKPINK​ - '붐바야(BOOMBAYAH)' 0821 SBS Inkigay...,Performance,17764522,555087,15245,2025-08-11,02:01:23.405743,BLACKPINK
74688,vAqAp1tJnkc,BLACKPINK - '휘파람’(WHISTLE) 0814 SBS Inkigayo,Performance,49396180,873371,29515,2025-08-11,02:01:23.405745,BLACKPINK


### **3. Data Information**

#### **Data dimensions**: 


In [5]:
df.shape

(74690, 9)

#### **Details for each row:**

The information of one video, including id, title, artist, views, likes,...

In [6]:
df.head(1)

Unnamed: 0,id,title,kind,viewCount,likeCount,commentCount,date,time,artist
0,YudHcBIxlYw,JISOO - ‘꽃(FLOWER)’ M\/V,Music Video,390657394,10117824,1340919,2023-09-19,17:01:02.392672,JISOO


#### **Details for each column:**

| Column       | Description                                                                                     |
|--------------|-------------------------------------------------------------------------------------------------|
| id           | The unique identifier for the video on YouTube.                                                 |
| title        | The title of the video.                                                                         |
| kind         | The type or category of the video content.                                                      |
| viewCount    | The total number of views the video has received on YouTube.                                     |
| likeCount    | The total number of likes the video has received on YouTube.                                     |
| commentCount | The total number of comments posted on the video on YouTube.                                     |
| date         | The date when the data was collected or the statistics were obtained.                            |
| time         | The time when the data was collected or the statistics were obtained (if applicable).            |
| artist       | The artist or creator associated with the video content.                                         |


#### **Data types of each column:**

In [7]:
df.dtypes

id              object
title           object
kind            object
viewCount       object
likeCount       object
commentCount    object
date            object
time            object
artist          object
dtype: object

Convert the `viewCount`, `likeCount`, `commentCount` to int:

In [8]:
df['viewCount'] = df['viewCount'].astype(np.int64)
df['likeCount'] = df['likeCount'].astype(np.int64)
df['commentCount'] = df['commentCount'].astype(np.int64)

Convert the `date` column to datetime type:

In [9]:
df['date'] = pd.to_datetime(df['date'], format='%Y-%m-%d')

In [10]:
#df['time'] = pd.to_datetime(df['time'], format='%H:%M:%S.%f').dt.time

#### **Number of missing values in each column:**

In [11]:
df.isnull().sum()

id              0
title           0
kind            0
viewCount       0
likeCount       0
commentCount    0
date            0
time            0
artist          0
dtype: int64

#### **Investigating missing data days**

In [12]:
all_dates = pd.date_range(start=df['date'].min(), end=df['date'].max(), freq='D')
missing_dates = all_dates[~all_dates.isin(df['date'])]
print("Missing dates:")
if len(missing_dates) > 0:
    print(missing_dates)

Missing dates:
DatetimeIndex(['2024-06-17', '2024-06-18', '2024-06-19', '2024-06-20',
               '2024-06-21', '2024-06-22', '2024-06-23', '2024-06-24',
               '2024-06-25', '2024-06-26', '2024-06-27', '2024-06-28',
               '2024-06-29', '2024-06-30', '2024-07-01', '2024-07-02',
               '2024-07-03', '2024-07-04', '2024-07-05', '2024-07-06',
               '2024-07-07', '2024-07-08', '2024-07-09', '2024-07-10',
               '2024-07-11', '2024-07-12', '2024-07-13', '2024-07-14',
               '2024-07-15', '2024-07-16', '2024-07-17', '2024-07-18',
               '2024-07-19', '2024-07-20', '2024-07-21', '2024-07-22',
               '2024-07-23', '2024-07-24', '2024-07-25', '2024-07-26',
               '2024-07-27', '2024-07-28', '2024-07-29', '2024-07-30',
               '2024-07-31', '2024-08-01', '2024-08-02', '2024-08-03',
               '2024-08-04', '2024-08-05', '2024-08-06', '2024-08-07',
               '2024-08-08', '2024-08-09', '2024-08-10', '2024

#### **Date of First Data Collection? Date of Last Data Collection?**

In [13]:
print("Date of First Data Collection:", df['date'].dt.date.min())
print("Date of Last Data Collection:", df['date'].dt.date.max())

Date of First Data Collection: 2023-09-19
Date of Last Data Collection: 2025-08-11


#### **The number of distinct videos in the dataset:**

In [14]:
df.id.nunique()

124

#### **Number of videos for each video types:**

In [15]:
kind_video_counts = df.drop_duplicates(subset=['kind', 'id']).groupby('kind').size().reset_index(name='count').sort_values(by='count',ascending=False)
kind_video_counts['percentage'] = round(kind_video_counts['count'] / kind_video_counts['count'].sum() * 100,2)
display(kind_video_counts)

Unnamed: 0,kind,count,percentage
2,Performance,88,69.84
0,Dance Practice Video,20,15.87
1,Music Video,18,14.29


#### **Number of videos for each artist:**

In [16]:
artist_video_counts = df.drop_duplicates(subset=['artist', 'id']).groupby('artist').size().reset_index(name='count').sort_values(by='count',ascending=False)
artist_video_counts['percentage'] = round(artist_video_counts['count'] / artist_video_counts['count'].sum() * 100,2)
display(artist_video_counts)

Unnamed: 0,artist,count,percentage
0,BLACKPINK,97,78.23
1,JENNIE,11,8.87
4,ROSÉ,8,6.45
3,LISA,6,4.84
2,JISOO,2,1.61


In [17]:
artist_video_counts_2 = df.drop_duplicates(subset=['artist','id','kind']).groupby(['artist','kind']).size().reset_index(name='count').sort_values(by=['count'],ascending=False).sort_values(by=['artist'])
display(artist_video_counts_2)

Unnamed: 0,artist,kind,count
2,BLACKPINK,Performance,72
1,BLACKPINK,Music Video,13
0,BLACKPINK,Dance Practice Video,12
5,JENNIE,Performance,6
3,JENNIE,Dance Practice Video,4
4,JENNIE,Music Video,1
6,JISOO,Dance Practice Video,1
7,JISOO,Music Video,1
8,JISOO,Performance,1
11,LISA,Performance,3


#### **Most Viewed Video by Each Video types:**

In [18]:
most_viewed_videos = df.loc[df.groupby('kind')['viewCount'].idxmax()].reset_index()
most_viewed_videos['link'] = f'https://www.youtube.com/watch?v=' + most_viewed_videos['id']
print("Most Viewed Video by Each Video types:")
display(most_viewed_videos[['kind', 'title', 'viewCount','link']])

Most Viewed Video by Each Video types:


Unnamed: 0,kind,title,viewCount,link
0,Dance Practice Video,BLACKPINK - 'How You Like That' DANCE PERFORMA...,1885077047,https://www.youtube.com/watch?v=32si5cfrCNc
1,Music Video,BLACKPINK - ‘뚜두뚜두 (DDU-DU DDU-DU)’ M\/V,2326677998,https://www.youtube.com/watch?v=IHNzOHi8sJs
2,Performance,LISA - 'MONEY' EXCLUSIVE PERFORMANCE VIDEO,1143648763,https://www.youtube.com/watch?v=dNCWe_6HAM8


#### **Most Viewed Video by Each Artist**

In [19]:
most_viewed_videos = df.loc[df.groupby('artist')['viewCount'].idxmax()].reset_index()
most_viewed_videos['link'] = f'https://www.youtube.com/watch?v=' + most_viewed_videos['id']
print("Most Viewed Video by Each Artist:")
display(most_viewed_videos[['artist', 'title', 'viewCount','link']])

Most Viewed Video by Each Artist:


Unnamed: 0,artist,title,viewCount,link
0,BLACKPINK,BLACKPINK - ‘뚜두뚜두 (DDU-DU DDU-DU)’ M\/V,2326677998,https://www.youtube.com/watch?v=IHNzOHi8sJs
1,JENNIE,JENNIE - 'SOLO' M\/V,1061413953,https://www.youtube.com/watch?v=b73BI9eUkjM
2,JISOO,JISOO - ‘꽃(FLOWER)’ M\/V,610381561,https://www.youtube.com/watch?v=YudHcBIxlYw
3,LISA,LISA - 'MONEY' EXCLUSIVE PERFORMANCE VIDEO,1143648763,https://www.youtube.com/watch?v=dNCWe_6HAM8
4,ROSÉ,ROSÉ - 'On The Ground' M\/V,384003957,https://www.youtube.com/watch?v=CKZvWhCqx1s


#### **Top 10 Videos with the Highest Number of Views**

In [20]:
latest_view_per_video = df.drop_duplicates(subset='id', keep='last')
top_10_viewed_videos = latest_view_per_video.nlargest(10, 'viewCount').reset_index()
top_10_viewed_videos['link'] = f'https://www.youtube.com/watch?v=' + top_10_viewed_videos['id']
print("Top 10 Videos with the Highest Number of Views:")
display(top_10_viewed_videos[['title', 'kind','artist','viewCount','link']])

Top 10 Videos with the Highest Number of Views:


Unnamed: 0,title,kind,artist,viewCount,link
0,BLACKPINK - ‘뚜두뚜두 (DDU-DU DDU-DU)’ M\/V,Music Video,BLACKPINK,2326677998,https://www.youtube.com/watch?v=IHNzOHi8sJs
1,BLACKPINK - 'Kill This Love' M\/V,Music Video,BLACKPINK,2129468713,https://www.youtube.com/watch?v=2S24-y0Ij3Y
2,BLACKPINK - 'How You Like That' DANCE PERFORMA...,Dance Practice Video,BLACKPINK,1885077047,https://www.youtube.com/watch?v=32si5cfrCNc
3,BLACKPINK - '붐바야 (BOOMBAYAH)' M\/V,Music Video,BLACKPINK,1809014806,https://www.youtube.com/watch?v=bwmSjveL3Lc
4,BLACKPINK - '마지막처럼 (AS IF IT'S YOUR LAST)' M\/V,Music Video,BLACKPINK,1466741243,https://www.youtube.com/watch?v=Amq-qlqbjYA
5,BLACKPINK - 'How You Like That' M\/V,Music Video,BLACKPINK,1341376066,https://www.youtube.com/watch?v=ioNng23DkIM
6,LISA - 'MONEY' EXCLUSIVE PERFORMANCE VIDEO,Performance,LISA,1143648763,https://www.youtube.com/watch?v=dNCWe_6HAM8
7,JENNIE - 'SOLO' M\/V,Music Video,JENNIE,1061413953,https://www.youtube.com/watch?v=b73BI9eUkjM
8,BLACKPINK - ‘Pink Venom’ M\/V,Music Video,BLACKPINK,1003241790,https://www.youtube.com/watch?v=gQlMMD8auMs
9,BLACKPINK - 'Ice Cream (with Selena Gomez)' M\/V,Music Video,BLACKPINK,972986403,https://www.youtube.com/watch?v=vRXZj0DzXIA


#### **Top 10 Videos with the Highest Number of Likes**

In [21]:
latest_like_per_video = df.drop_duplicates(subset='id', keep='last')
top_10_liked_videos = latest_like_per_video.nlargest(10, 'likeCount').reset_index()
top_10_liked_videos['link'] = f'https://www.youtube.com/watch?v=' + top_10_viewed_videos['id']
print("Top 10 Videos with the Highest Number of Likes:")
display(top_10_liked_videos[['title', 'kind','artist','likeCount','link']])

Top 10 Videos with the Highest Number of Likes:


Unnamed: 0,title,kind,artist,likeCount,link
0,BLACKPINK - 'Kill This Love' M\/V,Music Video,BLACKPINK,26260032,https://www.youtube.com/watch?v=IHNzOHi8sJs
1,BLACKPINK - 'How You Like That' M\/V,Music Video,BLACKPINK,25170280,https://www.youtube.com/watch?v=2S24-y0Ij3Y
2,BLACKPINK - ‘뚜두뚜두 (DDU-DU DDU-DU)’ M\/V,Music Video,BLACKPINK,24492632,https://www.youtube.com/watch?v=32si5cfrCNc
3,BLACKPINK - 'Ice Cream (with Selena Gomez)' M\/V,Music Video,BLACKPINK,20395478,https://www.youtube.com/watch?v=bwmSjveL3Lc
4,BLACKPINK - 'How You Like That' DANCE PERFORMA...,Dance Practice Video,BLACKPINK,19538655,https://www.youtube.com/watch?v=Amq-qlqbjYA
5,LISA - 'LALISA' M\/V,Music Video,LISA,17671402,https://www.youtube.com/watch?v=ioNng23DkIM
6,BLACKPINK - ‘Pink Venom’ M\/V,Music Video,BLACKPINK,17237715,https://www.youtube.com/watch?v=dNCWe_6HAM8
7,BLACKPINK - '붐바야 (BOOMBAYAH)' M\/V,Music Video,BLACKPINK,17115534,https://www.youtube.com/watch?v=b73BI9eUkjM
8,BLACKPINK - 'Lovesick Girls' M\/V,Music Video,BLACKPINK,16007009,https://www.youtube.com/watch?v=gQlMMD8auMs
9,LISA - 'MONEY' EXCLUSIVE PERFORMANCE VIDEO,Performance,LISA,15202118,https://www.youtube.com/watch?v=vRXZj0DzXIA


#### **Top 10 Videos with the Highest Number of Comments**

In [22]:
latest_cmt_per_video = df.drop_duplicates(subset='id', keep='last')
top_10_cmt_videos = latest_cmt_per_video.nlargest(10, 'commentCount').reset_index()
top_10_cmt_videos['link'] = f'https://www.youtube.com/watch?v=' + top_10_viewed_videos['id']
print("Top 10 Videos with the Highest Number of Comments:")
display(top_10_cmt_videos[['title', 'kind','artist','commentCount','link']])

Top 10 Videos with the Highest Number of Comments:


Unnamed: 0,title,kind,artist,commentCount,link
0,BLACKPINK - 'How You Like That' M\/V,Music Video,BLACKPINK,5012255,https://www.youtube.com/watch?v=IHNzOHi8sJs
1,BLACKPINK - ‘Pink Venom’ M\/V,Music Video,BLACKPINK,3486480,https://www.youtube.com/watch?v=2S24-y0Ij3Y
2,JENNIE - 'SOLO' M\/V,Music Video,JENNIE,3347562,https://www.youtube.com/watch?v=32si5cfrCNc
3,BLACKPINK - ‘뚜두뚜두 (DDU-DU DDU-DU)’ M\/V,Music Video,BLACKPINK,3262939,https://www.youtube.com/watch?v=bwmSjveL3Lc
4,BLACKPINK - 'Ice Cream (with Selena Gomez)' M\/V,Music Video,BLACKPINK,3011511,https://www.youtube.com/watch?v=Amq-qlqbjYA
5,LISA - 'LALISA' M\/V,Music Video,LISA,2785944,https://www.youtube.com/watch?v=ioNng23DkIM
6,ROSÉ - 'On The Ground' M\/V,Music Video,ROSÉ,2573427,https://www.youtube.com/watch?v=dNCWe_6HAM8
7,BLACKPINK - 'Kill This Love' M\/V,Music Video,BLACKPINK,2502614,https://www.youtube.com/watch?v=b73BI9eUkjM
8,BLACKPINK - ‘Shut Down’ M\/V,Music Video,BLACKPINK,2092525,https://www.youtube.com/watch?v=gQlMMD8auMs
9,BLACKPINK - 'Lovesick Girls' M\/V,Music Video,BLACKPINK,2024528,https://www.youtube.com/watch?v=vRXZj0DzXIA


#### **Views Growth in 24 Hours:**

In [23]:
latest_date_data = df[df['date'] == df['date'].max()]
previous_date_data = df[df['date'] == (df['date'].max() - pd.Timedelta(days=1))]
merged_data = pd.merge(latest_date_data, previous_date_data, on=['title'], suffixes=('_latest', '_previous'), how='inner')
merged_data['viewCount_increase'] = merged_data['viewCount_latest'] - merged_data['viewCount_previous']

top_10_increase_videos = merged_data.nlargest(10, 'viewCount_increase').reset_index()
top_10_increase_videos['link'] = f'https://www.youtube.com/watch?v=' + top_10_increase_videos['id_latest']
print("Top 10 Videos with the Highest Increase in Views from the Previous Day:")
display(top_10_increase_videos[['title','date_latest','viewCount_increase','viewCount_latest','link']])

Top 10 Videos with the Highest Increase in Views from the Previous Day:


Unnamed: 0,title,date_latest,viewCount_increase,viewCount_latest,link
0,BLACKPINK - ‘뛰어(JUMP)’ M\/V,2025-08-11,2204407,140855809,https://www.youtube.com/watch?v=CgCVZdcKcqY
1,BLACKPINK - 'How You Like That' DANCE PERFORMA...,2025-08-11,469131,1885077047,https://www.youtube.com/watch?v=32si5cfrCNc
2,BLACKPINK - 'Kill This Love' M\/V,2025-08-11,388658,2129468713,https://www.youtube.com/watch?v=2S24-y0Ij3Y
3,BLACKPINK - ‘Pink Venom’ M\/V,2025-08-11,382966,1003241790,https://www.youtube.com/watch?v=gQlMMD8auMs
4,BLACKPINK - '뛰어(JUMP)' Live at WORLD TOUR [DEA...,2025-08-11,375693,8810996,https://www.youtube.com/watch?v=P169hsXjYQs
5,BLACKPINK - '붐바야 (BOOMBAYAH)' M\/V,2025-08-11,278885,1809014806,https://www.youtube.com/watch?v=bwmSjveL3Lc
6,LISA - 'MONEY' EXCLUSIVE PERFORMANCE VIDEO,2025-08-11,277321,1143648763,https://www.youtube.com/watch?v=dNCWe_6HAM8
7,BLACKPINK - ‘뚜두뚜두 (DDU-DU DDU-DU)’ M\/V,2025-08-11,271405,2326677998,https://www.youtube.com/watch?v=IHNzOHi8sJs
8,BLACKPINK - ‘Shut Down’ M\/V,2025-08-11,248267,750768395,https://www.youtube.com/watch?v=POe9SOEKotk
9,BLACKPINK - '마지막처럼 (AS IF IT'S YOUR LAST)' M\/V,2025-08-11,204259,1466741243,https://www.youtube.com/watch?v=Amq-qlqbjYA
