## THE OFFICE (US)

- Importing all the important libraries

In [2]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import plotly.express as px

- Reading the data

In [3]:
office_df = pd.read_csv("the_office_series.csv")

In [4]:
office_df.head(5)

Unnamed: 0.1,Unnamed: 0,Season,EpisodeTitle,About,Ratings,Votes,Viewership,Duration,Date,GuestStars,Director,Writers
0,0,1,Pilot,The premiere episode introduces the boss and s...,7.5,4936,11.2,23,24 March 2005,,Ken Kwapis,Ricky Gervais |Stephen Merchant and Greg Daniels
1,1,1,Diversity Day,Michael's off color remark puts a sensitivity ...,8.3,4801,6.0,23,29 March 2005,,Ken Kwapis,B. J. Novak
2,2,1,Health Care,Michael leaves Dwight in charge of picking the...,7.8,4024,5.8,22,5 April 2005,,Ken Whittingham,Paul Lieberstein
3,3,1,The Alliance,"Just for a laugh, Jim agrees to an alliance wi...",8.1,3915,5.4,23,12 April 2005,,Bryan Gordon,Michael Schur
4,4,1,Basketball,Michael and his staff challenge the warehouse ...,8.4,4294,5.0,23,19 April 2005,,Greg Daniels,Greg Daniels


- Check for null values

In [6]:
office_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 188 entries, 0 to 187
Data columns (total 12 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   Unnamed: 0    188 non-null    int64  
 1   Season        188 non-null    int64  
 2   EpisodeTitle  188 non-null    object 
 3   About         188 non-null    object 
 4   Ratings       188 non-null    float64
 5   Votes         188 non-null    int64  
 6   Viewership    188 non-null    float64
 7   Duration      188 non-null    int64  
 8   Date          188 non-null    object 
 9   GuestStars    29 non-null     object 
 10  Director      188 non-null    object 
 11  Writers       188 non-null    object 
dtypes: float64(2), int64(4), object(6)
memory usage: 13.3+ KB


*There are some values present in GuestStars columns*

- Describe all the numerical data columns 

In [7]:
office_df.describe()

Unnamed: 0.1,Unnamed: 0,Season,Ratings,Votes,Viewership,Duration
count,188.0,188.0,188.0,188.0,188.0,188.0
mean,93.5,5.468085,8.237234,2838.228723,7.24633,27.053191
std,54.415071,2.386245,0.58993,1063.16529,2.066012,6.937254
min,0.0,1.0,6.6,1832.0,3.25,19.0
25%,46.75,3.0,7.8,2187.75,5.99,22.0
50%,93.5,6.0,8.2,2614.0,7.535,23.0
75%,140.25,7.25,8.6,3144.25,8.425,30.0
max,187.0,9.0,9.8,10515.0,22.91,60.0


- Grouping season-wise and finding how many episodes in each season

In [11]:
season = office_df.groupby("Season")[["Season"]].count()
season

Unnamed: 0_level_0,Season
Season,Unnamed: 1_level_1
1,6
2,22
3,23
4,14
5,26
6,26
7,24
8,24
9,23


- Histogram for Rating

In [12]:
px.histogram(office_df['Ratings'])

- Scatter plot for Votes

In [13]:
px.scatter(office_df['Votes'])

In [15]:
# analysing the rating

max_rating = max(office_df['Ratings'])
min_rating = min(office_df['Ratings'])

print("The maximum rating that an episode got is:",max_rating)
print("The minimum rating that an episode got is:",min_rating)

The maximum rating that an episode got is: 9.8
The minimum rating that an episode got is: 6.6


In [16]:
# here we normalize the data

normal_ratings = (office_df['Ratings'] - min_rating)/(max_rating-min_rating)

office_df['norm_rating'] = normal_ratings

office_df.head(5)

Unnamed: 0.1,Unnamed: 0,Season,EpisodeTitle,About,Ratings,Votes,Viewership,Duration,Date,GuestStars,Director,Writers,norm_rating
0,0,1,Pilot,The premiere episode introduces the boss and s...,7.5,4936,11.2,23,24 March 2005,,Ken Kwapis,Ricky Gervais |Stephen Merchant and Greg Daniels,0.28125
1,1,1,Diversity Day,Michael's off color remark puts a sensitivity ...,8.3,4801,6.0,23,29 March 2005,,Ken Kwapis,B. J. Novak,0.53125
2,2,1,Health Care,Michael leaves Dwight in charge of picking the...,7.8,4024,5.8,22,5 April 2005,,Ken Whittingham,Paul Lieberstein,0.375
3,3,1,The Alliance,"Just for a laugh, Jim agrees to an alliance wi...",8.1,3915,5.4,23,12 April 2005,,Bryan Gordon,Michael Schur,0.46875
4,4,1,Basketball,Michael and his staff challenge the warehouse ...,8.4,4294,5.0,23,19 April 2005,,Greg Daniels,Greg Daniels,0.5625


In [17]:
ratings_color = []

for lab, row in office_df.iterrows() :
    if row['norm_rating'] < 0.25:
        ratings_color.append('red')
    elif row['norm_rating'] < 0.5:
        ratings_color.append('orange')
    elif row['norm_rating'] < 0.75:
        ratings_color.append('yellow')
    else :
        ratings_color.append('green')

office_df['colors'] = ratings_color

In [18]:
office_df.tail(5)

Unnamed: 0.1,Unnamed: 0,Season,EpisodeTitle,About,Ratings,Votes,Viewership,Duration,Date,GuestStars,Director,Writers,norm_rating,colors
183,183,9,Stairmageddon,Dwight shoots Stanley with a bull tranquilizer...,8.0,1985,3.83,22,11 April 2013,,Matt Sohn,Dan Sterling,0.4375,orange
184,184,9,Paper Airplane,The employees hold a paper airplane competitio...,8.0,2007,3.25,22,25 April 2013,,Jesse Peretz,Halsted Sullivan | Warren Lieberstein,0.4375,orange
185,185,9,Livin' the Dream,Dwight becomes regional manager after Andy qui...,9.0,2831,3.51,42,2 May 2013,Michael Imperioli,Jeffrey Blitz,Niki Schwartz-Wright,0.75,yellow
186,186,9,A.A.R.M.,Dwight prepares for a marriage proposal and hi...,9.5,3914,4.56,43,9 May 2013,,David Rogers,Brent Forrester,0.90625,green
187,187,9,Finale,"One year later, Dunder Mifflin employees past ...",9.8,10515,5.69,51,16 May 2013,"Joan Cusack, Ed Begley Jr, Rachel Harris, Nanc...",Ken Kwapis,Greg Daniels,1.0,green


In [19]:
px.scatter(office_df,x='Unnamed: 0', y='Viewership',color = 'colors')

In [20]:
office_df.loc[77]

Unnamed: 0                                                     77
Season                                                          5
EpisodeTitle                                        Stress Relief
About           Dwight's too-realistic fire alarm gives Stanle...
Ratings                                                       9.7
Votes                                                        8170
Viewership                                                  22.91
Duration                                                       60
Date                                              1 February 2009
GuestStars              Cloris Leachman, Jack Black, Jessica Alba
Director                                            Jeffrey Blitz
Writers                                          Paul Lieberstein
norm_rating                                               0.96875
colors                                                      green
Name: 77, dtype: object