# Grammys Project


## Data Dictionary
 `grammys_live_web_analytics.csv` and `ra_live_web_analytics.csv`.

These files will contain the following information:

- **date** - The date the data was confirmed. It is in `yyyy-mm-dd` format.
- **visitors** - The number of users who went on the website on that day.
- **pageviews** - The number of pages that all users viewed on the website.
- **sessions** - The total number of sessions on the website. A session is a group of user interactions with your website that take place within a given time frame. For example a single session can contain multiple page views, events, social interactions.
- **bounced_sessions** - The total number of bounced sessions on the website. A bounced session is when a visitor comes to the website and does not interact with any pages / links and leaves.
- **avg_session_duration_secs** - The average length for all session durations for all users that came to the website that day.
- **awards_week** - A binary flag if the dates align with marketing campaigns before and after the Grammys award ceremony was held. This is the big marketing push to get as many eyeballs watching the event.
- **awards_night** - The actual night that Grammy Awards event was held.

# Part I - Exploratory Data Analysis


In [None]:
# Import libraries
import pandas as pd
import numpy as np
import plotly.express as px

In [None]:
# this formats numbers to two decimal places when shown in pandas
pd.set_option('display.float_format', lambda x: '%.2f' % x)

In [None]:
# Read in dataframes
full_df = pd.read_csv('datasets/grammy_live_web_analytics.csv')
rec_academy = pd.read_csv('datasets/ra_live_web_analytics.csv')

In [None]:
# preview full_df dataframe
full_df.sample(5)

Unnamed: 0,date,visitors,pageviews,sessions,bounced_sessions,avg_session_duration_secs,awards_week,awards_night
2288,2023-04-08,17143,31697,18081,9419,73,0,0
2,2017-01-03,11425,27062,12215,7569,92,0,0
44,2017-02-14,213939,526079,229644,149522,102,1,0
1951,2022-05-06,17715,39910,19867,10997,80,0,0
697,2018-11-29,14385,22312,15066,8439,72,0,0


In [None]:
# preview rec_academy dataframe
rec_academy.sample(5)

Unnamed: 0,date,visitors,pageviews,sessions,bounced_sessions,avg_session_duration_secs,awards_week,awards_night
281,2022-11-09,1233,4585,1479,36,93,0,0
101,2022-05-13,1100,2713,1246,814,92,0,0
165,2022-07-16,682,1541,816,580,131,0,0
246,2022-10-05,1601,6688,2013,57,126,0,0
48,2022-03-21,1508,3417,1682,1080,84,0,0


In [None]:
# Plot a line chart of the visitors on the site.
px.line(full_df, x = 'date', y = 'visitors')

In [None]:
full_df.groupby('awards_night').agg({'visitors':'mean'})

Unnamed: 0_level_0,visitors
awards_night,Unnamed: 1_level_1
0,32388.28
1,1389590.23


In [None]:
# Split the data to separate the full_df into two new dataframes.
# One for before the switch of the websites and one for after
combined_site = full_df[full_df['date'] < "2022-02-01"]
grammys = full_df[full_df['date'] >= "2022-02-01"]

In [None]:
# .copy() prevents pandas from printing a scary-looking warning message
combined_site = combined_site.copy()
grammys = grammys.copy()

In [None]:
# print the shape of the combined_site dataframe
combined_site.shape

(1857, 8)

# Part II - KPIs

In [None]:
# create the list of dataframes
frames = [combined_site,rec_academy,grammys]

In [None]:
# create the `pages_per_session` column for all 3 dataframes.
for frame in frames:
    frame['pages_per_session'] = frame['pageviews'] / frame['sessions']

In [None]:
# calculate and print the mean of the `pages_per_session` column for all 3 dataframes.
mean1 = combined_site['pages_per_session'].mean()
mean2 = rec_academy['pages_per_session'].mean()
mean3 = grammys['pages_per_session'].mean()
print(f'Combined: {mean1:0.2f}, Recording Academy: {mean2:0.2f}, Grammys: {mean3:0.2f}')

Combined: 1.59, Recording Academy: 2.78, Grammys: 2.14


In [None]:
# combined_site graph
px.line(combined_site,x ='date',y ='pages_per_session')

In [None]:
# grammys graph
px.line(grammys,x ='date',y ='pages_per_session')

In [None]:
# rec_academy graph
px.line(rec_academy,x ='date',y ='pages_per_session')

In [None]:
# Function to calculate bounce rate
def bounce_rate(dataframe):
    '''
    Calculates the bounce rate for visitors on the website.
    input: dataframe with bounced_sessions and sessions columns
    output: numeric value from bounce rate
    '''
    # WRITE YOUR CODE BELOW
    # Remember, the input for the function is called `dataframe`
    # So all calculations should reference that variable.
    for data in dataframe:
        sum_bounced = dataframe['bounced_sessions'].sum()
        sum_sessions = dataframe['sessions'].sum()
        return 100 * sum_bounced / sum_sessions


In [None]:
# Calculate the Bounce Rate for each site
for frame in frames:
    test1 = bounce_rate(combined_site)
    test2 = bounce_rate(grammys)
    test3 = bounce_rate(rec_academy)

print(f'The bounce rate of the combined website is: {test1:0.2f}')
print(f'The bounce rate of the grammys website is: {test2:0.2f}')
print(f'The bounce rate of the recording academy website is: {test3:0.2f}')

The bounce rate of the combined website is: 41.58
The bounce rate of the grammys website is: 40.16
The bounce rate of the recording academy website is: 33.67


In [None]:
# Calculate the average of the avg_session_duration_secs. Use the frames list you created in Task 6.
for frame in frames:
    combined_mean = combined_site['avg_session_duration_secs'].mean()
    grammys_mean = grammys['avg_session_duration_secs'].mean()
    rec_academy_mean = rec_academy['avg_session_duration_secs'].mean()
print(f'The mean of average session duration in seconds for the combined website is: {combined_mean:0.2f}')
print(f'The mean of average session duration in seconds for the grammys website is: {grammys_mean:0.2f}')
print(f'The mean of average session duration in seconds for the recording academy website is: {rec_academy_mean:0.2f}')

The mean of average session duration in seconds for the combined website is: 102.85
The mean of average session duration in seconds for the grammys website is: 82.99
The mean of average session duration in seconds for the recording academy website is: 128.50


# Part III - Demographics



In [None]:
# read in the files
age_grammys = pd.read_csv('datasets/grammys_age_demographics.csv')
age_tra = pd.read_csv('datasets/tra_age_demographics.csv')

In [None]:
# preview the age_grammys file. the age_tra will look very similar.
display(age_grammys.sample(5))
display(age_tra.sample(5))

Unnamed: 0,age_group,pct_visitors
1,25-34,24.13
4,55-64,9.82
0,18-24,27.37
3,45-54,13.57
5,65+,6.39


Unnamed: 0,age_group,pct_visitors
3,45-54,13.82
1,25-34,26.16
0,18-24,27.12
5,65+,5.12
2,35-44,19.55


In [None]:
# create the website column
age_grammys['website'] = 'Grammys'
age_tra['website'] = 'Recording Academy'

In [None]:
display(age_grammys.sample(5))
display(age_tra.sample(5))

Unnamed: 0,age_group,pct_visitors,website
4,55-64,9.82,Grammys
5,65+,6.39,Grammys
1,25-34,24.13,Grammys
0,18-24,27.37,Grammys
3,45-54,13.57,Grammys


Unnamed: 0,age_group,pct_visitors,website
2,35-44,19.55,Recording Academy
3,45-54,13.82,Recording Academy
1,25-34,26.16,Recording Academy
4,55-64,8.24,Recording Academy
0,18-24,27.12,Recording Academy


In [None]:
# use pd.concat to join the two datasets
age_df = pd.concat([age_grammys,age_tra])
age_df.shape

(12, 3)

In [None]:
# Create bar chart
px.bar(age_df,x ='age_group', y ='pct_visitors', color = 'website', barmode='group')


*The chart indicates there is a not a significant correlation between the age of the visitor and which website they prefer to visit. The biggest difference in percentage of visitors between the two websites is found in ages 25-34, where 26% of the visitors visited the recording academy and 24% visited the grammys website.*


*After analyzing the Grammys, Recording Academy, and both websites combined, I recommend that the websites remain separate. The bounce rate, average pages viewed per session, and pages per session indicate a successful split. The average session duration is the only value where one of the individual websites did not outperform the combined websites. When the websites are separate, it's much easier to analyze and discover which categories the websites are succeeding or falling behind.
               When looking at the bounce rate, the combined website had the highest bounce rate of visitors at 41%. The Grammy's website was following at 40%, and following was the Recording Academy at 33%. The bounce rates of the separate websites were lower than their combined counterpart. Another significant value was the average number of pages viewed per session. The average number of pages per session of the combined website was 1.59, the Grammys was 2.14, and the Recording Academy was 2.78. In the graph representing the number of pages viewed per session, we saw the combined website had lower peaks and values outside of its peaks compared to the separate websites. Considering the graph and averages, the split positively impacted the pages per session value. The average session duration was the only data indicating a worse performance by an individual website than the combined website. The Grammys had an average session duration of 83 seconds, the Recording Academy had an average of 128 seconds, and the combined website had an average of 102 seconds. The Grammys website didn't do as well as the Recording Academy website at keeping viewers engaged on their website. The great news is that since these websites have separated, we can experiment with strategies to increase the average session duration on the Grammys website.
               If the websites remain separated, it will be easier to conduct analysis and experiment to increase the average session duration, pages per session, and the bounce rate of visitors. I support the separation of these two websites.*

Ray and Harvey are both interested to see how the Grammys.com website compares to that of their main music award competitor, The American Music Awards (AMA). The dashboard below is aggregated information about the performace of The AMA website for the months of April, May, and June of 2023.

Your task is to determine how the Grammys website is performing relative to The AMA website. In particular, you will be looking at the device distribution and total visits over the same time span and leveraging information about Visit Duration, Bounce Rate, and Pages / Visit from your work in the core of this project.

![](figs/TheAMAs.png)

In [None]:
# Load in the data
desktop_users = pd.read_csv('datasets/desktop_users.csv')
mobile_users = pd.read_csv('datasets/mobile_users.csv')


In [None]:
# preview the desktop_users file
desktop_users.sample(5)

Unnamed: 0,date,segment,visitors
106,2022-05-18,Desktop Traffic,6910
65,2022-04-07,Desktop Traffic,31264
238,2022-09-27,Desktop Traffic,5476
286,2022-11-14,Desktop Traffic,10143
265,2022-10-24,Desktop Traffic,6128


In [None]:
# preview mobile_users file
mobile_users.sample(5)

Unnamed: 0,date,segment,visitors
462,2023-05-09,Mobile Traffic,13429
335,2023-01-02,Mobile Traffic,15984
355,2023-01-22,Mobile Traffic,29387
414,2023-03-22,Mobile Traffic,9544
391,2023-02-27,Mobile Traffic,10326


In [None]:
# change name of the visitors column to indicate which category it comes from
mobile_users = mobile_users.rename(columns = {'visitors': 'mobile_visitors'})
desktop_users = desktop_users.rename(columns = {'visitors': 'desktop_visitors'})

In [None]:
# drop the segment column from each dataframe
mobile_users = mobile_users.drop(columns=['segment'])
desktop_users = desktop_users.drop(columns=['segment'])

In [None]:
mobile_users.head()

Unnamed: 0,date,mobile_visitors
0,2022-02-01,23494
1,2022-02-02,20234
2,2022-02-03,22816
3,2022-02-04,18592
4,2022-02-05,13298


In [None]:
desktop_users.head()

Unnamed: 0,date,desktop_visitors
0,2022-02-01,10195
1,2022-02-02,10560
2,2022-02-03,9935
3,2022-02-04,8501
4,2022-02-05,5424


In [None]:
# join the two dataframes and preview the dataframe
segment_df = pd.concat([mobile_users,desktop_users])
segment_df = segment_df.fillna(0)

In [None]:
# create total_visitors column
segment_df['total_visitors'] = segment_df['desktop_visitors'] + segment_df['mobile_visitors']


In [None]:
segment_df

Unnamed: 0,date,mobile_visitors,desktop_visitors,total_visitors
0,2022-02-01,23494.00,0.00,23494.00
1,2022-02-02,20234.00,0.00,20234.00
2,2022-02-03,22816.00,0.00,22816.00
3,2022-02-04,18592.00,0.00,18592.00
4,2022-02-05,13298.00,0.00,13298.00
...,...,...,...,...
510,2023-06-26,0.00,4302.00,4302.00
511,2023-06-27,0.00,5528.00,5528.00
512,2023-06-28,0.00,4928.00,4928.00
513,2023-06-29,0.00,5554.00,5554.00


In [None]:
# filter and calculate the percentage share
segment_df[segment_df['date']>='2023-04-01'].sum()




date                2023-04-012023-04-022023-04-032023-04-042023-0...
mobile_visitors                                             973700.00
desktop_visitors                                            454782.00
total_visitors                                             1428482.00
dtype: object


*Around 68% mobile users, and 32% desktop visitors out of total visitors in the timeframe in question. There were 1,428,482 total visitors in this timeframe.*


*The values that the Grammys website is doing well with compared to the AMA website is the bounce rate, average session duration, total visits, a better distribution between mobile and desktop users. If my analysis is correct, the only value that AMA outperformed the Grammys website was on the average pages viewed per visit. Grammys had an average of 2.14, while AMA had an average of 2.74.*