# Songs genres over time
**Author:** Vivian Li
<br> **Achievement:** Understood the dominant genres since 2000 and how their popularity/ presence has changed over time. It is interesting to note that country music popularity has not changed much over time, whereas pop songs are on the rise.

# Loading Data

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
import nbimporter
from collections import Counter

copy = pd.read_csv('/Users/vivianli/Documents/Data Science/evolution-of-music/exploration/data/billboards_data_w_artist_data.csv')
copy2 = pd.read_csv('/Users/vivianli/Documents/Data Science/evolution-of-music/exploration/data/billboards_data_w_artist_data_2.csv')

In [None]:
copy = copy[copy['date']<'2015-01-01'].reset_index(drop=True)
copy = copy.append(copy2)
copy

In [None]:
copy2

In [None]:
# create year column using date
copy['year'] = copy['date'].astype(str).str[:4]
copy = copy[['year','artist_genres']]
# fill in null values/ make them strings
copy= copy.fillna('n/a')

# clean up strings; get rid of whitespace in genres
for idx,row in copy.iterrows():
    row['artist_genres'] = row['artist_genres'].replace(" ", "")

# group by years to get all the genres for the songs that year
copy = copy.groupby(['year'])['artist_genres'].apply(','.join).reset_index()

In [None]:
headers = ['year','artist_genres']
full_list = []

for idx,row in copy.iterrows():
    year = row['year']
    year_genres = Counter(row['artist_genres'].split(','))
    year_genres = dict(year_genres)
    full_list.append(year_genres)

new_dataframe = pd.DataFrame(full_list)

In [None]:
new_dataframe.set_index([pd.Index(copy['year'])],inplace=True)

In [None]:
new_dataframe.drop(columns=['n/a'],inplace=True)
new_dataframe

Since there are too many genres for a good analysis, get the top 10 genres of all time and only include those in our data. Top 10 genres are determined by count of occurences in charts.

In [None]:
sums=pd.DataFrame()

for column in new_dataframe:
    sums = sums.append([[column,new_dataframe[column].sum()]])

sums = sums.sort_values([1],ascending=False).head(10)
top_10_genres_all_time = list(sums[0])
new_dataframe= new_dataframe[top_10_genres_all_time]

Stacked bar graph of counts of songs per genre over time

In [None]:
new_dataframe.plot(kind='bar', stacked=True)
plt.title("Genres vs Time")
plt.xlabel("Year")
plt.ylabel("Count of Songs")

Get percentages of each genre

In [None]:
res = new_dataframe.div(new_dataframe.sum(axis=1), axis=0)
res.plot(kind='bar', stacked=True)
plt.title("Genres vs Time")
plt.xlabel("Year")
plt.ylabel("Count of Songs")

Notes from the above chart:
- dancepop, country, contemporary country, country road trends seem fairly consistent over the years
- r&b, pop rock, urban contemporary have been steadily decreasing in popularity
- post teenpop and pop are on the rise