# DATA 602 Assignment 7: Using Pandas to explore and understand a music dataset
Author: Kevin Kirby
Date: 10/11/2024

## Assignment Overview
The assignment states: "In this homework assignment, you will explore and analyze a public dataset of your choosing. Since this assignment is “open-ended” in nature, you are free to expand upon the requirements below. However, you must meet the minimum requirments as indicated in each section. 

* You must use Pandas as the **primary tool** to process your data.

* The preferred method for this analysis is in a .ipynb file. Feel free to use whichever platform of your choosing.  
 * https://www.youtube.com/watch?v=inN8seMm7UI (Getting started with Colab).

* Your data should need some "work", or be considered "dirty".  You must show your skills in data cleaning/wrangling."


## Introduction

I have chosen [Music Dataset: Lyrics and Metadata from 1950 to 2019](https://data.mendeley.com/datasets/3t9vbwxgr5/3). The description provided by the dataset authors states:
"This dataset provides a list of lyrics from 1950 to 2019 describing music metadata as sadness, danceability, loudness, acousticness, etc. We also provide some informations as lyrics which can be used to natural language processing. 

The  audio  data  was  scraped  using  Echo  Nest®  API  integrated  engine  with spotipy Python’s package. The spotipy API permits the user to search for specific genres, artists,songs, release date, etc. To obtain the lyrics we used the Lyrics Genius® API as baseURL for requesting data based on the song title and artist name."

I found this data after searching around the [Google Dataset Search](https://datasetsearch.research.google.com/), which was on the list of example places I could look for data that met the criteria for this lab. 

I chose it because I have familiarity with the metrics being used, as I've used similar ones on different datasets. This will allow me to assess the quality of the data from a structural as well as accuracy perspective. In addition, I wanted to see whatm if any, kind of connection lyrics have on the way a song gets classified by Spotify's audio analysis.

## Data Exploration

While pandas is the only library I need for most of this work, I did bring in builtins for my summary stats function, which I'll explain down below. The invocation of ```pd.display.max_columns``` lets me see all the columns without some in the middle being cutout.

Things I generally look for in EDA:
* Data that "makes sense"
   * 

In [1]:
import pandas as pd
pd.set_option('display.max_columns', None)
import builtins

In [2]:
tcc_ceds_music_df = pd.read_csv('https://storage.googleapis.com/data_science_masters_files/2024_fall/data_602_advanced_python/week_eight/tcc_ceds_music.csv')

### Stastistics and other relevant information

```df_deets()``` is a function I made that provides a lot basic and common EDA stats  at once. It takes the variable name of one dataframe as a string or multiple dataframes as a list of string as the first parameter and a string list of numeric value columns as the second. In return, it produces:

* The first and last five rows of the dataframe, with total rows and columns printed right below it
* Summary statistics table that provides the following for each column:
  * Data type
  * Nmber of duplicates
  * Number of nulls,
  * Number of distinct values
* Total number  of duplicate rows in the dataframe, where duplicate means every single value in every column is the same for two or more rows
* All the column names in the dataframe in an easy to copy-and-paste format. I often want chunks of column names in a comma separated list
* Summary statistics for every column with a numeric value, including:
  * Mean
  * Median
  * Standard deviation, 
  * minimum and maximum values, 
  * Quartile values, with max also being quartile 100%



In [3]:
def df_deets(tian_snacks, metrics):
    def print_info(df, df_name):
        print(f"{df_name}:")
        if hasattr(builtins, "display"):
            display(df)
        else:
            print(df)
        print("\n")
        
        summary_df = pd.DataFrame({
            "Data types": df.dtypes,
            "Duplicates": [df.duplicated(subset=[col]).sum() for col in df.columns],
            "NAs": df.isna().sum().values,
            "Distinct values": df.nunique().values
        }).set_index(df.columns)
        
        print(f"{df_name} summary:")
        if hasattr(builtins, "display"):
            display(summary_df)
        else:
            print(summary_df)
        print("\n")
        print(df.columns)
        print("\n")
        total_duplicates = df.duplicated().sum()
        print(f"Total number of duplicate rows in {df_name}: {total_duplicates}\n")
        
        summary_stats = df[metrics].describe().T
        summary_stats['median'] = df[metrics].median()

        print(f"{df_name} metrics summary:")
        if hasattr(builtins, "display"):
            display(summary_stats)
        else:
            print(summary_stats)
        print("\n")

    if isinstance(tian_snacks, list): 
        for item in tian_snacks:
            if isinstance(item, str):
                df = globals()[item] 
                print_info(df, item)
            else:
                print_info(item, "dataframe")
    else: 
        if isinstance(tian_snacks, str):
            df = globals()[tian_snacks]  
            print_info(df, tian_snacks)
        else:
            print_info(tian_snacks, "dataframe")


#### Summary stats output

In the first cell I declare the numeric fields as a list that can be passed into df_deets() as the second argument. I then call df_deets() with the dataframe name as the first parmeter and the numeric fields variable as the second. If you pass in the dataframe name but don't put it in quotes, the function will still mostly work except the name of the dataframe would not be printed across the top of each chart. For one dataframe, it doesn't really matter. If you pass a bunch of dataframes in because you're working on a big project, having the names of the actual dataframe on top of each chart makes it easier to scan.


In [4]:
tcc_numeric_fields = ['len', 'dating', 'violence', 'world/life', 'night/time', 
                          'shake the audience', 'family/gospel', 'romantic', 
                          'communication', 'obscene', 'music', 'movement/places', 
                          'light/visual perceptions', 'family/spiritual', 'like/girls', 
                          'sadness', 'feelings', 'danceability', 'loudness', 
                          'acousticness', 'instrumentalness', 'valence', 'energy', 
                          'age']

df_deets('tcc_ceds_music_df', tcc_numeric_fields)


tcc_ceds_music_df:


Unnamed: 0.1,Unnamed: 0,artist_name,track_name,release_date,genre,lyrics,len,dating,violence,world/life,night/time,shake the audience,family/gospel,romantic,communication,obscene,music,movement/places,light/visual perceptions,family/spiritual,like/girls,sadness,feelings,danceability,loudness,acousticness,instrumentalness,valence,energy,topic,age
0,0,mukesh,mohabbat bhi jhoothi,1950,pop,hold time feel break feel untrue convince spea...,95,0.000598,0.063746,0.000598,0.000598,0.000598,0.048857,0.017104,0.263751,0.000598,0.039288,0.000598,0.000598,0.000598,0.000598,0.380299,0.117175,0.357739,0.454119,0.997992,0.901822,0.339448,0.137110,sadness,1.000000
1,4,frankie laine,i believe,1950,pop,believe drop rain fall grow believe darkest ni...,51,0.035537,0.096777,0.443435,0.001284,0.001284,0.027007,0.001284,0.001284,0.001284,0.118034,0.001284,0.212681,0.051124,0.001284,0.001284,0.001284,0.331745,0.647540,0.954819,0.000002,0.325021,0.263240,world/life,1.000000
2,6,johnnie ray,cry,1950,pop,sweetheart send letter goodbye secret feel bet...,24,0.002770,0.002770,0.002770,0.002770,0.002770,0.002770,0.158564,0.250668,0.002770,0.323794,0.002770,0.002770,0.002770,0.002770,0.002770,0.225422,0.456298,0.585288,0.840361,0.000000,0.351814,0.139112,music,1.000000
3,10,pérez prado,patricia,1950,pop,kiss lips want stroll charm mambo chacha merin...,54,0.048249,0.001548,0.001548,0.001548,0.021500,0.001548,0.411536,0.001548,0.001548,0.001548,0.129250,0.001548,0.001548,0.081132,0.225889,0.001548,0.686992,0.744404,0.083935,0.199393,0.775350,0.743736,romantic,1.000000
4,12,giorgos papadopoulos,apopse eida oneiro,1950,pop,till darling till matter know till dream live ...,48,0.001350,0.001350,0.417772,0.001350,0.001350,0.001350,0.463430,0.001350,0.001350,0.001350,0.001350,0.001350,0.029755,0.001350,0.068800,0.001350,0.291671,0.646489,0.975904,0.000246,0.597073,0.394375,romantic,1.000000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
28367,82447,mack 10,10 million ways,2019,hip hop,cause fuck leave scar tick tock clock come kno...,78,0.001350,0.001350,0.001350,0.001350,0.001350,0.001350,0.001350,0.001350,0.391651,0.001350,0.435089,0.001350,0.001350,0.001350,0.065664,0.001350,0.889527,0.759711,0.062549,0.000000,0.751649,0.695686,obscene,0.014286
28368,82448,m.o.p.,ante up (robbin hoodz theory),2019,hip hop,minks things chain ring braclets yap fame come...,67,0.001284,0.001284,0.035338,0.001284,0.001284,0.001284,0.066324,0.203889,0.318910,0.058152,0.134955,0.001284,0.001284,0.040811,0.001284,0.001284,0.662082,0.789580,0.004607,0.000002,0.922712,0.797791,obscene,0.014286
28369,82449,nine,whutcha want?,2019,hip hop,get ban get ban stick crack relax plan attack ...,77,0.001504,0.154302,0.168988,0.001504,0.039755,0.001504,0.035401,0.001504,0.356685,0.001504,0.068684,0.001504,0.001504,0.001504,0.001504,0.001504,0.663165,0.726970,0.104417,0.000001,0.838211,0.767761,obscene,0.014286
28370,82450,will smith,switch,2019,hip hop,check check yeah yeah hear thing call switch g...,67,0.001196,0.001196,0.001196,0.001196,0.048359,0.001196,0.001196,0.001196,0.492434,0.103614,0.001196,0.202659,0.001196,0.070867,0.001196,0.001196,0.883028,0.786888,0.007027,0.000503,0.508450,0.885882,obscene,0.014286




tcc_ceds_music_df summary:


Unnamed: 0,Data types,Duplicates,NAs,Distinct values
Unnamed: 0,int64,0,0,28372
artist_name,object,22946,0,5426
track_name,object,4683,0,23689
release_date,int64,28302,0,70
genre,object,28365,0,7
lyrics,object,0,0,28372
len,int64,28173,0,199
dating,float64,454,0,27918
violence,float64,183,0,28189
world/life,float64,177,0,28195




Index(['Unnamed: 0', 'artist_name', 'track_name', 'release_date', 'genre',
       'lyrics', 'len', 'dating', 'violence', 'world/life', 'night/time',
       'shake the audience', 'family/gospel', 'romantic', 'communication',
       'obscene', 'music', 'movement/places', 'light/visual perceptions',
       'family/spiritual', 'like/girls', 'sadness', 'feelings', 'danceability',
       'loudness', 'acousticness', 'instrumentalness', 'valence', 'energy',
       'topic', 'age'],
      dtype='object')


Total number of duplicate rows in tcc_ceds_music_df: 0

tcc_ceds_music_df metrics summary:


Unnamed: 0,count,mean,std,min,25%,50%,75%,max,median
len,28372.0,73.028444,41.829831,1.0,42.0,63.0,93.0,199.0,63.0
dating,28372.0,0.021112,0.05237,0.0002907822,0.000923,0.001462,0.004049,0.647706,0.001462
violence,28372.0,0.118396,0.178684,0.000284495,0.00112,0.002506,0.192608,0.981781,0.002506
world/life,28372.0,0.120973,0.1722,0.0002907822,0.00117,0.006579,0.197793,0.962105,0.006579
night/time,28372.0,0.057387,0.111923,0.0002891845,0.001032,0.001949,0.065842,0.973684,0.001949
shake the audience,28372.0,0.017422,0.04067,0.000284495,0.000993,0.001595,0.010002,0.497463,0.001595
family/gospel,28372.0,0.017045,0.041966,0.0002891845,0.000923,0.001504,0.004785,0.545303,0.001504
romantic,28372.0,0.048681,0.106095,0.000284495,0.000975,0.001754,0.042301,0.940789,0.001754
communication,28372.0,0.07668,0.109538,0.0002907822,0.001144,0.002632,0.132136,0.645829,0.002632
obscene,28372.0,0.097168,0.181303,0.0002891845,0.001053,0.001815,0.088765,0.992298,0.001815






I'm dropping ```topic = 'obscene'``` from the data because I want a family friendly dataset.   



In [5]:
tcc_ceds_music_df = tcc_ceds_music_df[tcc_ceds_music_df['topic'] != 'obscene']


## Data Wrangling
Create a subset of your original data and answer the following questions:


In [6]:
lame_pop_music_df = tcc_ceds_music_df[tcc_ceds_music_df['genre'] == 'pop'].copy()

### 1. Modify multiple column names

In [7]:
lame_pop_music_df.rename(columns={'artist_name': 'artist', 'family/spiritual': 'spiritual'}, inplace=True)
lame_pop_music_df.columns


Index(['Unnamed: 0', 'artist', 'track_name', 'release_date', 'genre', 'lyrics',
       'len', 'dating', 'violence', 'world/life', 'night/time',
       'shake the audience', 'family/gospel', 'romantic', 'communication',
       'obscene', 'music', 'movement/places', 'light/visual perceptions',
       'spiritual', 'like/girls', 'sadness', 'feelings', 'danceability',
       'loudness', 'acousticness', 'instrumentalness', 'valence', 'energy',
       'topic', 'age'],
      dtype='object')

### 2. Look at the structure of your data – are any variables improperly coded? Such as strings or characters? Convert to correct structure if needed.

I acknowledge that date came in as an int64 but I kept it as is because it's easier to work with dates in a numeric format and then do a last-mile conversion to a datetime object later. I don't see any missing or null values that need to be taken care of.

In [8]:
lame_pop_numeric_fields = ['len', 'dating', 'violence', 'world/life', 'night/time', 
                          'shake the audience', 'romantic', 
                          'communication', 'obscene', 'music', 'movement/places', 
                          'light/visual perceptions', 'spiritual', 'like/girls', 
                          'sadness', 'feelings', 'danceability', 'loudness', 
                          'acousticness', 'instrumentalness', 'valence', 'energy', 
                          'age']
df_deets('lame_pop_music_df', lame_pop_numeric_fields)


lame_pop_music_df:


Unnamed: 0.1,Unnamed: 0,artist,track_name,release_date,genre,lyrics,len,dating,violence,world/life,night/time,shake the audience,family/gospel,romantic,communication,obscene,music,movement/places,light/visual perceptions,spiritual,like/girls,sadness,feelings,danceability,loudness,acousticness,instrumentalness,valence,energy,topic,age
0,0,mukesh,mohabbat bhi jhoothi,1950,pop,hold time feel break feel untrue convince spea...,95,0.000598,0.063746,0.000598,0.000598,0.000598,0.048857,0.017104,0.263751,0.000598,0.039288,0.000598,0.000598,0.000598,0.000598,0.380299,0.117175,0.357739,0.454119,0.997992,0.901822,0.339448,0.137110,sadness,1.000000
1,4,frankie laine,i believe,1950,pop,believe drop rain fall grow believe darkest ni...,51,0.035537,0.096777,0.443435,0.001284,0.001284,0.027007,0.001284,0.001284,0.001284,0.118034,0.001284,0.212681,0.051124,0.001284,0.001284,0.001284,0.331745,0.647540,0.954819,0.000002,0.325021,0.263240,world/life,1.000000
2,6,johnnie ray,cry,1950,pop,sweetheart send letter goodbye secret feel bet...,24,0.002770,0.002770,0.002770,0.002770,0.002770,0.002770,0.158564,0.250668,0.002770,0.323794,0.002770,0.002770,0.002770,0.002770,0.002770,0.225422,0.456298,0.585288,0.840361,0.000000,0.351814,0.139112,music,1.000000
3,10,pérez prado,patricia,1950,pop,kiss lips want stroll charm mambo chacha merin...,54,0.048249,0.001548,0.001548,0.001548,0.021500,0.001548,0.411536,0.001548,0.001548,0.001548,0.129250,0.001548,0.001548,0.081132,0.225889,0.001548,0.686992,0.744404,0.083935,0.199393,0.775350,0.743736,romantic,1.000000
4,12,giorgos papadopoulos,apopse eida oneiro,1950,pop,till darling till matter know till dream live ...,48,0.001350,0.001350,0.417772,0.001350,0.001350,0.001350,0.463430,0.001350,0.001350,0.001350,0.001350,0.001350,0.029755,0.001350,0.068800,0.001350,0.291671,0.646489,0.975904,0.000246,0.597073,0.394375,romantic,1.000000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
7034,20248,ed sheeran,way to break my heart (feat. skrillex),2019,pop,sign gemini eye fair hair light call time nigh...,76,0.000693,0.000693,0.087991,0.039190,0.000693,0.000693,0.083572,0.116645,0.051078,0.000693,0.000693,0.048766,0.000693,0.061970,0.396793,0.071409,0.719484,0.752583,0.110441,0.000003,0.418796,0.731723,sadness,0.014286
7037,20269,florida georgia line,simple,2019,pop,finger plus rocket science time hard true road...,104,0.000605,0.000605,0.582178,0.000605,0.056798,0.050381,0.000605,0.000605,0.057264,0.000605,0.000605,0.000605,0.012187,0.233327,0.000605,0.000605,0.325246,0.818911,0.079417,0.000000,0.866035,0.882879,world/life,0.014286
7039,20281,jonas brothers,i believe,2019,pop,night lifetime yeah know fine cause stay good ...,117,0.059642,0.000450,0.522350,0.000450,0.105584,0.000450,0.069017,0.092730,0.000450,0.000450,0.000450,0.061453,0.000450,0.070258,0.000450,0.014017,0.744395,0.754711,0.086043,0.000012,0.336356,0.774768,world/life,0.014286
7040,20286,ellie goulding,sixteen,2019,pop,remember move say small house change save give...,90,0.000835,0.000835,0.387109,0.080026,0.000835,0.000835,0.000835,0.000835,0.000835,0.000835,0.199035,0.056671,0.000835,0.176745,0.000835,0.089554,0.657749,0.774607,0.269076,0.000000,0.533182,0.801796,world/life,0.014286




lame_pop_music_df summary:


Unnamed: 0,Data types,Duplicates,NAs,Distinct values
Unnamed: 0,int64,0,0,5822
artist,object,4291,0,1531
track_name,object,452,0,5370
release_date,int64,5752,0,70
genre,object,5821,0,1
lyrics,object,0,0,5822
len,int64,5629,0,193
dating,float64,30,0,5792
violence,float64,14,0,5808
world/life,float64,16,0,5806




Index(['Unnamed: 0', 'artist', 'track_name', 'release_date', 'genre', 'lyrics',
       'len', 'dating', 'violence', 'world/life', 'night/time',
       'shake the audience', 'family/gospel', 'romantic', 'communication',
       'obscene', 'music', 'movement/places', 'light/visual perceptions',
       'spiritual', 'like/girls', 'sadness', 'feelings', 'danceability',
       'loudness', 'acousticness', 'instrumentalness', 'valence', 'energy',
       'topic', 'age'],
      dtype='object')


Total number of duplicate rows in lame_pop_music_df: 0

lame_pop_music_df metrics summary:


Unnamed: 0,count,mean,std,min,25%,50%,75%,max,median
len,5822.0,68.776709,34.586049,2.0,43.0,63.0,88.0,199.0,63.0
dating,5822.0,0.022294,0.054932,0.0002990431,0.000907,0.001422,0.003759,0.574397,0.001422
violence,5822.0,0.118422,0.179468,0.0002940312,0.001074,0.002193,0.205441,0.968689,0.002193
world/life,5822.0,0.136211,0.181812,0.0003007519,0.00117,0.028017,0.286265,0.930577,0.028017
night/time,5822.0,0.062397,0.120618,0.0002891845,0.001012,0.00188,0.070967,0.950139,0.00188
shake the audience,5822.0,0.016903,0.040528,0.0003059976,0.00094,0.001462,0.004049,0.442431,0.001462
romantic,5822.0,0.059248,0.11756,0.0002940312,0.001012,0.002024,0.058455,0.837529,0.002024
communication,5822.0,0.087934,0.117752,0.0002940312,0.001196,0.003759,0.151888,0.615834,0.003759
obscene,5822.0,0.020229,0.052615,0.0002891845,0.000863,0.001316,0.002506,0.470206,0.001316
music,5822.0,0.061629,0.127276,0.0002891845,0.00094,0.001698,0.053382,0.874892,0.001698






### 3. Fix missing and invalid values in data

This data is pretty clean. I'm going to drop the Unnamed: 0 column because it's just a row number and not a useful piece of data.

    

In [9]:
lame_pop_music_df.drop(columns=['Unnamed: 0'], inplace=True)
lame_pop_music_df.columns


Index(['artist', 'track_name', 'release_date', 'genre', 'lyrics', 'len',
       'dating', 'violence', 'world/life', 'night/time', 'shake the audience',
       'family/gospel', 'romantic', 'communication', 'obscene', 'music',
       'movement/places', 'light/visual perceptions', 'spiritual',
       'like/girls', 'sadness', 'feelings', 'danceability', 'loudness',
       'acousticness', 'instrumentalness', 'valence', 'energy', 'topic',
       'age'],
      dtype='object')

### 4. Create new columns based on existing columns or calculations.

I created a new column that combines the genre and topic columns into one. 


In [10]:
lame_pop_music_df['genre_topic'] = lame_pop_music_df['genre'] + ' ' + lame_pop_music_df['topic']
genre_col_index = lame_pop_music_df.columns.get_loc('genre')
lame_pop_music_df.insert(genre_col_index + 1, 'genre_topic', lame_pop_music_df.pop('genre_topic'))
lame_pop_music_df = lame_pop_music_df[['genre', 'topic', 'genre_topic'] + [col for col in lame_pop_music_df.columns if col not in ['genre', 'topic', 'genre_topic']]]

lame_pop_music_df.head()

Unnamed: 0,genre,topic,genre_topic,artist,track_name,release_date,lyrics,len,dating,violence,world/life,night/time,shake the audience,family/gospel,romantic,communication,obscene,music,movement/places,light/visual perceptions,spiritual,like/girls,sadness,feelings,danceability,loudness,acousticness,instrumentalness,valence,energy,age
0,pop,sadness,pop sadness,mukesh,mohabbat bhi jhoothi,1950,hold time feel break feel untrue convince spea...,95,0.000598,0.063746,0.000598,0.000598,0.000598,0.048857,0.017104,0.263751,0.000598,0.039288,0.000598,0.000598,0.000598,0.000598,0.380299,0.117175,0.357739,0.454119,0.997992,0.901822,0.339448,0.13711,1.0
1,pop,world/life,pop world/life,frankie laine,i believe,1950,believe drop rain fall grow believe darkest ni...,51,0.035537,0.096777,0.443435,0.001284,0.001284,0.027007,0.001284,0.001284,0.001284,0.118034,0.001284,0.212681,0.051124,0.001284,0.001284,0.001284,0.331745,0.64754,0.954819,2e-06,0.325021,0.26324,1.0
2,pop,music,pop music,johnnie ray,cry,1950,sweetheart send letter goodbye secret feel bet...,24,0.00277,0.00277,0.00277,0.00277,0.00277,0.00277,0.158564,0.250668,0.00277,0.323794,0.00277,0.00277,0.00277,0.00277,0.00277,0.225422,0.456298,0.585288,0.840361,0.0,0.351814,0.139112,1.0
3,pop,romantic,pop romantic,pérez prado,patricia,1950,kiss lips want stroll charm mambo chacha merin...,54,0.048249,0.001548,0.001548,0.001548,0.0215,0.001548,0.411536,0.001548,0.001548,0.001548,0.12925,0.001548,0.001548,0.081132,0.225889,0.001548,0.686992,0.744404,0.083935,0.199393,0.77535,0.743736,1.0
4,pop,romantic,pop romantic,giorgos papadopoulos,apopse eida oneiro,1950,till darling till matter know till dream live ...,48,0.00135,0.00135,0.417772,0.00135,0.00135,0.00135,0.46343,0.00135,0.00135,0.00135,0.00135,0.00135,0.029755,0.00135,0.0688,0.00135,0.291671,0.646489,0.975904,0.000246,0.597073,0.394375,1.0


### 5. Drop column(s) from your dataset

In [11]:
lame_pop_music_df.drop(columns=['age', 'movement/places', 'sadness'], inplace=True)
lame_pop_music_df.columns


Index(['genre', 'topic', 'genre_topic', 'artist', 'track_name', 'release_date',
       'lyrics', 'len', 'dating', 'violence', 'world/life', 'night/time',
       'shake the audience', 'family/gospel', 'romantic', 'communication',
       'obscene', 'music', 'light/visual perceptions', 'spiritual',
       'like/girls', 'feelings', 'danceability', 'loudness', 'acousticness',
       'instrumentalness', 'valence', 'energy'],
      dtype='object')

### 6. Drop a row(s) from your dataset.

I dropped Frankie Laine from the dataset because....I saw the name and decided it should be removed. Honestly, I don't listen to music from 1950 so this was an easy decision. 


In [12]:
lame_pop_music_df = lame_pop_music_df[lame_pop_music_df['artist'] != 'frankie laine']
print(lame_pop_music_df['artist'].value_counts().get('frankie laine', 0))

0


### 7. Sort your data based on multiple variables

I'm sorting based on release date and then energy because I thought it would be interesting to see the most recent songs at the top, with energy being used as a proxy for dance music. A high energy pop song is something you should want to move your feet to.  


In [13]:
lame_pop_music_df.sort_values(by=['release_date', 'energy'], ascending=[False, False], inplace=True)
lame_pop_music_df.head()


Unnamed: 0,genre,topic,genre_topic,artist,track_name,release_date,lyrics,len,dating,violence,world/life,night/time,shake the audience,family/gospel,romantic,communication,obscene,music,light/visual perceptions,spiritual,like/girls,feelings,danceability,loudness,acousticness,instrumentalness,valence,energy
6963,pop,night/time,pop night/time,runaway june,buy my own drinks,2019,yeah try unfall apart think neon light real go...,131,0.000458,0.000458,0.000458,0.319597,0.124393,0.000458,0.011207,0.000458,0.228421,0.000458,0.05277,0.000458,0.000458,0.000458,0.676162,0.801246,0.068272,0.0,0.967024,0.900898
7037,pop,world/life,pop world/life,florida georgia line,simple,2019,finger plus rocket science time hard true road...,104,0.000605,0.000605,0.582178,0.000605,0.056798,0.050381,0.000605,0.000605,0.057264,0.000605,0.000605,0.012187,0.233327,0.000605,0.325246,0.818911,0.079417,0.0,0.866035,0.882879
6990,pop,sadness,pop sadness,jonas brothers,don't throw it away,2019,picture frame pack things help week get like c...,143,0.017162,0.00039,0.00039,0.163454,0.024244,0.048897,0.00039,0.136235,0.138088,0.00039,0.00039,0.00039,0.00039,0.00039,0.7249,0.783914,0.024597,0.0,0.730008,0.858854
6991,pop,night/time,pop night/time,jonas brothers,strangers,2019,come look excuse away beautiful like drive clo...,145,0.000548,0.118777,0.091951,0.347684,0.024154,0.000548,0.041862,0.106291,0.000548,0.000548,0.000548,0.000548,0.160229,0.045894,0.576519,0.788785,0.031324,0.0,0.566158,0.842838
6964,pop,sadness,pop sadness,dean lewis,stay awake,2019,trace finger right leave drink walk sudden bri...,121,0.00047,0.00047,0.00047,0.023337,0.00047,0.00047,0.039207,0.00047,0.00047,0.00047,0.00047,0.00047,0.00047,0.102677,0.716235,0.731585,0.338353,3.5e-05,0.575433,0.829825


The fact that 2019 makes me think "wow, that was a long time go" is a sign that I'm aging. This is fine. I'm fine. Totally fine.

### 8. Filter your data based on some condition

In [14]:
lame_pop_music_df = lame_pop_music_df[lame_pop_music_df['release_date'] >= 1980]
print(lame_pop_music_df['release_date'].min())


1980


### 9. Convert all the string values to upper or lower cases in one column

I'm going to convert the ```genre_topic``` column to uppercase because I think standout more. In general, putting words in uppercase in data should be used sparingly and only when necessary. This is for the output, not the code that produced it. 


In [15]:
lame_pop_music_df['genre_topic'] = lame_pop_music_df['genre_topic'].str.upper()
lame_pop_music_df.head()


Unnamed: 0,genre,topic,genre_topic,artist,track_name,release_date,lyrics,len,dating,violence,world/life,night/time,shake the audience,family/gospel,romantic,communication,obscene,music,light/visual perceptions,spiritual,like/girls,feelings,danceability,loudness,acousticness,instrumentalness,valence,energy
6963,pop,night/time,POP NIGHT/TIME,runaway june,buy my own drinks,2019,yeah try unfall apart think neon light real go...,131,0.000458,0.000458,0.000458,0.319597,0.124393,0.000458,0.011207,0.000458,0.228421,0.000458,0.05277,0.000458,0.000458,0.000458,0.676162,0.801246,0.068272,0.0,0.967024,0.900898
7037,pop,world/life,POP WORLD/LIFE,florida georgia line,simple,2019,finger plus rocket science time hard true road...,104,0.000605,0.000605,0.582178,0.000605,0.056798,0.050381,0.000605,0.000605,0.057264,0.000605,0.000605,0.012187,0.233327,0.000605,0.325246,0.818911,0.079417,0.0,0.866035,0.882879
6990,pop,sadness,POP SADNESS,jonas brothers,don't throw it away,2019,picture frame pack things help week get like c...,143,0.017162,0.00039,0.00039,0.163454,0.024244,0.048897,0.00039,0.136235,0.138088,0.00039,0.00039,0.00039,0.00039,0.00039,0.7249,0.783914,0.024597,0.0,0.730008,0.858854
6991,pop,night/time,POP NIGHT/TIME,jonas brothers,strangers,2019,come look excuse away beautiful like drive clo...,145,0.000548,0.118777,0.091951,0.347684,0.024154,0.000548,0.041862,0.106291,0.000548,0.000548,0.000548,0.000548,0.160229,0.045894,0.576519,0.788785,0.031324,0.0,0.566158,0.842838
6964,pop,sadness,POP SADNESS,dean lewis,stay awake,2019,trace finger right leave drink walk sudden bri...,121,0.00047,0.00047,0.00047,0.023337,0.00047,0.00047,0.039207,0.00047,0.00047,0.00047,0.00047,0.00047,0.00047,0.102677,0.716235,0.731585,0.338353,3.5e-05,0.575433,0.829825


### 10. Check whether numeric values are present in a given column of your dataframe.


In [16]:
print("track_name has a number?")
print(lame_pop_music_df['track_name'].str.contains(r'\d').value_counts())

pop_em_df = lame_pop_music_df[lame_pop_music_df['track_name'].str.contains(r'\d')]

print("\n pop_em_df sample:")
pop_em_df


track_name has a number?
track_name
False    3454
True       80
Name: count, dtype: int64

 pop_em_df sample:


Unnamed: 0,genre,topic,genre_topic,artist,track_name,release_date,lyrics,len,dating,violence,world/life,night/time,shake the audience,family/gospel,romantic,communication,obscene,music,light/visual perceptions,spiritual,like/girls,feelings,danceability,loudness,acousticness,instrumentalness,valence,energy
7007,pop,sadness,POP SADNESS,dreamville,"1993 (with j. cole, jid, cozz & earthgang feat...",2019,yeah yeah yeah yeah yeah hmmmm ummmm people me...,83,0.206718,0.000721,0.000721,0.000721,0.071051,0.000721,0.000721,0.000721,0.000721,0.000721,0.000721,0.048984,0.000721,0.000721,0.501787,0.777863,0.531124,0.000000,0.689819,0.725717
6978,pop,sadness,POP SADNESS,bazzi,fallin (feat. 6lack),2019,hop pray feel pain save fear linger fall quick...,122,0.000578,0.000578,0.055896,0.000578,0.123159,0.000578,0.000578,0.000578,0.205449,0.000578,0.000578,0.000578,0.108767,0.033390,0.782303,0.760480,0.176706,0.000000,0.187964,0.594582
6973,pop,sadness,POP SADNESS,blackbear,1 sided love,2019,busy talk hear sayin high realize smile fakin ...,82,0.001053,0.001053,0.168397,0.001053,0.001053,0.001053,0.001053,0.266476,0.217142,0.001053,0.001053,0.001053,0.001053,0.001053,0.565688,0.713535,0.236947,0.000000,0.267312,0.420402
6952,pop,world/life,POP WORLD/LIFE,clairo,4ever,2018,things aren simple call wonder change look thi...,81,0.000892,0.000892,0.315586,0.000892,0.000892,0.018566,0.068716,0.185256,0.000892,0.060392,0.000892,0.000892,0.000892,0.080100,0.745478,0.719970,0.049196,0.005587,0.577494,0.585573
6714,pop,sadness,POP SADNESS,the band camino,2 / 14,2016,second guess word think fine cross line green ...,76,0.000752,0.057572,0.000752,0.291736,0.000752,0.000752,0.018992,0.153927,0.000752,0.000752,0.000752,0.000752,0.000752,0.000752,0.590599,0.723765,0.378513,0.000000,0.543487,0.708700
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2767,pop,world/life,POP WORLD/LIFE,r.e.m.,1000000,1982,seclude marker stone deadlier smarter smarter ...,66,0.001462,0.001462,0.835259,0.001462,0.001462,0.001462,0.001462,0.001462,0.001462,0.001462,0.001462,0.001462,0.001462,0.001462,0.513701,0.733456,0.012650,0.731781,0.970115,0.897895
2843,pop,violence,POP VIOLENCE,descendents,m-16,1982,shoot shoot shoot gonna live american dream ar...,26,0.002288,0.401734,0.233553,0.002288,0.002288,0.002288,0.050593,0.002288,0.235045,0.002288,0.002288,0.002288,0.002288,0.002288,0.416224,0.652540,0.003302,0.000000,0.692910,0.847843
2831,pop,world/life,POP WORLD/LIFE,modern english,"i melt with you (7"" mix)",1982,move forward breath make second best world cra...,41,0.001698,0.113897,0.407104,0.001698,0.001698,0.001698,0.001698,0.264021,0.001698,0.001698,0.001698,0.001698,0.001698,0.001698,0.438969,0.601338,0.000389,0.000086,0.580585,0.641630
2706,pop,world/life,POP WORLD/LIFE,electric light orchestra,21st century man,1981,pocket suitcase hand century city rise land ce...,50,0.001504,0.001504,0.442806,0.001504,0.001504,0.001504,0.073898,0.001504,0.001504,0.001504,0.072856,0.001504,0.001504,0.001504,0.389148,0.603236,0.619478,0.000101,0.186933,0.412394


### 11. Group your dataset by one column, and get the mean, min, and max values by group

Use the following functions:    
  * Groupby()
  * agg() or .apply()

I thought on this result for a bit and decided it has value. Yes, there are a lot of columns but I wouldn't just try and make a decision right from the raw table form. I'm sticking to Pandas as requested for this assigbment but this is where I would do some basic charting to try and scan for patterns.

In [17]:
cooler_pop_numnums = ['len', 'energy', 'dating', 'violence', 'world/life',
                   'night/time', 'shake the audience', 'family/gospel', 'romantic',
                   'communication', 'obscene', 'music', 'light/visual perceptions',
                   'spiritual', 'like/girls', 'feelings', 'danceability', 'loudness',
                   'acousticness', 'instrumentalness', 'valence', 'energy']

yearly_spiral_df = lame_pop_music_df.groupby('release_date')[cooler_pop_numnums].agg(['mean', 'min', 'max'])
yearly_spiral_df.columns = ['_'.join(col).strip() for col in yearly_spiral_df.columns.values]
yearly_spiral_df = yearly_spiral_df.reset_index()

yearly_spiral_df



Unnamed: 0,release_date,len_mean,len_min,len_max,energy_mean,energy_min,energy_max,dating_mean,dating_min,dating_max,violence_mean,violence_min,violence_max,world/life_mean,world/life_min,world/life_max,night/time_mean,night/time_min,night/time_max,shake the audience_mean,shake the audience_min,shake the audience_max,family/gospel_mean,family/gospel_min,family/gospel_max,romantic_mean,romantic_min,romantic_max,communication_mean,communication_min,communication_max,obscene_mean,obscene_min,obscene_max,music_mean,music_min,music_max,light/visual perceptions_mean,light/visual perceptions_min,light/visual perceptions_max,spiritual_mean,spiritual_min,spiritual_max,like/girls_mean,like/girls_min,like/girls_max,feelings_mean,feelings_min,feelings_max,danceability_mean,danceability_min,danceability_max,loudness_mean,loudness_min,loudness_max,acousticness_mean,acousticness_min,acousticness_max,instrumentalness_mean,instrumentalness_min,instrumentalness_max,valence_mean,valence_min,valence_max,energy_mean.1,energy_min.1,energy_max.1
0,1980,67.393162,7,184,0.650751,0.143116,0.969969,0.019008,0.000399,0.358063,0.170548,0.000313,0.714956,0.1406,0.000399,0.780246,0.046285,0.000313,0.573442,0.010622,0.000313,0.097365,0.013149,0.000313,0.226018,0.047931,0.000349,0.738064,0.079478,0.000313,0.569281,0.021991,0.000349,0.340204,0.060799,0.000313,0.603821,0.050935,0.000349,0.469392,0.018241,0.000349,0.269903,0.027108,0.000313,0.322785,0.037235,0.000399,0.551319,0.532539,0.163869,0.83429,0.674735,0.444556,0.858601,0.185142,8.734949e-05,0.931727,0.088644,0.0,0.926113,0.615864,0.064716,0.982481,0.650751,0.143116,0.969969
1,1981,65.598291,13,169,0.666477,0.114086,0.958958,0.031956,0.000442,0.574397,0.119624,0.000474,0.675439,0.143945,0.000511,0.705217,0.084318,0.000474,0.552287,0.01263,0.000442,0.210117,0.014741,0.000442,0.378796,0.030381,0.000483,0.362016,0.089383,0.000474,0.509651,0.016133,0.000442,0.335235,0.039463,0.000474,0.529686,0.046843,0.000511,0.388896,0.022703,0.000474,0.277149,0.025939,0.000442,0.299747,0.03931,0.000474,0.77482,0.559356,0.092386,0.894942,0.667447,0.380099,0.820655,0.198931,3.142573e-05,0.826305,0.129235,0.0,0.936235,0.618492,0.056884,0.97939,0.666477,0.114086,0.958958
2,1982,69.982759,12,164,0.662782,0.023994,0.985986,0.01045,0.000411,0.28706,0.193438,0.000411,0.968689,0.149771,0.000411,0.835259,0.036192,0.000411,0.345956,0.009,0.000411,0.23184,0.008402,0.000411,0.119107,0.045461,0.000411,0.461149,0.067197,0.000543,0.502103,0.021046,0.000474,0.352179,0.065656,0.000474,0.866285,0.065032,0.000474,0.482801,0.02918,0.000411,0.390687,0.026512,0.000411,0.334343,0.026461,0.000411,0.363891,0.503089,0.040832,0.887361,0.655383,0.358947,0.84455,0.149942,3.714863e-07,0.966867,0.083153,0.0,0.951417,0.602314,0.033285,0.970115,0.662782,0.023994,0.985986
3,1983,75.288136,13,172,0.698078,0.16614,0.986987,0.007977,0.000371,0.112825,0.163961,0.000371,0.814271,0.120192,0.000408,0.930577,0.068292,0.000371,0.581466,0.021626,0.000371,0.364863,0.013573,0.00039,0.219016,0.047698,0.00039,0.5766,0.10537,0.000483,0.535173,0.01985,0.000371,0.25889,0.057414,0.000371,0.637348,0.045116,0.000408,0.554203,0.021963,0.000371,0.475817,0.025312,0.000371,0.280444,0.030111,0.000408,0.424322,0.552122,0.066392,0.835373,0.674176,0.439787,0.85337,0.17355,1.044178e-05,0.929719,0.081973,0.0,0.889676,0.643546,0.024526,0.976298,0.698078,0.16614,0.986987
4,1984,65.37,16,159,0.651381,0.200175,0.987988,0.013304,0.000344,0.265704,0.154422,0.000344,0.845722,0.119694,0.000381,0.78444,0.082951,0.000344,0.894737,0.011899,0.000381,0.193114,0.007623,0.000344,0.207165,0.036735,0.000381,0.441255,0.077515,0.000344,0.466368,0.021325,0.000344,0.303896,0.050672,0.000381,0.686236,0.050442,0.000344,0.479517,0.023774,0.000344,0.205942,0.020111,0.000344,0.307253,0.036843,0.000344,0.52712,0.548067,0.14654,0.894942,0.63462,0.422737,0.799067,0.203426,3.082332e-06,0.986948,0.079428,0.0,0.865385,0.651481,0.050598,0.986603,0.651381,0.200175,0.987988
5,1985,76.009434,20,193,0.668216,0.045816,0.978978,0.019872,0.000399,0.381177,0.155215,0.000346,0.722531,0.158362,0.000399,0.625533,0.060558,0.000399,0.679177,0.015631,0.000346,0.235376,0.013142,0.000399,0.16616,0.034042,0.000399,0.479249,0.094091,0.000346,0.552635,0.013129,0.000399,0.283505,0.045053,0.000346,0.500689,0.036329,0.000466,0.369236,0.025606,0.000346,0.257849,0.019077,0.000346,0.352256,0.023662,0.000346,0.196841,0.549421,0.021878,0.84837,0.639675,0.376715,0.834628,0.184751,0.0001516066,0.919679,0.070894,0.0,0.882591,0.578988,0.065025,0.984542,0.668216,0.045816,0.978978
6,1986,71.214286,11,144,0.633929,0.120093,0.98098,0.020891,0.000414,0.363682,0.119371,0.000414,0.672515,0.143474,0.000458,0.651686,0.085118,0.000474,0.950139,0.007623,0.000414,0.130922,0.012111,0.000414,0.204815,0.035742,0.000458,0.561425,0.098672,0.000414,0.533645,0.0127,0.000414,0.25511,0.054025,0.000414,0.659137,0.07332,0.000474,0.576323,0.021579,0.000414,0.223874,0.031785,0.000414,0.281206,0.027714,0.000458,0.392403,0.525924,0.11838,0.854868,0.637315,0.371125,0.85778,0.262804,4.447796e-05,0.98996,0.091317,0.0,0.984818,0.634464,0.016694,0.97939,0.633929,0.120093,0.98098
7,1987,68.843137,15,191,0.648068,0.160134,0.996997,0.02691,0.00045,0.352474,0.179398,0.000402,0.905263,0.132069,0.000402,0.714488,0.06019,0.00045,0.505749,0.016888,0.000402,0.239674,0.005875,0.000402,0.098795,0.064758,0.00045,0.482348,0.083905,0.00045,0.483782,0.010828,0.00045,0.279854,0.049632,0.000402,0.609175,0.055043,0.000466,0.526809,0.025272,0.000424,0.259399,0.032134,0.000402,0.216695,0.032959,0.000402,0.39022,0.539949,0.065309,0.859201,0.63382,0.395764,0.81945,0.219577,2.489962e-06,0.833333,0.055839,0.0,0.767206,0.583929,0.128195,0.968054,0.648068,0.160134,0.996997
8,1988,64.101852,3,147,0.65135,0.060631,0.98999,0.034826,0.000462,0.350877,0.123389,0.000381,0.686379,0.138344,0.000458,0.863478,0.064021,0.000381,0.617217,0.017729,0.000381,0.303729,0.007773,0.000381,0.130058,0.062413,0.000381,0.597792,0.099841,0.000516,0.467578,0.00952,0.000384,0.136178,0.034744,0.000381,0.542908,0.041762,0.000384,0.306607,0.014776,0.000381,0.224491,0.04025,0.000384,0.336047,0.035159,0.000381,0.580893,0.541864,0.040723,0.885194,0.649271,0.345255,0.829628,0.232728,1.455825e-05,0.929719,0.078776,0.0,0.936235,0.586448,0.059563,0.972176,0.65135,0.060631,0.98999
9,1989,70.104348,20,192,0.606468,0.045516,0.993994,0.023154,0.000393,0.254054,0.114386,0.000466,0.85725,0.185833,0.000501,0.739143,0.064279,0.000501,0.515312,0.016884,0.000393,0.406157,0.015568,0.000411,0.325735,0.044687,0.000393,0.412307,0.083146,0.000458,0.428994,0.017915,0.000393,0.296845,0.062932,0.000411,0.621063,0.043716,0.000411,0.301805,0.031982,0.000411,0.301683,0.027846,0.000393,0.347784,0.038904,0.000393,0.820967,0.51387,0.108632,0.865699,0.6466,0.426813,0.821809,0.232308,2.560244e-05,0.944779,0.077284,0.0,0.888664,0.541176,0.017725,0.982481,0.606468,0.045516,0.993994


### 12. Group your dataset by two columns and then sort the aggregated results within the groups. 



In [18]:
victory_twirl_df = lame_pop_music_df.groupby(['topic', 'genre_topic'])[cooler_pop_numnums].agg(['mean', 'min', 'max'])

victory_twirl_df.columns = ['_'.join(col).strip() for col in victory_twirl_df.columns.values]
victory_twirl_df = victory_twirl_df.reset_index()

victory_twirl_df = victory_twirl_df.sort_values(by=['topic', 'len_mean'], ascending=True)

victory_twirl_df 

Unnamed: 0,topic,genre_topic,len_mean,len_min,len_max,energy_mean,energy_min,energy_max,dating_mean,dating_min,dating_max,violence_mean,violence_min,violence_max,world/life_mean,world/life_min,world/life_max,night/time_mean,night/time_min,night/time_max,shake the audience_mean,shake the audience_min,shake the audience_max,family/gospel_mean,family/gospel_min,family/gospel_max,romantic_mean,romantic_min,romantic_max,communication_mean,communication_min,communication_max,obscene_mean,obscene_min,obscene_max,music_mean,music_min,music_max,light/visual perceptions_mean,light/visual perceptions_min,light/visual perceptions_max,spiritual_mean,spiritual_min,spiritual_max,like/girls_mean,like/girls_min,like/girls_max,feelings_mean,feelings_min,feelings_max,danceability_mean,danceability_min,danceability_max,loudness_mean,loudness_min,loudness_max,acousticness_mean,acousticness_min,acousticness_max,instrumentalness_mean,instrumentalness_min,instrumentalness_max,valence_mean,valence_min,valence_max,energy_mean.1,energy_min.1,energy_max.1
0,feelings,POP FEELINGS,81.358025,3,188,0.710319,0.114086,0.958958,0.025656,0.000371,0.350877,0.040666,0.000346,0.340322,0.047378,0.000346,0.32383,0.023143,0.000349,0.258706,0.036027,0.000349,0.364863,0.014697,0.000346,0.190348,0.013658,0.000349,0.148461,0.115028,0.000349,0.538209,0.031493,0.000349,0.296845,0.024173,0.000349,0.254195,0.034011,0.000349,0.335342,0.016572,0.000349,0.287218,0.028809,0.000349,0.169269,0.405221,0.301679,0.820967,0.560206,0.096718,0.958843,0.719197,0.380099,0.841935,0.155214,1.837351e-05,0.829317,0.07904,0.0,0.865385,0.576613,0.050598,0.967024,0.710319,0.114086,0.958958
1,music,POP MUSIC,75.567839,14,181,0.61037,0.088961,0.990991,0.023412,0.000349,0.363682,0.049533,0.000411,0.291524,0.045291,0.000337,0.347773,0.033913,0.000337,0.227396,0.015716,0.000373,0.390242,0.012524,0.000337,0.228092,0.015979,0.000337,0.273818,0.0807,0.000337,0.459134,0.019588,0.000337,0.352179,0.411801,0.300014,0.866285,0.05195,0.000349,0.539168,0.021431,0.000349,0.234286,0.037318,0.000337,0.362993,0.016069,0.000373,0.190723,0.524706,0.066392,0.86245,0.674571,0.381689,0.863703,0.277308,2.489962e-06,0.987952,0.065404,0.0,0.869433,0.519992,0.024526,0.971146,0.61037,0.088961,0.990991
2,night/time,POP NIGHT/TIME,81.228873,4,198,0.662152,0.118091,0.990991,0.024456,0.00031,0.26272,0.045862,0.000294,0.389699,0.049422,0.000337,0.369464,0.405061,0.300328,0.950139,0.024578,0.00031,0.442431,0.011799,0.000294,0.235729,0.022593,0.000294,0.233893,0.089422,0.000294,0.608394,0.02949,0.000337,0.333542,0.021154,0.00031,0.316618,0.04087,0.000294,0.52454,0.01404,0.000294,0.424171,0.033035,0.00031,0.394947,0.026872,0.000321,0.38115,0.573556,0.110798,0.902524,0.705039,0.371125,0.861626,0.204282,6.184745e-06,0.981928,0.044353,0.0,0.968623,0.549812,0.017725,0.97939,0.662152,0.118091,0.990991
3,romantic,POP ROMANTIC,76.0,13,191,0.591878,0.023994,0.967967,0.027847,0.000344,0.369389,0.041017,0.000344,0.297692,0.054173,0.000365,0.398334,0.040932,0.000344,0.335617,0.020718,0.000365,0.319563,0.011852,0.000337,0.31323,0.391899,0.30008,0.738064,0.089975,0.000344,0.465132,0.014093,0.000337,0.249337,0.021821,0.000337,0.367452,0.044423,0.000337,0.434192,0.012547,0.000337,0.404459,0.035339,0.000337,0.356488,0.026792,0.000337,0.151993,0.548115,0.082638,0.980505,0.671411,0.345255,0.835218,0.269838,9.839367e-06,0.986948,0.045906,0.0,0.926113,0.528335,0.065643,0.972176,0.591878,0.023994,0.967967
4,sadness,POP SADNESS,75.148253,7,188,0.646524,0.045516,0.993994,0.02301,0.000299,0.537417,0.046386,0.000299,0.376178,0.053703,0.000306,0.438291,0.035958,0.000299,0.45361,0.01841,0.000306,0.384801,0.010818,0.000299,0.386603,0.028608,0.000306,0.393039,0.104543,0.000306,0.580994,0.019789,0.000329,0.376147,0.023849,0.000299,0.396378,0.043647,0.000299,0.642117,0.016639,0.000306,0.475817,0.024901,0.000299,0.348193,0.024811,0.000299,0.339994,0.52192,0.071808,0.938265,0.706198,0.295618,0.877343,0.219939,3.714863e-07,0.987952,0.042898,0.0,0.972672,0.496407,0.012366,0.984542,0.646524,0.045516,0.993994
5,violence,POP VIOLENCE,77.433333,9,198,0.706797,0.026196,0.996997,0.013833,0.00039,0.334033,0.428217,0.300085,0.968689,0.044692,0.000301,0.384206,0.034016,0.000317,0.399544,0.015692,0.000368,0.425251,0.011483,0.000301,0.325735,0.025611,0.000317,0.310346,0.085071,0.000381,0.533645,0.023723,0.000368,0.377782,0.024381,0.000301,0.360455,0.050708,0.000379,0.59855,0.02468,0.000301,0.390687,0.031603,0.000301,0.368867,0.023225,0.000379,0.313817,0.4937,0.021878,0.899274,0.713996,0.204241,0.895393,0.173095,2.148597e-06,0.996988,0.07305,0.0,0.984818,0.494918,0.011748,0.982481,0.706797,0.026196,0.996997
6,world/life,POP WORLD/LIFE,72.924113,6,197,0.6628,0.027497,0.997998,0.017429,0.000349,0.574397,0.046665,0.000304,0.401109,0.42457,0.300297,0.930577,0.032154,0.000289,0.30828,0.01817,0.000313,0.406157,0.011101,0.000289,0.43584,0.022215,0.000349,0.372749,0.088715,0.000313,0.563764,0.022668,0.000289,0.4422,0.022634,0.000289,0.322419,0.048713,0.000289,0.593332,0.021233,0.000289,0.328865,0.032267,0.000289,0.356037,0.028235,0.000289,0.303161,0.525476,0.040832,0.909022,0.705758,0.39938,0.884496,0.211323,1.455825e-06,0.965863,0.055351,0.0,0.942308,0.514933,0.020301,0.986603,0.6628,0.027497,0.997998


### You are free (and should) to add on to these questions. Please clearly indicate in your assignment your answers to these questions.

The next few questions are ones I added on top of the original 12.

#### Which genre-topic combinations have the highest average energy?


In [21]:
pres_fit_df = lame_pop_music_df.groupby('genre_topic')['energy'].mean().reset_index()
pres_fit_df = pres_fit_df.sort_values(by='energy', ascending=False)
pres_fit_df


Unnamed: 0,genre_topic,energy
0,POP FEELINGS,0.710319
5,POP VIOLENCE,0.706797
6,POP WORLD/LIFE,0.6628
2,POP NIGHT/TIME,0.662152
4,POP SADNESS,0.646524
1,POP MUSIC,0.61037
3,POP ROMANTIC,0.591878


#### What are the top 5 topics with the highest average danceability?

In [22]:
lemons_jazz_df = lame_pop_music_df.groupby('topic')['danceability'].mean().reset_index()
lemons_jazz_df = lemons_jazz_df.sort_values(by='danceability', ascending=False).head(5)
lemons_jazz_df

Unnamed: 0,topic,danceability
2,night/time,0.573556
0,feelings,0.560206
3,romantic,0.548115
6,world/life,0.525476
1,music,0.524706


#### Which topic and genre_topic combinations have the highest average valence?

Valence is a measure of musical positivity. Pretty funny to see that pop music has managed to distribute itself so evenly across the positive and negative spectrum. 


In [28]:
vortexted_df = lame_pop_music_df.groupby(['genre', 'topic', 'genre_topic'])['valence'].agg([
    ('min', 'min'),
    ('25%', lambda x: x.quantile(0.25)),
    ('50%', 'median'),
    ('75%', lambda x: x.quantile(0.75)),
    ('max', 'max'),
    ('mean', 'mean')
]).reset_index()

vortexted_df

Unnamed: 0,genre,topic,genre_topic,min,25%,50%,75%,max,mean
0,pop,feelings,POP FEELINGS,0.050598,0.361088,0.567189,0.841303,0.967024,0.576613
1,pop,music,POP MUSIC,0.024526,0.309048,0.52803,0.732585,0.971146,0.519992
2,pop,night/time,POP NIGHT/TIME,0.017725,0.351041,0.565643,0.762984,0.97939,0.549812
3,pop,romantic,POP ROMANTIC,0.065643,0.312139,0.536789,0.744435,0.972176,0.528335
4,pop,sadness,POP SADNESS,0.012366,0.300289,0.478566,0.686727,0.984542,0.496407
5,pop,violence,POP VIOLENCE,0.011748,0.294363,0.478566,0.683378,0.982481,0.494918
6,pop,world/life,POP WORLD/LIFE,0.020301,0.318838,0.506389,0.71249,0.986603,0.514933
