In [1]:
pip install numpy pandas

Note: you may need to restart the kernel to use updated packages.


# Spotify Top 50 Tracks Analysis

### Introduction
This project analyzes Spotify's top tracks dataset to gain insights into the music trends. 
The dataset includes information about various features of the tracks, such as energy, danceability, key and more.

In [2]:
import numpy as np
import pandas as pd

In [3]:
file_path = "/Users/vytautas/Downloads/spotifytoptracks.csv"

In [4]:
df = pd.read_csv(file_path)

In [5]:
df.head()

Unnamed: 0.1,Unnamed: 0,artist,album,track_name,track_id,energy,danceability,key,loudness,acousticness,speechiness,instrumentalness,liveness,valence,tempo,duration_ms,genre
0,0,The Weeknd,After Hours,Blinding Lights,0VjIjW4GlUZAMYd2vXMi3b,0.73,0.514,1,-5.934,0.00146,0.0598,9.5e-05,0.0897,0.334,171.005,200040,R&B/Soul
1,1,Tones And I,Dance Monkey,Dance Monkey,1rgnBhdG2JDFTbYkYRZAku,0.593,0.825,6,-6.401,0.688,0.0988,0.000161,0.17,0.54,98.078,209755,Alternative/Indie
2,2,Roddy Ricch,Please Excuse Me For Being Antisocial,The Box,0nbXyq5TXYPCO7pr3N8S4I,0.586,0.896,10,-6.687,0.104,0.0559,0.0,0.79,0.642,116.971,196653,Hip-Hop/Rap
3,3,SAINt JHN,Roses (Imanbek Remix),Roses - Imanbek Remix,2Wo6QQD1KMDWeFkkjLqwx5,0.721,0.785,8,-5.457,0.0149,0.0506,0.00432,0.285,0.894,121.962,176219,Dance/Electronic
4,4,Dua Lipa,Future Nostalgia,Don't Start Now,3PfIrDoz19wz7qK7tYeu62,0.793,0.793,11,-4.521,0.0123,0.083,0.0,0.0951,0.679,123.95,183290,Nu-disco


### Dropping Unnecessary Column

1. **Column Removal:** We use the `drop` method to remove a specific column named 'Unnamed: 0' from the DataFrame `df`.
2. **Purpose:** The 'Unnamed: 0' column is dropped, because it contains unnecessary information.

3. **Result:** After this operation, the DataFrame `df` no longer contains the 'Unnamed: 0' column.

In [6]:
df = df.drop(columns=["Unnamed: 0"])

In [7]:
df.head()

Unnamed: 0,artist,album,track_name,track_id,energy,danceability,key,loudness,acousticness,speechiness,instrumentalness,liveness,valence,tempo,duration_ms,genre
0,The Weeknd,After Hours,Blinding Lights,0VjIjW4GlUZAMYd2vXMi3b,0.73,0.514,1,-5.934,0.00146,0.0598,9.5e-05,0.0897,0.334,171.005,200040,R&B/Soul
1,Tones And I,Dance Monkey,Dance Monkey,1rgnBhdG2JDFTbYkYRZAku,0.593,0.825,6,-6.401,0.688,0.0988,0.000161,0.17,0.54,98.078,209755,Alternative/Indie
2,Roddy Ricch,Please Excuse Me For Being Antisocial,The Box,0nbXyq5TXYPCO7pr3N8S4I,0.586,0.896,10,-6.687,0.104,0.0559,0.0,0.79,0.642,116.971,196653,Hip-Hop/Rap
3,SAINt JHN,Roses (Imanbek Remix),Roses - Imanbek Remix,2Wo6QQD1KMDWeFkkjLqwx5,0.721,0.785,8,-5.457,0.0149,0.0506,0.00432,0.285,0.894,121.962,176219,Dance/Electronic
4,Dua Lipa,Future Nostalgia,Don't Start Now,3PfIrDoz19wz7qK7tYeu62,0.793,0.793,11,-4.521,0.0123,0.083,0.0,0.0951,0.679,123.95,183290,Nu-disco


## Identifying Missing Values in Spotify Top 50 Tracks

This exploration looks at the Spotify Top 50 Tracks dataset to identify columns with missing values. 

It uses the isnull() function to find any columns where there are missing values.

If any columns are found to contain missing values, the exploration will provide a list of these columns. 

This information can be used to evaluate the extent of missing data in the dataset and to guide potential strategies for handling missing values


In [10]:
missing_values = df.isnull().sum()

if missing_values.sum() == 0:
    print("No missing values found in any column.")
else:
    print("Columns with missing values: ")
    print(missing_values[missing_values > 0])

No missing values found in any column.


### Duplicate Rows and Columns Detection

### Duplicate Rows

Use the duplicated() method to create a boolean Series indicating whether each row in the DataFrame is a duplicate.

Check if there are any True values in the Boolean Series.

If duplicate rows are found, print a message and display the duplicated rows.

If no duplicate rows are found, print a message indicating that there are no duplicates.

### Duplicate Columns

Transpose the DataFrame.

Use the duplicated() method to create a boolean Series indicating whether each column in the transposed DataFrame is a duplicate.

Check if there are any True values in the Boolean Series.

If duplicate columns are found, print a message and display the duplicated columns.

If no duplicate columns are found, print a message indicating that there are no duplicates.

In [11]:
duplicate_rows = df.duplicated()
if duplicate_rows.any():
    print("Duplicate rows found.")
    print(df[duplicate_rows])
else:
    print("No duplicate rows.")

duplicate_columns = df.T.duplicated()
if duplicate_columns.any():
    print("Duplicate columns found.")
    print(df.T[duplicate_columns].T)
else:
    print("No duplicate columns.")

No duplicate rows.
No duplicate columns.


### Checking outliers with describe() and z-scores

Simple code built following the formula detects outliers. For simplicity, it detected that there were not many outliers, so I decided not to do anything to them.

In [13]:
df.describe()

threshold = 3

mean_values = df.mean()
std_dev_values = df.std()

z_scores = (df - mean_values) / std_dev_values

outliers_df = np.abs(z_scores) > threshold


for col in df.columns:
    column_outliers = df.loc[outliers_df[col], col]
    
    if not column_outliers.empty:
        print(f"\nOutliers in {col}:")
        print(column_outliers)
    else:
        print(f"\nNo outliers in {col}")


No outliers in artist

No outliers in album

No outliers in track_name

No outliers in track_id

No outliers in energy

No outliers in danceability

No outliers in key

Outliers in loudness:
24   -14.454
Name: loudness, dtype: float64

No outliers in acousticness

Outliers in speechiness:
19    0.487
Name: speechiness, dtype: float64

Outliers in instrumentalness:
24    0.657
Name: instrumentalness, dtype: float64

Outliers in liveness:
2     0.790
41    0.792
Name: liveness, dtype: float64

No outliers in valence

No outliers in tempo

Outliers in duration_ms:
49    312820
Name: duration_ms, dtype: int64

No outliers in genre


## Dataset Overview

### Number of Observations

The following code snippet calculates and displays the number of observations in the cleaned Spotify Top 50 Tracks dataset.

# Finding the number of observations in the data cleaned dataset

In [13]:
num_observations = df.shape[0]
print(
    f"Number of observations in Spotify Top 50 Tracks dataset: {num_observations} observations"
)

Number of observations in Spotify Top 50 Tracks dataset: 50 observations


## Dataset Overview

### Number of Features

The following code snippet calculates and displays the number of features in the cleaned Spotify Top 50 Tracks dataset.

In [14]:
num_features = df.shape[1]
print(f"Number of features in Spotify Top 50 Tracks dataset: {num_features} features")

Number of features in Spotify Top 50 Tracks dataset: 16 features


## Dataset Exploration

### Identification of Categorical Features

In this section, the analysis is focused on identifying categorical features in the Spotify Top 50 Tracks dataset. The following code was used to extract and display the categorical columns:

In [15]:
categorical_columns = df.select_dtypes(include=["category", "object"]).columns

df_categorical = df[categorical_columns]

print("Categorical features:")
for feature in df_categorical:
    print(f" - {feature}")

Categorical features:
 - artist
 - album
 - track_name
 - track_id
 - genre


## Numerical Features in Spotify Top 50 Tracks

Numerical features in the Spotify Top 50 Tracks dataset provide insights into the musical characteristics of each track. 

These features, such as energy, danceability, key, and loudness, can be used to quantitatively measure the musical profile of each track. 

Numerical features are also important for statistical analysis and correlation exploration. 

They can also be used for outlier detection and quality control.

Overall, numerical features empower us to delve deeper into the musical landscape and enhance our analytical capabilities.

In [16]:
numeric_columns = df.select_dtypes(include=["number"]).columns

df_numeric = df[numeric_columns]

print("Numerical features:")
for feature in df_numeric:
    print(f" - {feature}")

Numerical features:
 - energy
 - danceability
 - key
 - loudness
 - acousticness
 - speechiness
 - instrumentalness
 - liveness
 - valence
 - tempo
 - duration_ms


## Popular Artists in Spotify Top 50 Tracks

This exploration looks at the distribution of artists in the Spotify Top 50 Tracks dataset to identify those with multiple popular tracks. 

It finds that artists with more than one popular track are common. 

This suggests that there is a diversity of popular artists in the dataset, but also that some artists are more popular than others.

In [17]:
artist_track_counts = df["artist"].value_counts()
artists_with_multiple_tracks = artist_track_counts[artist_track_counts > 1]

if not artists_with_multiple_tracks.empty:
    print("Artists with more than one popular track:")
    for artist, track_count in artists_with_multiple_tracks.items():
        print(f" - {artist}: {track_count} tracks")
else:
    print("No artists with more than one popular track.")

Artists with more than one popular track:
 - Travis Scott: 3 tracks
 - Billie Eilish: 3 tracks
 - Dua Lipa: 3 tracks
 - Post Malone: 2 tracks
 - Justin Bieber: 2 tracks
 - Harry Styles: 2 tracks
 - Lewis Capaldi: 2 tracks


## Most Popular Artists in Spotify Top 50 Tracks

This exploration looks at the artists with the most popular tracks in the Spotify Top 50 Tracks dataset. 

It finds that the most popular artists have the highest number of popular tracks. 

This provides insights into which artists are most dominant and influential in the dataset.

In [18]:
max_track_count = artists_with_multiple_tracks.max()
most_popular_artists = artists_with_multiple_tracks[
    artists_with_multiple_tracks == max_track_count
].index

if not most_popular_artists.empty:
    print("The most popular artist(s) with the highest amount of popular tracks:")
    for artist in most_popular_artists:
        print(f" - {artist}")
else:
    print("No artists with more than one popular track.")

The most popular artist(s) with the highest amount of popular tracks:
 - Travis Scott
 - Billie Eilish
 - Dua Lipa


## Artist Diversity in Spotify Top 50 Tracks

This exploration looks at the diversity of artists in the Spotify Top 50 Tracks dataset. 

It finds that there are a total of 40 unique artists in the Top 50 Spotify list. 

This provides insights into the overall artist diversity in the dataset and helps us understand the breadth of artists represented in the dataset.

In [19]:
total_songs = df["artist"].nunique()
print(f"Total number of artists in Top 50 Spotify list: {total_songs}")

Total number of artists in Top 50 Spotify list: 40


## Album Popularity in Spotify Top 50 Tracks

This exploration looks at the popularity distribution of albums in the Spotify Top 50 Tracks dataset. 

It finds that some albums have more than one popular track, which suggests that there is both diversity and repetition in album popularity. 

This exploration helps us better understand how tracks are distributed across albums and can guide future analysis.

In [20]:
album_track_count = df["album"].value_counts()
albums_with_multiple_tracks = album_track_count[album_track_count > 1]

if not albums_with_multiple_tracks.empty:
    print("Albums with more than one popular track:")
    for album, track_count in albums_with_multiple_tracks.items():
        print(f" - {album}: {track_count} tracks")
else:
    print(f"No albums with more than one popular track.")

Albums with more than one popular track:
 - Future Nostalgia: 3 tracks
 - Changes: 2 tracks
 - Hollywood's Bleeding: 2 tracks
 - Fine Line: 2 tracks


## Album Diversity in Spotify Top 50 Tracks

This exploration looks at the diversity of albums in the Spotify Top 50 Tracks dataset. 

It finds that there are 45 unique albums in the dataset. 

This provides insights into the range of albums represented in the dataset and is a fundamental metric for assessing album diversity.

In [21]:
total_albums = df["album"].nunique()
print(f"Total number of artists in Top 50 Spotify list: {total_albums}")

Total number of artists in Top 50 Spotify list: 45


## Highly Danceable Tracks in Spotify Top 50 Tracks

This exploration looks at tracks with danceability scores above 0.7 in the Spotify Top 50 Tracks dataset. 

It finds that these tracks are highly danceable, with upbeat tempos and catchy melodies. 

This exploration can be used to identify highly danceable tracks for playlists or to learn more about the characteristics of danceable music.

In [22]:
danceability_tracks_above = df[df["danceability"] > 0.7]
print("Tracks with danceability score above 0.7: \n")
print(danceability_tracks_above[["track_name", "danceability"]])

Tracks with danceability score above 0.7: 

                                       track_name  danceability
1                                    Dance Monkey         0.825
2                                         The Box         0.896
3                           Roses - Imanbek Remix         0.785
4                                 Don't Start Now         0.793
5                    ROCKSTAR (feat. Roddy Ricch)         0.746
7                death bed (coffee for your head)         0.726
8                                         Falling         0.784
10                                           Tusa         0.803
13                                Blueberry Faygo         0.774
14                       Intentions (feat. Quavo)         0.806
15                                   Toosie Slide         0.830
17                                         Say So         0.787
18                                       Memories         0.764
19                     Life Is Good (feat. Drake)         0.

## Less Danceable Tracks in Spotify Top 50 Tracks

This exploration looks at tracks with danceability scores below 0.4 in the Spotify Top 50 Tracks dataset. 

It finds that these tracks are less danceable, with slower tempos and more complex melodies. 

This exploration can be used to identify less danceable tracks for playlists or to learn more about the characteristics of less danceable music.

In [23]:
danceability_tracks_below = df[df["danceability"] < 0.4]
print("Tracks with danceability score below 0.4: \n")
print(danceability_tracks_below[["track_name", "danceability"]])

Tracks with danceability score below 0.4: 

              track_name  danceability
44  lovely (with Khalid)         0.351


## Loud Tracks in Spotify Top 50 Tracks

This exploration looks at tracks with loudness scores above -5 in the Spotify Top 50 Tracks dataset. 

It finds that these tracks are louder, with more intense instrumentals and vocals. 

This exploration can be used to identify loud tracks for playlists or to learn more about the characteristics of loud music.

In [24]:
loudness_tracks_above = df[df["loudness"] > -5]
print("Tracks with loudness score above -5: \n")
print(loudness_tracks_above[["track_name", "loudness"]])

Tracks with loudness score above -5: 

                                       track_name  loudness
4                                 Don't Start Now    -4.521
6                                Watermelon Sugar    -4.209
10                                           Tusa    -3.280
12                                        Circles    -3.497
16                                  Before You Go    -4.858
17                                         Say So    -4.577
21                                      Adore You    -3.675
23                         Mood (feat. iann dior)    -3.558
31                                 Break My Heart    -3.434
32                                       Dynamite    -4.410
33               Supalonely (feat. Gus Dapperton)    -4.746
35                Rain On Me (with Ariana Grande)    -3.764
37  Sunflower - Spider-Man: Into the Spider-Verse    -4.368
38                                          Hawái    -3.454
39                                        Ride It    -4.258
4

## Quiet Tracks in Spotify Top 50 Tracks

This exploration looks at tracks with loudness scores below -8 in the Spotify Top 50 Tracks dataset. 

It finds that these tracks are quieter, with more mellow instrumentals and vocals. 

This exploration can be used to identify quiet tracks for playlists or to learn more about the characteristics of quiet music.

In [25]:
loudness_tracks_below = df[df["loudness"] < -8]
print("Tracks with loudness score below -8: \n")
print(loudness_tracks_below[["track_name", "loudness"]])

Tracks with loudness score below -8: 

                                        track_name  loudness
7                 death bed (coffee for your head)    -8.765
8                                          Falling    -8.756
15                                    Toosie Slide    -8.820
20                Savage Love (Laxed - Siren Beat)    -8.520
24                             everything i wanted   -14.454
26                                         bad guy   -10.965
36                             HIGHEST IN THE ROOM    -8.764
44                            lovely (with Khalid)   -10.109
47  If the World Was Ending - feat. Julia Michaels   -10.086


## Longest Track in Spotify Top 50 Tracks

This exploration looks at the track with the longest duration in the Spotify Top 50 Tracks dataset. 

It finds that the track is SICKO MODE which is 31820 ms long. 

This exploration helps us find potential insights into user preferences for longer tracks

In [30]:
max_duration = df["duration_ms"].max()
longest_track = df[df["duration_ms"] == max_duration]

print("Tracks with the longest duration: ")
print(longest_track[["track_name", "duration_ms"]])

Tracks with the longest duration: 
    track_name  duration_ms
49  SICKO MODE       312820


## Shortest Track in Spotify Top 50 Tracks

This exploration looks at the track with the shortest duration in the Spotify Top 50 Tracks dataset. 

It finds that the track is Mood (feat. iann dior), which is 140526 ms long. 

This exploration helps us find potential insights into user preferences for shorter tracks.

In [31]:
min_duration = df["duration_ms"].min()
shortest_track = df[df["duration_ms"] == min_duration]
print("Tracks with the shortest duration: \n")
print(shortest_track[["track_name", "duration_ms"]])

Tracks with the shortest duration: 

                track_name  duration_ms
23  Mood (feat. iann dior)       140526


## Most Popular Genre in Spotify Top 50 Tracks

This exploration looks at the distribution of genres in the Spotify Top 50 Tracks dataset. 

It finds that the most popular genre is Pop, with 14 occurrences. This suggests that Pop is the dominant genre in the dataset, and it may be a good focus area for further analysis.

In [32]:
genre_counts = df["genre"].value_counts()
most_popular_genre = genre_counts.idxmax()
occurrences = genre_counts.loc[most_popular_genre]

print(
    f"The most popular genre and its occurences:\nGenre: {most_popular_genre} \nOccurences: {occurrences}"
)

The most popular genre and its occurences:
Genre: Pop 
Occurences: 14


## Rare Genres in Spotify Top 50 Tracks

This exploration looks at genres in the Spotify Top 50 Tracks dataset that are represented by a single song. 

It finds that the following genres are represented by only one song:

Dreampop/Hip-Hop/R&B                  
Hip-Hop/Trap                          
Nu-disco                              
Dance-pop/Disco                       
R&B/Hip-Hop alternative               
Pop rap                               
Pop/Soft Rock                         
Disco-pop                             
Alternative/reggaeton/experimental    
Chamber pop 

This suggests that these genres are relatively rare in the dataset, and it may be interesting to learn more about them. 

For example, one could explore the artists and songs that represent these genres, and try to understand why they are less common.

In [33]:
genre_counts = df["genre"].value_counts()
genres_with_one_song = genre_counts[genre_counts == 1]

print(f"Genres with only one song in the Spotify Top 50: \n\n{genres_with_one_song}")

Genres with only one song in the Spotify Top 50: 

Chamber pop                           1
Pop rap                               1
Disco-pop                             1
Dreampop/Hip-Hop/R&B                  1
Nu-disco                              1
Pop/Soft Rock                         1
R&B/Hip-Hop alternative               1
Hip-Hop/Trap                          1
Alternative/reggaeton/experimental    1
Dance-pop/Disco                       1
Name: genre, dtype: int64


## Unique Genres in Spotify Top 50 Tracks

This exploration looks at the diversity of genres in the Spotify Top 50 Tracks dataset to identify the unique musical styles represented. 

It finds that there are 16 unique genres in the dataset:

 - R&B/Soul
 - Alternative/Indie
 - Hip-Hop/Rap
 - Dance/Electronic
 - Nu-disco
 - Pop
 - R&B/Hip-Hop alternative
 - Pop/Soft Rock
 - Pop rap
 -  Electro-pop
 - Hip-Hop/Trap
 - Dance-pop/Disco
 - Disco-pop
 - Dreampop/Hip-Hop/R&B
 - Alternative/reggaeton/experimental
 - Chamber pop
 
This suggests that there is a wide range of musical styles represented in the Spotify Top 50 Tracks dataset, and that some genres are more unique than others.

In [27]:
unique_genres = df["genre"].unique()

print("Unique genres represented in the Spotify Top50 list:")
for genre in unique_genres:
    print(f" - {genre}")
print(f"\nTotal number of unique genres: {len(unique_genres)}")

16


## Strongly Positively Correlated Features in Spotify Top 50 Tracks

This exploration looks at the correlation matrix of the Spotify Top 50 Tracks dataset to identify pairs of features that exhibit a strong positive correlation. 

It finds that the following feature pairs have a correlation coefficient greater than 0.7:
loudness  energy      0.79164
energy    loudness    0.79164

This suggests that these feature pairs are closely related, and that changes in one feature are likely to be accompanied by changes in the other feature. 

This information can be used to better understand the relationships between features and to make predictions about how changes in one feature will affect another.

In [35]:
correlation_matrix = df.corr()
correlation_pairs = correlation_matrix.unstack().sort_values()

strong_positive_correlation_pairs = correlation_pairs[
    (correlation_pairs > 0.7) & (correlation_pairs < 1.0)
]

if not strong_positive_correlation_pairs.empty:
    print("Strongly positively correlated features:")
    print(strong_positive_correlation_pairs)
else:
    print("There is no negatively correlated pairs")

Strongly positively correlated features:
loudness  energy      0.79164
energy    loudness    0.79164
dtype: float64


## Strongly Negatively Correlated Features in Spotify Top 50 Tracks

This exploration looks at the correlation matrix of the Spotify Top 50 Tracks dataset to identify pairs of features that exhibit a strong negative correlation. 

It finds that there are no negatively correlated pairs

But if there was some, it would suggest that these feature pairs are inversely related, and that an increase in one feature is likely to be accompanied by a decrease in the other feature. 

This information can be used to better understand the relationships between features and to make predictions about how changes in one feature will affect another.

In [36]:
correlation_pairs = correlation_matrix.unstack().sort_values()

strong_negative_correlation_pairs = correlation_pairs[
    (correlation_pairs < -0.7) & (correlation_pairs > -1.0)
]

if not strong_negative_correlation_pairs.empty:
    print("Strong negatively correlated features:")
    print(strong_negative_correlation_pairs)
else:
    print("There are no negatively correlated pairs")

There are no negatively correlated pairs


## Uncorrelated Features in Spotify Top 50 Tracks

This analysis looks at the correlation matrix of the Spotify Top 50 Tracks dataset to identify features that show zero correlation. It finds that there are no features that do no correlate.

But if there was some, it would suggest that these features are independent of each other, and that changes in one feature are not likely to be accompanied by changes in the other features.

In [37]:
correlation_matrix = df.corr()

no_correlation_pairs = correlation_matrix.unstack().sort_values().abs() == 0

no_correlation_features = no_correlation_pairs[no_correlation_pairs].index

if not no_correlation_features.empty:
    print("Features with no correlation:")
    print(no_correlation_features)
else:
    print("There are no features that do no correlate.")

There are no features that do no correlate.


## Comparing the danceability of different genres of music

On average, hip-hop/rap and dance/electronic songs are more danceable than pop and alternative/indie songs.

However, pop and alternative/indie songs have a wider range of danceability scores, meaning that there are some songs in these genres that are very danceable and others that are less danceable.

This information can help us to understand the musical diversity present in each genre.

## Breakdown

Mean danceability: 

- Hip-hop/rap and dance/electronic songs have a higher mean danceability than pop and alternative/indie songs. 

This means that, on average, songs in these two genres are more danceable.

Range of danceability scores:

- Pop and alternative/indie songs have a wider range of danceability scores than hip-hop/rap and dance/electronic songs. 

This means that there is a greater variety of danceability scores within these two genres.

Musical diversity: 

- The different danceability characteristics of different genres provide insights into the musical diversity present in each genre.

In [38]:
# Group by 'genre' and calculate loudness statistics for each genre
genre_danceability_stats = df.groupby("genre")["danceability"].describe()

# Print the results
for genre, stats in genre_danceability_stats.iterrows():
    print(f"{genre} Genre Danceability Stats:")
    print(stats)
    print()

 Electro-pop Genre Danceability Stats:
count    2.000000
mean     0.789500
std      0.125158
min      0.701000
25%      0.745250
50%      0.789500
75%      0.833750
max      0.878000
Name:  Electro-pop, dtype: float64

Alternative/Indie Genre Danceability Stats:
count    4.000000
mean     0.661750
std      0.211107
min      0.459000
25%      0.490500
50%      0.663000
75%      0.834250
max      0.862000
Name: Alternative/Indie, dtype: float64

Alternative/reggaeton/experimental Genre Danceability Stats:
count    1.000
mean     0.607
std        NaN
min      0.607
25%      0.607
50%      0.607
75%      0.607
max      0.607
Name: Alternative/reggaeton/experimental, dtype: float64

Chamber pop Genre Danceability Stats:
count    1.000
mean     0.351
std        NaN
min      0.351
25%      0.351
50%      0.351
75%      0.351
max      0.351
Name: Chamber pop, dtype: float64

Dance-pop/Disco Genre Danceability Stats:
count    1.00
mean     0.73
std       NaN
min      0.73
25%      0.73
50%     

## Comparing the loudness of different genres of music

For simplicity purposes, lets assume that the loudness was measured here in decibels.

Electro-pop and alternative/indie songs are generally quieter than dance/electronic and hip-hop/rap songs.

- Electro-pop songs have a mean loudness of -8.898 decibels (dB).

- Alternative/indie songs have a mean loudness of -5.421 dB.

- Dance/electronic songs have a mean loudness of -5.338 dB.

- Hip-hop/rap songs have a mean loudness of -6.918 dB.

Electro-pop and hip-hop/rap songs have a wider range of loudness levels than alternative/indie and dance/electronic songs.

This means that there are some electro-pop and hip-hop/rap songs that are much louder or much quieter than the average song in those genres.

The variability in loudness levels across genres contributes to the diversity of music preferences.

Some people prefer louder music, while others prefer quieter music. The different loudness levels of different genres give people a variety of choices to choose from.

In [11]:
genre_loudness_stats = df.groupby("genre")["loudness"].describe()

for genre, stats in genre_loudness_stats.iterrows():
    print(f"{genre} Genre Loudness Stats:")
    print(stats)
    print()

 Electro-pop Genre Loudness Stats:
count     2.000000
mean     -8.898500
std       2.922472
min     -10.965000
25%      -9.931750
50%      -8.898500
75%      -7.865250
max      -6.832000
Name:  Electro-pop, dtype: float64

Alternative/Indie Genre Loudness Stats:
count    4.000000
mean    -5.421000
std      0.774502
min     -6.401000
25%     -5.859500
50%     -5.268500
75%     -4.830000
max     -4.746000
Name: Alternative/Indie, dtype: float64

Alternative/reggaeton/experimental Genre Loudness Stats:
count    1.000
mean    -4.074
std        NaN
min     -4.074
25%     -4.074
50%     -4.074
75%     -4.074
max     -4.074
Name: Alternative/reggaeton/experimental, dtype: float64

Chamber pop Genre Loudness Stats:
count     1.000
mean    -10.109
std         NaN
min     -10.109
25%     -10.109
50%     -10.109
75%     -10.109
max     -10.109
Name: Chamber pop, dtype: float64

Dance-pop/Disco Genre Loudness Stats:
count    1.000
mean    -3.434
std        NaN
min     -3.434
25%     -3.434
50%    

## Comparing the acousticness of different genres of music

Acousticness of songs in the Spotify Top 50 Tracks dataset for different genres.

Alternative/indie songs are generally more acoustic than dance/electronic and electro-pop songs.

Alternative/indie songs have a mean acousticness of 0.5835.
Electro-pop songs have a mean acousticness of 0.2555.
Dance/electronic songs have a mean acousticness of 0.09944.
However, there is a variety of acousticness levels within each genre.

This means that there are some alternative/indie songs that are very acoustic, while others are less acoustic. The same is true for electro-pop and dance/electronic songs.

The variability in acousticness levels across genres contributes to the diversity of music styles and preferences.

Some people prefer more acoustic music, while others prefer less acoustic music. The different acousticness levels of different genres give people a variety of choices to choose from.

In [42]:
genre_acousticness_stats = df.groupby("genre")["acousticness"].describe()

for genre, stats in genre_acousticness_stats.iterrows():
    print(f"{genre} Genre Acousticness Stats:")
    print(stats)
    print()

 Electro-pop Genre Acousticness Stats:
count    2.00000
mean     0.25550
std      0.10253
min      0.18300
25%      0.21925
50%      0.25550
75%      0.29175
max      0.32800
Name:  Electro-pop, dtype: float64

Alternative/Indie Genre Acousticness Stats:
count    4.000000
mean     0.583500
std      0.204086
min      0.291000
25%      0.525750
50%      0.646000
75%      0.703750
max      0.751000
Name: Alternative/Indie, dtype: float64

Alternative/reggaeton/experimental Genre Acousticness Stats:
count    1.0000
mean     0.0103
std         NaN
min      0.0103
25%      0.0103
50%      0.0103
75%      0.0103
max      0.0103
Name: Alternative/reggaeton/experimental, dtype: float64

Chamber pop Genre Acousticness Stats:
count    1.000
mean     0.934
std        NaN
min      0.934
25%      0.934
50%      0.934
75%      0.934
max      0.934
Name: Chamber pop, dtype: float64

Dance-pop/Disco Genre Acousticness Stats:
count    1.000
mean     0.167
std        NaN
min      0.167
25%      0.167
50%