# Spotify Top 50 Tracks Analysis

## Introduction

This project conducts a comprehensive analysis of Spotify's Top 50 Tracks dataset to uncover insights into current music trends. Through detailed data exploration and statistical analysis, we aim to understand what characterizes successful music in the streaming era.

### Project Objectives
- Analyze the musical characteristics of top-performing tracks
- Identify patterns in genre distribution and artist representation
- Examine relationships between audio features
- Understand what makes songs popular on Spotify

### Analysis Framework
The project will proceed through the following stages:

1. **Data Preparation**
   - Loading and cleaning the dataset
   - Handling missing values and outliers
   - Identifying categorical and numerical features

2. **Artist and Album Analysis**
   - Examining artist representation in top tracks
   - Analyzing album popularity
   - Measuring artist and album diversity

3. **Musical Features Analysis**
   - Investigating track characteristics:
     - Danceability
     - Energy
     - Loudness
     - Acousticness
     - Duration
   - Comparing features across different genres

4. **Genre Analysis**
   - Identifying dominant genres
   - Analyzing genre distribution
   - Examining genre-specific characteristics

5. **Statistical Analysis**
   - Correlation analysis between features
   - Cross-genre comparisons
   - Outlier detection and impact

6. **Visualization**
   - Creating visual representations of key findings
   - Plotting feature distributions
   - Comparing characteristics across genres

### Expected Insights
This analysis will provide valuable insights for:
- Music industry professionals
- Artists and producers
- Music streaming platforms
- Music marketing strategists

By examining these aspects, we aim to understand the characteristics that contribute to a track's success on Spotify and identify trends in popular music consumption.

---

Let's begin with our analysis by importing the necessary libraries and preparing our dataset.

In [1]:
%load_ext autoreload
%autoreload 2

In [10]:
import numpy as np
import pandas as pd

from spotify_top_50_2020_analysis.utils import load_spotify_data, detect_outliers, print_outliers

We will load the dataset into a pandas DataFrame for analysis. Ensure that the CSV file is in the correct directory or provide the full path to the file.

In [3]:
df = load_spotify_data()

Successfully loaded data from: /Users/vytautasbunevicius/spotify-top-50-2020-analysis/data/spotifytoptracks.csv


Let's display the first few rows to get an idea of the data.

In [4]:
df.head()

Unnamed: 0.1,Unnamed: 0,artist,album,track_name,track_id,energy,danceability,key,loudness,acousticness,speechiness,instrumentalness,liveness,valence,tempo,duration_ms,genre
0,0,The Weeknd,After Hours,Blinding Lights,0VjIjW4GlUZAMYd2vXMi3b,0.73,0.514,1,-5.934,0.00146,0.0598,9.5e-05,0.0897,0.334,171.005,200040,R&B/Soul
1,1,Tones And I,Dance Monkey,Dance Monkey,1rgnBhdG2JDFTbYkYRZAku,0.593,0.825,6,-6.401,0.688,0.0988,0.000161,0.17,0.54,98.078,209755,Alternative/Indie
2,2,Roddy Ricch,Please Excuse Me For Being Antisocial,The Box,0nbXyq5TXYPCO7pr3N8S4I,0.586,0.896,10,-6.687,0.104,0.0559,0.0,0.79,0.642,116.971,196653,Hip-Hop/Rap
3,3,SAINt JHN,Roses (Imanbek Remix),Roses - Imanbek Remix,2Wo6QQD1KMDWeFkkjLqwx5,0.721,0.785,8,-5.457,0.0149,0.0506,0.00432,0.285,0.894,121.962,176219,Dance/Electronic
4,4,Dua Lipa,Future Nostalgia,Don't Start Now,3PfIrDoz19wz7qK7tYeu62,0.793,0.793,11,-4.521,0.0123,0.083,0.0,0.0951,0.679,123.95,183290,Nu-disco


Upon inspecting the DataFrame, we notice an `'Unnamed: 0'` column, which seems to be an unnecessary index column resulting from saving the DataFrame to CSV.

We will remove this column.

In [5]:
df = df.drop(columns=["Unnamed: 0"])

In [6]:
df.head()

Unnamed: 0,artist,album,track_name,track_id,energy,danceability,key,loudness,acousticness,speechiness,instrumentalness,liveness,valence,tempo,duration_ms,genre
0,The Weeknd,After Hours,Blinding Lights,0VjIjW4GlUZAMYd2vXMi3b,0.73,0.514,1,-5.934,0.00146,0.0598,9.5e-05,0.0897,0.334,171.005,200040,R&B/Soul
1,Tones And I,Dance Monkey,Dance Monkey,1rgnBhdG2JDFTbYkYRZAku,0.593,0.825,6,-6.401,0.688,0.0988,0.000161,0.17,0.54,98.078,209755,Alternative/Indie
2,Roddy Ricch,Please Excuse Me For Being Antisocial,The Box,0nbXyq5TXYPCO7pr3N8S4I,0.586,0.896,10,-6.687,0.104,0.0559,0.0,0.79,0.642,116.971,196653,Hip-Hop/Rap
3,SAINt JHN,Roses (Imanbek Remix),Roses - Imanbek Remix,2Wo6QQD1KMDWeFkkjLqwx5,0.721,0.785,8,-5.457,0.0149,0.0506,0.00432,0.285,0.894,121.962,176219,Dance/Electronic
4,Dua Lipa,Future Nostalgia,Don't Start Now,3PfIrDoz19wz7qK7tYeu62,0.793,0.793,11,-4.521,0.0123,0.083,0.0,0.0951,0.679,123.95,183290,Nu-disco


Next, we need to check if there are any missing values in the dataset, which could affect our analysis.

In [7]:
missing_values = df.isnull().sum()

if missing_values.sum() == 0:
    print("No missing values found in any column.")
else:
    print("Columns with missing values: ")
    print(missing_values[missing_values > 0])

No missing values found in any column.


Since there are no missing values, the next step is to check for duplicates:

1. **Rows**: Use `duplicated()` to find any duplicate rows. Display them if found; otherwise, confirm there are none.

2. **Columns**: Transpose the DataFrame and use `duplicated()` to identify any duplicate columns. Display them if found; otherwise, confirm there are none.

In [8]:
duplicate_rows = df.duplicated()
if duplicate_rows.any():
    print("Duplicate rows found.")
    print(df[duplicate_rows])
else:
    print("No duplicate rows.")

duplicate_columns = df.T.duplicated()
if duplicate_columns.any():
    print("Duplicate columns found.")
    print(df.T[duplicate_columns].T)
else:
    print("No duplicate columns.")

No duplicate rows.
No duplicate columns.


Since no duplicate rows or columns were found, we proceeded to detect outliers using a simple code-based formula. Given that only a few outliers were detected, no further action was taken.

In [11]:
outliers = detect_outliers(df, method='iqr', threshold=1.5)
print_outliers(outliers)


No outliers in energy

Outliers in danceability:
16    0.459
44    0.351
47    0.464
Name: danceability, dtype: float64

No outliers in key

Outliers in loudness:
24   -14.454
Name: loudness, dtype: float64

Outliers in acousticness:
1     0.688
7     0.731
9     0.751
18    0.837
24    0.902
44    0.934
47    0.866
Name: acousticness, dtype: float64

Outliers in speechiness:
19    0.487
26    0.375
27    0.375
29    0.342
38    0.389
43    0.379
Name: speechiness, dtype: float64

Outliers in instrumentalness:
0     0.000095
1     0.000161
3     0.004320
10    0.000134
12    0.002440
24    0.657000
26    0.130000
33    0.000209
34    0.001880
39    0.000064
41    0.001090
48    0.000658
Name: instrumentalness, dtype: float64

Outliers in liveness:
2     0.790
7     0.696
41    0.792
Name: liveness, dtype: float64

No outliers in valence

No outliers in tempo

Outliers in duration_ms:
43    295177
49    312820
Name: duration_ms, dtype: int64

No outliers in artist

No outliers in album

We can see that outliers were found in danceability, loudness, acousticness, speechiness, instrumentalness, liveness, and duration, with specific rows identified for each feature. 

No outliers were detected in energy, key, valence, tempo, or categorical features such as artist, album, track name, track ID, and genre.

Since outliers were detected in several features, the next step is to examine the shape of the dataset to determine the number of observations and columns.

In [13]:
df.shape

(50, 16)

We can see that the dataset contains 50 observations and 16 features, represented as a (50, 16) shape.

With 50 observations and 16 features in the dataset, the next step is to identify categorical features. Below is the code used to extract and display the categorical columns.

In [14]:
categorical_columns = df.select_dtypes(include=["category", "object"]).columns

df_categorical = df[categorical_columns]

print("Categorical features:")
for feature in df_categorical:
    print(f" - {feature}")

Categorical features:
 - artist
 - album
 - track_name
 - track_id
 - genre


Identified categorical features include artist, album, track_name, track_id, and genre.

Next, we will examine the numerical features in the dataset.

In [16]:
numeric_columns = df.select_dtypes(include=["number"]).columns

df_numeric = df[numeric_columns]

print("Numerical features:")
for feature in df_numeric:
    print(f" - {feature}")

Numerical features:
 - energy
 - danceability
 - key
 - loudness
 - acousticness
 - speechiness
 - instrumentalness
 - liveness
 - valence
 - tempo
 - duration_ms


Identified numerical features includes energy, danceability, key, loudness, acousticness, speechiness, instrumentalness, liveness, valence, tempo, duration_ms

Next, we explore the distribution of artists to identify those with multiple popular tracks.

In [17]:
artist_track_counts = df["artist"].value_counts()
artists_with_multiple_tracks = artist_track_counts[artist_track_counts > 1]

if not artists_with_multiple_tracks.empty:
    print("Artists with more than one popular track:")
    for artist, track_count in artists_with_multiple_tracks.items():
        print(f" - {artist}: {track_count} tracks")
else:
    print("No artists with more than one popular track.")

Artists with more than one popular track:
 - Travis Scott: 3 tracks
 - Billie Eilish: 3 tracks
 - Dua Lipa: 3 tracks
 - Post Malone: 2 tracks
 - Justin Bieber: 2 tracks
 - Harry Styles: 2 tracks
 - Lewis Capaldi: 2 tracks


**Artists with multiple popular tracks**:
- Travis Scott: 3 tracks
- Billie Eilish: 3 tracks
- Dua Lipa: 3 tracks
- Post Malone: 2 tracks
- Justin Bieber: 2 tracks
- Harry Styles: 2 tracks
- Lewis Capaldi: 2 tracks

Several artists have multiple tracks, indicating both diversity and dominance in popularity.

This analysis highlights the most influential artists in the Spotify Top 50.

Next, we can check artists with most track count.

In [18]:
max_track_count = artists_with_multiple_tracks.max()
most_popular_artists = artists_with_multiple_tracks[
    artists_with_multiple_tracks == max_track_count
].index

if not most_popular_artists.empty:
    print("The most popular artist(s) with the highest amount of popular tracks:")
    for artist in most_popular_artists:
        print(f" - {artist}")
else:
    print("No artists with more than one popular track.")

The most popular artist(s) with the highest amount of popular tracks:
 - Travis Scott
 - Billie Eilish
 - Dua Lipa


Leading the charts with multiple hit tracks are powerhouse artists **Travis Scott**, **Billie Eilish**, and **Dua Lipa**.

Their consistent presence in Spotify's `Top 50 Tracks` demonstrates their widespread appeal and ability to create viral hits that resonate with global audiences.

To deepen our understanding of the streaming landscape, let's examine the diversity of artists represented in the `Spotify Top 50 Tracks` dataset. This analysis will reveal patterns in artist representation and genre distribution across the platform's most popular songs.

In [19]:
total_songs = df["artist"].nunique()
print(f"Total number of artists in Top 50 Spotify list: {total_songs}")

Total number of artists in Top 50 Spotify list: 40


Our analysis reveals **40 unique artists** in the `Top 50 Spotify` list, indicating a healthy balance between repeat hitmakers and individual success stories. 

This distribution highlights significant artist diversity, suggesting that while some musicians place multiple tracks in the top charts, there's still substantial room for varied voices and styles.

Next we can check for album popularity.

In [15]:
album_track_count = df["album"].value_counts()
albums_with_multiple_tracks = album_track_count[album_track_count > 1]

if not albums_with_multiple_tracks.empty:
    print("Albums with more than one popular track:")
    for album, track_count in albums_with_multiple_tracks.items():
        print(f" - {album}: {track_count} tracks")
else:
    print("No albums with more than one popular track.")

Albums with more than one popular track:
 - Future Nostalgia: 3 tracks
 - Hollywood's Bleeding: 2 tracks
 - Fine Line: 2 tracks
 - Changes: 2 tracks


Several albums have achieved multiple entries in the `Top 50`, demonstrating their exceptional commercial and cultural impact:

* **Future Nostalgia** by Dua Lipa: 3 tracks
* **Hollywood's Bleeding** by Post Malone: 2 tracks
* **Fine Line** by Harry Styles: 2 tracks
* **Changes** by Justin Bieber: 2 tracks

Let's delve deeper into the album diversity across Spotify's `Top 50 Tracks` to understand how different releases compete for listener attention and chart success.

In [21]:
total_albums = df["album"].nunique()
print(f"Total number of artists in Top 50 Spotify list: {total_albums}")

Total number of artists in Top 50 Spotify list: 45


Our analysis reveals **45 unique artists** in the `Spotify Top 50`, demonstrating significant diversity in the chart rankings. This wide representation suggests a dynamic musical landscape where various artists can achieve mainstream success.

Let's explore the tracks with high danceability scores (above `0.7`) to understand which songs are most likely to get listeners moving and their prevalence in the current Top 50.

In [22]:
danceability_tracks_above = df[df["danceability"] > 0.7]
print("Tracks with danceability score above 0.7: \n")
print(danceability_tracks_above[["track_name", "danceability"]])

Tracks with danceability score above 0.7: 

                                       track_name  danceability
1                                    Dance Monkey         0.825
2                                         The Box         0.896
3                           Roses - Imanbek Remix         0.785
4                                 Don't Start Now         0.793
5                    ROCKSTAR (feat. Roddy Ricch)         0.746
7                death bed (coffee for your head)         0.726
8                                         Falling         0.784
10                                           Tusa         0.803
13                                Blueberry Faygo         0.774
14                       Intentions (feat. Quavo)         0.806
15                                   Toosie Slide         0.830
17                                         Say So         0.787
18                                       Memories         0.764
19                     Life Is Good (feat. Drake)         0.

Out of the `Top 50` tracks, **30 songs** have a danceability score above `0.7`, indicating a strong preference for danceable music among listeners. Here are some notable highlights:

**Highest Danceability Scores**

* **WAP** (feat. Megan Thee Stallion): `0.935` - Highest score
* **The Box**: `0.896` - Second highest
* **Ride It**: `0.880` - Third highest
* **Sunday Best**: `0.878`
* **Supalonely**: `0.862`

Many chart-topping hits like **Dance Monkey** (`0.825`), **Toosie Slide** (`0.830`), and **SICKO MODE** (`0.834`) also feature strong danceability scores, suggesting that rhythmic, dance-friendly tracks perform well on Spotify's charts.

Let's examine the tracks with lower danceability scores to understand the full spectrum of musical characteristics in the Top 50.

In [23]:
danceability_tracks_below = df[df["danceability"] < 0.4]
print("Tracks with danceability score below 0.4: \n")
print(danceability_tracks_below[["track_name", "danceability"]])

Tracks with danceability score below 0.4: 

              track_name  danceability
44  lovely (with Khalid)         0.351


In striking contrast to the numerous high-energy dance tracks, only **one song** in the `Top 50` has a danceability score below `0.4`:

* **lovely** by Billie Eilish with Khalid: `0.351`

This outlier demonstrates that while most chart-topping tracks favor danceable rhythms, there's still room for more subdued, emotionally-driven songs to achieve mainstream success.

Let's explore tracks with loudness scores above `-5 dB` to understand how audio dynamics influence popularity in the Spotify Top 50.

In [24]:
loudness_tracks_above = df[df["loudness"] > -5]
print("Tracks with loudness score above -5: \n")
print(loudness_tracks_above[["track_name", "loudness"]])

Tracks with loudness score above -5: 

                                       track_name  loudness
4                                 Don't Start Now    -4.521
6                                Watermelon Sugar    -4.209
10                                           Tusa    -3.280
12                                        Circles    -3.497
16                                  Before You Go    -4.858
17                                         Say So    -4.577
21                                      Adore You    -3.675
23                         Mood (feat. iann dior)    -3.558
31                                 Break My Heart    -3.434
32                                       Dynamite    -4.410
33               Supalonely (feat. Gus Dapperton)    -4.746
35                Rain On Me (with Ariana Grande)    -3.764
37  Sunflower - Spider-Man: Into the Spider-Verse    -4.368
38                                          Hawái    -3.454
39                                        Ride It    -4.258
4

Of the `Top 50` tracks, **19 songs** exceed `-5 dB` in loudness, showcasing a trend toward powerful, energetic production. Here are some notable highlights:

**Loudest Tracks**

* **Tusa**: `-3.280 dB`
* **goosebumps**: `-3.370 dB`
* **Hawái**: `-3.454 dB`
* **Break My Heart**: `-3.434 dB`
* **Circles**: `-3.497 dB`

Pop hits like **Don't Start Now** (`-4.521 dB`) and **Watermelon Sugar** (`-4.209 dB`) also feature prominent loudness levels, suggesting that robust production values contribute to mainstream appeal.

Let's examine songs with loudness scores below `-8 dB` to understand how quieter tracks perform in the Spotify Top 50.

In [25]:
loudness_tracks_below = df[df["loudness"] < -8]
print("Tracks with loudness score below -8: \n")
print(loudness_tracks_below[["track_name", "loudness"]])

Tracks with loudness score below -8: 

                                        track_name  loudness
7                 death bed (coffee for your head)    -8.765
8                                          Falling    -8.756
15                                    Toosie Slide    -8.820
20                Savage Love (Laxed - Siren Beat)    -8.520
24                             everything i wanted   -14.454
26                                         bad guy   -10.965
36                             HIGHEST IN THE ROOM    -8.764
44                            lovely (with Khalid)   -10.109
47  If the World Was Ending - feat. Julia Michaels   -10.086


Among the `Top 50` tracks, **9 songs** feature loudness levels below `-8 dB`, showing that quieter productions can still achieve significant popularity. Here are the notable patterns:

**Quietest Tracks Breakdown:**

* **everything i wanted** by Billie Eilish: `-14.454 dB` (Most subdued)
* **bad guy** by Billie Eilish: `-10.965 dB`
* **lovely** by Billie Eilish with Khalid: `-10.109 dB`
* **If the World Was Ending**: `-10.086 dB`

Interestingly, **Billie Eilish** dominates the quieter end of the spectrum with three tracks, suggesting her signature style successfully challenges the trend toward louder productions.

Let's examine which song holds the record for the longest runtime in the Spotify Top 50, offering insights into optimal song length for streaming success.

In [30]:
max_duration = df["duration_ms"].max()
longest_track = df[df["duration_ms"] == max_duration]

print("Tracks with the longest duration: ")
print(longest_track[["track_name", "duration_ms"]])

Tracks with the longest duration: 
    track_name  duration_ms
49  SICKO MODE       312820


**SICKO MODE** by Travis Scott stands alone at the top with a duration of **5:13** (`312,820 ms`), distinguishing itself as the longest track in the `Top 50`.

This hit demonstrates that while streaming platforms typically favor shorter songs, an engaging extended track can still achieve significant popularity.

Let's examine which song holds the record for the shortest runtime in the Spotify Top 50, providing insights into the lower bounds of song duration in today's streaming landscape.

In [31]:
min_duration = df["duration_ms"].min()
shortest_track = df[df["duration_ms"] == min_duration]
print("Tracks with the shortest duration: \n")
print(shortest_track[["track_name", "duration_ms"]])

Tracks with the shortest duration: 

                track_name  duration_ms
23  Mood (feat. iann dior)       140526


**Duration Comparison**

* **Longest**: **SICKO MODE** - `5:13` (`312,820 ms`)
* **Shortest**: **Mood** feat. iann dior - `2:21` (`140,526 ms`)

This nearly three-minute difference between the longest and shortest tracks demonstrates the flexible range of song lengths that can achieve streaming success. While **SICKO MODE** proves longer, complex tracks can thrive, **Mood**'s shorter runtime aligns with the current trend toward concise, streaming-optimized songs.

Let's explore the most prevalent music genres in the Spotify Top 50 to understand current listening preferences and industry trends.

In [32]:
genre_counts = df["genre"].value_counts()
most_popular_genre = genre_counts.idxmax()
occurrences = genre_counts.loc[most_popular_genre]

print(
    f"The most popular genre and its occurences:\nGenre: {most_popular_genre} \nOccurences: {occurrences}"
)

The most popular genre and its occurences:
Genre: Pop 
Occurences: 14


**Dominant Genre**

**Pop** leads the `Top 50` with **14 tracks**, representing `28%` of the chart. This dominance reflects pop music's continued ability to capture mainstream attention and cross-cultural appeal in the streaming era.

**Genre Distribution**

Pop's strong presence suggests that while streaming has democratized music distribution, traditionally popular genres still maintain significant influence over listening habits.

Let's explore the unique or underrepresented genres in the Top 50, highlighting the musical diversity that exists even at the highest levels of streaming success.

In [33]:
genre_counts = df["genre"].value_counts()
genres_with_one_song = genre_counts[genre_counts == 1]

print(f"Genres with only one song in the Spotify Top 50: \n\n{genres_with_one_song}")

Genres with only one song in the Spotify Top 50: 

Chamber pop                           1
Pop rap                               1
Disco-pop                             1
Dreampop/Hip-Hop/R&B                  1
Nu-disco                              1
Pop/Soft Rock                         1
R&B/Hip-Hop alternative               1
Hip-Hop/Trap                          1
Alternative/reggaeton/experimental    1
Dance-pop/Disco                       1
Name: genre, dtype: int64


**Genre Diversity in Spotify Top 50**

**Single-Entry Genres**
While **Pop** dominates with `14 tracks`, several distinctive genres appear with just one track each:
* Chamber pop
* Pop rap
* Disco-pop
* Dreampop/Hip-Hop/R&B
* Nu-disco
* Pop/Soft Rock
* R&B/Hip-Hop alternative
* Hip-Hop/Trap
* Alternative/reggaeton/experimental
* Dance-pop/Disco

**Total Genre Diversity**
The `Top 50` features **16 unique genres**, demonstrating significant musical diversity:
* Mainstream categories:
  * R&B/Soul
  * Alternative/Indie
  * Hip-Hop/Rap
  * Dance/Electronic
  * Pop
* Fusion genres:
  * Electro-pop
  * Dance-pop/Disco
  * Dreampop/Hip-Hop/R&B
* Niche styles:
  * Chamber pop
  * Nu-disco
  * Alternative/reggaeton/experimental

This rich variety suggests that while pop maintains its stronghold, Spotify's Top 50 embraces diverse musical expressions, including genre-blending and experimental styles.

Let's analyze the total number of unique genres represented in the dataset to better understand the overall musical diversity.

In [27]:
unique_genres = df["genre"].unique()

print("Unique genres represented in the Spotify Top50 list:")
for genre in unique_genres:
    print(f" - {genre}")
print(f"\nTotal number of unique genres: {len(unique_genres)}")

16


The analysis confirms a total of **16 unique genres**, highlighting the rich musical diversity within Spotify's most popular tracks.

Next let's examine which audio features show strong positive correlations, helping us understand how different musical elements interact in popular tracks.

In [35]:
correlation_matrix = df.corr()
correlation_pairs = correlation_matrix.unstack().sort_values()

strong_positive_correlation_pairs = correlation_pairs[
    (correlation_pairs > 0.7) & (correlation_pairs < 1.0)
]

if not strong_positive_correlation_pairs.empty:
    print("Strongly positively correlated features:")
    print(strong_positive_correlation_pairs)
else:
    print("There is no negatively correlated pairs")

Strongly positively correlated features:
loudness  energy      0.79164
energy    loudness    0.79164
dtype: float64


A strong positive correlation exists between:

* **Loudness** and **Energy**: `0.792`

This significant correlation suggests that louder tracks in the `Top 50` tend to have higher energy levels, which aligns with music production principles where dynamic, energetic songs often feature stronger volume levels. The correlation of `0.792` indicates that these features typically increase or decrease together.

Further, let's examine which audio features display strong negative correlations, revealing which musical elements tend to move in opposite directions.

In [36]:
correlation_pairs = correlation_matrix.unstack().sort_values()

strong_negative_correlation_pairs = correlation_pairs[
    (correlation_pairs < -0.7) & (correlation_pairs > -1.0)
]

if not strong_negative_correlation_pairs.empty:
    print("Strong negatively correlated features:")
    print(strong_negative_correlation_pairs)
else:
    print("There are no negatively correlated pairs")

There are no negatively correlated pairs


Interestingly, our analysis reveals **no strong negative correlations** between audio features in the `Top 50` tracks. This suggests that most audio characteristics either work in harmony or operate independently rather than in opposition to each other.

Next, let's examine which audio features show little to no correlation, helping us understand which musical elements operate independently in successful tracks.

In [37]:
correlation_matrix = df.corr()

no_correlation_pairs = correlation_matrix.unstack().sort_values().abs() == 0

no_correlation_features = no_correlation_pairs[no_correlation_pairs].index

if not no_correlation_features.empty:
    print("Features with no correlation:")
    print(no_correlation_features)
else:
    print("There are no features that do no correlate.")

There are no features that do no correlate.


The data shows that all features have some degree of correlation with each other, suggesting that successful tracks maintain interconnected audio characteristics rather than completely independent elements.

Moving forward, let's explore how danceability scores vary across different genres, providing insights into which musical styles are most likely to get listeners moving.

In [38]:
genre_danceability_stats = df.groupby("genre")["danceability"].describe()

for genre, stats in genre_danceability_stats.iterrows():
    print(f"{genre} Genre Danceability Stats:")
    print(stats)
    print()

 Electro-pop Genre Danceability Stats:
count    2.000000
mean     0.789500
std      0.125158
min      0.701000
25%      0.745250
50%      0.789500
75%      0.833750
max      0.878000
Name:  Electro-pop, dtype: float64

Alternative/Indie Genre Danceability Stats:
count    4.000000
mean     0.661750
std      0.211107
min      0.459000
25%      0.490500
50%      0.663000
75%      0.834250
max      0.862000
Name: Alternative/Indie, dtype: float64

Alternative/reggaeton/experimental Genre Danceability Stats:
count    1.000
mean     0.607
std        NaN
min      0.607
25%      0.607
50%      0.607
75%      0.607
max      0.607
Name: Alternative/reggaeton/experimental, dtype: float64

Chamber pop Genre Danceability Stats:
count    1.000
mean     0.351
std        NaN
min      0.351
25%      0.351
50%      0.351
75%      0.351
max      0.351
Name: Chamber pop, dtype: float64

Dance-pop/Disco Genre Danceability Stats:
count    1.00
mean     0.73
std       NaN
min      0.73
25%      0.73
50%     

**Highest Danceability**
* **Hip-Hop/Trap**: `0.935` (highest)
* **Electro-pop**: Average of `0.790` (range: `0.701` to `0.878`)
* **Hip-Hop/Rap**: Average of `0.766` (range: `0.598` to `0.896`)
* **Dance/Electronic**: Average of `0.755` (range: `0.647` to `0.880`)

**Moderate Danceability**

* **Nu-disco**: `0.793`
* **R&B/Hip-Hop alternative**: `0.784`
* **Dance-pop/Disco**: `0.730`
* **Pop**: Average of `0.678` (range: `0.464` to `0.806`)
* **Alternative/Indie**: Average of `0.662` (range: `0.459` to `0.862`)

**Lower Danceability**
* **Chamber pop**: `0.351` (lowest)

**Key Insights**
* Hip-Hop genres consistently show high danceability scores
* Electronic-influenced genres maintain strong dance appeal
* Traditional pop shows more variation in danceability
* Chamber pop stands as a clear outlier with the lowest score

Next, let's examine how loudness levels vary across different genres, revealing production trends in popular music.

In [11]:
genre_loudness_stats = df.groupby("genre")["loudness"].describe()

for genre, stats in genre_loudness_stats.iterrows():
    print(f"{genre} Genre Loudness Stats:")
    print(stats)
    print()

 Electro-pop Genre Loudness Stats:
count     2.000000
mean     -8.898500
std       2.922472
min     -10.965000
25%      -9.931750
50%      -8.898500
75%      -7.865250
max      -6.832000
Name:  Electro-pop, dtype: float64

Alternative/Indie Genre Loudness Stats:
count    4.000000
mean    -5.421000
std      0.774502
min     -6.401000
25%     -5.859500
50%     -5.268500
75%     -4.830000
max     -4.746000
Name: Alternative/Indie, dtype: float64

Alternative/reggaeton/experimental Genre Loudness Stats:
count    1.000
mean    -4.074
std        NaN
min     -4.074
25%     -4.074
50%     -4.074
75%     -4.074
max     -4.074
Name: Alternative/reggaeton/experimental, dtype: float64

Chamber pop Genre Loudness Stats:
count     1.000
mean    -10.109
std         NaN
min     -10.109
25%     -10.109
50%     -10.109
75%     -10.109
max     -10.109
Name: Chamber pop, dtype: float64

Dance-pop/Disco Genre Loudness Stats:
count    1.000
mean    -3.434
std        NaN
min     -3.434
25%     -3.434
50%    

**Genre Loudness Analysis**

**Loudest Genres** (closer to 0 dB)
* **Pop rap**: `-3.558`
* **Dance-pop/Disco**: `-3.434`
* **Pop/Soft Rock**: `-3.497`
* **Pop**: Range from `-14.454` to `-3.280` (widest range)

**Moderate Loudness**
* **Alternative/Indie**: Average of `-5.421`
* **Dance/Electronic**: Average of `-5.338`
* **R&B/Soul**: Average of `-5.256`
* **Hip-Hop/Rap**: Average of `-6.918`

**Quietest Genres**
* **Chamber pop**: `-10.109`
* **Electro-pop**: Average of `-8.899`
* **R&B/Hip-Hop alternative**: `-8.756`

**Key Insights**
* Dance and pop-oriented genres tend to be louder
* More introspective genres like Chamber pop are notably quieter
* Pop shows the most dynamic range in loudness levels
* Electronic-influenced genres show surprising variation in loudness

Lastly, let's examine how acoustic qualities vary across genres, revealing the balance between electronic and organic sound elements.

In [42]:
genre_acousticness_stats = df.groupby("genre")["acousticness"].describe()

for genre, stats in genre_acousticness_stats.iterrows():
    print(f"{genre} Genre Acousticness Stats:")
    print(stats)
    print()

 Electro-pop Genre Acousticness Stats:
count    2.00000
mean     0.25550
std      0.10253
min      0.18300
25%      0.21925
50%      0.25550
75%      0.29175
max      0.32800
Name:  Electro-pop, dtype: float64

Alternative/Indie Genre Acousticness Stats:
count    4.000000
mean     0.583500
std      0.204086
min      0.291000
25%      0.525750
50%      0.646000
75%      0.703750
max      0.751000
Name: Alternative/Indie, dtype: float64

Alternative/reggaeton/experimental Genre Acousticness Stats:
count    1.0000
mean     0.0103
std         NaN
min      0.0103
25%      0.0103
50%      0.0103
75%      0.0103
max      0.0103
Name: Alternative/reggaeton/experimental, dtype: float64

Chamber pop Genre Acousticness Stats:
count    1.000
mean     0.934
std        NaN
min      0.934
25%      0.934
50%      0.934
75%      0.934
max      0.934
Name: Chamber pop, dtype: float64

Dance-pop/Disco Genre Acousticness Stats:
count    1.000
mean     0.167
std        NaN
min      0.167
25%      0.167
50%

**Most Acoustic** (closer to 1.0)
* **Chamber pop**: `0.934` (highest)
* **Alternative/Indie**: Average of `0.584` (range: `0.291` to `0.751`)
* **Dreampop/Hip-Hop/R&B**: `0.533`
* **Pop**: Average of `0.324` (wide range: `0.021` to `0.902`)

**Moderately Acoustic**
* **Electro-pop**: Average of `0.256`
* **Pop rap**: `0.221`
* **Pop/Soft Rock**: `0.192`
* **Hip-Hop/Rap**: Average of `0.189` (range: `0.005` to `0.731`)

**Least Acoustic** (closer to 0.0)
* **Alternative/reggaeton/experimental**: `0.010`
* **Disco-pop**: `0.011`
* **Nu-disco**: `0.012`
* **Dance/Electronic**: Average of `0.099`
* **Hip-Hop/Trap**: `0.019`

**Key Insights**
* Traditional genres maintain higher acoustic elements
* Electronic and dance genres show consistently low acousticness
* Pop shows remarkable versatility, spanning from highly electronic to mostly acoustic
* Hybrid genres (like Dreampop/Hip-Hop/R&B) balance acoustic and electronic elements