# Analysis of Polish Radio Stations


---

## 1. Project Overview

This report presents an analysis of music tracks played across 13 Polish radio stations between January 1, 2024 and June 16, 2025. The analysis covers track and artist popularity, library uniqueness, the speed at which premieres appear, and the frequency of airplay during specific times of the day.

Data sources include the odsluchane.eu portal and the "musicbrainzngs" library, which provides access to the MusicBrainz database from which track release dates were obtained.

**Key figures:**
- Records analyzed: 1 835 671
- Unique artists: 52 877
- Unique tracks: 86 816

**Key questions:**
- Who are the most played artists and songs?
- Which tracks are unique to specific stations?
- How quickly do new releases appear across different radios?
- What are the patterns of music airplay at different times of day?
- Which radio stations are most likely to premiere songs that later become major hits?
- How do new tracks travel from their premiere to wider popularity across the radio landscape?

---


## 2. Data Loading & Environment Setup

The analysis was performed in Python using the following libraries:


In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from glob import glob
import plotly.express as px
import plotly.graph_objs as go
from datetime import datetime
import os
import re

In [None]:
# Main dataset load
# Path to the folder with Excel files
folder_path = r"data\output_fast"


In [8]:
# Get all .xlsx files in the folder
all_files = glob(os.path.join(folder_path, "*.xlsx"))

print(f"Found {len(all_files)} Excel files.")



Found 533 Excel files.


In [9]:
# Read and concatenate all files into a single DataFrame
df_all = pd.concat([pd.read_excel(file) for file in all_files], ignore_index=True)

print(f"Combined DataFrame shape: {df_all.shape}")
df_all.head()

Combined DataFrame shape: (1838972, 6)


Unnamed: 0,Godzina,Nazwa utworu,Radio,Data_odtworzenia,Godzina_od,Godzina_do
0,00:00,U2 - New Year's Day,Antyradio,2024-01-01,0,2
1,00:05,Jet - Are You Gonna Be My Girl,Antyradio,2024-01-01,0,2
2,00:09,Green Day - Basket Case,Antyradio,2024-01-01,0,2
3,00:12,Metallica - Whiskey In The Jar,Antyradio,2024-01-01,0,2
4,00:17,T.love - Warszawa,Antyradio,2024-01-01,0,2


## 3. Data Cleaning & Preparation

In [10]:
# Rename columns to English
df_all = df_all.rename(columns={
    'Godzina': 'time',
    'Nazwa utworu': 'track',
    'Radio': 'station',
    'Data_odtworzenia': 'date',
    'Godzina_od': 'hour_from',
    'Godzina_do': 'hour_to'
})

In [11]:
df_all.head()

Unnamed: 0,time,track,station,date,hour_from,hour_to
0,00:00,U2 - New Year's Day,Antyradio,2024-01-01,0,2
1,00:05,Jet - Are You Gonna Be My Girl,Antyradio,2024-01-01,0,2
2,00:09,Green Day - Basket Case,Antyradio,2024-01-01,0,2
3,00:12,Metallica - Whiskey In The Jar,Antyradio,2024-01-01,0,2
4,00:17,T.love - Warszawa,Antyradio,2024-01-01,0,2


In [12]:
# Check for missing values
missing_summary = df_all.isnull().sum()
print("Missing values per column:\n", missing_summary)

Missing values per column:
 time         0
track        0
station      0
date         0
hour_from    0
hour_to      0
dtype: int64


In [14]:
# Check for duplicates
duplicates = df_all.duplicated()
num_duplicates = duplicates.sum()
print(f"Number of duplicate rows: {num_duplicates}")

Number of duplicate rows: 1196


In [104]:
# Show column data types
print(df_all.dtypes)

time         object
track        object
station      object
date         object
hour_from     int64
hour_to       int64
dtype: object


In [None]:
# Convert date column to datetime
df_all['date'] = pd.to_datetime(df_all['date'], errors='coerce')

# Convert time column to time (format HH:MM)
df_all['time'] = pd.to_datetime(df_all['time'], format='%H:%M', errors='coerce').dt.time

print(df_all.dtypes)
df_all.head()


time                 object
track                object
station              object
date         datetime64[ns]
hour_from             int64
hour_to               int64
dtype: object


Unnamed: 0,time,track,station,date,hour_from,hour_to
0,00:00:00,U2 - New Year's Day,Antyradio,2024-01-01,0,2
1,00:05:00,Jet - Are You Gonna Be My Girl,Antyradio,2024-01-01,0,2
2,00:09:00,Green Day - Basket Case,Antyradio,2024-01-01,0,2
3,00:12:00,Metallica - Whiskey In The Jar,Antyradio,2024-01-01,0,2
4,00:17:00,T.love - Warszawa,Antyradio,2024-01-01,0,2


In [106]:
# Split 'track' column into 'artist' and 'title'
df_all[['artist', 'title']] = df_all['track'].str.extract(r'^(.*?)\s*-\s*(.*)$')
df_all

Unnamed: 0,time,track,station,date,hour_from,hour_to,artist,title
0,00:00:00,U2 - New Year's Day,Antyradio,2024-01-01,0,2,U2,New Year's Day
1,00:05:00,Jet - Are You Gonna Be My Girl,Antyradio,2024-01-01,0,2,Jet,Are You Gonna Be My Girl
2,00:09:00,Green Day - Basket Case,Antyradio,2024-01-01,0,2,Green Day,Basket Case
3,00:12:00,Metallica - Whiskey In The Jar,Antyradio,2024-01-01,0,2,Metallica,Whiskey In The Jar
4,00:17:00,T.love - Warszawa,Antyradio,2024-01-01,0,2,T.love,Warszawa
...,...,...,...,...,...,...,...,...
1838967,23:42:00,Bon Jovi - Always,Złote Przeboje,2025-06-16,22,0,Bon Jovi,Always
1838968,23:47:00,K.a.s.a. & Los Amigos - Maczo,Złote Przeboje,2025-06-16,22,0,K.a.s.a. & Los Amigos,Maczo
1838969,23:50:00,Modern Talking - Geronimo's Cadillac,Złote Przeboje,2025-06-16,22,0,Modern Talking,Geronimo's Cadillac
1838970,23:54:00,Budka Suflera - Ratujmy Co Sie Da,Złote Przeboje,2025-06-16,22,0,Budka Suflera,Ratujmy Co Sie Da


In [107]:
# Check the top artists
print(df_all['artist'].value_counts().head(20))

# there is missing artist name in 5128 records

artist
Dua Lipa             8770
Sanah                8373
Daria Zawiałow       7381
Dawid Podsiadło      7255
Oskar Cyms           6944
Sylwia Grzeszczak    6869
Lady Gaga            6316
Lady Pank            5789
The Kolors           5726
                     5128
Teddy Swims          5041
Ed Sheeran           4895
Ava Max              4606
Margaret             4522
Maanam               4520
Rihanna              4428
Queen                4379
Perfect              4334
Wilki                4163
Katy Perry           4117
Name: count, dtype: int64


In [108]:
# Preview some rows with missing artist
df_all[df_all['artist'].str.strip() == ''][['track']].head(10)

Unnamed: 0,track
52687,- C-Bool - Magic Symphon
592493,- Sofya - Back To Black
748468,---__--___ More Eaze Proxy.exe & Seth Graham -...
748469,---__--___ Koeosaeme More Eaze & Seth Graham -...
1029719,- ̟̞̝̜̙̘̗̖҉̵̴̨̧̢̡̼̻̺ - 01 ʅ͡͡͡͡͡͡͡͡͡͡͡(̸̢̛̼̞̭͋...
1322843,"- _String Quartet No 7 In D, D.94"
1322844,"- 999D588A_String Quartet No 7 In F, Op 59 No ..."
1322845,"- _Clarinet Quintet In B Minor, Op 115"
1322846,"- _Symphony In C Minor, _Symphonie Funebre_"
1322847,"- _Der Pilgrim, D.794"


In [None]:
# Checking a row with a strange value in the track column (index 1029719)
df_all.loc[1029719]


time                                                  22:31:00
track        - ̟̞̝̜̙̘̗̖҉̵̴̨̧̢̡̼̻̺ - 01 ʅ͡͡͡͡͡͡͡͡͡͡͡(̸̢̛̼̞̭͋...
station                                              Chillizet
date                                       2024-10-29 00:00:00
hour_from                                                   22
hour_to                                                      0
artist                                                        
title        ̟̞̝̜̙̘̗̖҉̵̴̨̧̢̡̼̻̺ - 01 ʅ͡͡͡͡͡͡͡͡͡͡͡(̸̢̛̼̞̭͋Ι)...
Name: 1029719, dtype: object

In [110]:
# Removing row (index 1029719)
df_all = df_all.drop(index=1029719)

In [111]:
# Cleaning prefixes in track names
def clean_prefix(name):
    """
    Removes any prefix in the track name up to the first letter or digit,
    including Polish diacritics.
    """
    if pd.isna(name):
        return name
    match = re.search(r'[A-Za-z0-9ĄąĆćĘęŁłŃńÓóŚśŹźŻż]', name)
    return name[match.start():] if match else name

# Clean prefixes in the 'track' column
df_all['track'] = df_all['track'].apply(clean_prefix)

In [112]:
# Checking again rows with missing artist
df_all[df_all['artist'].str.strip() == ''][['track']].head(10)

Unnamed: 0,track
52687,C-Bool - Magic Symphon
592493,Sofya - Back To Black
748468,More Eaze Proxy.exe & Seth Graham - Rock Botto...
748469,Koeosaeme More Eaze & Seth Graham - To Suppres...
1322843,"String Quartet No 7 In D, D.94"
1322844,"999D588A_String Quartet No 7 In F, Op 59 No 1 ..."
1322845,"Clarinet Quintet In B Minor, Op 115"
1322846,"Symphony In C Minor, _Symphonie Funebre_"
1322847,"Der Pilgrim, D.794"
1322848,4 Sea Interludes From _Peter Grimes_ Op 33A


In [113]:
# Removing columns artist, title, hour_from	hour_to
df_all = df_all.drop(columns=['artist', 'title', 'hour_from','hour_to'])
df_all
                     

Unnamed: 0,time,track,station,date
0,00:00:00,U2 - New Year's Day,Antyradio,2024-01-01
1,00:05:00,Jet - Are You Gonna Be My Girl,Antyradio,2024-01-01
2,00:09:00,Green Day - Basket Case,Antyradio,2024-01-01
3,00:12:00,Metallica - Whiskey In The Jar,Antyradio,2024-01-01
4,00:17:00,T.love - Warszawa,Antyradio,2024-01-01
...,...,...,...,...
1838967,23:42:00,Bon Jovi - Always,Złote Przeboje,2025-06-16
1838968,23:47:00,K.a.s.a. & Los Amigos - Maczo,Złote Przeboje,2025-06-16
1838969,23:50:00,Modern Talking - Geronimo's Cadillac,Złote Przeboje,2025-06-16
1838970,23:54:00,Budka Suflera - Ratujmy Co Sie Da,Złote Przeboje,2025-06-16


In [114]:
# Re-extract artist and title columns from the cleaned track
df_all[['artist', 'title']] = df_all['track'].str.extract(r'^(.*?)\s*-\s*(.*)$')
df_all

Unnamed: 0,time,track,station,date,artist,title
0,00:00:00,U2 - New Year's Day,Antyradio,2024-01-01,U2,New Year's Day
1,00:05:00,Jet - Are You Gonna Be My Girl,Antyradio,2024-01-01,Jet,Are You Gonna Be My Girl
2,00:09:00,Green Day - Basket Case,Antyradio,2024-01-01,Green Day,Basket Case
3,00:12:00,Metallica - Whiskey In The Jar,Antyradio,2024-01-01,Metallica,Whiskey In The Jar
4,00:17:00,T.love - Warszawa,Antyradio,2024-01-01,T.love,Warszawa
...,...,...,...,...,...,...
1838967,23:42:00,Bon Jovi - Always,Złote Przeboje,2025-06-16,Bon Jovi,Always
1838968,23:47:00,K.a.s.a. & Los Amigos - Maczo,Złote Przeboje,2025-06-16,K.a.s.a. & Los Amigos,Maczo
1838969,23:50:00,Modern Talking - Geronimo's Cadillac,Złote Przeboje,2025-06-16,Modern Talking,Geronimo's Cadillac
1838970,23:54:00,Budka Suflera - Ratujmy Co Sie Da,Złote Przeboje,2025-06-16,Budka Suflera,Ratujmy Co Sie Da


In [115]:
# Preview rows with missing artist
df_all[df_all['artist'].str.strip() == ''][['track']].head(10)

Unnamed: 0,track
1334078,-


In [116]:
# Checking row index 1334078
df_all.loc[1334078]

time                  22:08:00
track                        -
station                Jedynka
date       2025-01-26 00:00:00
artist                        
title                         
Name: 1334078, dtype: object

In [117]:
# Removing row (index 1334078)
df_all = df_all.drop(index=1334078)

In [118]:
# Sample rows where the song title contains a dash
sample_with_dash = df_all[df_all['title'].str.contains("-", na=False)][['track', 'artist', 'title']].sample(50)
sample_with_dash

Unnamed: 0,track,artist,title
1695651,Ignacy Jan Paderewski - Koncert Fortepianowy A...,Ignacy Jan Paderewski,Koncert Fortepianowy A-Moll (3)
1111937,Jo Yeong-Wook - Sympathy For Lady Vegeance,Jo Yeong,Wook - Sympathy For Lady Vegeance
84963,Jiri Antonin Benda - 8 Symfonia D-Dur (1),Jiri Antonin Benda,8 Symfonia D-Dur (1)
1833960,Clean Bandit / Sean Paul / Anne-Marie - Rockabye,Clean Bandit / Sean Paul / Anne,Marie - Rockabye
1727374,Clean Bandit & Sean Paul & Anne-Marie - Rockabye,Clean Bandit & Sean Paul & Anne,Marie - Rockabye
229915,Vangelis - Blade Runner - Love Theme,Vangelis,Blade Runner - Love Theme
476181,Suk Praga Op 26 Bbc Symphony Orchestra Belohla...,Suk Praga Op 26 Bbc Symphony Orchestra Belohlavek,Praga Op. 26 Cz 2 - Allegro - Piu Mosso
11349,H-Blockx - Move,H,Blockx - Move
943696,Muzio Clementi - Symfonia Nr 3 G-Dur (4),Muzio Clementi,Symfonia Nr 3 G-Dur (4)
523537,Franz Schubert - Oktet F-Dur (3),Franz Schubert,Oktet F-Dur (3)


In [119]:
# Changing columns order
desired_order = ['date', 'time', 'station', 'track', 'artist', 'title']
df_all = df_all[desired_order]

df_all.head()


Unnamed: 0,date,time,station,track,artist,title
0,2024-01-01,00:00:00,Antyradio,U2 - New Year's Day,U2,New Year's Day
1,2024-01-01,00:05:00,Antyradio,Jet - Are You Gonna Be My Girl,Jet,Are You Gonna Be My Girl
2,2024-01-01,00:09:00,Antyradio,Green Day - Basket Case,Green Day,Basket Case
3,2024-01-01,00:12:00,Antyradio,Metallica - Whiskey In The Jar,Metallica,Whiskey In The Jar
4,2024-01-01,00:17:00,Antyradio,T.love - Warszawa,T.love,Warszawa


In [120]:
# Sample rows where the song title contains a dash
sample_with_dash = df_all[df_all['title'].str.contains("-", na=False)][['track', 'artist', 'title']].sample(50)
sample_with_dash


Unnamed: 0,track,artist,title
1108901,James Horner - Becoming One Of The People - Be...,James Horner,Becoming One Of The People - Becoming One With...
422990,Disturbed - The Sound Of Silence - Cyril Remix,Disturbed,The Sound Of Silence - Cyril Remix
1020255,Pitbull & Ne-Yo & Afrojack - Give Me Everything,Pitbull & Ne,Yo & Afrojack - Give Me Everything
255966,Sophie Ellis-Bextor / Pnau - Murder On The Dan...,Sophie Ellis,Bextor / Pnau - Murder On The Dancefloor (Pnau...
884585,Strachy Na Lachy - Btw - Mamy Tylko Siebie,Strachy Na Lachy,Btw - Mamy Tylko Siebie
1040895,Francois Couperin - Vii Koncert G-Moll (4),Francois Couperin,Vii Koncert G-Moll (4)
1288626,Adam Czerwiński / Jan Sebastian Bach - “Aria” ...,Adam Czerwiński / Jan Sebastian Bach,“Aria” - From Goldberg Variations Bwv 988
9045,Antonio Vivaldi - Cztery Pory Roku - Wiosna (3),Antonio Vivaldi,Cztery Pory Roku - Wiosna (3)
702355,"Higdon Soliloquy Walters, Rose, Zehngut, Zakan...","Higdon Soliloquy Walters, Rose, Zehngut, Zakan...",06 - Jennifer Higdon Soliloquy
1235499,Franz Joseph Haydn - Symfonia Nr 94 G-Dur (4),Franz Joseph Haydn,Symfonia Nr 94 G-Dur (4)


> **Improved title and artist extraction:**  
> We use a custom function that checks if the track starts with a known artist name containing a dash (from a whitelist of common cases).  
> For these, we keep the full artist name; otherwise, we split on the first dash as usual.  
> This minimizes incorrect splits and improves data quality.


In [145]:
artist_exceptions = [
    "Jay-Z", "C-Bool", "Anne-Marie", "Seong-Jin Cho", "Sophie Ellis-Bextor", "In - Grid",
    "Camille Saint-Saens", "Del-M", "Jean-Francois Dandrieu", "(G)I-Dle",
    "Heitor Villa-Lobos", "Czarno-Czarni", "T-Pain", "In-Grid", "Jo Yeong-Wook", "Di-Rect", "Remady & Manu - L & J - Son", "Jon Cutler Feat.E-Man", "N-Joi", "K-Lone", "The B-52's",
    "Harry Gregson-Williams", "Georg-Philipp Telemann", "A-Ha", "Ne-Yo", "Remady / Manu-L","Wu - Tang Clan", "Marc-Antoine Charpentier", "Melcer-Szczawiński", "Blink-182", "Lin-Manuel Miranda", "Paul Leonard-Morgan", "Dick Dale & His Del-Tones", "Alan Walker / Jvke / (G)I-Dle / Yuqi", "Billen Ted, Kah-Lo", "Jean-Baptiste Lully", "Sheku Kanneh-Mason", "Vikingur Olafsson / Jean-Philippe Rameau", "Sofi Tukker & Kah-Lo", "Georg-Philipp Telemann", "H-Blockx", "Sheku Kanneh-Mason / Frank Bridge", "B-Qll", "Wzgórze Ya-Pa 3", "Di-Rect",
    "Remady & Manu-L", "O - Zone", "Anne-Sophie Versnaeyen", "Eagle-Eye Cherry", "K-Maro","D-Bomb","Anne-Sophie Mutter", "Riton & Kah-Lo", "Alex C & Y-Ass", "Jan-Rapowanie & Szpaku", "One-T & Cool-T", "Jan-Rapowanie", "Nikołaj Rimski-Korsakow", "Antek-Smykiewicz", "Olivia Newton-John & John Travolta", "Beyonce & Jay - Z", "Jay - Z"

]

def extract_artist_title(track):
    if pd.isna(track):
        return pd.Series(['', track])
    last_dash = track.rfind(' - ')
    if last_dash == -1:
        return pd.Series(['', track])
    artist_part = track[:last_dash]
    title_part = track[last_dash + 3:]
    # Check if any artist exception is present
    for exc in artist_exceptions:
        if exc.lower() in artist_part.lower():
            return pd.Series([artist_part, title_part])
    # Split by first dash
    match = re.match(r'^(.*?)\s*-\s*(.*)$', track)
    if match:
        return pd.Series([match.group(1), match.group(2)])
    else:
        return pd.Series(['', track])

df_all[['artist_custom', 'title_custom']] = df_all['track'].apply(extract_artist_title)


In [146]:
# Checking again sample rows where the song title contains a dash (using the improved columns)
sample_with_dash_custom = df_all[df_all['title_custom'].str.contains("-", na=False)][['track', 'artist_custom', 'title_custom']].sample(50, random_state=42)
display(sample_with_dash_custom)




Unnamed: 0,track,artist_custom,title_custom
1108136,Kaczmarek Where Is Mr Barrie? - Kaczmarek - Wh...,Kaczmarek Where Is Mr Barrie?,Kaczmarek - Where Is Mr Barrie?
712052,Chopin Sonata H-Moll Op. 58 Nobauer - Sonata H...,Chopin Sonata H,Moll Op. 58 Nobauer - Sonata H-Moll Op. 58 Cz Iv
512716,Bach Koncert Skrzypcowy A-Moll Bwv 1041 Cz. I....,Bach Koncert Skrzypcowy A,"Moll Bwv 1041 Cz. I. Mutter, Mutter's Virtuosi..."
1373254,Clementi Symfonia Nr 1 C-Dur Cz Iii Menuet Moz...,Clementi Symfonia Nr 1 C,Dur Cz Iii Menuet Mozarteumorchester Salzburg ...
300669,Disturbed - The Sound Of Silence - Cyril Remix,Disturbed,The Sound Of Silence - Cyril Remix
1515866,Daniel Pemberton - Spider-Woman (Gwen Stacy),Daniel Pemberton,Spider-Woman (Gwen Stacy)
808948,Franz Schubert - Impromptu Ges-Dur Op.90 Nr 3,Franz Schubert,Impromptu Ges-Dur Op.90 Nr 3
267689,Elsner Vesperae D-Dur Op 89 Zespol Muzyki Dawn...,Elsner Vesperae D,Dur Op 89 Zespol Muzyki Dawnej Kapela Jasnogor...
1710785,"Crosby, Stills & Nash - Suite- Judy Blue Eyes ...","Crosby, Stills & Nash","Suite- Judy Blue Eyes (Live At Fillmore East, ..."
482772,De Staat - De Staat - Get It Together (Vintici...,De Staat,De Staat - Get It Together (Vinticious Version)


In [147]:
# Find 100 most common song titles (title_custom) that contain a dash
top_titles_with_dash = (
    df_all[df_all['title_custom'].str.contains("-", na=False)]
    .groupby('title_custom')
    .size()
    .sort_values(ascending=False)
    .head(100)
)

output_path = "titles_with_dash.txt"
with open(output_path, "w", encoding="utf-8") as f:
    f.write("Most frequent song titles containing a dash:\n\n")
    for title in top_titles_with_dash.index:
        sample_row = df_all[df_all['title_custom'] == title].iloc[0]
        f.write(f"Title: {title}\n")
        f.write(f"Track: {sample_row['track']}\n")
        f.write(f"Extracted Artist: {sample_row['artist_custom']}\n")
        f.write('-'*60 + '\n')

print(f"Exported to {output_path}")


Exported to titles_with_dash.txt


In [148]:
# Display records where artist is empty or just spaces
blank_artist = df_all['artist_custom'].isna() | (df_all['artist_custom'].str.strip() == '')

blank_artist_records = df_all[blank_artist]
blank_artist_records

Unnamed: 0,date,time,station,track,artist,title,artist_custom,title_custom


In [None]:
# Saving to file
blank_artist_records[['track', 'artist_custom', 'title_custom']].to_csv('blank_artist_records.csv', index=False)


Saved blank artist records to blank_artist_records.csv


In [142]:
# Removing rows where artist or title is blank (NaN or only whitespace)
df_all = df_all[
    df_all['artist_custom'].notna() & 
    (df_all['artist_custom'].str.strip() != '') &
    df_all['title_custom'].notna() & 
    (df_all['title_custom'].str.strip() != '')
]



In [150]:
# Drop old columns
df_all = df_all.drop(columns=['artist', 'title'])

In [151]:
# Rename 'artist_custom' and 'title_custom' to 'artist' and 'title'
df_all = df_all.rename(columns={'artist_custom': 'artist', 'title_custom': 'title'})

In [None]:
# Check data frame
df_all.head()

Unnamed: 0,date,time,station,track,artist,title
0,2024-01-01,00:00:00,Antyradio,U2 - New Year's Day,U2,New Year's Day
1,2024-01-01,00:05:00,Antyradio,Jet - Are You Gonna Be My Girl,Jet,Are You Gonna Be My Girl
2,2024-01-01,00:09:00,Antyradio,Green Day - Basket Case,Green Day,Basket Case
3,2024-01-01,00:12:00,Antyradio,Metallica - Whiskey In The Jar,Metallica,Whiskey In The Jar
4,2024-01-01,00:17:00,Antyradio,T.love - Warszawa,T.love,Warszawa


## 4. Data Analysis


In [155]:
# Number of unique radio stations
print(f"Number of unique radio stations: {df_all['station'].nunique():,}")

Number of unique radio stations: 13


In [157]:
# Alphabetical list of radio stations
sorted(df_all['station'].unique())

['Antyradio',
 'Chillizet',
 'Czwórka',
 'Dwójka',
 'Eska',
 'Jedynka',
 'RMF Classic',
 'RMF FM',
 'RMF MAXXX',
 'Trójka',
 'VOX FM',
 'ZET',
 'Złote Przeboje']

In [162]:
df_all['station'].value_counts()

station
Eska              198862
RMF MAXXX         198006
VOX FM            179647
RMF Classic       169277
Złote Przeboje    161937
ZET               156065
Antyradio         155279
RMF FM            154736
Chillizet         153501
Czwórka           133584
Trójka             78842
Jedynka            60199
Dwójka             35736
Name: count, dtype: int64

In [327]:
## Data statistics
print(f"Number of records: {len(df_all):,}")
print(f"Number of unique artists: {df_all['artist'].nunique():,}")
print(f"Number of unique titles: {df_all['title'].nunique():,}")
print(f"Date range: {df_all['date'].min()} → {df_all['date'].max()}")

Number of records: 1,835,671
Number of unique artists: 52,877
Number of unique titles: 86,816
Date range: 2024-01-01 00:00:00 → 2025-06-16 00:00:00


In [183]:
## Top 5 most popular artists
df_all['artist'].value_counts().head()

artist
Dua Lipa           8770
Sanah              8373
Daria Zawiałow     7381
Dawid Podsiadło    7255
Oskar Cyms         6944
Name: count, dtype: int64

In [182]:
## Top 5 most popular artists
df_all['track'].value_counts().head()

track
Artemas - I Like The Way You Kiss Me      3344
Dua Lipa - Houdini                        3032
The Kolors - Un Ragazzo Una Ragazza       2847
Chappell Roan - Good Luck, Babe!          2666
Daria Zawiałow - Złamane Serce Jest Ok    2571
Name: count, dtype: int64

In [181]:
## Top 5 most popular songs
df_all['title'].value_counts().head()

## The data looks different, e.g. Houdini - it's possible that several songs have the same title

title
Houdini                       5512
I Like The Way You Kiss Me    3344
Róż                           3153
I Like It                     2902
Un Ragazzo Una Ragazza        2847
Name: count, dtype: int64

In [180]:
# Top 10 (artist and title)
df_all.groupby(['artist', 'title']).size().sort_values(ascending=False).head(10)

# OK :) The title "Houdini" refers to two different songs (Dua Lipa, Eminem)


artist             title                     
Artemas            I Like The Way You Kiss Me    3344
Dua Lipa           Houdini                       3032
The Kolors         Un Ragazzo Una Ragazza        2847
Chappell Roan      Good Luck, Babe!              2666
Daria Zawiałow     Złamane Serce Jest Ok         2571
Sylwia Grzeszczak  Och I Ach                     2532
Kaeyra             Sour                          2524
Eminem             Houdini                       2462
Hozier             Too Sweet                     2352
The Kolors         Italodisco                    2286
dtype: int64

In [184]:
top20_tracks = df_all['track'].value_counts().head(20)
print(top20_tracks)


track
Artemas - I Like The Way You Kiss Me        3344
Dua Lipa - Houdini                          3032
The Kolors - Un Ragazzo Una Ragazza         2847
Chappell Roan - Good Luck, Babe!            2666
Daria Zawiałow - Złamane Serce Jest Ok      2571
Sylwia Grzeszczak - Och I Ach               2532
Kaeyra - Sour                               2524
Eminem - Houdini                            2462
Hozier - Too Sweet                          2352
The Kolors - Italodisco                     2286
Dua Lipa - Training Season                  2205
Mark Ambor - Belong Together                2181
Damiano David - Born With A Broken Heart    2147
Sabrina Carpenter - Espresso                2141
Teddy Swims - Bad Dreams                    2033
Lady Gaga - Abracadabra                     2017
Wiktor Dyduła - Tam Słońce Gdzie My         2005
Dawid Podsiadło - Pięknie Płyniesz          1986
Sanah - Było, Minęło                        1952
Cyril - Stumblin" In                        1919
Name: count, d

## 5. Visualizations

In [185]:
top20 = (
    df_all.groupby(['artist', 'title'])
    .size()
    .sort_values(ascending=False)
    .head(10)
    .reset_index(name='Number of plays')
)

import plotly.express as px
px.bar(
    top20,
    x='Number of plays',
    y=top20['artist'] + ' - ' + top20['title'] + ' ',
    orientation='h',
    title='Top 10 most played songs',
    height=800
)


In [None]:
# Top 20 most frequently played tracks

# Create the output folder
output_folder = "report_outputs"
os.makedirs(output_folder, exist_ok=True)

# Prepare data for top tracks
top_tracks = (
    df_all.groupby(['artist', 'title'])
    .size()
    .sort_values(ascending=False)
    .head(20)
    .reset_index(name='Play count')
)

# Bar plot
fig = px.bar(
    top_tracks,
    x='Play count',
    y=top_tracks['artist'] + ' - ' + top_tracks['title'],
    orientation='h',
    title='Top 20 most frequently played tracks',
    height=800
)

# Save plot as HTML file
fig.write_html(os.path.join(output_folder, "top20_tracks.html"))


In [None]:
# Top 20 most frequently played artists

# Top 20 artists
top_artists_series = df_all['artist'].value_counts().head(20)
top_artists = top_artists_series.reset_index()
top_artists.columns = ['artist', 'Play count']

# Plot
fig = px.bar(
    top_artists,
    x='Play count',
    y='artist',
    orientation='h',
    title='Top 20 most frequently played artists',
    height=700
)

# Save plot as HTML file
fig.write_html(os.path.join(output_folder, "top20_artists.html"))


In [None]:
# Monthly plays for top 5 tracks per station - scatter plot

# Output folders
main_folder = "report_outputs"
subfolder = os.path.join(main_folder, "monthly_per_station")
os.makedirs(subfolder, exist_ok=True)

# Add a month column
df_all['month'] = (
    pd.to_datetime(df_all['date'])
    .dt.to_period('M')
    .dt.to_timestamp()
)

# Find the top 5 tracks (artist + title) for each station
top5 = (
    df_all
    .groupby(['station', 'artist', 'title'])
    .size()
    .reset_index(name='count')
    .sort_values(['station', 'count'], ascending=[True, False])
    .groupby('station')
    .head(5)
)

# Filter the original data to only top 5 tracks per station
df_top5 = df_all.merge(
    top5[['station', 'artist', 'title']],
    on=['station', 'artist', 'title'],
    how='inner'
)

# Generate separate HTML reports for each radio station
for station, grp in df_top5.groupby('station'):
    # Prepare monthly data for the top 5 tracks
    monthly = (
        grp
        .groupby(['month', 'artist', 'title'])
        .size()
        .reset_index(name='plays')
    )
    monthly['track_label'] = monthly['artist'] + ' - ' + monthly['title']

    # Create a scatter plot for monthly plays
    fig = px.scatter(
        monthly,
        x='month',
        y='track_label',
        size='plays',
        title=f'Monthly plays for top 5 tracks - {station}',
        height=600
    )
    fig.update_yaxes(title_text='')

    # Save to HTML file
    filename = f"top5_monthly_{station.replace(' ', '_')}.html"
    fig.write_html(os.path.join(subfolder, filename))


In [200]:
# Average number of plays per station and hour

# Extract the hour from the time column
df_all['hour'] = pd.to_datetime(df_all['time'], format='%H:%M:%S', errors='coerce').dt.hour

# Count plays per station, hour, and date
plays_per_day = (
    df_all
    .groupby(['station', 'hour', 'date'])
    .size()
    .reset_index(name='plays')
)

# Calculate the average number of plays per station and hour
avg_plays = (
    plays_per_day
    .groupby(['station', 'hour'])['plays']
    .mean()
    .reset_index(name='avg_plays')
)

avg_plays


Unnamed: 0,station,hour,avg_plays
0,Antyradio,0,15.028384
1,Antyradio,1,14.227571
2,Antyradio,2,15.480151
3,Antyradio,3,15.859023
4,Antyradio,4,15.854991
...,...,...,...
307,Złote Przeboje,19,12.118421
308,Złote Przeboje,20,9.654135
309,Złote Przeboje,21,10.500000
310,Złote Przeboje,22,14.939509


In [197]:
plays_per_day

Unnamed: 0,station,hour,date,plays
0,Antyradio,0,2024-01-01,12
1,Antyradio,0,2024-01-02,12
2,Antyradio,0,2024-01-03,12
3,Antyradio,0,2024-01-04,12
4,Antyradio,0,2024-01-05,13
...,...,...,...,...
154594,Złote Przeboje,23,2025-06-12,15
154595,Złote Przeboje,23,2025-06-13,16
154596,Złote Przeboje,23,2025-06-14,16
154597,Złote Przeboje,23,2025-06-15,15


In [198]:
plays_per_day_sorted = plays_per_day.sort_values(by='plays', ascending=False)
plays_per_day_sorted


Unnamed: 0,station,hour,date,plays
46868,Eska,2,2024-10-27,38
118192,VOX FM,2,2024-10-27,38
1214,Antyradio,2,2024-10-27,33
130528,ZET,2,2024-10-27,33
143188,Złote Przeboje,2,2024-10-27,32
...,...,...,...,...
31655,Czwórka,13,2024-08-01,1
31691,Czwórka,13,2024-09-06,1
31773,Czwórka,13,2024-11-29,1
31580,Czwórka,13,2024-05-15,1


In [201]:
avg_plays_sorted = avg_plays.sort_values(by='avg_plays', ascending=False)
avg_plays_sorted

Unnamed: 0,station,hour,avg_plays
101,Eska,5,22.552830
197,RMF MAXXX,5,20.778612
119,Eska,23,20.537736
100,Eska,4,19.983051
118,Eska,22,19.664151
...,...,...,...
85,Dwójka,13,2.384880
91,Dwójka,19,2.038328
124,Jedynka,4,1.640000
142,Jedynka,22,1.630695


In [202]:
no_music = avg_plays[avg_plays['avg_plays'] <= 5]
no_music.to_html("report_outputs/hours_no_music_avg.html", index=False)
no_music


Unnamed: 0,station,hour,avg_plays
72,Dwójka,0,4.755656
73,Dwójka,1,4.079295
74,Dwójka,2,2.667883
75,Dwójka,3,3.544503
76,Dwójka,4,2.733564
77,Dwójka,5,4.455385
83,Dwójka,11,4.572383
84,Dwójka,12,3.746556
85,Dwójka,13,2.38488
86,Dwójka,14,3.539359


In [204]:
# For each station, select the 5 hours with the highest average number of plays
most_musical = (
    avg_plays
    .sort_values(['station', 'avg_plays'], ascending=[True, False])
    .groupby('station')
    .head(5)
)

most_musical



Unnamed: 0,station,hour,avg_plays
5,Antyradio,5,16.157895
3,Antyradio,3,15.859023
4,Antyradio,4,15.854991
2,Antyradio,2,15.480151
0,Antyradio,0,15.028384
...,...,...,...
292,Złote Przeboje,4,16.045113
291,Złote Przeboje,3,15.919325
290,Złote Przeboje,2,15.706215
293,Złote Przeboje,5,15.341463


In [206]:
# Save as HTML table
most_musical.to_html("report_outputs/hours_most_musical_avg.html", index=False)

In [None]:
# Average number of plays per Station and Hour

# Pivot: station × hour with avg_plays as values
pivot_avg = avg_plays.pivot(index='hour', columns='station', values='avg_plays')

# Plot heatmap
fig = px.imshow(
    pivot_avg.T,
    x=pivot_avg.index,
    y=pivot_avg.columns,
    labels={'x': 'hour', 'y': 'station', 'color': 'average number of plays'},
    title='Average number of plays per Station and Hour',
    aspect='auto'
)

# Save to HTML
fig.write_html(os.path.join(output_folder, "heatmap_avg_plays_per_station.html"))


In [209]:
# # Average number of plays per Station and Hour - differnet colour
# # Pivot: station × hour with avg_plays as values
pivot_avg = avg_plays.pivot(index='hour', columns='station', values='avg_plays')

# A single-color scale (blues)
import plotly.express as px
fig = px.imshow(
    pivot_avg.T,
    x=pivot_avg.index,
    y=pivot_avg.columns,
    labels={'x': 'hour', 'y': 'station', 'color': 'average number of plays'},
    title='# Average number of plays per Station and Hour',
    aspect='auto',
    color_continuous_scale='Blues'
)

# Save to HTML
fig.write_html(os.path.join(output_folder, "heatmap_avg_plays_per_station_color.html"))


In [None]:
# Song premiere analysis

# 1. Prepare a datetime column by combining date and time
df_all['datetime'] = pd.to_datetime(
    df_all['date'].dt.strftime('%Y-%m-%d') + ' ' + df_all['time'].astype(str),
    format='%Y-%m-%d %H:%M:%S'
)

# 2. Find the first play (premiere) of each song (by artist and title)
first_plays = (
    df_all
    .sort_values('datetime')
    .groupby(['artist', 'title'], as_index=False)
    .first()[['artist', 'title', 'station', 'datetime']]
)

# 3. Filter premieres that happened in 2025
premieres_2025 = first_plays[first_plays['datetime'].dt.year == 2025]

# 4. Count the number of premieres per station
premieres_per_station = (
    premieres_2025['station']
    .value_counts()
    .reset_index(name='number_of_premieres')
    .rename(columns={'index': 'station'})
)

premieres_per_station


In [211]:
# Display premieres only for a specific station, e.g. Eska
eska_premieres = premieres_2025[premieres_2025['station'] == 'Eska']

# Print a table with artist, title, and datetime
print(eska_premieres[['artist', 'title', 'datetime']].to_string(index=False))


                                                                                      artist                               title            datetime
                                                              Adam Levine & Gym Class Heroes                       Stereo Hearts 2025-06-16 03:09:00
                                                                              Adma & Filipek                          Easy Peasy 2025-03-17 05:27:00
                                                                   Akon & Kardinal Offishall                           Dangerous 2025-06-12 22:41:00
                                                                             Alan Walker & K                        391 - Ignite 2025-06-16 02:59:00
                                                                   Alan Walker & Sofia Reyes                       Guaro Con Ron 2025-06-12 14:28:00
                                                                        Alan Walker & Sorana              

In [213]:
# Save to HTML file 
premieres_per_station.to_html(
    "report_outputs/premieres_per_station_2025.html",
    index=False
)

In [215]:
# Save the cleaned dataset as df_music for further analysis
df_all.to_csv("report_outputs/df_music.csv", index=False)




In [216]:
# Load the music data and premieres
df_music = pd.read_csv(
    "report_outputs/df_music.csv",
    parse_dates=['date']
)

df_prem = pd.read_csv(
    "report_outputs/premieres_complete_2025.csv",
    parse_dates=['release_date']
)

In [218]:
# Create a full datetime for each play by combining date and time
df_music['play_dt'] = pd.to_datetime(df_music['date'].astype(str) + ' ' + df_music['time'].astype(str))

In [220]:
display(df_music)
display(df_prem)

Unnamed: 0,date,time,station,track,artist,title,month,hour,datetime,play_dt
0,2024-01-01,00:00:00,Antyradio,U2 - New Year's Day,U2,New Year's Day,2024-01-01,0,2024-01-01 00:00:00,2024-01-01 00:00:00
1,2024-01-01,00:05:00,Antyradio,Jet - Are You Gonna Be My Girl,Jet,Are You Gonna Be My Girl,2024-01-01,0,2024-01-01 00:05:00,2024-01-01 00:05:00
2,2024-01-01,00:09:00,Antyradio,Green Day - Basket Case,Green Day,Basket Case,2024-01-01,0,2024-01-01 00:09:00,2024-01-01 00:09:00
3,2024-01-01,00:12:00,Antyradio,Metallica - Whiskey In The Jar,Metallica,Whiskey In The Jar,2024-01-01,0,2024-01-01 00:12:00,2024-01-01 00:12:00
4,2024-01-01,00:17:00,Antyradio,T.love - Warszawa,T.love,Warszawa,2024-01-01,0,2024-01-01 00:17:00,2024-01-01 00:17:00
...,...,...,...,...,...,...,...,...,...,...
1835666,2025-06-16,23:42:00,Złote Przeboje,Bon Jovi - Always,Bon Jovi,Always,2025-06-01,23,2025-06-16 23:42:00,2025-06-16 23:42:00
1835667,2025-06-16,23:47:00,Złote Przeboje,K.a.s.a. & Los Amigos - Maczo,K.a.s.a. & Los Amigos,Maczo,2025-06-01,23,2025-06-16 23:47:00,2025-06-16 23:47:00
1835668,2025-06-16,23:50:00,Złote Przeboje,Modern Talking - Geronimo's Cadillac,Modern Talking,Geronimo's Cadillac,2025-06-01,23,2025-06-16 23:50:00,2025-06-16 23:50:00
1835669,2025-06-16,23:54:00,Złote Przeboje,Budka Suflera - Ratujmy Co Sie Da,Budka Suflera,Ratujmy Co Sie Da,2025-06-01,23,2025-06-16 23:54:00,2025-06-16 23:54:00


Unnamed: 0,artists,title,release_date
0,Rockie Fresh,Driving 88,2025-02-19
1,Zoë Blade,Fallout Zone,2025-06-21
2,Harry Last,Eternal,2025-07-22
3,Intermezzo,Iron Giant,2025-04-20
4,Thundermother,Dirty & Divine,2025-02-07
...,...,...,...
52733,Roderick Newport,Pulling Us Away,2025-07-11
52734,Black Helium,The Animals Are Coming,2025-06-20
52735,Alejandro Zandes,Silhouette,2025-05-30
52736,akaJazy,Gaia & Physis Recordings,2025-07-04


In [221]:
# Merge music data with premiere data
df_merged = pd.merge(
    df_music,
    df_prem,
    left_on=['artist', 'title'],
    right_on=['artists', 'title'],
    how='inner'
)


In [222]:
df_merged

Unnamed: 0,date,time,station,track,artist,title,month,hour,datetime,play_dt,artists,release_date
0,2024-01-03,22:47:00,Złote Przeboje,Tina Turner - Private Dancer,Tina Turner,Private Dancer,2024-01-01,22,2024-01-03 22:47:00,2024-01-03 22:47:00,Tina Turner,2025-03-21
1,2024-01-03,22:47:00,Złote Przeboje,Tina Turner - Private Dancer,Tina Turner,Private Dancer,2024-01-01,22,2024-01-03 22:47:00,2024-01-03 22:47:00,Tina Turner,2025-03-21
2,2024-01-03,22:47:00,Złote Przeboje,Tina Turner - Private Dancer,Tina Turner,Private Dancer,2024-01-01,22,2024-01-03 22:47:00,2024-01-03 22:47:00,Tina Turner,2025-03-21
3,2024-01-07,16:48:00,Antyradio,Black Sabbath - Paranoid,Black Sabbath,Paranoid,2024-01-01,16,2024-01-07 16:48:00,2024-01-07 16:48:00,Black Sabbath,2025-03-14
4,2024-01-08,15:41:00,Złote Przeboje,Tina Turner - Private Dancer,Tina Turner,Private Dancer,2024-01-01,15,2024-01-08 15:41:00,2024-01-08 15:41:00,Tina Turner,2025-03-21
...,...,...,...,...,...,...,...,...,...,...,...,...
12495,2025-06-16,19:30:00,ZET,Alex Warren - Ordinary,Alex Warren,Ordinary,2025-06-01,19,2025-06-16 19:30:00,2025-06-16 19:30:00,Alex Warren,2025-02-07
12496,2025-06-16,19:40:00,ZET,Ed Sheeran - Azizam,Ed Sheeran,Azizam,2025-06-01,19,2025-06-16 19:40:00,2025-06-16 19:40:00,Ed Sheeran,2025-04-04
12497,2025-06-16,19:40:00,ZET,Ed Sheeran - Azizam,Ed Sheeran,Azizam,2025-06-01,19,2025-06-16 19:40:00,2025-06-16 19:40:00,Ed Sheeran,2025-04-04
12498,2025-06-16,23:36:00,ZET,Ed Sheeran - Azizam,Ed Sheeran,Azizam,2025-06-01,23,2025-06-16 23:36:00,2025-06-16 23:36:00,Ed Sheeran,2025-04-04


In [224]:
# Find the first play of each premiere per station
first_plays = (
    df_merged
    .groupby(
        ['station', 'artists', 'title', 'release_date'],
        as_index=False
    )
    .agg(first_play=('play_dt', 'min'))
)

first_plays

Unnamed: 0,station,artists,title,release_date,first_play
0,Antyradio,Alice Cooper,Black Mamba,2025-04-23,2025-05-03 18:49:00
1,Antyradio,Alice Cooper,Wild Ones,2025-06-04,2025-06-03 21:38:00
2,Antyradio,Avantasia,The Witch,2025-02-13,2025-02-19 21:35:00
3,Antyradio,Avatar,Captain Goat,2025-05-28,2025-05-31 19:54:00
4,Antyradio,Biffy Clyro,A Little Love,2025-06-11,2025-06-14 18:10:00
...,...,...,...,...,...
420,ZET,Sabrina Carpenter,Taste,2025-01-10,2024-09-25 19:31:00
421,ZET,Sam Feldt,Time After Time,2025-05-23,2025-06-04 19:49:00
422,ZET,Tina Turner,Private Dancer,2025-03-21,2025-04-26 16:36:00
423,ZET,Tom Grennan,Shadowboxing,2025-01-29,2025-02-05 19:53:00


In [225]:
# Keep only records where the first play did not happen before the release date
valid_plays = first_plays[
    first_plays['first_play'] >= first_plays['release_date']
].copy()

# Info: how many records were discarded
print(f"Discarded {len(first_plays) - len(valid_plays)} records with negative delay (play before release).")


Discarded 59 records with negative delay (play before release).


In [None]:
# Data type check
print(valid_plays.dtypes[['first_play','release_date']])


first_play      datetime64[ns]
release_date            object
dtype: object


In [228]:
# Convert release_date from object to datetime
valid_plays['release_date'] = pd.to_datetime(
    valid_plays['release_date']
)


In [229]:
# Check for unparsed (NaT) release_date values
print("Number of non-parsable release_date entries:", valid_plays['release_date'].isna().sum())


Number of non-parsable release_date entries: 1


In [230]:
# Show records where release_date could not be parsed
invalid_release_dates = valid_plays[valid_plays['release_date'].isna()][['artists', 'title', 'release_date']]
print(invalid_release_dates)


    artists     title release_date
370   Swans  Birthing          NaT


In [231]:
# Remove rows where release_date could not be parsed (is NaT)
valid_plays = valid_plays.dropna(subset=['release_date'])


In [None]:
# Show records where release_date could not be parsed - check again
invalid_release_dates = valid_plays[valid_plays['release_date'].isna()][['artists', 'title', 'release_date']]
print(invalid_release_dates)

Empty DataFrame
Columns: [artists, title, release_date]
Index: []


In [234]:
# Calculate latency: the delay between release date and first play
valid_plays['latency'] = valid_plays['first_play'] - valid_plays['release_date']
valid_plays


Unnamed: 0,station,artists,title,release_date,first_play,latency
0,Antyradio,Alice Cooper,Black Mamba,2025-04-23,2025-05-03 18:49:00,10 days 18:49:00
2,Antyradio,Avantasia,The Witch,2025-02-13,2025-02-19 21:35:00,6 days 21:35:00
3,Antyradio,Avatar,Captain Goat,2025-05-28,2025-05-31 19:54:00,3 days 19:54:00
4,Antyradio,Biffy Clyro,A Little Love,2025-06-11,2025-06-14 18:10:00,3 days 18:10:00
5,Antyradio,Billy Idol,Still Dancing,2025-02-26,2025-03-04 12:51:00,6 days 12:51:00
...,...,...,...,...,...,...
418,ZET,Ed Sheeran,Azizam,2025-04-04,2025-04-04 19:30:00,0 days 19:30:00
419,ZET,Ed Sheeran,Old Phone,2025-05-01,2025-05-05 19:32:00,4 days 19:32:00
421,ZET,Sam Feldt,Time After Time,2025-05-23,2025-06-04 19:49:00,12 days 19:49:00
422,ZET,Tina Turner,Private Dancer,2025-03-21,2025-04-26 16:36:00,36 days 16:36:00


In [236]:
# Average delay (latency) between release and first play for each radio station
radio_latency = (
    valid_plays
    .groupby('station')['latency']
    .mean()
    .reset_index()
    .sort_values('latency')
)
radio_latency

Unnamed: 0,station,latency
9,VOX FM,7 days 17:49:30
4,Eska,8 days 21:58:24
7,RMF MAXXX,11 days 08:30:27.692307692
0,Antyradio,15 days 00:30:10.588235294
5,Jedynka,15 days 01:59:21
6,RMF FM,15 days 10:08:46.666666666
8,Trójka,18 days 02:52:18.832116788
10,ZET,20 days 19:18:10.909090909
2,Czwórka,20 days 21:49:41.428571428
1,Chillizet,22 days 10:09:19.459459459


In [237]:
# Add a column with latency in hours
radio_latency['latency_hours'] = (
    radio_latency['latency']
    .dt.total_seconds()
    .div(3600)
    .round(2)
)

# Save the latency table to HTML
radio_latency.to_html(
    os.path.join(output_folder, "radio_latency_table.html"),
    index=False
)


In [238]:
# Create a bar chart for average latency per station
fig = px.bar(
    radio_latency,
    x='latency_hours',
    y='station',
    orientation='h',
    title='Average premiere D=delay per Station (hours)',
    labels={'latency_hours': 'Delay (h)', 'station': 'Radio Station'},
    height=600
)

# Save the chart as an HTML file
fig.write_html(
    os.path.join(output_folder, "radio_latency_chart.html")
)


In [None]:
# Average premiere delay per Station (hours)

#  Color dictionary for radio stations
station_colors = {
    'Antyradio': '#e74c3c',
    'Eska': '#3498db',
    'RMF FM': '#f1c40f',
    'Radio Zet': '#9b59b6',
    # Add all your stations here...
}

# Create a bar chart with station-specific colors
fig = px.bar(
    radio_latency,
    x='latency_hours',
    y='station',
    color='station',
    color_discrete_map=station_colors,
    orientation='h',
    title='Average premiere delay per Station (hours)',
    labels={'latency_hours': 'Delay (h)', 'station': 'Radio Station'},
    height=600
)

# Save the chart as an HTML file
fig.write_html(
    os.path.join(output_folder, "radio_latency_chart_new.html")
)


In [240]:
# Top 10 fastest premieres on Radio

# Sort valid_plays by latency and select the top 10 fastest premieres
fastest = valid_plays.sort_values('latency').head(10).copy()

# Add latency in hours
fastest['latency_hours'] = fastest['latency'].dt.total_seconds().div(3600).round(2)

# Save the table to HTML
fastest[['artists', 'title', 'station', 'release_date', 'first_play', 'latency_hours']].to_html(
    os.path.join(output_folder, "fastest_premieres.html"),
    index=False
)

# Create a horizontal bar chart
fig = px.bar(
    fastest,
    x='latency_hours',
    y=fastest['artists'] + ' - ' + fastest['title'],
    orientation='h',
    title='Top 10 fastest premieres on Radio',
    labels={'latency_hours': 'Delay (h)', 'y': 'Track'},
    height=600
)

# Save the chart as HTML
fig.write_html(os.path.join(output_folder, "fastest_premieres_chart.html"))


In [245]:
# Get the 20 tracks with the shortest latency between release and first radio play
fastest = valid_plays.sort_values('latency').head(20).copy()
display(fastest)

# Fastest 50
fastest50 = valid_plays.sort_values('latency').head(50).copy()
display(fastest50)



Unnamed: 0,station,artists,title,release_date,first_play,latency
182,Eska,Ed Sheeran,Azizam,2025-04-04,2025-04-04 07:09:00,0 days 07:09:00
319,Trójka,Lola Young,One Thing,2025-05-16,2025-05-16 07:15:00,0 days 07:15:00
274,Trójka,Chappell Roan,The Giver,2025-03-14,2025-03-14 10:07:00,0 days 10:07:00
299,Trójka,Greentea Peng,Stones Throw,2025-01-31,2025-01-31 10:19:00,0 days 10:19:00
337,Trójka,Olivia Dean,Nice To Each Other,2025-05-30,2025-05-30 10:20:00,0 days 10:20:00
186,Eska,Lola Young,One Thing,2025-05-16,2025-05-16 10:32:00,0 days 10:32:00
328,Trójka,Mei Semones,Animaru,2025-05-02,2025-05-02 10:37:00,0 days 10:37:00
377,Trójka,The Black Keys,Babygirl,2025-03-21,2025-03-21 10:44:00,0 days 10:44:00
404,Trójka,Yung Lean,Forever Yung,2025-02-21,2025-02-21 10:48:00,0 days 10:48:00
296,Trójka,Gigi Perez,Chemistry,2025-02-28,2025-02-28 11:08:00,0 days 11:08:00


Unnamed: 0,station,artists,title,release_date,first_play,latency
182,Eska,Ed Sheeran,Azizam,2025-04-04,2025-04-04 07:09:00,0 days 07:09:00
319,Trójka,Lola Young,One Thing,2025-05-16,2025-05-16 07:15:00,0 days 07:15:00
274,Trójka,Chappell Roan,The Giver,2025-03-14,2025-03-14 10:07:00,0 days 10:07:00
299,Trójka,Greentea Peng,Stones Throw,2025-01-31,2025-01-31 10:19:00,0 days 10:19:00
337,Trójka,Olivia Dean,Nice To Each Other,2025-05-30,2025-05-30 10:20:00,0 days 10:20:00
186,Eska,Lola Young,One Thing,2025-05-16,2025-05-16 10:32:00,0 days 10:32:00
328,Trójka,Mei Semones,Animaru,2025-05-02,2025-05-02 10:37:00,0 days 10:37:00
377,Trójka,The Black Keys,Babygirl,2025-03-21,2025-03-21 10:44:00,0 days 10:44:00
404,Trójka,Yung Lean,Forever Yung,2025-02-21,2025-02-21 10:48:00,0 days 10:48:00
296,Trójka,Gigi Perez,Chemistry,2025-02-28,2025-02-28 11:08:00,0 days 11:08:00


In [247]:
# Convert the latency to hours, rounded to two decimal places
fastest['latency_hours'] = fastest['latency'].dt.total_seconds().div(3600).round(2)
display(fastest)


Unnamed: 0,station,artists,title,release_date,first_play,latency,latency_hours
182,Eska,Ed Sheeran,Azizam,2025-04-04,2025-04-04 07:09:00,0 days 07:09:00,7.15
319,Trójka,Lola Young,One Thing,2025-05-16,2025-05-16 07:15:00,0 days 07:15:00,7.25
274,Trójka,Chappell Roan,The Giver,2025-03-14,2025-03-14 10:07:00,0 days 10:07:00,10.12
299,Trójka,Greentea Peng,Stones Throw,2025-01-31,2025-01-31 10:19:00,0 days 10:19:00,10.32
337,Trójka,Olivia Dean,Nice To Each Other,2025-05-30,2025-05-30 10:20:00,0 days 10:20:00,10.33
186,Eska,Lola Young,One Thing,2025-05-16,2025-05-16 10:32:00,0 days 10:32:00,10.53
328,Trójka,Mei Semones,Animaru,2025-05-02,2025-05-02 10:37:00,0 days 10:37:00,10.62
377,Trójka,The Black Keys,Babygirl,2025-03-21,2025-03-21 10:44:00,0 days 10:44:00,10.73
404,Trójka,Yung Lean,Forever Yung,2025-02-21,2025-02-21 10:48:00,0 days 10:48:00,10.8
296,Trójka,Gigi Perez,Chemistry,2025-02-28,2025-02-28 11:08:00,0 days 11:08:00,11.13


In [248]:
# Save the table of the 20 fastest premieres to an HTML file
fastest[['artists', 'title', 'station', 'release_date', 'first_play', 'latency_hours']].to_html(
    os.path.join(output_folder, "fastest_20_premieres.html"),
    index=False
)


In [None]:
# Create a horizontal bar chart of the 20 fastest premieres, colored by station
fig = px.bar(
    fastest,
    x='latency_hours',
    y=fastest['artists'] + ' – ' + fastest['title'],
    color='station',
    orientation='h',
    title='Top 20 fastest premieres on Radio',
    labels={'latency_hours': 'Delay (h)', 'y': 'Track'},
    height=800
)
fig.update_layout(yaxis={'categoryorder': 'total ascending'})

# Save the chart to HTML
fig.write_html(os.path.join(output_folder, "fastest_20_premieres_chart.html"))


In [250]:

# Top 20 fastest premieres by radio statio

fastest['latency_hours'] = fastest['latency'].dt.total_seconds().div(3600).round(2)
fastest['track_label'] = fastest['artists'] + ' – ' + fastest['title']

fig = px.bar(
    fastest,
    x='latency_hours',
    y='track_label',
    color='station',
    facet_col='station',
    facet_col_wrap=2,         # number of panels per row
    orientation='h',
    title='Top 20 fastest premieres by radio station',
    labels={'latency_hours': 'Delay (h)', 'track_label': 'Track'}
)
# Remove the legend, since color and facet are the same
fig.update_layout(showlegend=False)

# Save to HTML
fig.write_html(os.path.join(output_folder, "fastest_20_faceted.html"))


In [251]:
# Comparison of release date vs. first radio play

fastest_plot = fastest.copy()
fastest_plot['release_dt'] = pd.to_datetime(fastest_plot['release_date'])
fastest_plot['first_play_dt'] = fastest_plot['first_play']

fig = px.scatter(
    fastest_plot,
    x='release_dt',
    y='first_play_dt',
    color='station',
    symbol='station',
    hover_data=['artists', 'title'],
    title='Comparison of release date vs. first radio play'
)

fig.write_html(os.path.join(output_folder, "fastest_scatter.html"))


In [252]:

# Top 20 fastest premieres per station

# Output folder
output_subfolder = os.path.join(output_folder, "fastest_by_station")
os.makedirs(output_subfolder, exist_ok=True)

for station, grp in fastest.groupby('station'):
    fig = px.bar(
        grp,
        x='latency_hours',
        y=grp['artists'] + ' - ' + grp['title'],
        orientation='h',
        title=f'Top 20 fastest premieres - {station}',
        labels={'latency_hours': 'Delay (h)', 'y': 'Track'},
        height=600
    )
    fname = f"fastest_20_{station.replace(' ', '_')}.html"
    fig.write_html(os.path.join(output_subfolder, fname))


In [254]:
# For the top 20 fastest premieres, add a label column combining artist, title, and station
fastest['label'] = (
    fastest['artists'] + ' - ' +
    fastest['title'] + ' (' + fastest['station'] + ')'
)

fastest


Unnamed: 0,station,artists,title,release_date,first_play,latency,latency_hours,track_label,label
182,Eska,Ed Sheeran,Azizam,2025-04-04,2025-04-04 07:09:00,0 days 07:09:00,7.15,Ed Sheeran – Azizam,Ed Sheeran - Azizam (Eska)
319,Trójka,Lola Young,One Thing,2025-05-16,2025-05-16 07:15:00,0 days 07:15:00,7.25,Lola Young – One Thing,Lola Young - One Thing (Trójka)
274,Trójka,Chappell Roan,The Giver,2025-03-14,2025-03-14 10:07:00,0 days 10:07:00,10.12,Chappell Roan – The Giver,Chappell Roan - The Giver (Trójka)
299,Trójka,Greentea Peng,Stones Throw,2025-01-31,2025-01-31 10:19:00,0 days 10:19:00,10.32,Greentea Peng – Stones Throw,Greentea Peng - Stones Throw (Trójka)
337,Trójka,Olivia Dean,Nice To Each Other,2025-05-30,2025-05-30 10:20:00,0 days 10:20:00,10.33,Olivia Dean – Nice To Each Other,Olivia Dean - Nice To Each Other (Trójka)
186,Eska,Lola Young,One Thing,2025-05-16,2025-05-16 10:32:00,0 days 10:32:00,10.53,Lola Young – One Thing,Lola Young - One Thing (Eska)
328,Trójka,Mei Semones,Animaru,2025-05-02,2025-05-02 10:37:00,0 days 10:37:00,10.62,Mei Semones – Animaru,Mei Semones - Animaru (Trójka)
377,Trójka,The Black Keys,Babygirl,2025-03-21,2025-03-21 10:44:00,0 days 10:44:00,10.73,The Black Keys – Babygirl,The Black Keys - Babygirl (Trójka)
404,Trójka,Yung Lean,Forever Yung,2025-02-21,2025-02-21 10:48:00,0 days 10:48:00,10.8,Yung Lean – Forever Yung,Yung Lean - Forever Yung (Trójka)
296,Trójka,Gigi Perez,Chemistry,2025-02-28,2025-02-28 11:08:00,0 days 11:08:00,11.13,Gigi Perez – Chemistry,Gigi Perez - Chemistry (Trójka)


In [255]:
# List of all unique radio stations
unique_stations = df_music['station'].unique()

unique_stations


array(['Antyradio', 'Chillizet', 'Czwórka', 'Dwójka', 'Eska', 'Jedynka',
       'RMF Classic', 'RMF FM', 'RMF MAXXX', 'Trójka', 'VOX FM', 'ZET',
       'Złote Przeboje'], dtype=object)

In [None]:
# Set colors for each radio station
station_colors = {
    'Jedynka': '#F48F20',         # orange from Jedynka's logo  
    'Dwójka': '#00994C',          # green - Dwójka  
    'Trójka': '#E0007D',          # magenta - Trójka  
    'Czwórka': '#FFD100',         # yellow - Czwórka  
    'ZET': '#C8102E',             # red - ZET  
    'Złote Przeboje': '#DEAB4A',  # gold - Złote Przeboje  
    'Antyradio': '#CE2029',       # red-black - Antyradio  
    'RMF MAXXX': '#E600CC',       # pink - RMF Maxxx  
    'Chillizet': '#CC00FF',       # purple-pink - Chilli Zet  
    'RMF FM': '#FFD700',          # yellow - RMF  
    'VOX FM': '#002663',          # dark blue - Vox FM  
    'Eska': '#FF6600',            # orange - Eska  
    'RMF Classic': '#7F1F30'      # burgundy - RMF Classic  
}


In [257]:
# Create a label directly from artist, title, and station
fastest['label'] = fastest['artists'] + ' - ' + fastest['title'] + ' (' + fastest['station'] + ')'

# Create a list of labels, sorted alphabetically by track name
labels_order = sorted(
    fastest['label'].unique(),
    key=lambda lab: lab.split('(')[0].strip().lower()
)


In [258]:
# Bar chart - the top 20 tracks with the shortest delay from release to first radio play, with colors mapped to stations and labels sorted alphabetically by track

# fastest: df with columns latency_hours, label and station
# labels_order: list of labels sorted alphabetically by track
# station_colors: dictionary mapping station names to colors

fig = px.bar(
    fastest,
    x='latency_hours',
    y='label',
    color='station',
    color_discrete_map=station_colors,
    orientation='h',
    category_orders={'label': labels_order},  # sort labels alphabetically
    title='Top 20 fastest premieres on Radio',
    labels={'latency_hours': 'Delay since release (h)', 'label': 'Track (Station)'},
    height=800
)

fig.write_html(os.path.join(output_folder, "fastest_20_colored_sorted.html"))


In [260]:
# Add latency in hours
fastest50['latency_hours'] = (
    fastest50['latency']
    .dt.total_seconds()
    .div(3600)
    .round(2)
)

# Create a label combining artist, title, and station in parentheses
fastest50['label'] = (
    fastest50['artists'] + ' - ' +
    fastest50['title'] + ' (' + fastest50['station'] + ')'
)

# Labels order
labels_order = sorted(
    fastest50['label'].unique(),
    key=lambda lab: lab.rsplit('(',1)[0].strip().lower()
)


In [261]:
# Bar chart - the 50 tracks with the shortest delay from release to first radio play, with station-based colors and alphabetically sorted labels


fig = px.bar(
    fastest50,
    x='latency_hours',
    y='label',
    color='station',
    color_discrete_map=station_colors,
    orientation='h',
    category_orders={'label': labels_order},
    title='Top 50 fastest premieres on radio',
    labels={'latency_hours': 'Delay (h)', 'label': 'Track (Station)'},
    height=1200
)

fig.write_html(os.path.join(output_folder, "fastest50_premieres_chart.html"))


In [263]:
# Top 50 fastest premieres on Radio
from collections import Counter

# data: fastest50, labels_order, station_colors
fig = px.bar(
    fastest50,
    x='latency_hours',
    y='label',
    color='station',
    color_discrete_map=station_colors,
    orientation='h',
    category_orders={'label': labels_order},
    title='Top 50 fastest premieres on Radio',
    labels={'latency_hours': 'Delay (h)', 'label': 'Track (Station)'},
    height=1200
)

# Helper function to extract the track title from the label
def extract_title(label):
    return label.split('-', 1)[-1].split('(')[0].strip()

# Find duplicate titles among the fastest premieres
titles = [extract_title(lab) for lab in labels_order]
dupe_titles = [t for t, c in Counter(titles).items() if c > 1]

# Colors for highlighting duplicate titles
background_colors = [
    "rgba(50,200,50,0.15)",   # green
    "rgba(50,50,200,0.15)",   # blue
    "rgba(200,200,50,0.15)",  # yellow
    "rgba(200,50,200,0.15)",  # violet
    "rgba(200,50,50,0.15)",   # red
    "rgba(50,200,200,0.15)",  # turquoise
]

# Highlight rows with duplicate titles by drawing a colored rectangle behind them
for i, t in enumerate(dupe_titles):
    idxs = [j for j, lab in enumerate(labels_order) if extract_title(lab) == t]
    if not idxs:
        continue
    y0 = labels_order[min(idxs)]
    # If not the last on the list, y1 is the next label; if last, y1 = y0 (no shade)
    if max(idxs) + 1 < len(labels_order):
        y1 = labels_order[max(idxs) + 1]
    else:
        y1 = y0
    fig.add_shape(
        type="rect",
        xref="paper", yref="y",
        x0=0, x1=1, y0=y0, y1=y1,
        fillcolor=background_colors[i % len(background_colors)],
        line=dict(width=0),
        layer="below"
    )

fig.write_html(os.path.join(output_folder, "fastest50_premieres_chart_highlighted.html"))


### 2025 Top Hits

In [265]:
df_prem

Unnamed: 0,artists,title,release_date
0,Rockie Fresh,Driving 88,2025-02-19
1,Zoë Blade,Fallout Zone,2025-06-21
2,Harry Last,Eternal,2025-07-22
3,Intermezzo,Iron Giant,2025-04-20
4,Thundermother,Dirty & Divine,2025-02-07
...,...,...,...
52733,Roderick Newport,Pulling Us Away,2025-07-11
52734,Black Helium,The Animals Are Coming,2025-06-20
52735,Alejandro Zandes,Silhouette,2025-05-30
52736,akaJazy,Gaia & Physis Recordings,2025-07-04


In [267]:
# Songs with premieres in 2025

## datetime change
df_prem['release_date'] = pd.to_datetime(
    df_prem['release_date'],
    errors='coerce'
)

df_prem2025 = df_prem[df_prem['release_date'].dt.year == 2025].copy()
df_prem2025


Unnamed: 0,artists,title,release_date
0,Rockie Fresh,Driving 88,2025-02-19
1,Zoë Blade,Fallout Zone,2025-06-21
2,Harry Last,Eternal,2025-07-22
3,Intermezzo,Iron Giant,2025-04-20
4,Thundermother,Dirty & Divine,2025-02-07
...,...,...,...
52733,Roderick Newport,Pulling Us Away,2025-07-11
52734,Black Helium,The Animals Are Coming,2025-06-20
52735,Alejandro Zandes,Silhouette,2025-05-30
52736,akaJazy,Gaia & Physis Recordings,2025-07-04


In [268]:
# Total number of plays per song
total_plays = (
    df_music
    .groupby(['artist', 'title'])
    .size()
    .reset_index(name='total_plays')
)


In [270]:
# Sort songs by total number of plays, descending
total_plays_sorted_dsc = total_plays.sort_values(
    by='total_plays',
    ascending=False
)
total_plays_sorted_dsc.head(10)

Unnamed: 0,artist,title,total_plays
5714,Artemas,I Like The Way You Kiss Me,3344
25473,Dua Lipa,Houdini,3032
93510,The Kolors,Un Ragazzo Una Ragazza,2847
16354,Chappell Roan,"Good Luck, Babe!",2666
21450,Daria Zawiałow,Złamane Serce Jest Ok,2571
89375,Sylwia Grzeszczak,Och I Ach,2532
45611,Kaeyra,Sour,2524
27938,Eminem,Houdini,2462
38342,Hozier,Too Sweet,2352
93503,The Kolors,Italodisco,2286


In [272]:
# Top hits from 2025
hits2025 = (
    total_plays
    .merge(
        df_prem2025[['artists', 'title']],
        left_on=['artist', 'title'],
        right_on=['artists', 'title'],
        how='inner'
    )
    .sort_values('total_plays', ascending=False)
    .head(50)
    .reset_index(drop=True)
)
hits2025

Unnamed: 0,artist,title,total_plays,artists
0,Ed Sheeran,Azizam,1288,Ed Sheeran
1,Ed Sheeran,Azizam,1288,Ed Sheeran
2,Alex Warren,Ordinary,1163,Alex Warren
3,Sabrina Carpenter,Taste,948,Sabrina Carpenter
4,Sabrina Carpenter,Taste,948,Sabrina Carpenter
5,Doechii,Anxiety,752,Doechii
6,Benson Boone,Mystical Magical,208,Benson Boone
7,Youth Lagoon,Speed Freak,201,Youth Lagoon
8,Vella,All My Love,155,Vella
9,Vella,All My Love,155,Vella


In [273]:
# Create full play datetime from date and time
df_music['play_dt'] = pd.to_datetime(
    df_music['date'].astype(str) + ' ' + df_music['time'].astype(str),
    errors='coerce'
)


In [275]:
# Filter music plays to only 2025 hits
df_hits = df_music.merge(
    hits2025[['artist', 'title']],
    on=['artist', 'title'],
    how='inner'
)
df_hits

Unnamed: 0,date,time,station,track,artist,title,month,hour,datetime,play_dt
0,2024-01-03,22:47:00,Złote Przeboje,Tina Turner - Private Dancer,Tina Turner,Private Dancer,2024-01-01,22,2024-01-03 22:47:00,2024-01-03 22:47:00
1,2024-01-03,22:47:00,Złote Przeboje,Tina Turner - Private Dancer,Tina Turner,Private Dancer,2024-01-01,22,2024-01-03 22:47:00,2024-01-03 22:47:00
2,2024-01-03,22:47:00,Złote Przeboje,Tina Turner - Private Dancer,Tina Turner,Private Dancer,2024-01-01,22,2024-01-03 22:47:00,2024-01-03 22:47:00
3,2024-01-07,16:48:00,Antyradio,Black Sabbath - Paranoid,Black Sabbath,Paranoid,2024-01-01,16,2024-01-07 16:48:00,2024-01-07 16:48:00
4,2024-01-08,15:41:00,Złote Przeboje,Tina Turner - Private Dancer,Tina Turner,Private Dancer,2024-01-01,15,2024-01-08 15:41:00,2024-01-08 15:41:00
...,...,...,...,...,...,...,...,...,...,...
10325,2025-06-16,19:30:00,ZET,Alex Warren - Ordinary,Alex Warren,Ordinary,2025-06-01,19,2025-06-16 19:30:00,2025-06-16 19:30:00
10326,2025-06-16,19:40:00,ZET,Ed Sheeran - Azizam,Ed Sheeran,Azizam,2025-06-01,19,2025-06-16 19:40:00,2025-06-16 19:40:00
10327,2025-06-16,19:40:00,ZET,Ed Sheeran - Azizam,Ed Sheeran,Azizam,2025-06-01,19,2025-06-16 19:40:00,2025-06-16 19:40:00
10328,2025-06-16,23:36:00,ZET,Ed Sheeran - Azizam,Ed Sheeran,Azizam,2025-06-01,23,2025-06-16 23:36:00,2025-06-16 23:36:00


In [277]:
# Find the first play of each 2025 hit per station
first_by_station = (
    df_hits
    .groupby(['artist', 'title', 'station'], as_index=False)
    .agg(first_play=('play_dt', 'min'))
)
first_by_station


Unnamed: 0,artist,title,station,first_play
0,Abor & Tynna,Baller,Eska,2025-05-23 18:20:00
1,Abor & Tynna,Baller,RMF MAXXX,2025-05-22 19:33:00
2,Alex Warren,Ordinary,Czwórka,2025-05-12 08:24:00
3,Alex Warren,Ordinary,Eska,2025-04-04 06:58:00
4,Alex Warren,Ordinary,Jedynka,2025-03-24 14:51:00
...,...,...,...,...
76,Vella,All My Love,Jedynka,2025-02-10 18:49:00
77,Vella,All My Love,RMF MAXXX,2025-01-25 19:05:00
78,Youth Lagoon,Speed Freak,Chillizet,2025-02-06 13:10:00
79,Youth Lagoon,Speed Freak,Czwórka,2025-01-26 08:28:00


In [278]:
# Find the very first station and time each 2025 hit was played
first_overall = (
    first_by_station
    .sort_values('first_play')
    .groupby(['artist', 'title'], as_index=False)
    .first()
    .rename(columns={'station': 'premiere_station', 'first_play': 'premiere_time'})
)

first_overall


Unnamed: 0,artist,title,premiere_station,premiere_time
0,Abor & Tynna,Baller,RMF MAXXX,2025-05-22 19:33:00
1,Alex Warren,Ordinary,Jedynka,2025-03-24 14:51:00
2,Benson Boone,Mystical Magical,Eska,2025-04-25 08:09:00
3,Billy Idol,Still Dancing,Trójka,2025-03-03 14:18:00
4,Black Sabbath,Paranoid,Antyradio,2024-01-07 16:48:00
5,Britney Spears,Lucky,VOX FM,2024-01-10 12:27:00
6,Bryan Adams,Make Up Your Mind,Trójka,2025-03-13 04:44:00
7,Chappell Roan,The Giver,Trójka,2025-03-14 10:07:00
8,Dina Summer,Girls Gang,Czwórka,2024-10-03 19:33:00
9,Doechii,Anxiety,Eska,2025-03-04 19:14:00


In [279]:
# Count the number of 2025 hits premiered by each station
premiere_counts = (
    first_overall['premiere_station']
    .value_counts()
    .rename_axis('station')
    .reset_index(name='hit_count')
)

premiere_counts


Unnamed: 0,station,hit_count
0,Trójka,6
1,Eska,4
2,Antyradio,4
3,RMF MAXXX,2
4,Czwórka,2
5,Jedynka,1
6,VOX FM,1
7,RMF FM,1
8,Chillizet,1
9,Złote Przeboje,1


In [280]:

# Number of 2025 Hits premiered by each station

# Save the premiere counts table to HTML
premiere_counts.to_html(os.path.join(output_folder, "hit_premieres_2025_per_station.html"), index=False)

# Create a horizontal bar chart of 2025 hit premieres per station
fig = px.bar(
    premiere_counts,
    x='hit_count',
    y='station',
    orientation='h',
    title='Number of 2025 Hits premiered by each station',
    labels={'hit_count': 'Number of Hits', 'station': 'Radio Station'},
    color='station',
    color_discrete_map=station_colors
)

# Save the chart as an HTML file
fig.write_html(os.path.join(output_folder, "hit_premieres_2025_chart.html"))


In [281]:
# What songs are there

# Merge first_by_station with release_date from df_prem2025
df_hits_full = first_by_station.merge(
    df_prem2025[['artists', 'title', 'release_date']],
    left_on=['artist', 'title'],
    right_on=['artists', 'title'],
    how='left'
)

# Convert release_date to datetime
df_hits_full['release_dt'] = pd.to_datetime(
    df_hits_full['release_date'],
    errors='coerce'
)

# Filter only plays after the official release date
df_hits_valid = df_hits_full[
    df_hits_full['first_play'] >= df_hits_full['release_dt']
].copy()

# Find the premiere station (first play after release)
first_overall = (
    df_hits_valid
    .sort_values('first_play')
    .groupby(['artist', 'title'], as_index=False)
    .first()
    .rename(columns={
        'station': 'premiere_station',
        'first_play': 'premiere_time'
    })
)

# Merge with total_plays and prepare for sunburst
station_hits = first_overall.merge(
    total_plays,
    on=['artist', 'title'],
    how='left'
)


In [282]:
# Merges the first overall premieres with total play counts for each hit

station_hits = first_overall.merge(
    total_plays,
    on=['artist', 'title'],
    how='left'
)

station_hits

Unnamed: 0,artist,title,premiere_station,premiere_time,artists,release_date,release_dt,total_plays
0,Abor & Tynna,Baller,RMF MAXXX,2025-05-22 19:33:00,Abor & Tynna,2025-05-15,2025-05-15,140
1,Alex Warren,Ordinary,Jedynka,2025-03-24 14:51:00,Alex Warren,2025-02-07,2025-02-07,1163
2,Benson Boone,Mystical Magical,Eska,2025-04-25 08:09:00,Benson Boone,2025-04-24,2025-04-24,208
3,Billy Idol,Still Dancing,Trójka,2025-03-03 14:18:00,Billy Idol,2025-02-26,2025-02-26,117
4,Bryan Adams,Make Up Your Mind,Trójka,2025-03-13 04:44:00,Bryan Adams,2025-03-06,2025-03-06,98
5,Chappell Roan,The Giver,Trójka,2025-03-14 10:07:00,Chappell Roan,2025-03-14,2025-03-14,77
6,Doechii,Anxiety,Trójka,2025-03-07 10:16:00,Doechii,2025-03-05,2025-03-05,752
7,Ed Sheeran,Azizam,Eska,2025-04-04 07:09:00,Ed Sheeran,2025-04-04,2025-04-04,1288
8,Ed Sheeran,Old Phone,RMF FM,2025-05-03 09:45:00,Ed Sheeran,2025-05-01,2025-05-01,64
9,Ghost,Satanized,Antyradio,2025-03-05 17:13:00,Ghost,2025-03-05,2025-03-05,106


In [283]:
# Sunburst chart

fig = px.sunburst(
    station_hits,
    path=['premiere_station', 'artist', 'title'],
    values='total_plays',
    color='premiere_station',
    color_discrete_map=station_colors,
    title='2025 Hits by Premiere Station and Popularity'
)

# Save the sunburst chart to HTML
fig.write_html(os.path.join(output_folder, "hit_sunburst_per_station.html"))


In [285]:
# 2025 Hits by premiere station and popularity
fig.update_layout(
    title='2025 Hits by premiere station and popularity',
    margin=dict(t=100, l=25, r=25, b=25),
    annotations=[
        dict(
            text=(
                "Hierarchy of segments:<br>"
                "- First level: station that played the hit first<br>"
                "- Second level: artist<br>"
                "- Third level: track title<br><br>"
                "The segment size represents the total number of plays<br>"
                "(across all stations)."
            ),
            showarrow=False,
            x=0.5, y=-0.1,
            xref="paper", yref="paper",
            align="center",
            font=dict(size=12)
        )
    ]
)

# Save the chart as HTML
fig.write_html(os.path.join(output_folder, "hit_sunburst_per_station_with_annotation.html"))


In [287]:
# Count the number of plays of each hit per station
plays_station = (
    df_hits
    .groupby(['artist', 'title', 'station'])
    .size()
    .reset_index(name='plays_station')
)
plays_station

Unnamed: 0,artist,title,station,plays_station
0,Abor & Tynna,Baller,Eska,80
1,Abor & Tynna,Baller,RMF MAXXX,60
2,Alex Warren,Ordinary,Czwórka,1
3,Alex Warren,Ordinary,Eska,417
4,Alex Warren,Ordinary,Jedynka,4
...,...,...,...,...
76,Vella,All My Love,Jedynka,6
77,Vella,All My Love,RMF MAXXX,4
78,Youth Lagoon,Speed Freak,Chillizet,143
79,Youth Lagoon,Speed Freak,Czwórka,55


In [288]:
# First_overall: premiere station and time for each hit
sunburst_df = first_overall.merge(
    plays_station,
    on=['artist', 'title'],
    how='left'
)


In [289]:
# Sunburst chart - 2025 Hits: Premiere station → Artist → Title → All stations playing the song


fig = px.sunburst(
    sunburst_df,
    path=['premiere_station', 'artist', 'title', 'station'],
    values='plays_station',
    color='premiere_station',
    color_discrete_map=station_colors,
    title='2025 Hits: Premiere station → Artist → Title → All stations playing the song'
)

# Save to HTML
fig.write_html(os.path.join(output_folder, "hit_sunburst_full.html"))


In [290]:
# Number of plays of each hit per station (without premiere station on the last ring)
plays_station = df_hits.groupby(['artist', 'title', 'station']).size()

# Move indexes to columns
plays_station = (
    plays_station
    .rename_axis(['artist', 'title', 'station'])
    .reset_index()
)

# Name the column with values
plays_station.columns = ['artist', 'title', 'station', 'plays_station']


In [293]:
# Records where the premiere station information is missing

missing = station_hits_full[station_hits_full['premiere_station'].isna()]
print(missing[['artist', 'title', 'station', 'plays_station']].drop_duplicates())


               artist       title    station  plays_station
11      Black Sabbath    Paranoid  Antyradio             56
12      Black Sabbath    Paranoid    Jedynka              3
13      Black Sabbath    Paranoid     Trójka             11
14     Britney Spears       Lucky     VOX FM             80
21        Dina Summer  Girls Gang    Czwórka             78
44  Sabrina Carpenter       Taste       Eska            852
45  Sabrina Carpenter       Taste     RMF FM            198
46  Sabrina Carpenter       Taste  RMF MAXXX            704
47  Sabrina Carpenter       Taste     Trójka             36
48  Sabrina Carpenter       Taste        ZET            106


In [294]:
# Remove rows with missing premiere station information
station_hits_full = station_hits_full.dropna(subset=['premiere_station'])

# Remove rows with missing artist, title, or station information
station_hits_full = station_hits_full.dropna(subset=['artist', 'title', 'station'])


In [295]:
# 2025 Hits: Premiere station and further airplay in the network(excluding premiere station on last level)
fig = px.sunburst(
    station_hits_full,
    path=['premiere_station', 'artist', 'title', 'station'],
    values='plays_station',
    color='premiere_station',
    color_discrete_map=station_colors,
    title='2025 Hits: Premiere station and further airplay in the network'
)

# Save the sunburst chart as an HTML file
fig.write_html(os.path.join(output_folder, "hit_sunburst_without_premiere_on_level4.html"))


In [296]:
# Get the top 50 hits released in 2025 by total number of plays
hits2025 = (
    total_plays
    .merge(
        df_prem2025[['artists', 'title']],
        left_on=['artist', 'title'],
        right_on=['artists', 'title'],
        how='inner'
    )
    .sort_values('total_plays', ascending=False)
    .head(50)
    .reset_index(drop=True)
)


In [297]:
# Filter the music plays to include only the top 2025 hits
df_hits = df_music.merge(
    hits2025[['artist', 'title']],
    on=['artist', 'title'],
    how='inner'
)


In [298]:
# Find the first play of each 2025 hit per station
first_by_station = (
    df_hits
    .groupby(['artist', 'title', 'station'], as_index=False)
    .agg(first_play=('play_dt', 'min'))
)


In [299]:
# Merge with df_prem2025 to get release_date for each hit
fb = first_by_station.merge(
    df_prem2025[['artists', 'title', 'release_date']],
    left_on=['artist', 'title'],
    right_on=['artists', 'title'],
    how='left'
)

# Convert release_date to datetime and filter only valid (after release) first plays
fb['release_dt'] = pd.to_datetime(fb['release_date'])
fb = fb[fb['first_play'] >= fb['release_dt']]

# Find the premiere station and premiere time for each hit
first_overall = (
    fb
    .sort_values('first_play')
    .groupby(['artist', 'title'], as_index=False)
    .first()
    .rename(columns={'station': 'premiere_station', 'first_play': 'premiere_time'})
)


In [300]:
# Count number of plays for each hit per station
plays_station = (
    df_hits
    .groupby(['artist', 'title', 'station'])
    .size()
    .rename_axis(['artist', 'title', 'station'])
    .reset_index(name='plays_station')
)

# Exclude the premiere station
plays_excl = plays_station.merge(
    first_overall[['artist', 'title', 'premiere_station']],
    on=['artist', 'title'],
    how='left'
)
plays_excl = plays_excl[plays_excl['station'] != plays_excl['premiere_station']]


In [305]:
# Merge with total plays for each hit
station_hits_full = plays_excl.merge(
    total_plays,
    on=['artist', 'title'],
    how='left'
)

station_hits_clean = station_hits_full.dropna(
    subset=['premiere_station','artist', 'title', 'station']
)



# Create a sunburst chart with a max depth of 2 levels
fig = px.sunburst(
    station_hits_clean,
    path=['premiere_station', 'artist', 'title', 'station'],
    values='plays_station',
    color='premiere_station',
    color_discrete_map=station_colors,
    title='2025 Hits: Premiere Station and Further Airplay'
)

fig.update_traces(maxdepth=2)  # Limit the sunburst to 2 levels

fig.write_html(os.path.join(output_folder, "hit_sunburst_full_clean.html"))


In [306]:
# Top 5 artists for each station

# Group by station and artist, count plays
artist_station = (
    df_music
    .groupby(['station', 'artist'])
    .size()
    .reset_index(name='plays')
)

# Select the top 5 artists for each station
top5 = (
    artist_station
    .sort_values(['station', 'plays'], ascending=[True, False])
    .groupby('station')
    .head(5)
    .reset_index(drop=True)
)


In [307]:
# Top 5 Artists in each radio station

fig = px.bar(
    top5,
    x='plays',
    y='artist',
    color='station',
    color_discrete_map=station_colors,
    orientation='h',
    facet_row='station',
    height=320 * top5['station'].nunique(),
    title='Top 5 Artists in each radio station',
    labels={'plays': 'Number of Plays', 'artist': 'Artist'}
)

fig.update_yaxes(categoryorder='total ascending')
fig.update_layout(
    showlegend=False,
    margin=dict(l=120, t=70)
)

fig.add_annotation(
    text="Each panel shows the 5 most frequently played artists for a given radio station.<br>Panel colors correspond to the station colors used in the report.",
    xref="paper", yref="paper",
    x=0.5, y=1.10, showarrow=False,
    font=dict(size=13),
    align="center"
)

# Save the chart as HTML
fig.write_html(os.path.join(output_folder, "top5_artists_per_radio.html"))


In [308]:
# Count how many times each artist was played in each station
artist_station = (
    df_music
    .groupby(['station', 'artist'])
    .size()
    .reset_index(name='plays')
)

# Select the top 5 most popular artists for each radio station
top5_per_radio = (
    artist_station
    .sort_values(['station', 'plays'], ascending=[True, False])
    .groupby('station')
    .head(5)
    .reset_index(drop=True)
)

print(top5_per_radio)


           station                 artist  plays
0        Antyradio                    Hey   1722
1        Antyradio                   Kult   1705
2        Antyradio                Perfect   1491
3        Antyradio  Red Hot Chili Peppers   1452
4        Antyradio              Lady Pank   1408
..             ...                    ...    ...
60  Złote Przeboje              Lady Pank   2096
61  Złote Przeboje                 Maanam   1837
62  Złote Przeboje     Elektryczne Gitary   1770
63  Złote Przeboje                   Bajm   1564
64  Złote Przeboje                  Kombi   1559

[65 rows x 3 columns]


In [309]:
# Count in how many different stations each artist appears in the top 5
artist_count = top5_per_radio['artist'].value_counts()
unique_artists = artist_count[artist_count == 1].index.tolist()

# Add a column indicating whether the artist is unique to a single station's top 5
top5_per_radio['unique'] = top5_per_radio['artist'].isin(unique_artists)


In [310]:
# Top 5 artists in each radio station (unique highlighted in bold)

# Create the output folder if it doesn't exist
os.makedirs(output_folder, exist_ok=True)

# Make artist name bold in HTML if unique to a single station's top 5
def bold_if_unique(row):
    return f"<b>{row['artist']}</b>" if row['unique'] else row['artist']

top5_per_radio['artist_fmt'] = top5_per_radio.apply(bold_if_unique, axis=1)

# To keep order, set category for artist_fmt:
artist_order = (
    top5_per_radio
    .sort_values(['station', 'plays'], ascending=[True, False])
    .groupby('station')['artist_fmt']
    .apply(list)
    .explode()
    .unique()
)

# Create a faceted bar chart with bolded unique artists
fig = px.bar(
    top5_per_radio,
    x='plays',
    y='artist_fmt',
    color='station',
    color_discrete_map=station_colors,
    orientation='h',
    facet_row='station',
    height=320 * top5_per_radio['station'].nunique(),
    title='Top 5 artists in each radio station (unique highlighted in bold)',
    labels={'plays': 'Number of Plays', 'artist_fmt': 'Artist'}
)

fig.update_yaxes(categoryorder='total ascending')

# Enable rendering HTML labels
fig.update_traces(texttemplate="%{y}", textposition="auto", textfont_size=14)

# Add detailed hover text for clarity
fig.update_traces(
    hovertemplate='<b>Artist:</b> %{y}<br>Number of plays: %{x}<extra></extra>'
)

# Add annotation explaining the bold formatting
fig.add_annotation(
    text=(
        "Each panel shows the 5 most played artists for a given radio station.<br>"
        "<b>Bold name</b> – artist appears only in this station's top 5."
    ),
    xref="paper", yref="paper",
    x=0.5, y=1.11, showarrow=False,
    font=dict(size=13),
    align="center"
)

# Save the chart as HTML
fig.write_html(os.path.join(output_folder, "top5_artists_per_radio_highlight.html"))


In [311]:
# Create output folders for top 5 artists and tracks per station
os.makedirs(os.path.join(output_folder, "top5_artists_per_radio"), exist_ok=True)
os.makedirs(os.path.join(output_folder, "top5_tracks_per_radio"), exist_ok=True)


In [312]:
# Group by station and artist (number of plays)
artist_station = (
    df_music
    .groupby(['station', 'artist'])
    .size()
    .reset_index(name='plays')
)

# Group by station and track title (number of plays)
track_station = (
    df_music
    .groupby(['station', 'title'])
    .size()
    .reset_index(name='plays')
)

# Top 5 artists for each station
top5_artists = (
    artist_station
    .sort_values(['station', 'plays'], ascending=[True, False])
    .groupby('station')
    .head(5)
    .reset_index(drop=True)
)

# Top 5 tracks for each station
top5_tracks = (
    track_station
    .sort_values(['station', 'plays'], ascending=[True, False])
    .groupby('station')
    .head(5)
    .reset_index(drop=True)
)


In [313]:
# Unique artists: artists who appear in the top 5 for only one station
unique_artists = top5_artists['artist'].value_counts()
unique_artists = unique_artists[unique_artists == 1].index.tolist()

# Unique tracks: tracks that appear in the top 5 for only one station
unique_tracks = top5_tracks['title'].value_counts()
unique_tracks = unique_tracks[unique_tracks == 1].index.tolist()

# Mark unique artists and tracks in the top 5 lists
top5_artists['unique'] = top5_artists['artist'].isin(unique_artists)
top5_tracks['unique'] = top5_tracks['title'].isin(unique_tracks)


In [314]:
# Top 5 artists - one file per radio station
for station in top5_artists['station'].unique():
    df_station = top5_artists[top5_artists['station'] == station].sort_values('plays', ascending=True)
    # Bold the name if unique
    df_station['artist_fmt'] = df_station.apply(
        lambda row: f"<b>{row['artist']}</b>" if row['unique'] else row['artist'],
        axis=1
    )
    fig = px.bar(
        df_station,
        x='plays',
        y='artist_fmt',
        orientation='h',
        color_discrete_sequence=[station_colors.get(station, "#666666")],
        title=f'Top 5 Artists in {station}',
        labels={'plays': 'Number of Plays', 'artist_fmt': 'Artist'}
    )
    fig.update_yaxes(categoryorder='total ascending')
    fig.update_layout(showlegend=False)
    fig.write_html(os.path.join(output_folder, "top5_artists_per_radio", f"top5_artists_{station}.html"))

# Top 5 tracks - one file per radio station
for station in top5_tracks['station'].unique():
    df_station = top5_tracks[top5_tracks['station'] == station].sort_values('plays', ascending=True)
    # Bold the title if unique
    df_station['title_fmt'] = df_station.apply(
        lambda row: f"<b>{row['title']}</b>" if row['unique'] else row['title'],
        axis=1
    )
    fig = px.bar(
        df_station,
        x='plays',
        y='title_fmt',
        orientation='h',
        color_discrete_sequence=[station_colors.get(station, "#666666")],
        title=f'Top 5 Tracks in {station}',
        labels={'plays': 'Number of Plays', 'title_fmt': 'Track Title'}
    )
    fig.update_yaxes(categoryorder='total ascending')
    fig.update_layout(showlegend=False)
    fig.write_html(os.path.join(output_folder, "top5_tracks_per_radio", f"top5_tracks_{station}.html"))


In [315]:
top5_tracks

Unnamed: 0,station,title,plays,unique
0,Antyradio,Twoje Oczy Lubia Mnie,273,True
1,Antyradio,Nabijam Lufę (Feat. Hiob),251,True
2,Antyradio,Favourite,244,True
3,Antyradio,Rock N'roller,243,True
4,Antyradio,Chłopcy,240,True
...,...,...,...,...
60,Złote Przeboje,Maria,382,True
61,Złote Przeboje,Tonight,328,True
62,Złote Przeboje,99 Luftballons,323,True
63,Złote Przeboje,Zakrecona,315,True


In [317]:
# Group by station, title, and artist; count plays
track_station_artist = (
    df_music
    .groupby(['station', 'title', 'artist'])
    .size()
    .reset_index(name='plays')
)

# Select the top 5 tracks (by play count) for each station
top5_tracks = (
    track_station_artist
    .sort_values(['station', 'plays'], ascending=[True, False])
    .groupby('station')
    .head(5)
    .reset_index(drop=True)
)


In [318]:
import os

# Create the output folder for unique track reports
os.makedirs(os.path.join(output_folder, "unique"), exist_ok=True)

# Create a track identifier as (artist – title)
df_music['track_id'] = df_music['artist'] + " - " + df_music['title']

# Library size per radio (number of unique tracks)
library_size = (
    df_music.groupby('station')['track_id'].nunique().reset_index(name='Number of Tracks')
)

# Number of tracks unique to each radio station
# - Count in how many different stations each track was played
track_radio_count = (
    df_music.groupby('track_id')['station'].nunique().reset_index(name='station_count')
)

# - Keep only tracks played in exactly one station
unique_tracks = track_radio_count[track_radio_count['station_count'] == 1]['track_id']

# - Count the number of such unique tracks per station
unique_per_station = (
    df_music[df_music['track_id'].isin(unique_tracks)]
    .groupby('station')['track_id'].nunique()
    .reset_index(name='Unique Tracks')
)


In [320]:
# Size of each station's track library and the number of tracks unique to each station

# Library size per station
fig1 = px.bar(
    library_size.sort_values('Number of Tracks', ascending=False),
    x='station', y='Number of Tracks',
    color='station',
    color_discrete_map=station_colors,
    title='Library Size: Number of Unique Tracks per Station',
    text='Number of Tracks'
)
fig1.update_layout(showlegend=False)
fig1.write_html(os.path.join(output_folder, "unique", "library_size_per_station.html"))

# Number of unique tracks per station
fig2 = px.bar(
    unique_per_station.sort_values('Unique Tracks', ascending=False),
    x='station', y='Unique Tracks',
    color='station',
    color_discrete_map=station_colors,
    title='Number of Unique Tracks in Each Station\'s Library<br><span style="font-size:13px">(track = artist + title, uniqueness: not played on any other station)</span>',
    text='Unique Tracks'
)
fig2.update_layout(showlegend=False)
fig2.write_html(os.path.join(output_folder, "unique", "unique_tracks_per_station.html"))

display(unique_per_station)


Unnamed: 0,station,Unique Tracks
0,Antyradio,4426
1,Chillizet,5779
2,Czwórka,16125
3,Dwójka,26405
4,Eska,915
5,Jedynka,14073
6,RMF Classic,3241
7,RMF FM,322
8,RMF MAXXX,876
9,Trójka,18241


In [321]:
# Export tables of tracks that are unique to each station (not played anywhere else)

# Create the folder for unique track tables per station
os.makedirs(os.path.join(output_folder, "unique", "tracks_per_station"), exist_ok=True)

# For each station, export a table of unique tracks (artist + title), i.e. tracks played only in this station
for station in df_music['station'].unique():
    df_uni = (
        df_music[
            (df_music['station'] == station) &
            (df_music['track_id'].isin(unique_tracks))
        ][['artist', 'title']]
        .drop_duplicates()
        .sort_values(['artist', 'title'])
    )
    df_uni.to_html(
        os.path.join(output_folder, "unique", "tracks_per_station", f"unique_tracks_{station}.html"),
        index=False,
        encoding="utf-8"
    )


In [322]:
# Create a track ID (artist – title)
df_music['track_id'] = df_music['artist'] + " - " + df_music['title']

# Count the number of plays for each unique track in each station
unique_plays = (
    df_music[df_music['track_id'].isin(unique_tracks)]
    .groupby(['station', 'artist', 'title', 'track_id'])
    .size()
    .reset_index(name='plays')
)


In [323]:
# For each station, select the top 5 unique tracks by play count
top5_unique_per_station = (
    unique_plays
    .sort_values(['station', 'plays'], ascending=[True, False])
    .groupby('station')
    .head(5)
    .reset_index(drop=True)
)


In [324]:
# Top 5 most played unique tracks in radio station

# Create output folder for top 5 unique tracks per station
os.makedirs(os.path.join(output_folder, "unique", "top5_unique_per_station"), exist_ok=True)

# For each station, plot the top 5 unique tracks (artist – title)
for station in top5_unique_per_station['station'].unique():
    df_station = top5_unique_per_station[top5_unique_per_station['station'] == station].sort_values('plays', ascending=True)
    # Y axis: Artist – Title
    df_station['track_full'] = df_station['artist'] + " - " + df_station['title']
    fig = px.bar(
        df_station,
        x='plays',
        y='track_full',
        orientation='h',
        color_discrete_sequence=[station_colors.get(station, "#666666")],
        title=f"Top 5 most played unique tracks in {station}",
        labels={'plays': 'Number of Plays', 'track_full': 'Track (Artist – Title)'}
    )
    fig.update_yaxes(categoryorder='total ascending')
    fig.update_layout(showlegend=False)
    fig.add_annotation(
        text="Top 5 tracks played exclusively in this radio station.<br>Sorted by number of plays.",
        xref="paper", yref="paper",
        x=0.5, y=1.12, showarrow=False,
        font=dict(size=13),
        align="center"
    )
    fig.write_html(os.path.join(output_folder, "unique", "top5_unique_per_station", f"top5_unique_{station}.html"))
