# Spotify Playlists Visualizations

Here, we'll download, prepare, and visualize the songs of some playlists as point clouds.

Here are some playlists we can use:

* **[Top 500 of the Last Decade](https://open.spotify.com/playlist/7xm0JS2hGoHn7Svr9hsQkw)**: A collection of 508 tracks representing the most popular songs from the last decade. By alex_matier.

* **[Rolling Stone Magazine - 500 Greatest Songs of All Time (2021)](https://open.spotify.com/playlist/7EAqBCOVkDZcbccjxZmgjp)**: A playlist featuring 584 songs from Rolling Stone's 2021 updated list of the 500 greatest songs of all time. By Henrik B. Hansen.

* **[Spotify Top 500 Most Streamed Songs of All Time](https://open.spotify.com/playlist/0JiVp7Z0pYKI8diUV6HJyQ)**: A compilation of the 500 most-streamed songs of all time on Spotify. By Spotify.

* **[500 Greatest Songs of All Time](https://open.spotify.com/playlist/6G9mBCSozMx0sOSXhSzZRY)**: 500 tracks showcasing some of the greatest songs in music history. By one-media.

* **[500+ Million Streams [Top 50 ordered by Streams]](https://open.spotify.com/playlist/7A0BB1t8whMe5CELdkOGC4)**: A massive playlist featuring 1,754 songs with over 500 million streams, ordered by popularity. By Various.

Here's a dict with that information and more:

In [1]:
spotify_playlists = {
    'top_500_last_decade': {
        'url': 'https://open.spotify.com/playlist/7xm0JS2hGoHn7Svr9hsQkw',
        'description': 'A collection of 508 tracks representing the most popular songs from the last decade.',
        'author': 'alex_matier',
        'title': 'Top 500 of the Last Decade',
        'table_url': 'https://raw.githubusercontent.com/thorwhalen/sung_content/main/parquet/top_500_last_decade.parquet',
    },
    'rolling_stone_500_greatest_songs': {
        'url': 'https://open.spotify.com/playlist/7EAqBCOVkDZcbccjxZmgjp',
        'description': "A playlist featuring 584 songs from Rolling Stone's 2021 updated list of the 500 greatest songs of all time.",
        'author': 'Henrik B. Hansen',
        'title': 'Rolling Stone Magazine - 500 Greatest Songs of All Time (2021)',
        'table_url': 'https://raw.githubusercontent.com/thorwhalen/sung_content/main/parquet/rolling_stone_500_greatest_songs.parquet',
    },
    'spotify_top_500_streamed': {
        'url': 'https://open.spotify.com/playlist/0JiVp7Z0pYKI8diUV6HJyQ',
        'description': 'A compilation of the 500 most-streamed songs of all time on Spotify.',
        'author': 'Spotify',
        'title': 'Spotify Top 500 Most Streamed Songs of All Time',
        'table_url': 'https://raw.githubusercontent.com/thorwhalen/sung_content/main/parquet/spotify_top_500_streamed.parquet',
    },
    'greatest_500_songs': {
        'url': 'https://open.spotify.com/playlist/6G9mBCSozMx0sOSXhSzZRY',
        'description': '500 tracks showcasing some of the greatest songs in music history.',
        'author': 'one-media',
        'title': '500 Greatest Songs of All Time',
        'table_url': 'https://raw.githubusercontent.com/thorwhalen/sung_content/main/parquet/greatest_500_songs.parquet',
    },
    'over_500_million_streams': {
        'url': 'https://open.spotify.com/playlist/7A0BB1t8whMe5CELdkOGC4',
        'description': 'A massive playlist featuring 1,754 songs with over 500 million streams, ordered by popularity.',
        'author': 'Various',
        'title': '500+ Million Streams [Top 50 ordered by Streams]',
        'table_url': 'https://raw.githubusercontent.com/thorwhalen/sung_content/main/parquet/over_500_million_streams.parquet',
    },
}

# Setup

The following code was test with Python 3.10. No guaranties if it will work with earlier versions as is.

You'll need the following third-party packages too:

Required:

```
pip install -U pandas, tabled, imbed, dol
```

Optional:
* 'graze` if you want to cache the bytes you extract from the table urls

## Parameters

In [2]:
rootdir = '/Users/thorwhalen/Dropbox/_odata/figiri/spotify_playlists'

cache_tables = True

## Imports etc.

In [3]:
import os
import io 
from functools import partial

import pandas as pd
from imbed import fullpath_factory
from tabled import DfFiles, get_table as _get_table
from dol import Pipe

assert os.path.isdir(rootdir), f"rootdir: {rootdir} is not a directory"
fullpath_of = fullpath_factory(rootdir)

local_tables_dir = fullpath_of('tables')
os.makedirs(local_tables_dir, exist_ok=True)

local_tables = DfFiles(local_tables_dir)

# Determining how to get content from remote urls
if cache_tables:
    from graze import graze  # pip install graze

    content_of_url = partial(
        graze, 
        max_age=None,  # in seconds (if your cached data is older than this, it will be re-downloaded)
        rootdir=rootdir
    )

else:
    
    import requests

    content_of_url = lambda url: requests.get(url).content


parquet_url_to_table = Pipe(content_of_url, io.BytesIO, pd.read_parquet)



In [4]:
# get tables form github

def _playlist_data():
    for playlist_name, playlist in spotify_playlists.items():
        table = parquet_url_to_table(playlist['table_url'])
        yield playlist_name, table

playlist_data = dict(_playlist_data())

In [5]:
list(playlist_data)

['top_500_last_decade',
 'rolling_stone_500_greatest_songs',
 'spotify_top_500_streamed',
 'greatest_500_songs',
 'over_500_million_streams']

In [6]:
df = playlist_data['top_500_last_decade']
print(f"{df.shape=}")
df.head(3)

df.shape=(508, 40)


Unnamed: 0_level_0,name,popularity,explicit,album_name,album_release_date,album_release_year,url,artist_list,first_artist,id,...,energy,instrumentalness,liveness,loudness,speechiness,valence,tempo,key,mode,time_signature
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1V6gIisPpYqgFeWbMLI0bA,Heart Attack,78,False,Demi,2013-01-01,2013,https://open.spotify.com/track/1V6gIisPpYqgFeW...,[Demi Lovato],Demi Lovato,1V6gIisPpYqgFeWbMLI0bA,...,0.785,0.0,0.239,-4.802,0.104,0.502,173.968,8,1,4
4esOae7i4rqTbAu9o5Pxco,Girl on Fire,74,False,Girl on Fire (Remixes) - EP,2012-11-18,2012,https://open.spotify.com/track/4esOae7i4rqTbAu...,[Alicia Keys],Alicia Keys,4esOae7i4rqTbAu9o5Pxco,...,0.706,0.00106,0.105,-5.762,0.0607,0.144,92.513,9,1,4
2uwnP6tZVVmTovzX5ELooy,Power Trip (feat. Miguel),77,True,Born Sinner (Deluxe Version),2013-06-18,2013,https://open.spotify.com/track/2uwnP6tZVVmTovz...,"[J. Cole, Miguel]",J. Cole,2uwnP6tZVVmTovzX5ELooy,...,0.608,0.000198,0.426,-7.054,0.216,0.475,99.992,1,1,4


In [7]:
from sung.util import (
    spotify_features_fields,
    spotify_audio_features_fields,
)

spotify_audio_features_fields_names = list(spotify_audio_features_fields)
spotify_features_fields_names = list(spotify_features_fields)

# See what "feature" fields we have (and their explanation)
spotify_features_fields



{'duration_ms': "The track's duration in milliseconds.",
 'popularity': 'The popularity of the track, with values ranging from 0 to 100. Higher values indicate greater popularity.',
 'explicit': 'A boolean indicating whether the track contains explicit content.',
 'album_release_year': 'The year in which the album was released.',
 'acousticness': 'A confidence measure from 0.0 to 1.0 indicating the likelihood that the track is acoustic. Higher values denote a higher probability. Range: 0.0 to 1.0.',
 'danceability': 'Reflects how suitable a track is for dancing, based on tempo, rhythm stability, beat strength, and overall regularity. Higher values indicate greater danceability. Range: 0.0 to 1.0.',
 'energy': 'Measures the intensity and activity of a track. Energetic tracks feel fast, loud, and noisy. Higher values represent more energy. Range: 0.0 to 1.0.',
 'instrumentalness': 'Predicts whether a track contains no vocals. Higher values suggest a greater likelihood of the track being 

## Compute (and save) planar embeddings of the feature fields

In [None]:
recompute_planar_embeddings = False

if recompute_planar_embeddings:
    from imbed import planar_embeddings, transpose_iterable
    from tabled import extension_based_wrap
    from dol import Files

    WriteDfFiles = extension_based_wrap(Files)
    write_df_files = WriteDfFiles(local_tables_dir)

    n = len(playlist_data)

    for i, (name, data) in enumerate(playlist_data.items(), 1):
        print(f"Processing: {name} ({i}/{n})")
        X = data[spotify_features_fields_names].values
        data['x'], data['y'] = transpose_iterable(planar_embeddings(X).values())
        write_df_files[name + '.parquet'] = data
        del X  # to free memory

# Visualize these

In [10]:
list(local_tables)

['rolling_stone_500_greatest_songs.parquet',
 'top_500_last_decade.parquet',
 'greatest_500_songs.parquet',
 'over_500_million_streams.parquet',
 'spotify_top_500_streamed.parquet']

In [11]:
df = local_tables['top_500_last_decade.parquet']
df.head()

Unnamed: 0_level_0,name,popularity,explicit,album_name,album_release_date,album_release_year,url,artist_list,first_artist,id,...,liveness,loudness,speechiness,valence,tempo,key,mode,time_signature,x,y
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1V6gIisPpYqgFeWbMLI0bA,Heart Attack,78,False,Demi,2013-01-01,2013,https://open.spotify.com/track/1V6gIisPpYqgFeW...,[Demi Lovato],Demi Lovato,1V6gIisPpYqgFeWbMLI0bA,...,0.239,-4.802,0.104,0.502,173.968,8,1,4,8.775248,13.712613
4esOae7i4rqTbAu9o5Pxco,Girl on Fire,74,False,Girl on Fire (Remixes) - EP,2012-11-18,2012,https://open.spotify.com/track/4esOae7i4rqTbAu...,[Alicia Keys],Alicia Keys,4esOae7i4rqTbAu9o5Pxco,...,0.105,-5.762,0.0607,0.144,92.513,9,1,4,8.223083,12.477551
2uwnP6tZVVmTovzX5ELooy,Power Trip (feat. Miguel),77,True,Born Sinner (Deluxe Version),2013-06-18,2013,https://open.spotify.com/track/2uwnP6tZVVmTovz...,"[J. Cole, Miguel]",J. Cole,2uwnP6tZVVmTovzX5ELooy,...,0.426,-7.054,0.216,0.475,99.992,1,1,4,5.053463,12.713016
6BtmXhTJMM9sBTHeYYASGz,It's Time,64,False,Night Visions,2012-09-04,2012,https://open.spotify.com/track/6BtmXhTJMM9sBTH...,[Imagine Dragons],Imagine Dragons,6BtmXhTJMM9sBTHeYYASGz,...,0.145,-4.748,0.0372,0.858,105.009,2,1,4,8.920827,12.491635
6PUIzlqotEmPuBfjbwYWOB,Summertime Sadness (Lana Del Rey Vs. Cedric Ge...,72,False,Summertime Sadness (Lana Del Rey Vs. Cedric Ge...,2013-02-01,2013,https://open.spotify.com/track/6PUIzlqotEmPuBf...,"[Lana Del Rey, Cedric Gervais]",Lana Del Rey,6PUIzlqotEmPuBfjbwYWOB,...,0.13,-5.791,0.0558,0.11,126.052,1,0,4,7.993422,14.892555


In [12]:
from cosmograph_widget import Cosmograph
import pandas as pd
from IPython.display import display

In [13]:
df.head()

Unnamed: 0_level_0,name,popularity,explicit,album_name,album_release_date,album_release_year,url,artist_list,first_artist,id,...,liveness,loudness,speechiness,valence,tempo,key,mode,time_signature,x,y
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1V6gIisPpYqgFeWbMLI0bA,Heart Attack,78,False,Demi,2013-01-01,2013,https://open.spotify.com/track/1V6gIisPpYqgFeW...,[Demi Lovato],Demi Lovato,1V6gIisPpYqgFeWbMLI0bA,...,0.239,-4.802,0.104,0.502,173.968,8,1,4,8.775248,13.712613
4esOae7i4rqTbAu9o5Pxco,Girl on Fire,74,False,Girl on Fire (Remixes) - EP,2012-11-18,2012,https://open.spotify.com/track/4esOae7i4rqTbAu...,[Alicia Keys],Alicia Keys,4esOae7i4rqTbAu9o5Pxco,...,0.105,-5.762,0.0607,0.144,92.513,9,1,4,8.223083,12.477551
2uwnP6tZVVmTovzX5ELooy,Power Trip (feat. Miguel),77,True,Born Sinner (Deluxe Version),2013-06-18,2013,https://open.spotify.com/track/2uwnP6tZVVmTovz...,"[J. Cole, Miguel]",J. Cole,2uwnP6tZVVmTovzX5ELooy,...,0.426,-7.054,0.216,0.475,99.992,1,1,4,5.053463,12.713016
6BtmXhTJMM9sBTHeYYASGz,It's Time,64,False,Night Visions,2012-09-04,2012,https://open.spotify.com/track/6BtmXhTJMM9sBTH...,[Imagine Dragons],Imagine Dragons,6BtmXhTJMM9sBTHeYYASGz,...,0.145,-4.748,0.0372,0.858,105.009,2,1,4,8.920827,12.491635
6PUIzlqotEmPuBfjbwYWOB,Summertime Sadness (Lana Del Rey Vs. Cedric Ge...,72,False,Summertime Sadness (Lana Del Rey Vs. Cedric Ge...,2013-02-01,2013,https://open.spotify.com/track/6PUIzlqotEmPuBf...,"[Lana Del Rey, Cedric Gervais]",Lana Del Rey,6PUIzlqotEmPuBfjbwYWOB,...,0.13,-5.791,0.0558,0.11,126.052,1,0,4,7.993422,14.892555


In [14]:
df.iloc[0]

name                                                       Heart Attack
popularity                                                           78
explicit                                                          False
album_name                                                         Demi
album_release_date                                           2013-01-01
album_release_year                                                 2013
url                   https://open.spotify.com/track/1V6gIisPpYqgFeW...
artist_list                                               [Demi Lovato]
first_artist                                                Demi Lovato
id                                               1V6gIisPpYqgFeWbMLI0bA
duration_ms                                                      210840
is_local                                                          False
album_total_tracks                                                   13
album_images          [{'height': 640, 'url': 'https://i.scdn.co

In [15]:
d = df.copy()
d['index'] = range(len(d))  # TODO: Get rid of this. Should not be neccessary
d['explicit'] = d['explicit'].astype(int)

cosmo = Cosmograph(
   points=d, 
#    links=links,
   point_id='id',
   point_index='index',
   point_x='x',
   point_y='y',
   point_size='popularity',
   point_label='name',
   point_size_scale=0.003,
   point_color='explicit',  # mode, explicit
   # link_source='source',
   # link_source_index='sourceidx',
   # link_target='target',
   # link_target_index='targetidx',
   # simulation_decay=20,
#    link_source='source',
#    link_source_index='sourceidx',
#    link_target='target',
#    link_target_index='targetidx',
)

cosmo

Cosmograph(background_color=None, default_link_color=None, default_point_color=None, focused_point_ring_color=…

In [17]:
cosmo_2 = Cosmograph(
   points=d, 
#    links=links,
   point_id='id',
   point_index='index',
   point_x='x',
   point_y='y',
   point_size='popularity',
   point_label='name',
   point_size_scale=0.003,
   point_color='mode',  # mode, explicit
   # link_source='source',
   # link_source_index='sourceidx',
   # link_target='target',
   # link_target_index='targetidx',
   # simulation_decay=20,
#    link_source='source',
#    link_source_index='sourceidx',
#    link_target='target',
#    link_target_index='targetidx',
)

cosmo_2

Cosmograph(background_color=None, default_link_color=None, default_point_color=None, focused_point_ring_color=…

In [22]:
cosmo_2 = Cosmograph(
   points=d, 
#    links=links,
   point_id='id',
   point_index='index',
   point_x='x',
   point_y='y',
   point_size='duration_ms',
   point_label='name',
   point_size_scale=0.000001,
   point_color='popularity',  # mode, explicit
   # link_source='source',
   # link_source_index='sourceidx',
   # link_target='target',
   # link_target_index='targetidx',
   # simulation_decay=20,
#    link_source='source',
#    link_source_index='sourceidx',
#    link_target='target',
#    link_target_index='targetidx',
)

cosmo_2

Cosmograph(background_color=None, default_link_color=None, default_point_color=None, focused_point_ring_color=…

In [3]:
import pandas as pd

df = pd.read_parquet('/Users/thorwhalen/Dropbox/_odata/figiri/spotify_playlists/tables/top_500_last_decade.parquet')

df = df.copy()
df['index']  = range(len(df))
df.head(3)

Unnamed: 0_level_0,name,popularity,explicit,album_name,album_release_date,album_release_year,url,artist_list,first_artist,id,...,loudness,speechiness,valence,tempo,key,mode,time_signature,x,y,index
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1V6gIisPpYqgFeWbMLI0bA,Heart Attack,78,False,Demi,2013-01-01,2013,https://open.spotify.com/track/1V6gIisPpYqgFeW...,[Demi Lovato],Demi Lovato,1V6gIisPpYqgFeWbMLI0bA,...,-4.802,0.104,0.502,173.968,8,1,4,8.291117,13.710884,0
4esOae7i4rqTbAu9o5Pxco,Girl on Fire,74,False,Girl on Fire (Remixes) - EP,2012-11-18,2012,https://open.spotify.com/track/4esOae7i4rqTbAu...,[Alicia Keys],Alicia Keys,4esOae7i4rqTbAu9o5Pxco,...,-5.762,0.0607,0.144,92.513,9,1,4,8.808458,12.522784,1
2uwnP6tZVVmTovzX5ELooy,Power Trip (feat. Miguel),77,True,Born Sinner (Deluxe Version),2013-06-18,2013,https://open.spotify.com/track/2uwnP6tZVVmTovz...,"[J. Cole, Miguel]",J. Cole,2uwnP6tZVVmTovzX5ELooy,...,-7.054,0.216,0.475,99.992,1,1,4,5.788551,12.303987,2


In [11]:
from cosmograph_widget import Cosmograph

cosmo = Cosmograph(
   points=df,
   # links=pd.DataFrame(),
   point_id='id',
   point_index='index',
   point_x='x',
   point_y='y',
   point_size='popularity',
   point_size_scale=0.005,
   # point_color='color',
   # link_source='source',
   # link_source_index='sourceidx',
   # link_target='target',
   # link_target_index='targetidx',
   # simulation_decay=20,
)
cosmo

Cosmograph(background_color=None, default_link_color=None, default_point_color=None, focused_point_ring_color=…

In [None]:
import tabled.wrappers

# get the list of the names of all the objects defined in tabled.wrappers

tabled_wrappers_names = [name for name in dir(tabled.wrappers) if not name.startswith('_')]
tabled_wrappers_names




[]

In [1]:
from dol import Files, wrap_kvs
from tabled import extension_based_encoding, extension_based_decoding, extension_based_wrap

rootdir = '/Users/thorwhalen/tmp/test'

MyDfFiles = extension_based_wrap(Files)

s = MyDfFiles(rootdir)
list(s)

['df.parquet']

In [2]:
from tabled import DfFiles

ss = DfFiles(rootdir)
ss['df.parquet']

Unnamed: 0,a,b
0,1,who
1,2,you
2,3,are


In [2]:
import pandas as pd

# Superhero Laundry Day
superhero_laundry_day = pd.DataFrame({
    "name": ["Superman", "Batman", "Spider-Man"],
    "power_level": [1000, 700, 500],
    "has_clean_cape": [True, False, True],
    "favorite_detergent": ["Kryptonite-Free", "Bat-OxyBoost", "StickyClean"]
})

# Alien Abduction Log
alien_abduction_log = pd.DataFrame({
    "abductee_name": ["Bob", "Alice", "Zork"],
    "location": ["Kansas City", "Roswell", "Jupiter"],
    "duration_minutes": [15, 120, 30],
    "was_returned": [True, False, True]
})

# from tabled import DfFiles

# rootdir = '/Users/thorwhalen/tmp/test'

# df_files = DfFiles(rootdir)

# df_files['superhero_laundry_day.parquet'] = superhero_laundry_day
# df_files['superhero_laundry_day.parquet']
from tabled import extension_based_wrap
from dol import Files

DfFiles = extension_based_wrap(Files)

rootdir = '/Users/thorwhalen/Dropbox/_odata/figiri/spotify_playlists'

df_files = DfFiles(rootdir)

df_files['superhero_laundry_day.parquet'] = superhero_laundry_day
df_files['superhero_laundry_day.parquet']

Unnamed: 0,name,power_level,has_clean_cape,favorite_detergent
0,Superman,1000,True,Kryptonite-Free
1,Batman,700,False,Bat-OxyBoost
2,Spider-Man,500,True,StickyClean


In [None]:
import pandas as pd

# Superhero Laundry Day
superhero_laundry_day = pd.DataFrame({
    "name": ["Superman", "Batman", "Spider-Man"],
    "power_level": [1000, 700, 500],
    "has_clean_cape": [True, False, True],
    "favorite_detergent": ["Kryptonite-Free", "Bat-OxyBoost", "StickyClean"]
})

# Alien Abduction Log
alien_abduction_log = pd.DataFrame({
    "abductee_name": ["Bob", "Alice", "Zork"],
    "location": ["Kansas City", "Roswell", "Jupiter"],
    "duration_minutes": [15, 120, 30],
    "was_returned": [True, False, True]
})

# from tabled import DfFiles

# rootdir = '/Users/thorwhalen/tmp/test'

# df_files = DfFiles(rootdir)

# df_files['superhero_laundry_day.parquet'] = superhero_laundry_day
# df_files['superhero_laundry_day.parquet']
from tabled import extension_based_wrap
from dol import Files

DfFiles = extension_based_wrap(Files)
df_files = DfFiles(rootdir)

df_files['superhero_laundry_day.parquet'] = superhero_laundry_day
df_files['superhero_laundry_day.parquet']

In [1]:
import pandas as pd

# Superhero Laundry Day
superhero_laundry_day = pd.DataFrame({
    "name": ["Superman", "Batman", "Spider-Man"],
    "power_level": [1000, 700, 500],
    "has_clean_cape": [True, False, True],
    "favorite_detergent": ["Kryptonite-Free", "Bat-OxyBoost", "StickyClean"]
})

# Alien Abduction Log
alien_abduction_log = pd.DataFrame({
    "abductee_name": ["Bob", "Alice", "Zork"],
    "location": ["Kansas City", "Roswell", "Jupiter"],
    "duration_minutes": [15, 120, 30],
    "was_returned": [True, False, True]
})

from tabled import DfFiles

rootdir = '/Users/thorwhalen/Dropbox/_odata/figiri/spotify_playlists'

df_files = DfFiles(rootdir)

df_files['superhero_laundry_day.parquet'] = superhero_laundry_day
df_files['superhero_laundry_day.parquet']

Unnamed: 0,name,power_level,has_clean_cape,favorite_detergent
0,Superman,1000,True,Kryptonite-Free
1,Batman,700,False,Bat-OxyBoost
2,Spider-Man,500,True,StickyClean


In [None]:
import tabled

In [5]:
from dol import ValueCodecs

ValueCodecs.bytesio

functools.partial(<function _codec_wrap at 0x12fc7b130>, <class 'dol.trans.ValueCodec'>, <class '_io.BytesIO'>, operator.methodcaller('read'))

# Appendix: How we made the playlist tables (with the `sung` package)

In [33]:
spotify_playlists = {
    "top_500_last_decade": {
        "url": "https://open.spotify.com/playlist/7xm0JS2hGoHn7Svr9hsQkw",
        "description": "A collection of 508 tracks representing the most popular songs from the last decade.",
        "author": "alex_matier",
        "title": "Top 500 of the Last Decade",
    },
    "rolling_stone_500_greatest_songs": {
        "url": "https://open.spotify.com/playlist/7EAqBCOVkDZcbccjxZmgjp",
        "description": "A playlist featuring 584 songs from Rolling Stone's 2021 updated list of the 500 greatest songs of all time.",
        "author": "Henrik B. Hansen",
        "title": "Rolling Stone Magazine - 500 Greatest Songs of All Time (2021)"
    },
    "spotify_top_500_streamed": {
        "url": "https://open.spotify.com/playlist/0JiVp7Z0pYKI8diUV6HJyQ",
        "description": "A compilation of the 500 most-streamed songs of all time on Spotify.",
        "author": "Spotify",
        "title": "Spotify Top 500 Most Streamed Songs of All Time"
    },
    "greatest_500_songs": {
        "url": "https://open.spotify.com/playlist/6G9mBCSozMx0sOSXhSzZRY",
        "description": "500 tracks showcasing some of the greatest songs in music history.",
        "author": "one-media",
        "title": "500 Greatest Songs of All Time"
    },
    "over_500_million_streams": {
        "url": "https://open.spotify.com/playlist/7A0BB1t8whMe5CELdkOGC4",
        "description": "A massive playlist featuring 1,754 songs with over 500 million streams, ordered by popularity.",
        "author": "Various",
        "title": "500+ Million Streams [Top 50 ordered by Streams]"
    }
}

In [None]:
'https://github.com/thorwhalen/sung_content/raw/refs/heads/main/parquet/greatest_500_songs.parquet'

In [7]:
from imbed import fullpath_factory

save_to_rootdir = '/Users/thorwhalen/Dropbox/py/proj/t/sung_content'
save_fullpath_of = fullpath_factory(save_to_rootdir)

In [None]:
from sung import Playlist

for name, info in spotify_playlists.items():
    playlist = Playlist(info["url"])
    df = playlist.data
    df.to_parquet(save_fullpath_of(f'parquet/{name}.parquet'))


In [34]:
# We then saved these parquet files to https://github.com/thorwhalen/sung_content/, 
# and here we add the raw github url to the info of each playlist, so they can be accessed directly
from sung.tools import raw_github_url

for name, info in spotify_playlists.items():
    info['table_url'] = raw_github_url(f'parquet/{name}.parquet')
    
from pprint import pprint

spotify_playlists

{'top_500_last_decade': {'url': 'https://open.spotify.com/playlist/7xm0JS2hGoHn7Svr9hsQkw',
  'description': 'A collection of 508 tracks representing the most popular songs from the last decade.',
  'author': 'alex_matier',
  'title': 'Top 500 of the Last Decade',
  'table_url': 'https://raw.githubusercontent.com/thorwhalen/sung_content/main/parquet/top_500_last_decade.parquet'},
 'rolling_stone_500_greatest_songs': {'url': 'https://open.spotify.com/playlist/7EAqBCOVkDZcbccjxZmgjp',
  'description': "A playlist featuring 584 songs from Rolling Stone's 2021 updated list of the 500 greatest songs of all time.",
  'author': 'Henrik B. Hansen',
  'title': 'Rolling Stone Magazine - 500 Greatest Songs of All Time (2021)',
  'table_url': 'https://raw.githubusercontent.com/thorwhalen/sung_content/main/parquet/rolling_stone_500_greatest_songs.parquet'},
 'spotify_top_500_streamed': {'url': 'https://open.spotify.com/playlist/0JiVp7Z0pYKI8diUV6HJyQ',
  'description': 'A compilation of the 500 mos

In [None]:
from graze import graze


<function graze.base.graze(url: str, rootdir: str = '/Users/thorwhalen/graze', source=<graze.base.Internet object at 0x10d950a60>, *, key_ingress: Optional[Callable] = None, max_age: Union[int, float, NoneType] = None, return_filepaths: bool = False)>

In [None]:
for name, info in spotify_playlists.items():
    playlist = Playlist(info["url"])
    df = playlist.data
    df.to_parquet(save_fullpath_of(f'parquet/{name}.parquet'))

In [None]:
"""Code to access thorwhalen/sung_content data with ease

Note on requirements:
Minimum:   pip install graze
Optionally (for get_table function): pip install tabled

"""

org, repo, branch = 'thorwhalen/sung_content/main'.split('/')
DFLT_CONTENT_URL = (f'https://raw.githubusercontent.com/{org}/{repo}/{branch}' + '/{}').format  # function returning url of raw content from 


def get_content_bytes(key, max_age=None, *, cache_locally=True, content_url=DFLT_CONTENT_URL):
    """Get bytes of content from `thorwhalen/content`, automatically caching locally.
    
    ```
    # add max_age=1e-6 if you want to update the data with the remote data
    b = get_content_bytes('tables/csv/projects.csv', max_age=None)
    ```
    """
    url = content_url(key)

    if cache_locally:
        from graze import graze
        import os

        if isinstance(cache_locally, str):
            rootdir = cache_locally
            assert os.path.isdir(rootdir), f"cache_locally: {rootdir} is not a directory"
            return graze(url, rootdir, max_age=max_age)
        return graze(url, max_age=max_age)
    else:
        import requests

        return requests.get(url).content


def get_table(key, max_age=None, *, content_url=DFLT_CONTENT_URL, **extra_decoder_kwargs):
    from tabled import get_table as _get_table

    bytes_ = get_content_bytes(key, max_age=max_age, content_url=content_url)
    ext = key.split('.')[-1] if '.' in key else None
    return _get_table(bytes_, ext=ext, **extra_decoder_kwargs)


import sung

In [2]:
df = get_table('parquet/greatest_500_songs.parquet')
df

Unnamed: 0_level_0,name,popularity,explicit,album_name,album_release_date,album_release_year,url,artist_list,first_artist,id,...,energy,instrumentalness,liveness,loudness,speechiness,valence,tempo,key,mode,time_signature
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
3AhXZa8sUQht0UEdBJgpGc,Like a Rolling Stone,69,False,Highway 61 Revisited,1965-08-30,1965,https://open.spotify.com/track/3AhXZa8sUQht0UE...,[Bob Dylan],Bob Dylan,3AhXZa8sUQht0UEdBJgpGc,...,0.721,0.000000,0.1890,-6.839,0.0321,0.557,95.263,0,1,4
57J2znxukXsXzS3XPuZ1TG,(I Can't Get No) Satisfaction - Mono Version /...,0,False,The Rolling Stones Singles Collection: The Lon...,1989-08-15,1989,https://open.spotify.com/track/57J2znxukXsXzS3...,[The Rolling Stones],The Rolling Stones,57J2znxukXsXzS3XPuZ1TG,...,0.882,0.049600,0.1190,-6.763,0.0348,0.921,136.299,2,1,4
0dCj4s4VGxVziN43DHkUUW,What's Going On,0,False,Love Marvin,2010-01-01,2010,https://open.spotify.com/track/0dCj4s4VGxVziN4...,[Marvin Gaye],Marvin Gaye,0dCj4s4VGxVziN43DHkUUW,...,0.684,0.000008,0.3160,-10.802,0.0559,0.800,102.086,1,0,4
7s25THrKz86DM225dOYwnr,Respect,73,False,I Never Loved a Man the Way I Love You,1967-03-10,1967,https://open.spotify.com/track/7s25THrKz86DM22...,[Aretha Franklin],Aretha Franklin,7s25THrKz86DM225dOYwnr,...,0.558,0.000022,0.0546,-5.226,0.0410,0.965,114.950,0,1,4
7pKfPomDEeI4TPT6EOYjn9,Imagine - Remastered 2010,76,False,Imagine,1971-09-09,1971,https://open.spotify.com/track/7pKfPomDEeI4TPT...,[John Lennon],John Lennon,7pKfPomDEeI4TPT6EOYjn9,...,0.257,0.183000,0.0935,-12.358,0.0252,0.169,75.752,0,1,4
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2czBvzOv3TvnyoW7Ozo7fP,The Rising,52,False,The Rising,2002-07-30,2002,https://open.spotify.com/track/2czBvzOv3TvnyoW...,[Bruce Springsteen],Bruce Springsteen,2czBvzOv3TvnyoW7Ozo7fP,...,0.761,0.000000,0.0895,-6.381,0.0324,0.326,110.186,10,1,4
771L440a1283WimX8EzkpG,Miss You,0,False,GRRR! (Deluxe Version),2012-01-01,2012,https://open.spotify.com/track/771L440a1283Wim...,[The Rolling Stones],The Rolling Stones,771L440a1283WimX8EzkpG,...,0.558,0.022200,0.0605,-7.427,0.0306,0.961,109.988,9,0,4
5XkVQZWvsVwedk0Jv54SVH,Buddy Holly,0,False,Weezer (Deluxe Edition),1994-05-10,1994,https://open.spotify.com/track/5XkVQZWvsVwedk0...,[Weezer],Weezer,5XkVQZWvsVwedk0Jv54SVH,...,0.932,0.000020,0.1050,-4.110,0.0436,0.755,121.004,8,1,4
66gcFTSvJmIkh2hVSJFYmh,Shop Around,0,False,Hi... We're the Miracles,1961-01-01,1961,https://open.spotify.com/track/66gcFTSvJmIkh2h...,[Smokey Robinson & The Miracles],Smokey Robinson & The Miracles,66gcFTSvJmIkh2hVSJFYmh,...,0.701,0.000000,0.7010,-6.501,0.0300,0.916,131.705,7,1,4


In [7]:
from tabled import get_table

t = get_table(b, ext='parquet')
t

Unnamed: 0_level_0,name,popularity,explicit,album_name,album_release_date,album_release_year,url,artist_list,first_artist,id,...,energy,instrumentalness,liveness,loudness,speechiness,valence,tempo,key,mode,time_signature
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
3AhXZa8sUQht0UEdBJgpGc,Like a Rolling Stone,69,False,Highway 61 Revisited,1965-08-30,1965,https://open.spotify.com/track/3AhXZa8sUQht0UE...,[Bob Dylan],Bob Dylan,3AhXZa8sUQht0UEdBJgpGc,...,0.721,0.000000,0.1890,-6.839,0.0321,0.557,95.263,0,1,4
57J2znxukXsXzS3XPuZ1TG,(I Can't Get No) Satisfaction - Mono Version /...,0,False,The Rolling Stones Singles Collection: The Lon...,1989-08-15,1989,https://open.spotify.com/track/57J2znxukXsXzS3...,[The Rolling Stones],The Rolling Stones,57J2znxukXsXzS3XPuZ1TG,...,0.882,0.049600,0.1190,-6.763,0.0348,0.921,136.299,2,1,4
0dCj4s4VGxVziN43DHkUUW,What's Going On,0,False,Love Marvin,2010-01-01,2010,https://open.spotify.com/track/0dCj4s4VGxVziN4...,[Marvin Gaye],Marvin Gaye,0dCj4s4VGxVziN43DHkUUW,...,0.684,0.000008,0.3160,-10.802,0.0559,0.800,102.086,1,0,4
7s25THrKz86DM225dOYwnr,Respect,73,False,I Never Loved a Man the Way I Love You,1967-03-10,1967,https://open.spotify.com/track/7s25THrKz86DM22...,[Aretha Franklin],Aretha Franklin,7s25THrKz86DM225dOYwnr,...,0.558,0.000022,0.0546,-5.226,0.0410,0.965,114.950,0,1,4
7pKfPomDEeI4TPT6EOYjn9,Imagine - Remastered 2010,76,False,Imagine,1971-09-09,1971,https://open.spotify.com/track/7pKfPomDEeI4TPT...,[John Lennon],John Lennon,7pKfPomDEeI4TPT6EOYjn9,...,0.257,0.183000,0.0935,-12.358,0.0252,0.169,75.752,0,1,4
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2czBvzOv3TvnyoW7Ozo7fP,The Rising,52,False,The Rising,2002-07-30,2002,https://open.spotify.com/track/2czBvzOv3TvnyoW...,[Bruce Springsteen],Bruce Springsteen,2czBvzOv3TvnyoW7Ozo7fP,...,0.761,0.000000,0.0895,-6.381,0.0324,0.326,110.186,10,1,4
771L440a1283WimX8EzkpG,Miss You,0,False,GRRR! (Deluxe Version),2012-01-01,2012,https://open.spotify.com/track/771L440a1283Wim...,[The Rolling Stones],The Rolling Stones,771L440a1283WimX8EzkpG,...,0.558,0.022200,0.0605,-7.427,0.0306,0.961,109.988,9,0,4
5XkVQZWvsVwedk0Jv54SVH,Buddy Holly,0,False,Weezer (Deluxe Edition),1994-05-10,1994,https://open.spotify.com/track/5XkVQZWvsVwedk0...,[Weezer],Weezer,5XkVQZWvsVwedk0Jv54SVH,...,0.932,0.000020,0.1050,-4.110,0.0436,0.755,121.004,8,1,4
66gcFTSvJmIkh2hVSJFYmh,Shop Around,0,False,Hi... We're the Miracles,1961-01-01,1961,https://open.spotify.com/track/66gcFTSvJmIkh2h...,[Smokey Robinson & The Miracles],Smokey Robinson & The Miracles,66gcFTSvJmIkh2hVSJFYmh,...,0.701,0.000000,0.7010,-6.501,0.0300,0.916,131.705,7,1,4


In [None]:
t = get_table('parquet/greatest_500_songs.parquet')


In [22]:
t

Unnamed: 0_level_0,name,popularity,explicit,album_name,album_release_date,album_release_year,url,artist_list,first_artist,id,...,energy,instrumentalness,liveness,loudness,speechiness,valence,tempo,key,mode,time_signature
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
3AhXZa8sUQht0UEdBJgpGc,Like a Rolling Stone,69,False,Highway 61 Revisited,1965-08-30,1965,https://open.spotify.com/track/3AhXZa8sUQht0UE...,[Bob Dylan],Bob Dylan,3AhXZa8sUQht0UEdBJgpGc,...,0.721,0.000000,0.1890,-6.839,0.0321,0.557,95.263,0,1,4
57J2znxukXsXzS3XPuZ1TG,(I Can't Get No) Satisfaction - Mono Version /...,0,False,The Rolling Stones Singles Collection: The Lon...,1989-08-15,1989,https://open.spotify.com/track/57J2znxukXsXzS3...,[The Rolling Stones],The Rolling Stones,57J2znxukXsXzS3XPuZ1TG,...,0.882,0.049600,0.1190,-6.763,0.0348,0.921,136.299,2,1,4
0dCj4s4VGxVziN43DHkUUW,What's Going On,0,False,Love Marvin,2010-01-01,2010,https://open.spotify.com/track/0dCj4s4VGxVziN4...,[Marvin Gaye],Marvin Gaye,0dCj4s4VGxVziN43DHkUUW,...,0.684,0.000008,0.3160,-10.802,0.0559,0.800,102.086,1,0,4
7s25THrKz86DM225dOYwnr,Respect,73,False,I Never Loved a Man the Way I Love You,1967-03-10,1967,https://open.spotify.com/track/7s25THrKz86DM22...,[Aretha Franklin],Aretha Franklin,7s25THrKz86DM225dOYwnr,...,0.558,0.000022,0.0546,-5.226,0.0410,0.965,114.950,0,1,4
7pKfPomDEeI4TPT6EOYjn9,Imagine - Remastered 2010,76,False,Imagine,1971-09-09,1971,https://open.spotify.com/track/7pKfPomDEeI4TPT...,[John Lennon],John Lennon,7pKfPomDEeI4TPT6EOYjn9,...,0.257,0.183000,0.0935,-12.358,0.0252,0.169,75.752,0,1,4
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2czBvzOv3TvnyoW7Ozo7fP,The Rising,52,False,The Rising,2002-07-30,2002,https://open.spotify.com/track/2czBvzOv3TvnyoW...,[Bruce Springsteen],Bruce Springsteen,2czBvzOv3TvnyoW7Ozo7fP,...,0.761,0.000000,0.0895,-6.381,0.0324,0.326,110.186,10,1,4
771L440a1283WimX8EzkpG,Miss You,0,False,GRRR! (Deluxe Version),2012-01-01,2012,https://open.spotify.com/track/771L440a1283Wim...,[The Rolling Stones],The Rolling Stones,771L440a1283WimX8EzkpG,...,0.558,0.022200,0.0605,-7.427,0.0306,0.961,109.988,9,0,4
5XkVQZWvsVwedk0Jv54SVH,Buddy Holly,0,False,Weezer (Deluxe Edition),1994-05-10,1994,https://open.spotify.com/track/5XkVQZWvsVwedk0...,[Weezer],Weezer,5XkVQZWvsVwedk0Jv54SVH,...,0.932,0.000020,0.1050,-4.110,0.0436,0.755,121.004,8,1,4
66gcFTSvJmIkh2hVSJFYmh,Shop Around,0,False,Hi... We're the Miracles,1961-01-01,1961,https://open.spotify.com/track/66gcFTSvJmIkh2h...,[Smokey Robinson & The Miracles],Smokey Robinson & The Miracles,66gcFTSvJmIkh2hVSJFYmh,...,0.701,0.000000,0.7010,-6.501,0.0300,0.916,131.705,7,1,4


ParserError: Error tokenizing data. C error: Expected 1 fields in line 4, saw 3
