### References I used to get data:

Code:
https://github.com/sam-brady/spotify-podcasts/blob/master/Spotify%20Podcasts.ipynb
https://medium.com/web-mining-is688-spring-2021/preliminary-data-analysis-on-spotify-data-using-api-a84bb0aae00c

Documentation:
https://spotipy.readthedocs.io/en/2.19.0/#spotipy.client.Spotify.audio_features
https://developer.spotify.com/documentation/web-api/reference/#/operations/get-a-shows-episodes
https://developer.spotify.com/console/get-search-item/
https://developer.spotify.com/community/news/2020/03/20/introducing-podcasts-api/

In [98]:
import requests
import pickle

In [5]:
# pip install spotipy --upgrade

Collecting spotipy
  Using cached spotipy-2.19.0-py3-none-any.whl (27 kB)
Installing collected packages: spotipy
Successfully installed spotipy-2.19.0
Note: you may need to restart the kernel to use updated packages.


In [6]:
import spotipy
from spotipy.oauth2 import SpotifyClientCredentials

In [7]:
import pandas as pd
import numpy as np


In [None]:
#set temporary environment variables
#export SPOTIPY_CLIENT_ID='...'
#export SPOTIPY_CLIENT_SECRET='...'

In [10]:
auth_manager = SpotifyClientCredentials()
sp = spotipy.Spotify(auth_manager=auth_manager)

### Getting list of Spotify Show IDs for podcasts related to "mental health"

In [23]:
show_ids = []
show_names = []
show_descriptions = []

limit = 50
offset = 0
q = "mental%20health"          
                        

In [24]:
while offset <= 300:
    results = sp.search(q, limit, offset, type='show', market='US')['shows']['items']
    for i in range(len(results)):
        show_ids.append(results[i]['id'])
        show_names.append(results[i]['name'])
        show_descriptions.append(results[i]['description'])
    offset += 50   

In [25]:
mh_podcasts = pd.DataFrame()

mh_podcasts['Podcast_Name'] = show_names
mh_podcasts['Podcast_ShowID'] = show_ids
mh_podcasts['Podcast_Description'] = show_descriptions

In [37]:
mh_podcasts.Podcast_Description[100]

'Gen-Z hosts Bennett Scheer and Lorraine Affigne are on a mission to make mental health mainstream. They talk about things that anyone who’s struggled can relate to. They also welcome friends and other guests for unique conversations about mental health. From the pressures of living in the world of social media, to coping with the challenges of living in 2022, each host and every guest has a fresh perspective. No matter what you’re going through, Mental Health on the Mic is here for you. Be sure to check out our website: https://www.mentalhealthonthemic.com and follow us on Instagram and TikTok @mentalhealthonthemic for more content!!'

In [34]:
mh_podcasts['Podcast_Name']

0      (2020) Mental Health Explained | Created By Yo...
1      Being African American in 2021 and dealing wit...
2                                  Aubrey Marcus Podcast
3             Unfazed and Unbothered with Tasia and Camo
4                                       Barbell Shrugged
                             ...                        
345               Happy and Healthy Mind with Dr. Rozina
346                                          NAH Podcast
347                                   Healthcare Insight
348              Mental Health Education in High Schools
349                     Prabhat Ranjan Sarkar Discourses
Name: Podcast_Name, Length: 350, dtype: object

In [100]:
mh_podcasts.to_csv('just_podcasts.csv')

In [101]:
mh_podcasts.to_pickle('just_podcasts.pkl')

### Getting episodes and their descriptions for each show in show_list

In [57]:
show_id_list = [] #repeats to keep track
ep_ids = []
ep_dates = []
ep_names = []
ep_desc = []

In [66]:
for show in show_ids:
    limit = 50
    offset = 0
    more = 1
    counter = 0
    
    while ((offset<= 950) & (counter <= more)):
        response = sp.show_episodes(show, limit, offset, market = 'US')
        
        offset += 50
        counter += 1
        
        if next(iter(response)) != 'error':
            for i in range(len(response['items'])):
                show_id_list.append(show)
                ep_ids.append(response['items'][i]['id'])
                ep_names.append(response['items'][i]['name'])
                ep_dates.append(response['items'][i]['release_date'])
                ep_desc.append(response['items'][i]['description'])
            more = (response['total'] // 50)
        else:
            offset = 100000
            print(response, 'for show id:', show)
            

In [67]:
len(ep_ids)

20745

In [68]:
len(ep_names)

20745

In [69]:
len(ep_dates)

20745

In [70]:
len(show_id_list)

20745

In [71]:
all_eps = pd.DataFrame()

all_eps['Show_id'] = show_id_list
all_eps['Ep_id'] = ep_ids
all_eps['Ep_name'] = ep_names
all_eps['Ep_date'] = ep_dates
all_eps['Ep_desc'] = ep_desc

In [73]:
ep_counts = all_eps['Show_id'].value_counts()

In [74]:
few_eps = list(ep_counts[ep_counts < 10].index)

In [76]:
len(few_eps)
#111 podcasts have less than 10 eps

111

#### Replace show_id in dataframe with show name

In [85]:
#creating dictionary of show ids and show names

show_dict = mh_podcasts.groupby('Podcast_ShowID')['Podcast_Name'].apply(list).to_dict()

In [86]:
#map dictionary to show_id in full dataframe
all_eps['Show_id'] = all_eps['Show_id'].map(show_dict)

In [88]:
#renaming Show_id column to Podcast_Name
all_eps.rename(columns = {'Show_id': 'Podcast_Name'}, inplace=True)

In [94]:
#removing brackets
all_eps['Podcast_Name'] = all_eps['Podcast_Name'].str[0]

In [95]:
all_eps

Unnamed: 0,Podcast_Name,Ep_id,Ep_name,Ep_date,Ep_desc
0,(2020) Mental Health Explained | Created By Yo...,10JraOKEu4gb2dKQEwjhmm,Depression and Tics During Quarantine,2020-12-16,This episode helps explain the effects of quar...
1,Being African American in 2021 and dealing wit...,4Vs1ajXhg5t53zHNDpM3wu,Chipping away at the mental health stigma,2021-10-11,The Black community has made enormous contribu...
2,Being African American in 2021 and dealing wit...,6jFW6wq6Pafs0OLAlHVNRh,Being black in America in 2021,2021-10-08,With love for seven addressing mental health i...
3,Being African American in 2021 and dealing wit...,4F5RugIvvmb8uI5fDqPmhz,Surviving a Narcissistic breakup : The Fear an...,2020-12-12,Moving on and healing from an narcissistic -...
4,Being African American in 2021 and dealing wit...,4eEe5dXg47re6BjpeyZdPx,Love and mental health 2020,2020-12-09,"Love - relationship, mental health and parenti..."
...,...,...,...,...,...
20740,Prabhat Ranjan Sarkar Discourses,25in5tuCJCdjRhuB1AcV3P,Be Free From All Complexes,2021-04-21,By PR Sarkar founder of Ananda MargaDiscourse ...
20741,Prabhat Ranjan Sarkar Discourses,0QUC2IV4S5jlzixvlFsiGZ,Bad Habits Which Should Be Given Up,2021-04-05,By PR Sarkar founder of Ananda MargaDiscourse ...
20742,Prabhat Ranjan Sarkar Discourses,0MbPRjRYeYAsViwMaYuLKx,Ananda Marga A Revolution,2021-04-02,By PR Sarkar founder of Ananda MargaPublished ...
20743,Prabhat Ranjan Sarkar Discourses,65Rh9qdEJA4N8Ug2jvJOMK,An Exemplary Life,2021-03-29,Discourse given by Prabhat Ranjan Sarkar on:Ja...


In [96]:
#save df to csv
all_eps.to_csv('mh_podcasts.csv')

In [97]:
#save df to pickle
all_eps.to_pickle('mh_podcasts.pkl')

#### Other (testing out):
    

In [11]:
def get_ep_details(ep_id, market='US'):
    info = sp.episode(ep_id, market)
    
    name = info['name']
    desc = info['description']
    date = info['release_date']
    
    ep_info = [name, desc, date]
    return ep_info

In [12]:
get_ep_details('0q1cmniEqKWbvrpvDHAxr7')


['SDS 367: Building Data Pipelines for COVID-19 Modeling',
 'Samuel Hinton joins us again for an important and timely discussion on data pipelines and the work he’s doing to aid research on COVID-19 with the COVID-19 Critical Care Consortium. We also talk about his new online courses and his continued research into dark matter.In this episode you will learn:• Sam’s current work and COVID-19 Critical Care Consortium [4:22]• The COVID data science pipeline and workflow [12:50]• Sam’s second online course [36:22]• Bayesian inference [43:06]• Sam at DSGO Virtual [53:30]• Sam’s work on dark matter [1:01:25]• What is Sam reading right now? [1:09:14]Additional materials: www.superdatascience.com/367',
 '2020-05-20']