# Podcast as Alarm Clock?

I listen to NPR's Up First every morning to catch up on current events. It's thorough, but concise. I think it'd be nice to wake up to the NPR team discussing what's going on in the world.

Surprisingly it seems that this isn't a feature readily available on iOS, so I thought I'd turn to scripting it with Python.

## First Steps

I wasn't really sure where to begin as most of my Python knowledge and skills lies with using pandas and grabbing data from SQL databases. This seemed like it would require some external libraries, the NPR API, and some web scrapping. 

### RSS Feeds

I recalled using RSS feeds a long time ago to track websites I liked, but hadn't used them in a while. So, I googled NPR Up First RSS feed and found [this](https://www.npr.org/rss/podcast.php?id=510318). 

    https://www.npr.org/rss/podcast.php?id=510318
    
Seems most NPR podcasts are separated by a 6 digit ID. Here are some others:

    Planet Money: https://www.npr.org/rss/podcast.php?id=510289
    
    Invisibilia: https://www.npr.org/rss/podcast.php?id=510307
    
There's a place to start!

Through some trial and error I was able to get a list of 24 podcasts that use this 6 digit ID.

In [1]:
# let's import an RSS feed package
import feedparser

In [2]:
# we'll start by parsing Up First and see what we get
feed = feedparser.parse('https://www.npr.org/rss/podcast.php?id=510318')
print(feed)

{'feed': {'title': 'Up First', 'title_detail': {'type': 'text/plain', 'language': None, 'base': 'https://www.npr.org/rss/podcast.php?id=510318', 'value': 'Up First'}, 'links': [{'rel': 'alternate', 'type': 'text/html', 'href': 'http://www.npr.org/programs/morning-edition/'}], 'link': 'http://www.npr.org/programs/morning-edition/', 'subtitle': "NPR's Up First is the news you need to start your day. The biggest stories and ideas — from politics to pop culture — in 10 minutes. Hosted by Rachel Martin, David Greene and Steve Inskeep, with reporting and analysis from NPR News. Available weekdays by 6 a.m. ET. Subscribe and listen, then support your local NPR station at donate.npr.org.", 'subtitle_detail': {'type': 'text/html', 'language': None, 'base': 'https://www.npr.org/rss/podcast.php?id=510318', 'value': "NPR's Up First is the news you need to start your day. The biggest stories and ideas — from politics to pop culture — in 10 minutes. Hosted by Rachel Martin, David Greene and Steve In

### RSS Feed Dictionary

So, there's a lot to unpack here...

The feedparser.parse function returns this massive dictionary, so I'll start by isolating the keys of this dictionary to determine, which have the data we want: the release date, show description, and mp3 URL.

We'll also use the json module to pretty print this jumbled mess of data...

In [3]:
feed.keys()

dict_keys(['feed', 'entries', 'bozo', 'headers', 'href', 'status', 'encoding', 'version', 'namespaces'])

In [4]:
import json

In [5]:
print(json.dumps(feed['entries'], sort_keys=True, indent=2))

[
  {
    "author": "NPR",
    "author_detail": {
      "name": "NPR"
    },
    "authors": [
      {
        "name": "NPR"
      }
    ],
    "content": [
      {
        "base": "https://www.npr.org/rss/podcast.php?id=510318",
        "language": null,
        "type": "text/plain",
        "value": "What have police learned about Stephen Paddock's cache of weapons and preparations for the mass shooting he carried out in Las Vegas? And how might that massacre shape the debate around gun control?"
      },
      {
        "base": "https://www.npr.org/rss/podcast.php?id=510318",
        "language": null,
        "type": "text/html",
        "value": "What have police learned about Stephen Paddock's cache of weapons and preparations for the mass shooting he carried out in Las Vegas? And how might that massacre shape the debate around gun control?"
      }
    ],
    "guidislink": true,
    "id": "https://www.npr.org/rss/05e760c0-76d4-49ab-907a-0f949c0f45ff",
    "image": {
      "href": 

In [6]:
print(json.dumps(feed['feed'], sort_keys=True, indent=2))

{
  "author": "NPR (podcasts@npr.org)",
  "author_detail": {
    "email": "podcasts@npr.org",
    "name": "NPR"
  },
  "authors": [
    {
      "email": "podcasts@npr.org",
      "name": "NPR"
    }
  ],
  "generator": "NPR API RSS Generator 0.94",
  "generator_detail": {
    "name": "NPR API RSS Generator 0.94"
  },
  "image": {
    "href": "https://media.npr.org/assets/img/2017/03/21/upfirst_sq-ffcb53c89446b62b66fefb97b9356ad49b31bc5d.png?s=200",
    "link": "http://www.npr.org/programs/morning-edition/",
    "links": [
      {
        "href": "http://www.npr.org/programs/morning-edition/",
        "rel": "alternate",
        "type": "text/html"
      }
    ],
    "title": "Up First",
    "title_detail": {
      "base": "https://www.npr.org/rss/podcast.php?id=510318",
      "language": null,
      "type": "text/plain",
      "value": "Up First"
    }
  },
  "itunes_block": 0,
  "language": "en-us",
  "link": "http://www.npr.org/programs/morning-edition/",
  "links": [
    {
      "hr

## Data Uncovered

So, the 'entries' and 'feed' keys provide us with all the data that we need. Both the 'entries' and 'feed' keys possess dictionary values of their own. Within those value dictionaries we uncover podcast name, release date, show description and mp3 URL

Using list indexing we are grabbing the first [0] element as it will always be the most recent.

In [7]:
# podcast name
print(feed['feed']['title'])

Up First


In [8]:
# release date
print(feed['feed']['updated'])

Wed, 04 Oct 2017 05:45:00 -0400


In [9]:
# show description
print(feed['entries'][0]['content'][0]['value'])

What have police learned about Stephen Paddock's cache of weapons and preparations for the mass shooting he carried out in Las Vegas? And how might that massacre shape the debate around gun control?


In [10]:
# mp3 URL
print(feed['entries'][0]['links'][0]['href'])

https://play.podtrac.com/npr-510318/npr.mc.tritondigital.com/NPR_510318/media/anon.npr-mp3/npr/upfirst/2017/10/20171004_upfirst_100417upfirst.mp3?orgId=1&d=762&p=510318&story=555523076&t=podcast&e=555523076&ft=pod&f=510318


## Downloading Data

Now we can use the urllib package to retrieve that mp3 podcast file and we can simply store the text values as variables to keep records and/or use as filenames

In [11]:
import urllib

In [12]:
podcast_url = feed['entries'][0]['links'][0]['href']

In [13]:
urllib.request.urlretrieve(podcast_url, 'upfirst_10_04_2017.mp3')

('upfirst_10_04_2017.mp3', <http.client.HTTPMessage at 0x7f64973f3780>)

## Other RSS NPR Podcasts

Earlier, I mentioned that I was able to get 24 other podcast IDs. Let's see how that works. 

I noticed that the podcasts on the [NPR directory site](http://www.npr.org/podcasts/) displayed a '5103##' id so we'll let Python do the work and find some IDs and podcasts.

In [14]:
# create an empty dict to store IDs and podcast names
npr_podcasts = {}

for i in range(510300, 510400):
    try:
        feed = feedparser.parse('https://www.npr.org/rss/podcast.php?id={}'.format(str(i)))
        # store the podcast name as the key, as it's easier to search
        npr_podcasts[feed.feed.title] = i
    except:
        pass
print(npr_podcasts)

{'Cabinet of Wonders': 510300, 'Crosscurrents': 510301, 'Barbershop from Tell Me More': 510302, 'How To Do Everything': 510303, 'Song Travels Express': 510304, 'Alt.Latino': 510305, 'Tiny Desk Concerts - Audio': 510306, 'Invisibilia': 510307, 'Hidden Brain': 510308, 'Bullseye with Jesse Thorn': 510309, 'NPR Politics Podcast': 510310, 'Embedded': 510311, 'Code Switch': 510312, 'How I Built This with Guy Raz': 510313, 'The Big Listen': 510314, 'Radio Ambulante': 510315, '1A': 510316, "It's Been a Minute with Sam Sanders": 510317, 'Up First': 510318, 'NPR News Now': 510320, 'Wow in the World': 510321, 'Live from the Poundstone Institute': 510322, "What's Good with Stretch & Bobbito": 510323, 'Rough Translation': 510324}


## Next Steps: Playing mp3 as Alarm

So, I've figured out how to find and extract the details I want from the NPR site. Now, to figure out how to make the mp3 play on my computer...

Setting up the alarm clock will be easy with cron. 