# Working with RSS Feeds Lab

Complete the following set of exercises to solidify your knowledge of parsing RSS feeds and extracting information from them.

In [1]:
import feedparser

### 1. Use feedparser to parse the following RSS feed URL.

In [2]:
url = 'http://feeds.feedburner.com/oreilly/radar/atom'

In [3]:
feed = feedparser.parse(url)

### 2. Obtain a list of components (keys) that are available for this feed.

In [6]:
feed.keys()

dict_keys(['feed', 'entries', 'bozo', 'headers', 'etag', 'updated', 'updated_parsed', 'href', 'status', 'encoding', 'version', 'namespaces'])

### 3. Obtain a list of components (keys) that are available for the *feed* component of this RSS feed.

In [11]:
feed.feed.keys()

dict_keys(['title', 'title_detail', 'links', 'link', 'subtitle', 'subtitle_detail', 'updated', 'updated_parsed', 'language', 'sy_updateperiod', 'sy_updatefrequency', 'generator_detail', 'generator', 'feedburner_info', 'geo_lat', 'geo_long', 'feedburner_emailserviceid', 'feedburner_feedburnerhostname'])

### 4. Extract and print the feed title, subtitle, author, and link.

In [None]:
print('Title = ', feed.feed.title)
print('Subtitle = ', feed.feed.subtitle)
print('Author = ', feed.feed.feedburner_feedburnerhostname)
print('Link = ', feed.feed.link)

### 5. Count the number of entries that are contained in this RSS feed.

In [14]:
len(feed.entries)

18

### 6. Obtain a list of components (keys) available for an entry.

*Hint: Remember to index first before requesting the keys*

In [22]:
len(feed.entries[0].keys())

19

### 7. Extract a list of entry titles.

In [27]:
titles = [dicts['title'] for dicts in feed.entries]
titles

['It’s important to cultivate your organization’s collective genius',
 'Four short links: 5 November 2019',
 'Four short links: 4 November 2019',
 'Quantum computing’s potential is still far off, but quantum supremacy shows we’re on the right track',
 'Four short links: 1 November 2019',
 'Highlights from TensorFlow World in Santa Clara, California 2019',
 'Sticker recommendations and AI-driven innovations on the Hike messaging platform',
 '“Human error”: How can we help people build models that do what they expect',
 'Personalization of Spotify Home and TensorFlow',
 'TensorFlow.js: Bringing machine learning to JavaScript',
 'TFX: An end-to-end ML platform for everyone',
 'MLIR: Accelerating AI',
 'TensorFlow Hub: The platform to share and discover pretrained models for TensorFlow',
 'TensorFlow Lite: ML for mobile and IoT devices',
 'Four short links: 31 October 2019',
 'Accelerating ML at Twitter',
 'The latest from TensorFlow',
 'TensorFlow World 2019 opening keynote']

### 8. Calculate the percentage of "Four short links" entry titles.

In [31]:
n = len([t for t in titles if "Four short links" in t])

round(n/len(feed.entries[0].keys()) * 100, 2)

21.05

### 9. Create a Pandas data frame from the feed's entries.

In [33]:
import pandas as pd

In [105]:
ds = pd.DataFrame(feed.entries)
ds.head()

Unnamed: 0,title,title_detail,links,link,comments,published,published_parsed,authors,author,author_detail,tags,id,guidislink,summary,summary_detail,content,wfw_commentrss,slash_comments,feedburner_origlink
0,It’s important to cultivate your organization’...,"{'type': 'text/plain', 'language': None, 'base...","[{'rel': 'alternate', 'type': 'text/html', 'hr...",http://feedproxy.google.com/~r/oreilly/radar/a...,https://www.oreilly.com/radar/its-important-to...,"Tue, 05 Nov 2019 05:05:36 +0000","(2019, 11, 5, 5, 5, 36, 1, 309, 0)",[{'name': 'Jenn Webb'}],Jenn Webb,{'name': 'Jenn Webb'},"[{'term': 'Future of the Firm', 'scheme': None...",https://www.oreilly.com/radar/?p=10231,False,In this interview from O&#8217;Reilly Foo Camp...,"{'type': 'text/html', 'language': None, 'base'...","[{'type': 'text/html', 'language': None, 'base...",https://www.oreilly.com/radar/its-important-to...,0,https://www.oreilly.com/radar/its-important-to...
1,Four short links: 5 November 2019,"{'type': 'text/plain', 'language': None, 'base...","[{'rel': 'alternate', 'type': 'text/html', 'hr...",http://feedproxy.google.com/~r/oreilly/radar/a...,https://www.oreilly.com/radar/four-short-links...,"Tue, 05 Nov 2019 05:01:13 +0000","(2019, 11, 5, 5, 1, 13, 1, 309, 0)",[{'name': 'Nat Torkington'}],Nat Torkington,{'name': 'Nat Torkington'},"[{'term': 'Four Short Links', 'scheme': None, ...",https://www.oreilly.com/radar/?p=10644,False,&#8220;Nearly All&#8221; Counter-Strike Microt...,"{'type': 'text/html', 'language': None, 'base'...","[{'type': 'text/html', 'language': None, 'base...",https://www.oreilly.com/radar/four-short-links...,0,https://www.oreilly.com/radar/four-short-links...
2,Four short links: 4 November 2019,"{'type': 'text/plain', 'language': None, 'base...","[{'rel': 'alternate', 'type': 'text/html', 'hr...",http://feedproxy.google.com/~r/oreilly/radar/a...,https://www.oreilly.com/radar/four-short-links...,"Mon, 04 Nov 2019 05:01:01 +0000","(2019, 11, 4, 5, 1, 1, 0, 308, 0)",[{'name': 'Nat Torkington'}],Nat Torkington,{'name': 'Nat Torkington'},"[{'term': 'Four Short Links', 'scheme': None, ...",https://www.oreilly.com/radar/?p=10612,False,Beyond Bots and Trolls: Understanding Disinfor...,"{'type': 'text/html', 'language': None, 'base'...","[{'type': 'text/html', 'language': None, 'base...",https://www.oreilly.com/radar/four-short-links...,0,https://www.oreilly.com/radar/four-short-links...
3,Quantum computing’s potential is still far off...,"{'type': 'text/plain', 'language': None, 'base...","[{'rel': 'alternate', 'type': 'text/html', 'hr...",http://feedproxy.google.com/~r/oreilly/radar/a...,https://www.oreilly.com/radar/quantum-computin...,"Fri, 01 Nov 2019 04:05:34 +0000","(2019, 11, 1, 4, 5, 34, 4, 305, 0)",[{'name': 'Mike Loukides'}],Mike Loukides,{'name': 'Mike Loukides'},"[{'term': 'Innovation & Disruption', 'scheme':...",https://www.oreilly.com/radar/?p=10154,False,One of the most exciting topics we’ve been fol...,"{'type': 'text/html', 'language': None, 'base'...","[{'type': 'text/html', 'language': None, 'base...",https://www.oreilly.com/radar/quantum-computin...,0,https://www.oreilly.com/radar/quantum-computin...
4,Four short links: 1 November 2019,"{'type': 'text/plain', 'language': None, 'base...","[{'rel': 'alternate', 'type': 'text/html', 'hr...",http://feedproxy.google.com/~r/oreilly/radar/a...,https://www.oreilly.com/radar/four-short-links...,"Fri, 01 Nov 2019 04:01:53 +0000","(2019, 11, 1, 4, 1, 53, 4, 305, 0)",[{'name': 'Nat Torkington'}],Nat Torkington,{'name': 'Nat Torkington'},"[{'term': 'Four Short Links', 'scheme': None, ...",https://www.oreilly.com/radar/?p=10586,False,Vortimo &#8212; software that organizes inform...,"{'type': 'text/html', 'language': None, 'base'...","[{'type': 'text/html', 'language': None, 'base...",https://www.oreilly.com/radar/four-short-links...,0,https://www.oreilly.com/radar/four-short-links...


### 10. Count the number of entries per author and sort them in descending order.

In [108]:
da = ds.groupby(['author']).count()
#df = ds[['author', 'title']]
da['title'].sort_values(ascending=False)

author
Nat Torkington                              4
Tony Jebara                                 1
Theodore Summe                              1
Sandeep Gupta and Joseph Paul Cohen         1
Mike Loukides                               1
Mike Liang                                  1
Megan Kacholia                              1
Mac Slocum                                  1
Konstantinos Katsiapis and Anusha Ramesh    1
Jenn Webb                                   1
Jeff Dean                                   1
Jared Duke and Sarah Sirajuddin             1
Chris Lattner and Tatiana Shpeisman         1
Anna Roth                                   1
Ankur Narang                                1
Name: title, dtype: int64

### 11. Add a new column to the data frame that contains the length (number of characters) of each entry title. Return a data frame that contains the title, author, and title length of each entry in descending order (longest title length at the top).

In [136]:
df = ds[['author', 'title']]
dc = df.copy()
tit_len = [len(t) for t in ds['title']]
dc['title length'] = tit_len
dc.sort_values(['title length'], ascending=False)


Unnamed: 0,author,title,title length
3,Mike Loukides,Quantum computing’s potential is still far off...,100
12,Mike Liang,TensorFlow Hub: The platform to share and disc...,83
6,Ankur Narang,Sticker recommendations and AI-driven innovati...,80
7,Anna Roth,“Human error”: How can we help people build mo...,75
0,Jenn Webb,It’s important to cultivate your organization’...,65
5,Mac Slocum,Highlights from TensorFlow World in Santa Clar...,64
9,Sandeep Gupta and Joseph Paul Cohen,TensorFlow.js: Bringing machine learning to Ja...,54
8,Tony Jebara,Personalization of Spotify Home and TensorFlow,46
13,Jared Duke and Sarah Sirajuddin,TensorFlow Lite: ML for mobile and IoT devices,46
10,Konstantinos Katsiapis and Anusha Ramesh,TFX: An end-to-end ML platform for everyone,43


### 12. Create a list of entry titles whose summary includes the phrase "machine learning."

In [141]:
new_ds = ds[['title', 'summary']].copy()
new_ds[new_ds['summary'].str.contains('machine learning')]
# new_ds[filter(lambda x: 'machine learning' in x, new_ds['summary'])]


Unnamed: 0,title,summary
5,Highlights from TensorFlow World in Santa Clar...,People from across the TensorFlow community ca...
