# Working with RSS Feeds Lab

Complete the following set of exercises to solidify your knowledge of parsing RSS feeds and extracting information from them.

In [1]:
import feedparser

### 1. Use feedparser to parse the following RSS feed URL.

In [3]:
url = 'http://feeds.feedburner.com/oreilly/radar/atom'

feed = feedparser.parse(url)

### 2. Obtain a list of components (keys) that are available for this feed.

In [4]:
feed.keys()

dict_keys(['feed', 'entries', 'bozo', 'headers', 'etag', 'updated', 'updated_parsed', 'href', 'status', 'encoding', 'version', 'namespaces'])

### 3. Obtain a list of components (keys) that are available for the *feed* component of this RSS feed.

In [5]:
feed.feed.keys()

dict_keys(['title', 'title_detail', 'links', 'link', 'subtitle', 'subtitle_detail', 'updated', 'updated_parsed', 'language', 'sy_updateperiod', 'sy_updatefrequency', 'generator_detail', 'generator', 'feedburner_info', 'geo_lat', 'geo_long', 'feedburner_emailserviceid', 'feedburner_feedburnerhostname'])

### 4. Extract and print the feed title, subtitle, author, and link.

In [53]:
title = feed.feed.title
subtitle = feed.feed.subtitle
author = feed.feed['generator']
link = feed.href

relevant_facts = [title, subtitle, author, link]
relevant_facts

['Radar',
 'Now, next, and beyond: Tracking need-to-know trends at the intersection of business and technology',
 'https://wordpress.org/?v=5.3.3',
 'http://feeds.feedburner.com/oreilly/radar/atom']

### 5. Count the number of entries that are contained in this RSS feed.

In [51]:
print('There are {} entries contained in tis RSS feed.'.format(len(feed.entries)))

There are 60 entries contained in tis RSS feed.


### 6. Obtain a list of components (keys) available for an entry.

*Hint: Remember to index first before requesting the keys*

In [57]:
feed.entries[0].keys()

dict_keys(['title', 'title_detail', 'links', 'link', 'comments', 'published', 'published_parsed', 'authors', 'author', 'author_detail', 'tags', 'id', 'guidislink', 'summary', 'summary_detail', 'content', 'wfw_commentrss', 'slash_comments', 'feedburner_origlink'])

### 7. Extract a list of entry titles.

In [63]:
entry_titles = [feed.entries[i].title for i in range(len(feed.entries))]
entry_titles

['Four short links: 15 May 2020',
 'Practical Skills for The AI Product Manager',
 'Four short links: 14 May 2020',
 'Four short links: 13 May 2020',
 'Four short links: 12 May 2020',
 'When models are everywhere',
 'Four short links: 11 May 2020',
 'Four short links: 8 May 2020',
 'Radar trends to watch: May 2020',
 'Four short links: 7 May 2020',
 'Four short links: 6 May 2020',
 'Four short links: 5 May 2020',
 'On COBOL',
 'Four short links: 4 May 2020',
 'Four short links: 1 May 2020',
 'Four short links: 30 April 2020',
 'Four short links: 29 April 2020',
 'Four short links: 28 April 2020',
 'Four short links: 27 April 2020',
 'Four short links: 24 April 2020',
 'Four short links: 23 April 2020',
 'How data privacy leader Apple found itself in a data ethics catastrophe',
 'Four short links: 22 April 2020',
 'Four short links: 21 April 2020',
 'Four short links: 20 April 2020',
 'Four short links: 17 April 2020',
 'Four short links: 16 April 2020',
 'Four short links: 15 April 202

### 8. Calculate the percentage of "Four short links" entry titles.

In [70]:
four_short_links = [feed.entries[i].title for i in range(len(feed.entries)) if feed.entries[i].title.startswith('Four')]
percentage_of_four_short_links = len(four_short_links) / len(entry_titles)
print('The percentage of "Four short links" entry titles is {:.2%}'.format(percentage_of_four_short_links))

The percentage of "Four short links" entry titles is 71.67%


### 9. Create a Pandas data frame from the feed's entries.

In [71]:
import pandas as pd

In [72]:
feed_dataframe = pd.DataFrame(feed.entries)
feed_dataframe.head()

Unnamed: 0,title,title_detail,links,link,comments,published,published_parsed,authors,author,author_detail,tags,id,guidislink,summary,summary_detail,content,wfw_commentrss,slash_comments,feedburner_origlink
0,Four short links: 15 May 2020,"{'type': 'text/plain', 'language': None, 'base...","[{'rel': 'alternate', 'type': 'text/html', 'hr...",http://feedproxy.google.com/~r/oreilly/radar/a...,https://www.oreilly.com/radar/four-short-links...,"Fri, 15 May 2020 11:22:50 +0000","(2020, 5, 15, 11, 22, 50, 4, 136, 0)",[{'name': 'Nat Torkington'}],Nat Torkington,{'name': 'Nat Torkington'},"[{'term': 'Four Short Links', 'scheme': None, ...",https://www.oreilly.com/radar/?p=12789,False,Favourite Developer-Efficiency Tips &#8212; Be...,"{'type': 'text/html', 'language': None, 'base'...","[{'type': 'text/html', 'language': None, 'base...",https://www.oreilly.com/radar/four-short-links...,0,https://www.oreilly.com/radar/four-short-links...
1,Practical Skills for The AI Product Manager,"{'type': 'text/plain', 'language': None, 'base...","[{'rel': 'alternate', 'type': 'text/html', 'hr...",http://feedproxy.google.com/~r/oreilly/radar/a...,https://www.oreilly.com/radar/practical-skills...,"Thu, 14 May 2020 12:40:45 +0000","(2020, 5, 14, 12, 40, 45, 3, 135, 0)","[{'name': 'Justin Norman, Peter Skomoroch and ...","Justin Norman, Peter Skomoroch and Mike Loukides","{'name': 'Justin Norman, Peter Skomoroch and M...","[{'term': 'AI & ML', 'scheme': None, 'label': ...",https://www.oreilly.com/radar/?p=12786,False,"In our previous article, What You Need to Know...","{'type': 'text/html', 'language': None, 'base'...","[{'type': 'text/html', 'language': None, 'base...",https://www.oreilly.com/radar/practical-skills...,0,https://www.oreilly.com/radar/practical-skills...
2,Four short links: 14 May 2020,"{'type': 'text/plain', 'language': None, 'base...","[{'rel': 'alternate', 'type': 'text/html', 'hr...",http://feedproxy.google.com/~r/oreilly/radar/a...,https://www.oreilly.com/radar/four-short-links...,"Thu, 14 May 2020 11:28:00 +0000","(2020, 5, 14, 11, 28, 0, 3, 135, 0)",[{'name': 'Nat Torkington'}],Nat Torkington,{'name': 'Nat Torkington'},"[{'term': 'Four Short Links', 'scheme': None, ...",https://www.oreilly.com/radar/?p=12783,False,Malware Toolkit Targetting Airgapped Networks ...,"{'type': 'text/html', 'language': None, 'base'...","[{'type': 'text/html', 'language': None, 'base...",https://www.oreilly.com/radar/four-short-links...,0,https://www.oreilly.com/radar/four-short-links...
3,Four short links: 13 May 2020,"{'type': 'text/plain', 'language': None, 'base...","[{'rel': 'alternate', 'type': 'text/html', 'hr...",http://feedproxy.google.com/~r/oreilly/radar/a...,https://www.oreilly.com/radar/four-short-links...,"Wed, 13 May 2020 10:41:37 +0000","(2020, 5, 13, 10, 41, 37, 2, 134, 0)",[{'name': 'Nat Torkington'}],Nat Torkington,{'name': 'Nat Torkington'},"[{'term': 'Four Short Links', 'scheme': None, ...",https://www.oreilly.com/radar/?p=12779,False,"The Confessions of Marcus Hutchins, the Hacker...","{'type': 'text/html', 'language': None, 'base'...","[{'type': 'text/html', 'language': None, 'base...",https://www.oreilly.com/radar/four-short-links...,0,https://www.oreilly.com/radar/four-short-links...
4,Four short links: 12 May 2020,"{'type': 'text/plain', 'language': None, 'base...","[{'rel': 'alternate', 'type': 'text/html', 'hr...",http://feedproxy.google.com/~r/oreilly/radar/a...,https://www.oreilly.com/radar/four-short-links...,"Tue, 12 May 2020 11:22:23 +0000","(2020, 5, 12, 11, 22, 23, 1, 133, 0)",[{'name': 'Nat Torkington'}],Nat Torkington,{'name': 'Nat Torkington'},"[{'term': 'Four Short Links', 'scheme': None, ...",https://www.oreilly.com/radar/?p=12776,False,flecs &#8212; a Fast and Lightweight ECS (Enti...,"{'type': 'text/html', 'language': None, 'base'...","[{'type': 'text/html', 'language': None, 'base...",https://www.oreilly.com/radar/four-short-links...,0,https://www.oreilly.com/radar/four-short-links...


### 10. Count the number of entries per author and sort them in descending order.

In [73]:
authors = feed_dataframe.groupby('author', as_index = False).agg({'title':'count'})
authors.columns = ['author', 'entries']
authors.sort_values('entries', ascending = False)

Unnamed: 0,author,entries
7,Nat Torkington,43
4,Jenn Webb,4
6,Mike Loukides,4
10,Roger Magoulas and Steve Swoyer,2
0,Cynthia Owens,1
1,Daniel Wu and Mike Loukides,1
2,Hugo Bowne-Anderson,1
3,Hugo Bowne-Anderson and Mike Loukides,1
5,"Justin Norman, Peter Skomoroch and Mike Loukides",1
8,Peter Skomoroch and Mike Loukides,1


### 11. Add a new column to the data frame that contains the length (number of characters) of each entry title. Return a data frame that contains the title, author, and title length of each entry in descending order (longest title length at the top).

In [74]:
feed_dataframe['title_length'] = feed_dataframe['title'].apply(len)
feed_dataframe[['title', 'author', 'title_length']].sort_values('title_length', ascending = False).head()

Unnamed: 0,title,author,title_length
47,Great leaders inspire innovation and creativit...,Jenn Webb,76
21,How data privacy leader Apple found itself in ...,Daniel Wu and Mike Loukides,71
48,Strong leaders forge an intersection of knowle...,Jenn Webb,64
54,It’s an unprecedented crisis: 8 things to do r...,Cynthia Owens,54
41,What you need to know about product management...,Peter Skomoroch and Mike Loukides,53


### 12. Create a list of entry titles whose summary includes the phrase "machine learning."

In [97]:
mls = feed_dataframe[feed_dataframe['summary'].str.contains('machine learning')]
mls[['title', 'author', 'summary']]

Unnamed: 0,title,author,summary
1,Practical Skills for The AI Product Manager,"Justin Norman, Peter Skomoroch and Mike Loukides","In our previous article, What You Need to Know..."
5,When models are everywhere,Hugo Bowne-Anderson and Mike Loukides,You probably interact with fifty to a hundred ...
7,Four short links: 8 May 2020,Nat Torkington,Mathematics for Machine Learning &#8212; We wr...
41,What you need to know about product management...,Peter Skomoroch and Mike Loukides,If you’re already a software product manager (...
