# Working with RSS Feeds Lab

Complete the following set of exercises to solidify your knowledge of parsing RSS feeds and extracting information from them.

In [3]:
import feedparser

### 1. Use feedparser to parse the following RSS feed URL.

In [4]:
url = 'http://feeds.feedburner.com/oreilly/radar/atom'

In [6]:
urlparsed = feedparser.parse(url)

### 2. Obtain a list of components (keys) that are available for this feed.

In [8]:
print(list(urlparsed.keys()))

['feed', 'entries', 'bozo', 'headers', 'etag', 'updated', 'updated_parsed', 'href', 'status', 'encoding', 'version', 'namespaces']


### 3. Obtain a list of components (keys) that are available for the *feed* component of this RSS feed.

In [9]:
print(list(urlparsed.feed.keys()))

['title', 'title_detail', 'links', 'link', 'subtitle', 'subtitle_detail', 'updated', 'updated_parsed', 'language', 'sy_updateperiod', 'sy_updatefrequency', 'generator_detail', 'generator', 'feedburner_info', 'geo_lat', 'geo_long', 'feedburner_emailserviceid', 'feedburner_feedburnerhostname']


### 4. Extract and print the feed title, subtitle, author, and link.

In [11]:
print(urlparsed.feed.title)

Radar


In [12]:
print(urlparsed.feed.subtitle)

Now, next, and beyond: Tracking need-to-know trends at the intersection of business and technology


In [14]:
print(urlparsed.feed.link)

https://www.oreilly.com/radar


### 5. Count the number of entries that are contained in this RSS feed.

In [15]:
print(len(urlparsed.entries))

60


### 6. Obtain a list of components (keys) available for an entry.

*Hint: Remember to index first before requesting the keys*

In [19]:
print(list(urlparsed.entries[5].keys()))

['title', 'title_detail', 'links', 'link', 'comments', 'published', 'published_parsed', 'authors', 'author', 'author_detail', 'tags', 'id', 'guidislink', 'summary', 'summary_detail', 'content', 'wfw_commentrss', 'slash_comments', 'feedburner_origlink']


### 7. Extract a list of entry titles.

In [25]:
parsedtitles = [urlparsed.entries[x].title for x in range(len(urlparsed.entries))]
parsedtitles

['Four short links: 14 February 2020',
 'Four short links: 13 February 2020',
 'The state of data quality in 2020',
 'Four short links: 12 February 2020',
 'Four short links: 11 February 2020',
 'Four short links: 10 February 2020',
 'Four short links: 7 February 2020',
 'Radar trends to watch: February 2020',
 'Four short links: 6 February 2020',
 'Four short links: 5 February 2020',
 'Four short links: 4 February 2020',
 'AI meets operations',
 'Four short links: 3 February 2020',
 'Four short links: 31 January 2020',
 'Four short links: 30 January 2020',
 'Four short links: 29 January 2020',
 'Four short links: 28 January 2020',
 'Four short links: 27 January 2020',
 'Four short links: 24 January 2020',
 'Four short links: 23 January 2020',
 'Four short links: 22 January 2020',
 'Four short links: 21 January 2020',
 'Four short links: 20 January 2020',
 'Four short links: 17 January 2020',
 'Four short links: 16 January 2020',
 'Reinforcement learning for the real world',
 'Four sho

### 8. Calculate the percentage of "Four short links" entry titles.

In [26]:
len(parsedtitles)

60

In [33]:
for i in parsedtitles:
    if 'Four short links' in i:
        fourshortlinks =+ 1
        percentage = (fourshortlinks/len(parsedtitles))*100
percentage

1.6666666666666667

### 9. Create a Pandas data frame from the feed's entries.

In [34]:
import pandas as pd

In [35]:
df = pd.DataFrame(urlparsed.entries)
df.head()

Unnamed: 0,title,title_detail,links,link,comments,published,published_parsed,authors,author,author_detail,tags,id,guidislink,summary,summary_detail,content,wfw_commentrss,slash_comments,feedburner_origlink
0,Four short links: 14 February 2020,"{'type': 'text/plain', 'language': None, 'base...","[{'rel': 'alternate', 'type': 'text/html', 'hr...",http://feedproxy.google.com/~r/oreilly/radar/a...,https://www.oreilly.com/radar/four-short-links...,"Fri, 14 Feb 2020 05:01:00 +0000","(2020, 2, 14, 5, 1, 0, 4, 45, 0)",[{'name': 'Nat Torkington'}],Nat Torkington,{'name': 'Nat Torkington'},"[{'term': 'Four Short Links', 'scheme': None, ...",https://www.oreilly.com/radar/?p=11959,False,ABD &#8212; Course materials for Advanced Bina...,"{'type': 'text/html', 'language': None, 'base'...","[{'type': 'text/html', 'language': None, 'base...",https://www.oreilly.com/radar/four-short-links...,0,https://www.oreilly.com/radar/four-short-links...
1,Four short links: 13 February 2020,"{'type': 'text/plain', 'language': None, 'base...","[{'rel': 'alternate', 'type': 'text/html', 'hr...",http://feedproxy.google.com/~r/oreilly/radar/a...,https://www.oreilly.com/radar/four-short-links...,"Thu, 13 Feb 2020 05:01:00 +0000","(2020, 2, 13, 5, 1, 0, 3, 44, 0)",[{'name': 'Nat Torkington'}],Nat Torkington,{'name': 'Nat Torkington'},"[{'term': 'Four Short Links', 'scheme': None, ...",https://www.oreilly.com/radar/?p=11952,False,Ofcom To Regulate UK Internet &#8212; The regu...,"{'type': 'text/html', 'language': None, 'base'...","[{'type': 'text/html', 'language': None, 'base...",https://www.oreilly.com/radar/four-short-links...,0,https://www.oreilly.com/radar/four-short-links...
2,The state of data quality in 2020,"{'type': 'text/plain', 'language': None, 'base...","[{'rel': 'alternate', 'type': 'text/html', 'hr...",http://feedproxy.google.com/~r/oreilly/radar/a...,https://www.oreilly.com/radar/the-state-of-dat...,"Wed, 12 Feb 2020 06:00:00 +0000","(2020, 2, 12, 6, 0, 0, 2, 43, 0)",[{'name': 'Roger Magoulas and Steve Swoyer'}],Roger Magoulas and Steve Swoyer,{'name': 'Roger Magoulas and Steve Swoyer'},"[{'term': 'AI & ML', 'scheme': None, 'label': ...",https://www.oreilly.com/radar/?p=11549,False,We suspected that data quality was a topic bri...,"{'type': 'text/html', 'language': None, 'base'...","[{'type': 'text/html', 'language': None, 'base...",https://www.oreilly.com/radar/the-state-of-dat...,0,https://www.oreilly.com/radar/the-state-of-dat...
3,Four short links: 12 February 2020,"{'type': 'text/plain', 'language': None, 'base...","[{'rel': 'alternate', 'type': 'text/html', 'hr...",http://feedproxy.google.com/~r/oreilly/radar/a...,https://www.oreilly.com/radar/four-short-links...,"Wed, 12 Feb 2020 05:01:00 +0000","(2020, 2, 12, 5, 1, 0, 2, 43, 0)",[{'name': 'Nat Torkington'}],Nat Torkington,{'name': 'Nat Torkington'},"[{'term': 'Four Short Links', 'scheme': None, ...",https://www.oreilly.com/radar/?p=11936,False,Drafting an Engineering Strategy (Mathias Meye...,"{'type': 'text/html', 'language': None, 'base'...","[{'type': 'text/html', 'language': None, 'base...",https://www.oreilly.com/radar/four-short-links...,0,https://www.oreilly.com/radar/four-short-links...
4,Four short links: 11 February 2020,"{'type': 'text/plain', 'language': None, 'base...","[{'rel': 'alternate', 'type': 'text/html', 'hr...",http://feedproxy.google.com/~r/oreilly/radar/a...,https://www.oreilly.com/radar/four-short-links...,"Tue, 11 Feb 2020 05:01:00 +0000","(2020, 2, 11, 5, 1, 0, 1, 42, 0)",[{'name': 'Nat Torkington'}],Nat Torkington,{'name': 'Nat Torkington'},"[{'term': 'Four Short Links', 'scheme': None, ...",https://www.oreilly.com/radar/?p=11773,False,The Fate of Empires &#8212; 1977 text summariz...,"{'type': 'text/html', 'language': None, 'base'...","[{'type': 'text/html', 'language': None, 'base...",https://www.oreilly.com/radar/four-short-links...,0,https://www.oreilly.com/radar/four-short-links...


### 10. Count the number of entries per author and sort them in descending order.

In [41]:
df['author'].count()

60

In [47]:
entriesxauthor = df.groupby('author')['author'].count()
desc_sort = entriesxauthor.sort_values(ascending = False)
desc_sort

author
Nat Torkington                     47
Mike Loukides                       4
Jenn Webb                           2
                                    2
Zan McQuade and Amanda Quinn        1
Roger Magoulas and Steve Swoyer     1
Roger Magoulas                      1
Patrick Hall and Andrew Burt        1
Alison McCauley                     1
Name: author, dtype: int64

### 11. Add a new column to the data frame that contains the length (number of characters) of each entry title. Return a data frame that contains the title, author, and title length of each entry in descending order (longest title length at the top).

In [48]:
entry_length = [len(x) for x in df['title']]
entry_length

[34,
 34,
 33,
 34,
 34,
 34,
 33,
 36,
 33,
 33,
 33,
 19,
 33,
 33,
 33,
 33,
 33,
 33,
 33,
 33,
 33,
 33,
 33,
 33,
 33,
 41,
 33,
 33,
 46,
 33,
 33,
 35,
 32,
 39,
 32,
 34,
 32,
 22,
 32,
 32,
 32,
 30,
 32,
 34,
 34,
 34,
 34,
 34,
 34,
 34,
 34,
 34,
 45,
 34,
 34,
 63,
 34,
 34,
 59,
 34]

In [50]:
df['entry_length'] = entry_length
df

Unnamed: 0,title,title_detail,links,link,comments,published,published_parsed,authors,author,author_detail,tags,id,guidislink,summary,summary_detail,content,wfw_commentrss,slash_comments,feedburner_origlink,entry_length
0,Four short links: 14 February 2020,"{'type': 'text/plain', 'language': None, 'base...","[{'rel': 'alternate', 'type': 'text/html', 'hr...",http://feedproxy.google.com/~r/oreilly/radar/a...,https://www.oreilly.com/radar/four-short-links...,"Fri, 14 Feb 2020 05:01:00 +0000","(2020, 2, 14, 5, 1, 0, 4, 45, 0)",[{'name': 'Nat Torkington'}],Nat Torkington,{'name': 'Nat Torkington'},"[{'term': 'Four Short Links', 'scheme': None, ...",https://www.oreilly.com/radar/?p=11959,False,ABD &#8212; Course materials for Advanced Bina...,"{'type': 'text/html', 'language': None, 'base'...","[{'type': 'text/html', 'language': None, 'base...",https://www.oreilly.com/radar/four-short-links...,0,https://www.oreilly.com/radar/four-short-links...,34
1,Four short links: 13 February 2020,"{'type': 'text/plain', 'language': None, 'base...","[{'rel': 'alternate', 'type': 'text/html', 'hr...",http://feedproxy.google.com/~r/oreilly/radar/a...,https://www.oreilly.com/radar/four-short-links...,"Thu, 13 Feb 2020 05:01:00 +0000","(2020, 2, 13, 5, 1, 0, 3, 44, 0)",[{'name': 'Nat Torkington'}],Nat Torkington,{'name': 'Nat Torkington'},"[{'term': 'Four Short Links', 'scheme': None, ...",https://www.oreilly.com/radar/?p=11952,False,Ofcom To Regulate UK Internet &#8212; The regu...,"{'type': 'text/html', 'language': None, 'base'...","[{'type': 'text/html', 'language': None, 'base...",https://www.oreilly.com/radar/four-short-links...,0,https://www.oreilly.com/radar/four-short-links...,34
2,The state of data quality in 2020,"{'type': 'text/plain', 'language': None, 'base...","[{'rel': 'alternate', 'type': 'text/html', 'hr...",http://feedproxy.google.com/~r/oreilly/radar/a...,https://www.oreilly.com/radar/the-state-of-dat...,"Wed, 12 Feb 2020 06:00:00 +0000","(2020, 2, 12, 6, 0, 0, 2, 43, 0)",[{'name': 'Roger Magoulas and Steve Swoyer'}],Roger Magoulas and Steve Swoyer,{'name': 'Roger Magoulas and Steve Swoyer'},"[{'term': 'AI & ML', 'scheme': None, 'label': ...",https://www.oreilly.com/radar/?p=11549,False,We suspected that data quality was a topic bri...,"{'type': 'text/html', 'language': None, 'base'...","[{'type': 'text/html', 'language': None, 'base...",https://www.oreilly.com/radar/the-state-of-dat...,0,https://www.oreilly.com/radar/the-state-of-dat...,33
3,Four short links: 12 February 2020,"{'type': 'text/plain', 'language': None, 'base...","[{'rel': 'alternate', 'type': 'text/html', 'hr...",http://feedproxy.google.com/~r/oreilly/radar/a...,https://www.oreilly.com/radar/four-short-links...,"Wed, 12 Feb 2020 05:01:00 +0000","(2020, 2, 12, 5, 1, 0, 2, 43, 0)",[{'name': 'Nat Torkington'}],Nat Torkington,{'name': 'Nat Torkington'},"[{'term': 'Four Short Links', 'scheme': None, ...",https://www.oreilly.com/radar/?p=11936,False,Drafting an Engineering Strategy (Mathias Meye...,"{'type': 'text/html', 'language': None, 'base'...","[{'type': 'text/html', 'language': None, 'base...",https://www.oreilly.com/radar/four-short-links...,0,https://www.oreilly.com/radar/four-short-links...,34
4,Four short links: 11 February 2020,"{'type': 'text/plain', 'language': None, 'base...","[{'rel': 'alternate', 'type': 'text/html', 'hr...",http://feedproxy.google.com/~r/oreilly/radar/a...,https://www.oreilly.com/radar/four-short-links...,"Tue, 11 Feb 2020 05:01:00 +0000","(2020, 2, 11, 5, 1, 0, 1, 42, 0)",[{'name': 'Nat Torkington'}],Nat Torkington,{'name': 'Nat Torkington'},"[{'term': 'Four Short Links', 'scheme': None, ...",https://www.oreilly.com/radar/?p=11773,False,The Fate of Empires &#8212; 1977 text summariz...,"{'type': 'text/html', 'language': None, 'base'...","[{'type': 'text/html', 'language': None, 'base...",https://www.oreilly.com/radar/four-short-links...,0,https://www.oreilly.com/radar/four-short-links...,34
5,Four short links: 10 February 2020,"{'type': 'text/plain', 'language': None, 'base...","[{'rel': 'alternate', 'type': 'text/html', 'hr...",http://feedproxy.google.com/~r/oreilly/radar/a...,https://www.oreilly.com/radar/four-short-links...,"Mon, 10 Feb 2020 05:01:00 +0000","(2020, 2, 10, 5, 1, 0, 0, 41, 0)",[{'name': 'Nat Torkington'}],Nat Torkington,{'name': 'Nat Torkington'},"[{'term': 'Four Short Links', 'scheme': None, ...",https://www.oreilly.com/radar/?p=11745,False,The Digital Dictators: How Technology Strength...,"{'type': 'text/html', 'language': None, 'base'...","[{'type': 'text/html', 'language': None, 'base...",https://www.oreilly.com/radar/four-short-links...,0,https://www.oreilly.com/radar/four-short-links...,34
6,Four short links: 7 February 2020,"{'type': 'text/plain', 'language': None, 'base...","[{'rel': 'alternate', 'type': 'text/html', 'hr...",http://feedproxy.google.com/~r/oreilly/radar/a...,https://www.oreilly.com/radar/four-short-links...,"Fri, 07 Feb 2020 05:01:00 +0000","(2020, 2, 7, 5, 1, 0, 4, 38, 0)",[{'name': 'Nat Torkington'}],Nat Torkington,{'name': 'Nat Torkington'},"[{'term': 'Four Short Links', 'scheme': None, ...",https://www.oreilly.com/radar/?p=11710,False,31 Days of API Security Tips &#8212; Mobile Ce...,"{'type': 'text/html', 'language': None, 'base'...","[{'type': 'text/html', 'language': None, 'base...",https://www.oreilly.com/radar/four-short-links...,0,https://www.oreilly.com/radar/four-short-links...,33
7,Radar trends to watch: February 2020,"{'type': 'text/plain', 'language': None, 'base...","[{'rel': 'alternate', 'type': 'text/html', 'hr...",http://feedproxy.google.com/~r/oreilly/radar/a...,https://www.oreilly.com/radar/radar-trends-to-...,"Thu, 06 Feb 2020 11:00:00 +0000","(2020, 2, 6, 11, 0, 0, 3, 37, 0)",[{'name': 'Mike Loukides'}],Mike Loukides,{'name': 'Mike Loukides'},"[{'term': 'Radar Trends', 'scheme': None, 'lab...",https://www.oreilly.com/radar/?p=11621,False,Automation and infrastructure trends IBM has r...,"{'type': 'text/html', 'language': None, 'base'...","[{'type': 'text/html', 'language': None, 'base...",https://www.oreilly.com/radar/radar-trends-to-...,0,https://www.oreilly.com/radar/radar-trends-to-...,36
8,Four short links: 6 February 2020,"{'type': 'text/plain', 'language': None, 'base...","[{'rel': 'alternate', 'type': 'text/html', 'hr...",http://feedproxy.google.com/~r/oreilly/radar/a...,https://www.oreilly.com/radar/four-short-links...,"Thu, 06 Feb 2020 05:01:00 +0000","(2020, 2, 6, 5, 1, 0, 3, 37, 0)",[{'name': 'Nat Torkington'}],Nat Torkington,{'name': 'Nat Torkington'},"[{'term': 'Four Short Links', 'scheme': None, ...",https://www.oreilly.com/radar/?p=11682,False,Assembler &#8212; Google&#8217;s Jigsaw group ...,"{'type': 'text/html', 'language': None, 'base'...","[{'type': 'text/html', 'language': None, 'base...",https://www.oreilly.com/radar/four-short-links...,0,https://www.oreilly.com/radar/four-short-links...,33
9,Four short links: 5 February 2020,"{'type': 'text/plain', 'language': None, 'base...","[{'rel': 'alternate', 'type': 'text/html', 'hr...",http://feedproxy.google.com/~r/oreilly/radar/a...,https://www.oreilly.com/radar/four-short-links...,"Wed, 05 Feb 2020 05:01:00 +0000","(2020, 2, 5, 5, 1, 0, 2, 36, 0)",[{'name': 'Nat Torkington'}],Nat Torkington,{'name': 'Nat Torkington'},"[{'term': 'Four Short Links', 'scheme': None, ...",https://www.oreilly.com/radar/?p=11644,False,Discord Switching from Go to Rust &#8212; memo...,"{'type': 'text/html', 'language': None, 'base'...","[{'type': 'text/html', 'language': None, 'base...",https://www.oreilly.com/radar/four-short-links...,0,https://www.oreilly.com/radar/four-short-links...,33


In [51]:
sort_entry = df[['title','author','entry_length']].sort_values(by='entry_length',ascending = False)
sort_entry

Unnamed: 0,title,author,entry_length
55,5 industries that demonstrate how blockchains ...,Alison McCauley,63
58,Why you should care about debugging machine le...,Patrick Hall and Andrew Burt,59
28,Where programming languages are headed in 2020,Zan McQuade and Amanda Quinn,46
52,AI is computer science disguised as hard work,Jenn Webb,45
25,Reinforcement learning for the real world,Jenn Webb,41
33,9 additional books for the Next Economy,,39
7,Radar trends to watch: February 2020,Mike Loukides,36
31,Radar trends to watch: January 2020,Mike Loukides,35
46,Four short links: 26 December 2019,Nat Torkington,34
1,Four short links: 13 February 2020,Nat Torkington,34


### 12. Create a list of entry titles whose summary includes the phrase "machine learning."

In [52]:
df['summary']

0     ABD &#8212; Course materials for Advanced Bina...
1     Ofcom To Regulate UK Internet &#8212; The regu...
2     We suspected that data quality was a topic bri...
3     Drafting an Engineering Strategy (Mathias Meye...
4     The Fate of Empires &#8212; 1977 text summariz...
5     The Digital Dictators: How Technology Strength...
6     31 Days of API Security Tips &#8212; Mobile Ce...
7     Automation and infrastructure trends IBM has r...
8     Assembler &#8212; Google&#8217;s Jigsaw group ...
9     Discord Switching from Go to Rust &#8212; memo...
10    The Missing Semester of Your MIT Education &#8...
11    One of the biggest challenges operations group...
12    Standing on the Shoulders of Giants (Ben Evans...
13    Thunderbird on the Move (ZDNet) &#8212; the ne...
14    Towards a Human-like Open-Domain Chatbot &#821...
15    Reverb &#8212; speculative debugging for web a...
16    TinyML Book &#8212; machine learning for embed...
17    The Developer Coefficient (Stripe) &#8212;

In [55]:
list(df['title'].loc[(df['summary'].str.contains('machine learning'))])

['Four short links: 13 February 2020',
 'Four short links: 28 January 2020',
 'Four short links: 13 January 2020',
 'Why you should care about debugging machine learning models']