# Working with RSS Feeds Lab

Complete the following set of exercises to solidify your knowledge of parsing RSS feeds and extracting information from them.

In [8]:
import feedparser

### 1. Use feedparser to parse the following RSS feed URL.

In [14]:
url = 'http://feeds.feedburner.com/oreilly/radar/atom'

In [15]:
rss_parsed = feedparser.parse(url)

### 2. Obtain a list of components (keys) that are available for this feed.

In [19]:
keys = [rss for rss in rss_parsed]
keys

['feed',
 'entries',
 'bozo',
 'headers',
 'etag',
 'updated',
 'updated_parsed',
 'href',
 'status',
 'encoding',
 'version',
 'namespaces']

### 3. Obtain a list of components (keys) that are available for the *feed* component of this RSS feed.

In [23]:
keys_parsed = [rss for rss in rss_parsed["feed"]]
keys_parsed

['title',
 'title_detail',
 'links',
 'link',
 'subtitle',
 'subtitle_detail',
 'updated',
 'updated_parsed',
 'language',
 'sy_updateperiod',
 'sy_updatefrequency',
 'generator_detail',
 'generator',
 'feedburner_info',
 'geo_lat',
 'geo_long',
 'feedburner_emailserviceid',
 'feedburner_feedburnerhostname']

### 4. Extract and print the feed title, subtitle, author, and link.

In [29]:
title = rss_parsed["feed"]["title"]
subtitle = rss_parsed["feed"]["subtitle"]
link = rss_parsed["feed"]["link"]

print(title)
print(subtitle)
print(link)

Radar
Now, next, and beyond: Tracking need-to-know trends at the intersection of business and technology
https://www.oreilly.com/radar


### 5. Count the number of entries that are contained in this RSS feed.

In [31]:
len(rss_parsed)

12

### 6. Obtain a list of components (keys) available for an entry.

*Hint: Remember to index first before requesting the keys*

In [33]:
keys_entries = [rss for rss in rss_parsed["entries"][0]]
keys_entries

['title',
 'title_detail',
 'links',
 'link',
 'comments',
 'published',
 'published_parsed',
 'authors',
 'author',
 'author_detail',
 'tags',
 'id',
 'guidislink',
 'summary',
 'summary_detail',
 'content',
 'wfw_commentrss',
 'slash_comments',
 'feedburner_origlink']

### 7. Extract a list of entry titles.

In [36]:
title_entries = [rss["title"] for rss in rss_parsed["entries"]]
title_entries

['Four short links: 14 February 2020',
 'Four short links: 13 February 2020',
 'The state of data quality in 2020',
 'Four short links: 12 February 2020',
 'Four short links: 11 February 2020',
 'Four short links: 10 February 2020',
 'Four short links: 7 February 2020',
 'Radar trends to watch: February 2020',
 'Four short links: 6 February 2020',
 'Four short links: 5 February 2020',
 'Four short links: 4 February 2020',
 'AI meets operations',
 'Four short links: 3 February 2020',
 'Four short links: 31 January 2020',
 'Four short links: 30 January 2020',
 'Four short links: 29 January 2020',
 'Four short links: 28 January 2020',
 'Four short links: 27 January 2020',
 'Four short links: 24 January 2020',
 'Four short links: 23 January 2020',
 'Four short links: 22 January 2020',
 'Four short links: 21 January 2020',
 'Four short links: 20 January 2020',
 'Four short links: 17 January 2020',
 'Four short links: 16 January 2020',
 'Reinforcement learning for the real world',
 'Four sho

### 8. Calculate the percentage of "Four short links" entry titles.

In [57]:
title_entries = [rss["title"] for rss in rss_parsed["entries"]]

four_short_entries = []
for title in title_entries:
    if title.startswith("Four short links"):
        four_short_entries.append(title)

print(f"Percentage of 'Four short links' is: {(len(four_short_entries) * 100) / len(title_entries)}")

Percentage of 'Four short links' is: 78.33333333333333


### 9. Create a Pandas data frame from the feed's entries.

In [47]:
import pandas as pd

In [50]:
entries = [rss for rss in rss_parsed["entries"]]
feed_entries_df = pd.DataFrame(entries)
feed_entries_df.head()

Unnamed: 0,title,title_detail,links,link,comments,published,published_parsed,authors,author,author_detail,tags,id,guidislink,summary,summary_detail,content,wfw_commentrss,slash_comments,feedburner_origlink
0,Four short links: 14 February 2020,"{'type': 'text/plain', 'language': None, 'base...","[{'rel': 'alternate', 'type': 'text/html', 'hr...",http://feedproxy.google.com/~r/oreilly/radar/a...,https://www.oreilly.com/radar/four-short-links...,"Fri, 14 Feb 2020 05:01:00 +0000","(2020, 2, 14, 5, 1, 0, 4, 45, 0)",[{'name': 'Nat Torkington'}],Nat Torkington,{'name': 'Nat Torkington'},"[{'term': 'Four Short Links', 'scheme': None, ...",https://www.oreilly.com/radar/?p=11959,False,ABD &#8212; Course materials for Advanced Bina...,"{'type': 'text/html', 'language': None, 'base'...","[{'type': 'text/html', 'language': None, 'base...",https://www.oreilly.com/radar/four-short-links...,0,https://www.oreilly.com/radar/four-short-links...
1,Four short links: 13 February 2020,"{'type': 'text/plain', 'language': None, 'base...","[{'rel': 'alternate', 'type': 'text/html', 'hr...",http://feedproxy.google.com/~r/oreilly/radar/a...,https://www.oreilly.com/radar/four-short-links...,"Thu, 13 Feb 2020 05:01:00 +0000","(2020, 2, 13, 5, 1, 0, 3, 44, 0)",[{'name': 'Nat Torkington'}],Nat Torkington,{'name': 'Nat Torkington'},"[{'term': 'Four Short Links', 'scheme': None, ...",https://www.oreilly.com/radar/?p=11952,False,Ofcom To Regulate UK Internet &#8212; The regu...,"{'type': 'text/html', 'language': None, 'base'...","[{'type': 'text/html', 'language': None, 'base...",https://www.oreilly.com/radar/four-short-links...,0,https://www.oreilly.com/radar/four-short-links...
2,The state of data quality in 2020,"{'type': 'text/plain', 'language': None, 'base...","[{'rel': 'alternate', 'type': 'text/html', 'hr...",http://feedproxy.google.com/~r/oreilly/radar/a...,https://www.oreilly.com/radar/the-state-of-dat...,"Wed, 12 Feb 2020 06:00:00 +0000","(2020, 2, 12, 6, 0, 0, 2, 43, 0)",[{'name': 'Roger Magoulas and Steve Swoyer'}],Roger Magoulas and Steve Swoyer,{'name': 'Roger Magoulas and Steve Swoyer'},"[{'term': 'AI & ML', 'scheme': None, 'label': ...",https://www.oreilly.com/radar/?p=11549,False,We suspected that data quality was a topic bri...,"{'type': 'text/html', 'language': None, 'base'...","[{'type': 'text/html', 'language': None, 'base...",https://www.oreilly.com/radar/the-state-of-dat...,0,https://www.oreilly.com/radar/the-state-of-dat...
3,Four short links: 12 February 2020,"{'type': 'text/plain', 'language': None, 'base...","[{'rel': 'alternate', 'type': 'text/html', 'hr...",http://feedproxy.google.com/~r/oreilly/radar/a...,https://www.oreilly.com/radar/four-short-links...,"Wed, 12 Feb 2020 05:01:00 +0000","(2020, 2, 12, 5, 1, 0, 2, 43, 0)",[{'name': 'Nat Torkington'}],Nat Torkington,{'name': 'Nat Torkington'},"[{'term': 'Four Short Links', 'scheme': None, ...",https://www.oreilly.com/radar/?p=11936,False,Drafting an Engineering Strategy (Mathias Meye...,"{'type': 'text/html', 'language': None, 'base'...","[{'type': 'text/html', 'language': None, 'base...",https://www.oreilly.com/radar/four-short-links...,0,https://www.oreilly.com/radar/four-short-links...
4,Four short links: 11 February 2020,"{'type': 'text/plain', 'language': None, 'base...","[{'rel': 'alternate', 'type': 'text/html', 'hr...",http://feedproxy.google.com/~r/oreilly/radar/a...,https://www.oreilly.com/radar/four-short-links...,"Tue, 11 Feb 2020 05:01:00 +0000","(2020, 2, 11, 5, 1, 0, 1, 42, 0)",[{'name': 'Nat Torkington'}],Nat Torkington,{'name': 'Nat Torkington'},"[{'term': 'Four Short Links', 'scheme': None, ...",https://www.oreilly.com/radar/?p=11773,False,The Fate of Empires &#8212; 1977 text summariz...,"{'type': 'text/html', 'language': None, 'base'...","[{'type': 'text/html', 'language': None, 'base...",https://www.oreilly.com/radar/four-short-links...,0,https://www.oreilly.com/radar/four-short-links...


### 10. Count the number of entries per author and sort them in descending order.

In [80]:
df_by_author = feed_entries_df.groupby(['author'], sort=False).size()
df_by_author.head(10)

author
Nat Torkington                     47
Roger Magoulas and Steve Swoyer     1
Mike Loukides                       4
Jenn Webb                           2
Zan McQuade and Amanda Quinn        1
                                    2
Roger Magoulas                      1
Alison McCauley                     1
Patrick Hall and Andrew Burt        1
dtype: int64

### 11. Add a new column to the data frame that contains the length (number of characters) of each entry title. Return a data frame that contains the title, author, and title length of each entry in descending order (longest title length at the top).

In [81]:
length_df = [len(title) for title in new_df["title"]]
new_df = feed_entries_df[["title", "author"]]
new_df['title length'] = pd.Series(length_df)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  This is separate from the ipykernel package so we can avoid doing imports until


### 12. Create a list of entry titles whose summary includes the phrase "machine learning."

In [87]:
title_entries = [rss["summary"] for rss in rss_parsed["entries"]]

ml_summary_entries = []
for title in title_entries:
    if title.count("machine learning") > 0:
        ml_summary_entries.append(title)

ml_summary_entries

['Ofcom To Regulate UK Internet &#8212; The regulator will play a key role in enforcing a statutory duty of care to protect users from harmful and illegal terrorist and child abuse content. Turing &#8212; Julia library for fast machine learning. The Effects of Prize Structures on Innovative Performance &#8212; We find that a winner-takes-all compensation [&#8230;]',
 'TinyML Book &#8212; machine learning for embedded systems, an O&#8217;Reilly book by Pete Warden and Daniel Sityunake. Useful Probability for Systems Programmers &#8212; interesting findings like: If you have 1N chance of success, then you’re more likely than not to have succeeded after N tries, but the probability is only about two thirds. Cost of [&#8230;]',
 'Simulated Customer &#8212; The site will randomly generate one of 40 different [sales] objections, and give you 20 seconds to answer it. From Shallow to Deep Interactions Between Knowledge Representation, Reasoning, and Machine Learning &#8212; This paper proposes