# Working with RSS Feeds Lab

Complete the following set of exercises to solidify your knowledge of parsing RSS feeds and extracting information from them.

In [1]:
import feedparser

### 1. Use feedparser to parse the following RSS feed URL.

In [3]:
url = 'http://feeds.feedburner.com/oreilly/radar/atom'

In [7]:
reddit = feedparser.parse('https://www.reddit.com/r/tech.rss')
print(reddit['feed'])

{'tags': [{'term': 'tech', 'scheme': None, 'label': 'r/tech'}], 'updated': '2020-05-20T04:52:30+00:00', 'updated_parsed': time.struct_time(tm_year=2020, tm_mon=5, tm_mday=20, tm_hour=4, tm_min=52, tm_sec=30, tm_wday=2, tm_yday=141, tm_isdst=0), 'icon': 'https://www.redditstatic.com/icon.png/', 'id': 'https://www.reddit.com/r/tech.rss', 'guidislink': True, 'link': 'https://www.reddit.com/r/tech', 'links': [{'rel': 'self', 'href': 'https://www.reddit.com/r/tech.rss', 'type': 'application/atom+xml'}, {'rel': 'alternate', 'href': 'https://www.reddit.com/r/tech', 'type': 'text/html'}], 'logo': 'https://f.thumbs.redditmedia.com/kI7eGVG6kaObGTdM.png', 'subtitle': 'The goal of /r/tech is to provide a space dedicated to the intelligent discussion of innovations and changes to technology in our ever changing world. We focus on high quality news articles about technology and informative and thought provoking self posts.', 'subtitle_detail': {'type': 'text/plain', 'language': None, 'base': 'https:

### 2. Obtain a list of components (keys) that are available for this feed.

In [9]:
reddit.keys()

dict_keys(['feed', 'entries', 'bozo', 'headers', 'href', 'status', 'encoding', 'version', 'namespaces'])

### 3. Obtain a list of components (keys) that are available for the *feed* component of this RSS feed.

In [10]:
reddit['feed'].keys()

dict_keys(['tags', 'updated', 'updated_parsed', 'icon', 'id', 'guidislink', 'link', 'links', 'logo', 'subtitle', 'subtitle_detail', 'title', 'title_detail'])

### 4. Extract and print the feed title, subtitle, author, and link.

In [18]:
print (reddit.feed.title)
print (reddit.feed.subtitle)
reddit.entries[0].author
print (reddit.feed.link)

/r/tech: Technological innovations and changes.
The goal of /r/tech is to provide a space dedicated to the intelligent discussion of innovations and changes to technology in our ever changing world. We focus on high quality news articles about technology and informative and thought provoking self posts.
https://www.reddit.com/r/tech


### 5. Count the number of entries that are contained in this RSS feed.

In [22]:
import pandas as pd

df = pd.DataFrame(reddit.entries)
df.count().title

26

### 6. Obtain a list of components (keys) available for an entry.

*Hint: Remember to index first before requesting the keys*

In [31]:
listaentries = reddit.entries[0].keys()
print(listaentries)

dict_keys(['authors', 'author_detail', 'href', 'author', 'tags', 'content', 'summary', 'id', 'guidislink', 'link', 'links', 'updated', 'updated_parsed', 'title', 'title_detail'])


### 7. Extract a list of entry titles.

In [32]:
titles = [reddit.entries[i].title for i in range(len(reddit.entries))]
print(titles)

["/r/Tech now has it's own Discord server!", 'The farms growing beneath our cities', 'Dogs Obey Commands Given by Social Robots', 'Dyson’s scrapped electric car: founder reveals what could have been', 'Digital Overload: Average Adult Will Spend 34 Years Of Their Life Staring At Screens', 'Microsoft to adapt its cloud software for healthcare industry', '3 Questions: The rapidly unfolding future of smart fabrics', 'Maybe it’s time to retire the idea of “going viral”', 'App implementation based on your submissions', 'weird noise in headphones', 'Free Email Signature Generator by Exclaimer', 'Self-disinfecting mask that works with Face ID in development', 'Corkscrew light promises higher optical-communication data rates', 'Uber closes 45 offices and lays off another 3,000 employees', 'The role of cloud tech in saving the endangered Tasmanian devil', "IBM's new open-source tool helps developers make their apps more accessible", 'Why new U.S. rules on selling chips to Huawei could be a ‘big 

### 8. Calculate the percentage of "Four short links" entry titles.

In [38]:
authors = df.groupby('author', as_index=False).agg({'title':'count'})
authors.columns = ['author', 'entries']
authors.sort_values('entries', ascending=False)

Unnamed: 0,author,entries
13,/u/jsamwrites,4
11,/u/eugeneching,2
9,/u/eberkut,2
0,/u/CrazyGobler,1
19,/u/trevor25,1
18,/u/surfinThruLyfe,1
17,/u/snooshoe,1
16,/u/ryeshoes,1
15,/u/nasirbobby,1
14,/u/moist_pringles69,1


### 9. Create a Pandas data frame from the feed's entries.

In [None]:
import pandas as pd

In [34]:
df.head()

Unnamed: 0,authors,author_detail,href,author,tags,content,summary,id,guidislink,link,links,updated,updated_parsed,title,title_detail
0,"[{'name': '/u/OriginalHoneyBadger', 'href': 'h...","{'name': '/u/OriginalHoneyBadger', 'href': 'ht...",https://www.reddit.com/user/OriginalHoneyBadger,/u/OriginalHoneyBadger,"[{'term': 'tech', 'scheme': None, 'label': 'r/...","[{'type': 'text/html', 'language': None, 'base...","<!-- SC_OFF --><div class=""md""><p>Hey guys!</p...",https://www.reddit.com/r/t3_7dx2ew,True,https://www.reddit.com/r/tech/comments/7dx2ew/...,[{'href': 'https://www.reddit.com/r/tech/comme...,2017-11-19T00:37:30+00:00,"(2017, 11, 19, 0, 37, 30, 6, 323, 0)",/r/Tech now has it's own Discord server!,"{'type': 'text/plain', 'language': None, 'base..."
1,"[{'name': '/u/urbanrenaissance', 'href': 'http...","{'name': '/u/urbanrenaissance', 'href': 'https...",https://www.reddit.com/user/urbanrenaissance,/u/urbanrenaissance,"[{'term': 'tech', 'scheme': None, 'label': 'r/...","[{'type': 'text/html', 'language': None, 'base...","&#32; submitted by &#32; <a href=""https://www....",https://www.reddit.com/r/t3_gn0wub,True,https://www.reddit.com/r/tech/comments/gn0wub/...,[{'href': 'https://www.reddit.com/r/tech/comme...,2020-05-20T00:26:54+00:00,"(2020, 5, 20, 0, 26, 54, 2, 141, 0)",The farms growing beneath our cities,"{'type': 'text/plain', 'language': None, 'base..."
2,"[{'name': '/u/bittubruh', 'href': 'https://www...","{'name': '/u/bittubruh', 'href': 'https://www....",https://www.reddit.com/user/bittubruh,/u/bittubruh,"[{'term': 'tech', 'scheme': None, 'label': 'r/...","[{'type': 'text/html', 'language': None, 'base...","&#32; submitted by &#32; <a href=""https://www....",https://www.reddit.com/r/t3_gmj2cz,True,https://www.reddit.com/r/tech/comments/gmj2cz/...,[{'href': 'https://www.reddit.com/r/tech/comme...,2020-05-19T06:18:22+00:00,"(2020, 5, 19, 6, 18, 22, 1, 140, 0)",Dogs Obey Commands Given by Social Robots,"{'type': 'text/plain', 'language': None, 'base..."
3,"[{'name': '/u/ryeshoes', 'href': 'https://www....","{'name': '/u/ryeshoes', 'href': 'https://www.r...",https://www.reddit.com/user/ryeshoes,/u/ryeshoes,"[{'term': 'tech', 'scheme': None, 'label': 'r/...","[{'type': 'text/html', 'language': None, 'base...","&#32; submitted by &#32; <a href=""https://www....",https://www.reddit.com/r/t3_gmejb0,True,https://www.reddit.com/r/tech/comments/gmejb0/...,[{'href': 'https://www.reddit.com/r/tech/comme...,2020-05-19T01:04:40+00:00,"(2020, 5, 19, 1, 4, 40, 1, 140, 0)",Dyson’s scrapped electric car: founder reveals...,"{'type': 'text/plain', 'language': None, 'base..."
4,"[{'name': '/u/djwired', 'href': 'https://www.r...","{'name': '/u/djwired', 'href': 'https://www.re...",https://www.reddit.com/user/djwired,/u/djwired,"[{'term': 'tech', 'scheme': None, 'label': 'r/...","[{'type': 'text/html', 'language': None, 'base...","&#32; submitted by &#32; <a href=""https://www....",https://www.reddit.com/r/t3_gm0o0q,True,https://www.reddit.com/r/tech/comments/gm0o0q/...,[{'href': 'https://www.reddit.com/r/tech/comme...,2020-05-18T12:37:44+00:00,"(2020, 5, 18, 12, 37, 44, 0, 139, 0)",Digital Overload: Average Adult Will Spend 34 ...,"{'type': 'text/plain', 'language': None, 'base..."


### 10. Count the number of entries per author and sort them in descending order.

In [41]:
authors = df.groupby('author', as_index=False).agg({'title':'count'})
authors.columns = ['author', 'entries']
authors.sort_values('entries', ascending=False)

Unnamed: 0,author,entries
13,/u/jsamwrites,4
11,/u/eugeneching,2
9,/u/eberkut,2
0,/u/CrazyGobler,1
19,/u/trevor25,1
18,/u/surfinThruLyfe,1
17,/u/snooshoe,1
16,/u/ryeshoes,1
15,/u/nasirbobby,1
14,/u/moist_pringles69,1


### 11. Add a new column to the data frame that contains the length (number of characters) of each entry title. Return a data frame that contains the title, author, and title length of each entry in descending order (longest title length at the top).

In [42]:
df['title_length'] = df['title'].apply(len)
df[['title', 'author', 'title_length']].sort_values('title_length', ascending=False).head()

Unnamed: 0,title,author,title_length
20,Everything OK with Microsoft? Windows giant ad...,/u/jsamwrites,115
23,The biggest change in how you use your iPhone ...,/u/eugeneching,105
21,The System That Actually Worked: How the inter...,/u/eberkut,100
16,Why new U.S. rules on selling chips to Huawei ...,/u/snooshoe,94
17,"Tesla's next factory is going to be in Austin,...",/u/MichaelTen,85


### 12. Create a list of entry titles whose summary includes the phrase "machine learning."

In [61]:
summary = df[df['summary'].str.contains(pat = 'machine learning')],df[['title', 'summary']]
summary[0:]

(Empty DataFrame
 Columns: [authors, author_detail, href, author, tags, content, summary, id, guidislink, link, links, updated, updated_parsed, title, title_detail, title_length]
 Index: [],
                                                 title  \
 0            /r/Tech now has it's own Discord server!   
 1                The farms growing beneath our cities   
 2           Dogs Obey Commands Given by Social Robots   
 3   Dyson’s scrapped electric car: founder reveals...   
 4   Digital Overload: Average Adult Will Spend 34 ...   
 5   Microsoft to adapt its cloud software for heal...   
 6   3 Questions: The rapidly unfolding future of s...   
 7   Maybe it’s time to retire the idea of “going v...   
 8        App implementation based on your submissions   
 9                           weird noise in headphones   
 10        Free Email Signature Generator by Exclaimer   
 11  Self-disinfecting mask that works with Face ID...   
 12  Corkscrew light promises higher optical-commun... 