# Working with RSS Feeds Lab

Complete the following set of exercises to solidify your knowledge of parsing RSS feeds and extracting information from them.

In [1]:
import feedparser

### 1. Use feedparser to parse the following RSS feed URL.

In [2]:
url = 'http://feeds.feedburner.com/oreilly/radar/atom'

In [3]:
oreilly=feedparser.parse(url)

### 2. Obtain a list of components (keys) that are available for this feed.

In [4]:
oreilly.keys()

dict_keys(['feed', 'entries', 'bozo', 'headers', 'etag', 'updated', 'updated_parsed', 'href', 'status', 'encoding', 'version', 'namespaces'])

### 3. Obtain a list of components (keys) that are available for the *feed* component of this RSS feed.

In [5]:
oreilly.feed.keys()

dict_keys(['title', 'title_detail', 'id', 'guidislink', 'link', 'updated', 'updated_parsed', 'subtitle', 'subtitle_detail', 'links', 'authors', 'author_detail', 'author', 'feedburner_info', 'geo_lat', 'geo_long', 'feedburner_emailserviceid', 'feedburner_feedburnerhostname'])

### 4. Extract and print the feed title, subtitle, author, and link.

In [6]:
print(oreilly.feed.title)
print(oreilly.feed.subtitle)
print(oreilly.feed.author)
print(oreilly.feed.link)

    

All - O'Reilly Media
All of our Ideas and Learning material from all of our topics.
O'Reilly Media
https://www.oreilly.com


### 5. Count the number of entries that are contained in this RSS feed.

In [7]:
len(oreilly.entries)

60

### 6. Obtain a list of components (keys) available for an entry.

*Hint: Remember to index first before requesting the keys*

In [8]:
oreilly.entries[0].keys()

dict_keys(['title', 'title_detail', 'updated', 'updated_parsed', 'id', 'guidislink', 'link', 'content', 'summary', 'links', 'authors', 'author_detail', 'author', 'feedburner_origlink'])

### 7. Extract a list of entry titles.

In [9]:
titles = [oreilly.entries[i].title for i in range(len(oreilly.entries))]
titles

['Four short links: 12 July 2019',
 'Four short links: 11 July 2019',
 'Four short links: 10 July 2019',
 'Four short links: 9 July 2019',
 'The circle of fairness',
 "Highlights from the O'Reilly Artificial Intelligence Conference in Beijing 2019",
 'The future of hiring and the talent market with AI',
 'Designing computer hardware for artificial intelligence',
 'The future of machine learning is tiny',
 'AI and retail',
 'Data orchestration for AI, big data, and cloud',
 'AI and systems at RISELab',
 'Toward learned algorithms, data structures, and systems',
 'Top AI breakthroughs you need to know',
 'Four short links: 8 July 2019',
 'Four short links: 5 July 2019',
 'Four short links: 4 July 2019',
 'Tools for machine learning development',
 'New live online training courses',
 'Four short links: 3 July 2019',
 'Four short links: 2 July 2019',
 'Four short links: 1 July 2019',
 'RISELab’s AutoPandas hints at automation tech that will change the nature of software development',
 'Fou

### 8. Calculate the percentage of "Four short links" entry titles.

In [10]:
c = 0
for t in titles:
    if 'four short links' in t.lower():
        c += 1
print ('percent: '+str(c/len(titles)*100))

percent: 38.333333333333336


### 9. Create a Pandas data frame from the feed's entries.

In [11]:
import pandas as pd

In [12]:
df = pd.DataFrame(oreilly.entries)
df.head()

Unnamed: 0,author,author_detail,authors,content,feedburner_origlink,guidislink,id,link,links,summary,title,title_detail,updated,updated_parsed
0,Nat Torkington,{'name': 'Nat Torkington'},[{'name': 'Nat Torkington'}],"[{'type': 'text/html', 'language': None, 'base...",https://www.oreilly.com/ideas/four-short-links...,True,"tag:www.oreilly.com,2019-07-12:/ideas/four-sho...",http://feedproxy.google.com/~r/oreilly/radar/a...,[{'href': 'http://feedproxy.google.com/~r/orei...,"<p><em>Hosting Hate, Releasing, Government Inn...",Four short links: 12 July 2019,"{'type': 'text/plain', 'language': None, 'base...",2019-07-12T10:50:00Z,"(2019, 7, 12, 10, 50, 0, 4, 193, 0)"
1,Nat Torkington,{'name': 'Nat Torkington'},[{'name': 'Nat Torkington'}],"[{'type': 'text/html', 'language': None, 'base...",https://www.oreilly.com/ideas/four-short-links...,True,"tag:www.oreilly.com,2019-07-11:/ideas/four-sho...",http://feedproxy.google.com/~r/oreilly/radar/a...,[{'href': 'http://feedproxy.google.com/~r/orei...,"<p><em>Museum Copyright, Twitter Apprenticeshi...",Four short links: 11 July 2019,"{'type': 'text/plain', 'language': None, 'base...",2019-07-11T12:10:00Z,"(2019, 7, 11, 12, 10, 0, 3, 192, 0)"
2,Nat Torkington,{'name': 'Nat Torkington'},[{'name': 'Nat Torkington'}],"[{'type': 'text/html', 'language': None, 'base...",https://www.oreilly.com/ideas/four-short-links...,True,"tag:www.oreilly.com,2019-07-10:/ideas/four-sho...",http://feedproxy.google.com/~r/oreilly/radar/a...,[{'href': 'http://feedproxy.google.com/~r/orei...,"<p><em>Optimizations and Security, 512 Byte Pa...",Four short links: 10 July 2019,"{'type': 'text/plain', 'language': None, 'base...",2019-07-10T12:45:00Z,"(2019, 7, 10, 12, 45, 0, 2, 191, 0)"
3,Nat Torkington,{'name': 'Nat Torkington'},[{'name': 'Nat Torkington'}],"[{'type': 'text/html', 'language': None, 'base...",https://www.oreilly.com/ideas/four-short-links...,True,"tag:www.oreilly.com,2019-07-09:/ideas/four-sho...",http://feedproxy.google.com/~r/oreilly/radar/a...,[{'href': 'http://feedproxy.google.com/~r/orei...,"<p><em>Future of Work, GRANDstack, Hilarious L...",Four short links: 9 July 2019,"{'type': 'text/plain', 'language': None, 'base...",2019-07-09T11:40:00Z,"(2019, 7, 9, 11, 40, 0, 1, 190, 0)"
4,Mike Loukides,{'name': 'Mike Loukides'},[{'name': 'Mike Loukides'}],"[{'type': 'text/html', 'language': None, 'base...",https://www.oreilly.com/ideas/the-circle-of-fa...,True,"tag:www.oreilly.com,2019-07-09:/ideas/the-circ...",http://feedproxy.google.com/~r/oreilly/radar/a...,[{'href': 'http://feedproxy.google.com/~r/orei...,<p><img src='https://d3ucjech6zwjp8.cloudfront...,The circle of fairness,"{'type': 'text/plain', 'language': None, 'base...",2019-07-09T11:00:00Z,"(2019, 7, 9, 11, 0, 0, 1, 190, 0)"


### 10. Count the number of entries per author and sort them in descending order.

In [13]:
auth = df.groupby('author',as_index=False).agg({'content':'count'})
auth.columns=['author','entries']
auth.sort_values('entries',ascending=False)

Unnamed: 0,author,entries
25,Nat Torkington,23
3,Ben Lorica,4
13,Jenn Webb,3
24,Mikio Braun,1
19,Maria Zhang,1
20,Michael Carducci,1
21,Michael Feathers,1
22,Michael James,1
23,Mike Loukides,1
0,Abigail Hing Wen,1


### 11. Add a new column to the data frame that contains the length (number of characters) of each entry title. Return a data frame that contains the title, author, and title length of each entry in descending order (longest title length at the top).

In [14]:
auth = df[['author','title']]
auth['len']=auth['title'].apply(len)

auth.sort_values('len',ascending=False)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  


Unnamed: 0,author,title,len
22,Ben Lorica,RISELab’s AutoPandas hints at automation tech ...,97
45,Everett Harper,Infrastructure first: Because solving complex ...,81
5,Jenn Webb,Highlights from the O'Reilly Artificial Intell...,79
47,Jenn Webb,Highlights from the O'Reilly Software Architec...,78
32,Ben Lorica,Enabling end-to-end machine learning pipelines...,73
27,Ben Lorica,AI and machine learning will require retrainin...,72
53,Lena Hall,Channel into the universe of eventually perfec...,67
56,Mac Slocum,Highlights from the O'Reilly Velocity Conferen...,65
54,Chen Goldberg,Scaling teams with technology (or is it the ot...,62
38,Adam Tornhill,Prioritizing technical debt as if time and mon...,57


### 12. Create a list of entry titles whose summary includes the phrase "machine learning."

In [43]:
d2 = df.loc[df['summary'].str.contains('machine learning')]

print(d2['title'])

5     <p><em>Serverless Microservice Patterns, Organ...
8     <p><em>Serverless Microservice Patterns, Organ...
17    <p><em>Serverless Microservice Patterns, Organ...
18    <p><em>Serverless Microservice Patterns, Organ...
22    <p><em>Serverless Microservice Patterns, Organ...
27    <p><em>Serverless Microservice Patterns, Organ...
32    <p><em>Serverless Microservice Patterns, Organ...
34    <p><em>Serverless Microservice Patterns, Organ...
36    <p><em>Serverless Microservice Patterns, Organ...
Name: title, dtype: object
