# Working with RSS Feeds Lab

Complete the following set of exercises to solidify your knowledge of parsing RSS feeds and extracting information from them.

In [1]:
import feedparser

### 1. Use feedparser to parse the following RSS feed URL.

In [2]:
url = 'http://feeds.feedburner.com/oreilly/radar/atom'

In [3]:
feedburner = feedparser.parse(url)

### 2. Obtain a list of components (keys) that are available for this feed.

In [18]:
feedburner.keys()

dict_keys(['feed', 'entries', 'bozo', 'headers', 'etag', 'updated', 'updated_parsed', 'href', 'status', 'encoding', 'version', 'namespaces'])

### 3. Obtain a list of components (keys) that are available for the *feed* component of this RSS feed.

In [19]:
feedburner.feed.keys()

dict_keys(['title', 'title_detail', 'links', 'link', 'subtitle', 'subtitle_detail', 'updated', 'updated_parsed', 'language', 'sy_updateperiod', 'sy_updatefrequency', 'generator_detail', 'generator', 'feedburner_info', 'geo_lat', 'geo_long', 'feedburner_emailserviceid', 'feedburner_feedburnerhostname'])

### 4. Extract and print the feed title, subtitle, author, and link.

In [34]:
for key in feedburner.feed.keys():
    if key in ['title', 'subtitle', 'author', 'link']:
        print(feedburner.feed[key])

Radar
https://www.oreilly.com/radar
Now, next, and beyond: Tracking need-to-know trends at the intersection of business and technology


### 5. Count the number of entries that are contained in this RSS feed.

In [39]:
len(feedburner.entries)

60

### 6. Obtain a list of components (keys) available for an entry.

*Hint: Remember to index first before requesting the keys*

In [40]:
feedburner.entries[0].keys()

dict_keys(['title', 'title_detail', 'links', 'link', 'comments', 'published', 'published_parsed', 'authors', 'author', 'author_detail', 'tags', 'id', 'guidislink', 'summary', 'summary_detail', 'content', 'wfw_commentrss', 'slash_comments', 'feedburner_origlink'])

### 7. Extract a list of entry titles.

In [43]:
lst=[]
for entry in feedburner.entries:
    lst.append(entry.title)

### 8. Calculate the percentage of "Four short links" entry titles.

In [48]:
c = 0
for element in lst:
    if 'Four short' in element:
        c+=1
print(f'{(c/len(lst))*100}')

51.66666666666667


### 9. Create a Pandas data frame from the feed's entries.

In [51]:
import pandas as pd

In [58]:
df = pd.DataFrame(feedburner.entries)

### 10. Count the number of entries per author and sort them in descending order.

In [63]:
df['author'].value_counts()

Nat Torkington                                    32
Mike Loukides                                     16
                                                   4
Q Ethan McCallum and Mike Loukides                 1
Justin Norman and Mike Loukides                    1
Kevlin Henney                                      1
Matthew Rocklin and Hugo Bowne-Anderson            1
Q Ethan McCallum, Chris Butler and Shane Glynn     1
Patrick Hall and Ayoub Ouederni                    1
Tim O’Reilly                                       1
Alex Castrounis                                    1
Name: author, dtype: int64

### 11. Add a new column to the data frame that contains the length (number of characters) of each entry title. Return a data frame that contains the title, author, and title length of each entry in descending order (longest title length at the top).

In [77]:
df['Number of characters'] = df['title'].str.len()

df[['title','author','Number of characters']].sort_values('Number of characters',ascending=False)

Unnamed: 0,title,author,Number of characters
58,Why Best-of-Breed is a Better Choice than All-...,Matthew Rocklin and Hugo Bowne-Anderson,79
8,"Where Programming, Ops, AI, and the Cloud are ...",Mike Loukides,60
5,5 infrastructure and operations trends to watc...,,55
14,O’Reilly’s top 20 live online training courses...,,54
4,5 things on our data and AI radar for 2021,,42
9,Seven Legal Questions for Data Scientists,Patrick Hall and Ayoub Ouederni,41
0,The End of Silicon Valley as We Know It?,Tim O’Reilly,40
36,AI Product Management After Deployment,Justin Norman and Mike Loukides,38
52,Radar trends to watch: September 2020,Mike Loukides,37
7,Radar trends to watch: February 2021,Mike Loukides,36


### 12. Create a list of entry titles whose summary includes the phrase "machine learning."

In [103]:
df[df[['summary']].apply( (lambda row: True if 'machine learning' in row['summary'] else False) ,  axis=1 )]['title']

9    Seven Legal Questions for Data Scientists
Name: title, dtype: object