# Working with RSS Feeds Lab

Complete the following set of exercises to solidify your knowledge of parsing RSS feeds and extracting information from them.

In [1]:
import feedparser

### 1. Use feedparser to parse the following RSS feed URL.

In [2]:
url = 'http://feeds.feedburner.com/oreilly/radar/atom'

In [3]:
feedburner = feedparser.parse(url)
feedburner

{'bozo': 0,
 'encoding': 'UTF-8',
 'entries': [{'author': 'Nat Torkington',
   'author_detail': {'name': 'Nat Torkington'},
   'authors': [{'name': 'Nat Torkington'}],
   'content': [{'base': 'http://feeds.feedburner.com/oreilly/radar/atom',
     'language': None,
     'type': 'text/html',
     'value': '<p><em>Personal Information, Research Data, Massive Lamba Scale, and The Moral Character of Cryptographic Work</em></p><ol>\n<li>\n<a href="https://github.com/microsoft/presidio">Presidio</a> -- recognizers for personally identifiable information, assembled into a pipeline that helps you scrub <i>sensitive text such as credit card numbers, names, locations, social security numbers, bitcoin wallets, US phone numbers, and financial data</i>.</li>\n<li>\n<a href="http://ma-graph.org/">Microsoft\'s Academic Knowledge Graph</a> -- <i>a large RDF data set with over eight billion triples with information about scientific publications and related entities, such as authors, institutions, journa

### 2. Obtain a list of components (keys) that are available for this feed.

In [4]:
feedburner.keys()

dict_keys(['feed', 'entries', 'bozo', 'headers', 'etag', 'updated', 'updated_parsed', 'href', 'status', 'encoding', 'version', 'namespaces'])

### 3. Obtain a list of components (keys) that are available for the *feed* component of this RSS feed.

In [5]:
feedburner.feed.keys()

dict_keys(['title', 'title_detail', 'id', 'guidislink', 'link', 'updated', 'updated_parsed', 'subtitle', 'subtitle_detail', 'links', 'authors', 'author_detail', 'author', 'feedburner_info', 'geo_lat', 'geo_long', 'feedburner_emailserviceid', 'feedburner_feedburnerhostname'])

### 4. Extract and print the feed title, subtitle, author, and link.

In [9]:
print (feedburner.feed.title)
print (feedburner.feed.subtitle)
print (feedburner.feed.authors)
print (feedburner.feed.link)

All - O'Reilly Media
All of our Ideas and Learning material from all of our topics.
[{'name': "O'Reilly Media"}]
https://www.oreilly.com


### 5. Count the number of entries that are contained in this RSS feed.

In [18]:
print (len(feedburner['entries']))

60


### 6. Obtain a list of components (keys) available for an entry.

*Hint: Remember to index first before requesting the keys*

In [21]:
feedburner.entries[0].keys()

dict_keys(['title', 'title_detail', 'updated', 'updated_parsed', 'id', 'guidislink', 'link', 'content', 'summary', 'links', 'authors', 'author_detail', 'author', 'feedburner_origlink'])

### 7. Extract a list of entry titles.

In [26]:
feedburner.entries[0].title

'Four short links: 27 August 2019'

In [32]:
entries_titles = [feedburner.entries[x].title for x in range(60)]
print (entries_titles)

['Four short links: 27 August 2019', 'Four short links: 26 August 2019', 'How organizations are sharpening their skills to better understand and use AI', 'Four short links: 23 August 2019', 'Four short links: 22 August 2019', 'Four short links: 21 August 2019', 'Four short links: 20 August 2019', 'Four short links: 19 August 2019', 'Antitrust regulators are using the wrong tools to break up Big Tech', 'Labeling, transforming, and structuring training data sets for machine learning', 'Four short links: 15 August 2019', 'Four short links: 14 August 2019', 'Four short links: 13 August 2019', 'Four short links: 12 August 2019', 'Blockchain solutions in enterprise', 'Four short links: 9 August 2019', 'Got speech? These guidelines will help you get started building voice applications', 'Four short links: 8 August 2019', 'New live online training courses', 'Four short links: 7 August 2019', 'Four short links: 6 August 2019', 'Four short links: 5 August 2019', 'Four short links: 2 August 2019'

### 8. Calculate the percentage of "Four short links" entry titles.

In [49]:
special_sentence = lambda x: x.startswith('Four short links')

percentage = round(sum(list(map(special_sentence,entries_titles)))/len(feedburner['entries'])*100,2)

print (f'The percentage of "Four short links" entry titles is {percentage}%.')

The percentage of "Four short links" entry titles is 58.33%.


### 9. Create a Pandas data frame from the feed's entries.

In [50]:
import pandas as pd

In [51]:
df = pd.DataFrame(feedburner['entries'])

### 10. Count the number of entries per author and sort them in descending order.

In [59]:
authors = df.groupby('author', as_index=False).agg({'title':'count'})
authors

Unnamed: 0,author,title
0,Adam Jacob,1
1,Adrian Cockcroft,1
2,Alison McCauley,1
3,Andy Oram,1
4,Arun Gupta,1
5,Ben Lorica,5
6,"Ben Lorica, Harish Doddi, David Talby",1
7,"Ben Lorica, Yishay Carmiel",1
8,Kay Williams,1
9,Mac Slocum,1


### 11. Add a new column to the data frame that contains the length (number of characters) of each entry title. Return a data frame that contains the title, author, and title length of each entry in descending order (longest title length at the top).

In [62]:
authors['title length'] = df['title'].str.len()

In [67]:
authors = authors.sort_values(by=['title length'], ascending=False)
authors.head()

Unnamed: 0,author,title,title length
16,Tiffani Bell,1,82
9,Mac Slocum,1,79
2,Alison McCauley,1,77
8,Kay Williams,1,67
14,Pete Skomoroch,1,34


### 12. Create a list of entry titles whose summary includes the phrase "machine learning."

In [82]:
import re 
new_df = df[df['summary'].str.contains('machine learning')]
list(new_df["title"])

['How organizations are sharpening their skills to better understand and use AI',
 'Labeling, transforming, and structuring training data sets for machine learning',
 'Four short links: 15 August 2019',
 'Got speech? These guidelines will help you get started building voice applications',
 'New live online training courses',
 'Four short links: 5 August 2019',
 'Learning from adversaries',
 'One simple graphic: Researchers love PyTorch and TensorFlow',
 'Acquiring and sharing high-quality data',
 "Highlights from the O'Reilly Open Source Software Conference in Portland 2019",
 'Managing machine learning in the enterprise: Lessons from banking and health care']