# Working with RSS Feeds Lab

Complete the following set of exercises to solidify your knowledge of parsing RSS feeds and extracting information from them.

In [1]:
import feedparser

### 1. Use feedparser to parse the following RSS feed URL.

In [2]:
url = 'http://feeds.feedburner.com/oreilly/radar/atom'

In [3]:
reddit = feedparser.parse(url)

### 2. Obtain a list of components (keys) that are available for this feed.

In [4]:
reddit.keys()

dict_keys(['bozo', 'entries', 'feed', 'headers', 'etag', 'updated', 'updated_parsed', 'href', 'status', 'encoding', 'version', 'namespaces'])

### 3. Obtain a list of components (keys) that are available for the *feed* component of this RSS feed.

In [5]:
reddit.feed.keys()

dict_keys(['title', 'title_detail', 'links', 'link', 'subtitle', 'subtitle_detail', 'updated', 'updated_parsed', 'language', 'sy_updateperiod', 'sy_updatefrequency', 'generator_detail', 'generator', 'feedburner_info', 'geo_lat', 'geo_long', 'feedburner_emailserviceid', 'feedburner_feedburnerhostname'])

### 4. Extract and print the feed title, subtitle, author, and link.

In [11]:
print(reddit.feed.title)
print(reddit.feed.subtitle)
print(reddit.feed.link)
print(reddit.entries[0].author)

Radar
Now, next, and beyond: Tracking need-to-know trends at the intersection of business and technology
https://www.oreilly.com/radar
Q McCallum


### 5. Count the number of entries that are contained in this RSS feed.

In [12]:
len(reddit.entries)

15

### 6. Obtain a list of components (keys) available for an entry.

*Hint: Remember to index first before requesting the keys*

In [13]:
reddit.entries[0].keys()

dict_keys(['title', 'title_detail', 'links', 'link', 'comments', 'published', 'published_parsed', 'authors', 'author', 'author_detail', 'tags', 'id', 'guidislink', 'summary', 'summary_detail', 'content', 'wfw_commentrss', 'slash_comments'])

### 7. Extract a list of entry titles.

In [14]:
[x.title for x in reddit.entries]

['Building a Better Middleman',
 'Quantum Computing without the Hype',
 'Radar trends to watch: May 2022',
 'Building a Better Middleman',
 'The General Purpose Pendulum',
 'Radar trends to watch: April 2022',
 'AI Adoption in the Enterprise 2022',
 'D-Day in Kyiv',
 'The Future of Security',
 'Identity problems get bigger in the metaverse',
 'Recommendations for all of us',
 'Epstein Barr and the Cause of Cause',
 'Radar trends to watch: March 2022',
 'Intelligence and Comprehension',
 'The Human Web']

### 8. Calculate the percentage of "Four short links" entry titles.

In [22]:
count = 0
total = len(reddit.entries)

for x in reddit.entries:
    if 'Four short links' in x.title:
        count += 1

percentage = (count/total)*100
percentage

0.0

### 9. Create a Pandas data frame from the feed's entries.

In [15]:
import pandas as pd

In [17]:
df = pd.DataFrame(reddit.entries)

In [49]:
df.head()

Unnamed: 0,title,title_detail,links,link,comments,published,published_parsed,authors,author,author_detail,tags,id,guidislink,summary,summary_detail,content,wfw_commentrss,slash_comments,title_length
0,Building a Better Middleman,"{'type': 'text/plain', 'language': None, 'base...","[{'rel': 'alternate', 'type': 'text/html', 'hr...",https://www.oreilly.com/radar/building-a-bette...,https://www.oreilly.com/radar/building-a-bette...,"Tue, 17 May 2022 10:58:32 +0000","(2022, 5, 17, 10, 58, 32, 1, 137, 0)",[{'name': 'Q McCallum'}],Q McCallum,{'name': 'Q McCallum'},"[{'term': 'Operations', 'scheme': None, 'label...",https://www.oreilly.com/radar/?p=14497,False,"In the previous article, I explored the role o...","{'type': 'text/html', 'language': None, 'base'...","[{'type': 'text/html', 'language': None, 'base...",https://www.oreilly.com/radar/building-a-bette...,0,27
1,Quantum Computing without the Hype,"{'type': 'text/plain', 'language': None, 'base...","[{'rel': 'alternate', 'type': 'text/html', 'hr...",https://www.oreilly.com/radar/quantum-computin...,https://www.oreilly.com/radar/quantum-computin...,"Tue, 10 May 2022 11:45:05 +0000","(2022, 5, 10, 11, 45, 5, 1, 130, 0)",[{'name': 'Mike Loukides'}],Mike Loukides,{'name': 'Mike Loukides'},"[{'term': 'Big Data Tools and Pipelines', 'sch...",https://www.oreilly.com/radar/?p=14492,False,"Several weeks ago, I had a great conversation ...","{'type': 'text/html', 'language': None, 'base'...","[{'type': 'text/html', 'language': None, 'base...",https://www.oreilly.com/radar/quantum-computin...,0,34
2,Radar trends to watch: May 2022,"{'type': 'text/plain', 'language': None, 'base...","[{'rel': 'alternate', 'type': 'text/html', 'hr...",https://www.oreilly.com/radar/radar-trends-to-...,https://www.oreilly.com/radar/radar-trends-to-...,"Tue, 03 May 2022 11:19:02 +0000","(2022, 5, 3, 11, 19, 2, 1, 123, 0)",[{'name': 'Mike Loukides'}],Mike Loukides,{'name': 'Mike Loukides'},"[{'term': 'Radar Trends', 'scheme': None, 'lab...",https://www.oreilly.com/radar/?p=14482,False,April was the month for large language models....,"{'type': 'text/html', 'language': None, 'base'...","[{'type': 'text/html', 'language': None, 'base...",https://www.oreilly.com/radar/radar-trends-to-...,0,31
3,Building a Better Middleman,"{'type': 'text/plain', 'language': None, 'base...","[{'rel': 'alternate', 'type': 'text/html', 'hr...",https://www.oreilly.com/radar/building-a-bette...,https://www.oreilly.com/radar/building-a-bette...,"Tue, 19 Apr 2022 12:22:21 +0000","(2022, 4, 19, 12, 22, 21, 1, 109, 0)",[{'name': 'Q McCallum'}],Q McCallum,{'name': 'Q McCallum'},"[{'term': 'Operations', 'scheme': None, 'label...",https://www.oreilly.com/radar/?p=14442,False,What comes to mind when you hear the term &#82...,"{'type': 'text/html', 'language': None, 'base'...","[{'type': 'text/html', 'language': None, 'base...",https://www.oreilly.com/radar/building-a-bette...,0,27
4,The General Purpose Pendulum,"{'type': 'text/plain', 'language': None, 'base...","[{'rel': 'alternate', 'type': 'text/html', 'hr...",https://www.oreilly.com/radar/the-general-purp...,https://www.oreilly.com/radar/the-general-purp...,"Tue, 12 Apr 2022 11:59:19 +0000","(2022, 4, 12, 11, 59, 19, 1, 102, 0)",[{'name': 'Mike Loukides'}],Mike Loukides,{'name': 'Mike Loukides'},"[{'term': 'Hardware', 'scheme': None, 'label':...",https://www.oreilly.com/radar/?p=14436,False,"Pendulums do what they do: they swing one way,...","{'type': 'text/html', 'language': None, 'base'...","[{'type': 'text/html', 'language': None, 'base...",https://www.oreilly.com/radar/the-general-purp...,0,28


### 10. Count the number of entries per author and sort them in descending order.

In [24]:
authors = df.groupby('author').agg({'title':'count'}).sort_values('title', ascending=False)
authors

Unnamed: 0_level_0,title
author,Unnamed: 1_level_1
Mike Loukides,9
Chris Butler,2
Q McCallum,2
Christina Morillo,1
Jeffrey Carr,1


### 11. Add a new column to the data frame that contains the length (number of characters) of each entry title. Return a data frame that contains the title, author, and title length of each entry in descending order (longest title length at the top).

In [25]:
df['title_length'] = df['title'].apply(len)
df[['title', 'author', 'title_length']].sort_values('title_length', ascending=False).head()

Unnamed: 0,title,author,title_length
9,Identity problems get bigger in the metaverse,Chris Butler,45
11,Epstein Barr and the Cause of Cause,Mike Loukides,35
1,Quantum Computing without the Hype,Mike Loukides,34
6,AI Adoption in the Enterprise 2022,Mike Loukides,34
5,Radar trends to watch: April 2022,Mike Loukides,33


### 12. Create a list of entry titles whose summary includes the phrase "machine learning."

In [69]:
df.summary

0     In the previous article, I explored the role o...
1     Several weeks ago, I had a great conversation ...
2     April was the month for large language models....
3     What comes to mind when you hear the term &#82...
4     Pendulums do what they do: they swing one way,...
5     March was a busy month, especially for develop...
6     In December 2021 and January 2022, we asked re...
7     My experience working with Ukraine’s Offensive...
8     The future of cybersecurity is being shaped by...
9     If the hype surrounding the metaverse results ...
10    If you live in a household with a communal dev...
11    One of the most intriguing news stories of the...
12    February was a short month, but it wasn’t shor...
13    I haven’t written much about AI recently. But ...
14    A few days ago, I recommended that Tim O&#8217...
Name: summary, dtype: object

In [103]:
import re

machine_titles=[]
    
for x in df.index:
    if re.findall(r'machine learning', df.loc[x].summary):
        machine_titles.append(df.loc[x].title)
        
machine_titles

[]

In [106]:
#no sé por qué no funciona con machine learning

import re

machine_titles=[]
    
for x in df.index:
    if re.findall(r'what', df.loc[x].summary):
        machine_titles.append(df.loc[x].title)
        
machine_titles

['Quantum Computing without the Hype',
 'The General Purpose Pendulum',
 'AI Adoption in the Enterprise 2022']