# Working with RSS Feeds Lab

Complete the following set of exercises to solidify your knowledge of parsing RSS feeds and extracting information from them.

In [7]:
import feedparser

In [6]:
!pip install feedparser

Collecting feedparser
  Downloading feedparser-6.0.10-py3-none-any.whl (81 kB)
[K     |████████████████████████████████| 81 kB 1.4 MB/s eta 0:00:01
[?25hCollecting sgmllib3k
  Downloading sgmllib3k-1.0.0.tar.gz (5.8 kB)
Building wheels for collected packages: sgmllib3k
  Building wheel for sgmllib3k (setup.py) ... [?25ldone
[?25h  Created wheel for sgmllib3k: filename=sgmllib3k-1.0.0-py3-none-any.whl size=6066 sha256=7d9c39436f3c9349c6351b7d4cd1353750adc021267bdd1e8b2a8a1c4520c672
  Stored in directory: /Users/hal/Library/Caches/pip/wheels/65/7a/a7/78c287f64e401255dff4c13fdbc672fed5efbfd21c530114e1
Successfully built sgmllib3k
Installing collected packages: sgmllib3k, feedparser
Successfully installed feedparser-6.0.10 sgmllib3k-1.0.0


### 1. Use feedparser to parse the following RSS feed URL.

In [8]:
url = 'http://feeds.feedburner.com/oreilly/radar/atom'

In [11]:
feeds = feedparser.parse(url)

### 2. Obtain a list of components (keys) that are available for this feed.

In [14]:
feeds.keys()

dict_keys(['bozo', 'entries', 'feed', 'headers', 'updated', 'updated_parsed', 'href', 'status', 'encoding', 'version', 'namespaces'])

In [16]:
feeds.feed.keys()

dict_keys(['title', 'title_detail', 'links', 'link', 'subtitle', 'subtitle_detail', 'updated', 'updated_parsed', 'language', 'sy_updateperiod', 'sy_updatefrequency', 'generator_detail', 'generator'])

### 3. Obtain a list of components (keys) that are available for the *feed* component of this RSS feed.

In [17]:
feeds.feed.keys()

dict_keys(['title', 'title_detail', 'links', 'link', 'subtitle', 'subtitle_detail', 'updated', 'updated_parsed', 'language', 'sy_updateperiod', 'sy_updatefrequency', 'generator_detail', 'generator'])

### 4. Extract and print the feed title, subtitle, author, and link.

In [35]:
print(feeds.feed.title)
print(feeds.feed.subtitle)
print(feeds.feed.generator_detail)
print(feeds.feed.link)

Radar
Now, next, and beyond: Tracking need-to-know trends at the intersection of business and technology
{'name': 'https://wordpress.org/?v=5.3.14'}
https://www.oreilly.com/radar


### 5. Count the number of entries that are contained in this RSS feed.

In [26]:
len(feeds.entries)

15

### 6. Obtain a list of components (keys) available for an entry.

*Hint: Remember to index first before requesting the keys*

In [27]:
feeds.entries[0].keys()

dict_keys(['title', 'title_detail', 'links', 'link', 'comments', 'published', 'published_parsed', 'authors', 'author', 'author_detail', 'tags', 'id', 'guidislink', 'summary', 'summary_detail', 'content', 'wfw_commentrss', 'slash_comments'])

### 7. Extract a list of entry titles.

In [30]:
feeds.entries[0].title

'Automating the Automators: Shift Change in the Robot Factory'

In [32]:
title = [feeds.entries[i].title for i in range(len(feeds.entries))]
title

['Automating the Automators: Shift Change in the Robot Factory',
 'Digesting 2022',
 'Radar Trends to Watch: January 2023',
 'What Does Copyright Say about Generative Models?',
 'Radar Trends to Watch: December 2022',
 'AI’s ‘SolarWinds Moment’ Will Occur; It’s Just a Matter of When',
 'Technical Health Isn’t Optional',
 'Healthy Data',
 'Formal Informal Languages',
 'Radar Trends to Watch: November 2022',
 'What We Learned Auditing Sophisticated AI for Bias',
 'The Collaborative Metaverse',
 'What Is Hyperautomation?',
 'Radar Trends to Watch: October 2022',
 'The Problem with Intelligence']

In [33]:
len(title)

15

### 8. Calculate the percentage of "Four short links" entry titles.

In [40]:
feeds.entries[0].link

'https://www.oreilly.com/radar/automating-the-automators-shift-change-in-the-robot-factory/'

In [42]:
feeds.entries[0].links

[{'rel': 'alternate',
  'type': 'text/html',
  'href': 'https://www.oreilly.com/radar/automating-the-automators-shift-change-in-the-robot-factory/'}]

In [44]:
links = [feeds.entries[i].link for i in range(len(feeds.entries))]
links

['https://www.oreilly.com/radar/automating-the-automators-shift-change-in-the-robot-factory/',
 'https://www.oreilly.com/radar/digesting-2022/',
 'https://www.oreilly.com/radar/radar-trends-to-watch-january-2023/',
 'https://www.oreilly.com/radar/what-does-copyright-say-about-generative-models/',
 'https://www.oreilly.com/radar/radar-trends-to-watch-december-2022/',
 'https://www.oreilly.com/radar/ais-solarwinds-moment-will-occur-its-just-a-matter-of-when/',
 'https://www.oreilly.com/radar/technical-health-isnt-optional/',
 'https://www.oreilly.com/radar/healthy-data/',
 'https://www.oreilly.com/radar/formal-informal-languages/',
 'https://www.oreilly.com/radar/radar-trends-to-watch-november-2022/',
 'https://www.oreilly.com/radar/what-we-learned-auditing-sophisticated-ai-for-bias/',
 'https://www.oreilly.com/radar/the-collaborative-metaverse/',
 'https://www.oreilly.com/radar/what-is-hyperautomation/',
 'https://www.oreilly.com/radar/radar-trends-to-watch-october-2022/',
 'https://www

In [45]:
links2 = [feeds.entries[i].links for i in range(len(feeds.entries))]
links2

[[{'rel': 'alternate',
   'type': 'text/html',
   'href': 'https://www.oreilly.com/radar/automating-the-automators-shift-change-in-the-robot-factory/'}],
 [{'rel': 'alternate',
   'type': 'text/html',
   'href': 'https://www.oreilly.com/radar/digesting-2022/'}],
 [{'rel': 'alternate',
   'type': 'text/html',
   'href': 'https://www.oreilly.com/radar/radar-trends-to-watch-january-2023/'}],
 [{'rel': 'alternate',
   'type': 'text/html',
   'href': 'https://www.oreilly.com/radar/what-does-copyright-say-about-generative-models/'}],
 [{'rel': 'alternate',
   'type': 'text/html',
   'href': 'https://www.oreilly.com/radar/radar-trends-to-watch-december-2022/'}],
 [{'rel': 'alternate',
   'type': 'text/html',
   'href': 'https://www.oreilly.com/radar/ais-solarwinds-moment-will-occur-its-just-a-matter-of-when/'}],
 [{'rel': 'alternate',
   'type': 'text/html',
   'href': 'https://www.oreilly.com/radar/technical-health-isnt-optional/'}],
 [{'rel': 'alternate',
   'type': 'text/html',
   'href': 

### 9. Create a Pandas data frame from the feed's entries.

In [36]:
import pandas as pd

In [47]:
df = pd.DataFrame(feeds.entries)
df.head()

Unnamed: 0,title,title_detail,links,link,comments,published,published_parsed,authors,author,author_detail,tags,id,guidislink,summary,summary_detail,content,wfw_commentrss,slash_comments
0,Automating the Automators: Shift Change in the...,"{'type': 'text/plain', 'language': None, 'base...","[{'rel': 'alternate', 'type': 'text/html', 'hr...",https://www.oreilly.com/radar/automating-the-a...,https://www.oreilly.com/radar/automating-the-a...,"Tue, 17 Jan 2023 11:33:31 +0000","(2023, 1, 17, 11, 33, 31, 1, 17, 0)",[{'name': 'Q McCallum'}],Q McCallum,{'name': 'Q McCallum'},"[{'term': 'AI & ML', 'scheme': None, 'label': ...",https://www.oreilly.com/radar/?p=14841,False,What would you say is the job of a software de...,"{'type': 'text/html', 'language': None, 'base'...","[{'type': 'text/html', 'language': None, 'base...",https://www.oreilly.com/radar/automating-the-a...,0
1,Digesting 2022,"{'type': 'text/plain', 'language': None, 'base...","[{'rel': 'alternate', 'type': 'text/html', 'hr...",https://www.oreilly.com/radar/digesting-2022/,https://www.oreilly.com/radar/digesting-2022/#...,"Tue, 10 Jan 2023 13:37:13 +0000","(2023, 1, 10, 13, 37, 13, 1, 10, 0)",[{'name': 'Mike Loukides'}],Mike Loukides,{'name': 'Mike Loukides'},"[{'term': 'AI & ML', 'scheme': None, 'label': ...",https://www.oreilly.com/radar/?p=14837,False,Although I don’t subscribe to the idea that hi...,"{'type': 'text/html', 'language': None, 'base'...","[{'type': 'text/html', 'language': None, 'base...",https://www.oreilly.com/radar/digesting-2022/f...,0
2,Radar Trends to Watch: January 2023,"{'type': 'text/plain', 'language': None, 'base...","[{'rel': 'alternate', 'type': 'text/html', 'hr...",https://www.oreilly.com/radar/radar-trends-to-...,https://www.oreilly.com/radar/radar-trends-to-...,"Wed, 04 Jan 2023 11:53:08 +0000","(2023, 1, 4, 11, 53, 8, 2, 4, 0)",[{'name': 'Mike Loukides'}],Mike Loukides,{'name': 'Mike Loukides'},"[{'term': 'Radar Trends', 'scheme': None, 'lab...",https://www.oreilly.com/radar/?p=14826,False,"Perhaps unsurprisingly, December was a slow mo...","{'type': 'text/html', 'language': None, 'base'...","[{'type': 'text/html', 'language': None, 'base...",https://www.oreilly.com/radar/radar-trends-to-...,0
3,What Does Copyright Say about Generative Models?,"{'type': 'text/plain', 'language': None, 'base...","[{'rel': 'alternate', 'type': 'text/html', 'hr...",https://www.oreilly.com/radar/what-does-copyri...,https://www.oreilly.com/radar/what-does-copyri...,"Tue, 13 Dec 2022 12:22:38 +0000","(2022, 12, 13, 12, 22, 38, 1, 347, 0)",[{'name': 'Mike Loukides'}],Mike Loukides,{'name': 'Mike Loukides'},"[{'term': 'Artificial Intelligence', 'scheme':...",https://www.oreilly.com/radar/?p=14806,False,The current generation of flashy AI applicatio...,"{'type': 'text/html', 'language': None, 'base'...","[{'type': 'text/html', 'language': None, 'base...",https://www.oreilly.com/radar/what-does-copyri...,0
4,Radar Trends to Watch: December 2022,"{'type': 'text/plain', 'language': None, 'base...","[{'rel': 'alternate', 'type': 'text/html', 'hr...",https://www.oreilly.com/radar/radar-trends-to-...,https://www.oreilly.com/radar/radar-trends-to-...,"Tue, 06 Dec 2022 12:21:48 +0000","(2022, 12, 6, 12, 21, 48, 1, 340, 0)",[{'name': 'Mike Loukides'}],Mike Loukides,{'name': 'Mike Loukides'},"[{'term': 'Radar Trends', 'scheme': None, 'lab...",https://www.oreilly.com/radar/?p=14799,False,This month’s news has been overshadowed by the...,"{'type': 'text/html', 'language': None, 'base'...","[{'type': 'text/html', 'language': None, 'base...",https://www.oreilly.com/radar/radar-trends-to-...,0


In [48]:
df.shape

(15, 18)

### 10. Count the number of entries per author and sort them in descending order.

In [53]:
authors = df.groupby('author', as_index=False).agg({'title':'count'})
authors.columns = ['author', 'entries']
authors.sort_values('entries', ascending=False)

Unnamed: 0,author,entries
1,Mike Loukides,12
0,Mike Barlow,1
2,Patrick Hall,1
3,Q McCallum,1


### 11. Add a new column to the data frame that contains the length (number of characters) of each entry title. Return a data frame that contains the title, author, and title length of each entry in descending order (longest title length at the top).

In [56]:
df['len'] = df.title.apply(lambda x: len(x))
df['len']

0     60
1     14
2     35
3     48
4     36
5     63
6     31
7     12
8     25
9     36
10    50
11    27
12    24
13    35
14    29
Name: len, dtype: int64

In [58]:
df[['title','author','len']].sort_values('len',ascending=False)

Unnamed: 0,title,author,len
5,AI’s ‘SolarWinds Moment’ Will Occur; It’s Just...,Mike Barlow,63
0,Automating the Automators: Shift Change in the...,Q McCallum,60
10,What We Learned Auditing Sophisticated AI for ...,Patrick Hall,50
3,What Does Copyright Say about Generative Models?,Mike Loukides,48
4,Radar Trends to Watch: December 2022,Mike Loukides,36
9,Radar Trends to Watch: November 2022,Mike Loukides,36
2,Radar Trends to Watch: January 2023,Mike Loukides,35
13,Radar Trends to Watch: October 2022,Mike Loukides,35
6,Technical Health Isn’t Optional,Mike Loukides,31
14,The Problem with Intelligence,Mike Loukides,29


### 12. Create a list of entry titles whose summary includes the phrase "machine learning."

In [73]:
df['flag']=df.summary.apply(lambda x: 'Yes' if 'machine learning' in x.lower() else 'No')

In [74]:
df['flag']

0      No
1      No
2      No
3      No
4      No
5      No
6      No
7      No
8      No
9      No
10     No
11     No
12     No
13    Yes
14     No
Name: flag, dtype: object

In [75]:
list(df[df['flag']=='Yes']['title'])

['Radar Trends to Watch: October 2022']