In [2]:
# Python 2 & 3 Compatibility
from __future__ import print_function, division

Task:  
----------
As journalist students, we are given the assignment of identifying the main themes/topics discussed on Twitter in November 2018 related to the United Nations Millennium Development Goals (MDGs). In other words, the task involves the following:
1. Parse structured HTML (Twitter search results on MDGs), 
2. Extract the top most retweeted tweets including the tweeters' name, text, pictures, links, and other details
3. Save everything in a CSV file for further analysis (to work on that in the next workshop)

#### Steps:
1. Go to [Twitter advanced search](https://twitter.com/search-advanced) for the term "Millennium Development Goals" and confine the search to November 2018.
2. Click on the 'Latest' tab.
3. Flip through the pages until no new results appear.
4. Through the browser menu, go to 'View' menu and click on 'Developer Tools' and then open the 'Element' tab
5. Go to the second line from top that starts with <html and right click on Copy | Copy outerHTML
6. Create a new plain text file (using any text editor) and then paste the clipboard data.
7. Save the file locally to the data/twitter_results.html file
8. Follow the rest of this exercise to extract the data.

#### A note about copyright
Since our exercise is used for fair use, i.e., not for commercial or other purposes, it is OK to scrape web pages from Twitter. In our case, we will be downloading a page to our own hard drive and do scraping offline. For more guidance about scraping copyrighted content, see: https://www.eff.org/document/fair-use-presentation-one-pager




### Beautiful Soup to the rescue!
* BeautifulSoup (bs4) is a library that allows you to extract data from html with ease.
* It sits atop an HTML or XML parser, 
* providing Pythonic idioms for iterating, searching, and modifying the parse tree
* handles text encodings automatically (always utf-8 out)
Beautiful Soup [Documentation](https://www.crummy.com/software/BeautifulSoup/bs4/doc/):  

In [6]:
from bs4 import BeautifulSoup

We now use the open() function to point the twitter results file [twitter_results.html](../data/twitter_results.html) to the file variable content_file and save the file's content into a string variable named 'content' as shown below:

In [7]:
with open('data/twitter_results.html', 'r') as content_file:
    content = content_file.read()

In [8]:
soup = BeautifulSoup(content, 'html.parser')

#### Great! Now we have a soup object, which means we could use all the soup magic to parse the html contained in the soup

## Find elements from page
Beautiful Soup defines a lot of methods for searching the parse tree, but they’re all very similar. The two most popular methods: find() and find_all().  
* soup.find( )
* soup.find_all( )  -- the most popular API, hence there is a shortcut

The two APIs are almost exactly the same.  

Let's try out some common variations of `soup.find()`

## `soup.find()`

#### soup.find() returns the first matched tag it finds. It searches the entire tree. It only returns the first match and stops searching.

In [9]:
# You can  find the first element that has a particular id for example (by using =True)
soup.find(id=True)

<style id="react-native-stylesheet"></style>

In [10]:
# You can search for a type of tag by using the tag as a string
# (like 'body','div','p','a') as an argument.

print(soup.find('a'))

# Equivalently:
print(soup.a)

<a class="u-hiddenVisually focusable" href="#timeline">Skip to content</a>
<a class="u-hiddenVisually focusable" href="#timeline">Skip to content</a>


In [11]:
# Let's try again: to extract the page title
soup.find('title')

<title>"Millennium Development Goals" since:2018-11-01 until:2018-11-29 - Twitter Search</title>

In [12]:
# The above is a 'tag' object. To get the text content below that tag, use .get_text() as below:
print (soup.find('title').get_text())

"Millennium Development Goals" since:2018-11-01 until:2018-11-29 - Twitter Search


In [13]:
# retrieve the url from an anchor tag
soup.find('a')['href']

'#timeline'

In [14]:
# You can match the first element with an attribute like an id or class, e.g., an element with a 'GalleryTweet' class
soup.find(class_="GalleryTweet")

<div class="GalleryTweet"></div>

## Partial matches and regular expressions
#### *Regular expressions* are a special sequence of characters that can match or find specific text using a specialized syntax held in a pattern.
#### To use regular expressions, you need to import the 're' python module and use 're.compile('regular expression goes here')

In [15]:
import re

In [18]:
# For example, if you are interested in identifying all the classes that contain the word 'Tweet', use:
soup.find(class_=re.compile('.*dropdown.*'))

[<div class="ProfileTweet-action ProfileTweet-action--more js-more-ProfileTweet-actions">
 <div class="dropdown">
 <button aria-haspopup="true" class="ProfileTweet-actionButton u-textUserColorHover dropdown-toggle js-dropdown-toggle" type="button">
 <div class="IconContainer js-tooltip" title="More">
 <span class="Icon Icon--caretDownLight Icon--small"></span>
 <span class="u-hiddenVisually">More</span>
 </div>
 </button>
 <div class="dropdown-menu is-autoCentered">
 <div class="dropdown-caret">
 <div class="caret-outer"></div>
 <div class="caret-inner"></div>
 </div>
 <ul>
 <li class="copy-link-to-tweet js-actionCopyLinkToTweet">
 <button class="dropdown-link" type="button">Copy link to Tweet</button>
 </li>
 <li class="embed-link js-actionEmbedTweet" data-nav="embed_tweet">
 <button class="dropdown-link" type="button">Embed Tweet</button>
 </li>
 <li class="mute-user-item"><button class="dropdown-link" type="button">Mute <span class="username u-dir u-textTruncate" dir="ltr">@<b>GSTIC

##### Check the full documentation about regular expressions at https://docs.python.org/2/library/re.html

## `soup.find_all()`

##### `soup.find_all()` works just like `soup.find()`, but returns a list of all matches.

In [13]:
soup.find_all('title')

[<title>"Millennium Development Goals" since:2018-11-01 until:2018-11-29 - Twitter Search</title>]

In [14]:
# You can search all links with a particular href pattern, let's find links that refer to any United Nations account
import re
soup.find_all('a', href=re.compile(r'UN'))

[<a class="account-group js-recommend-link js-user-profile-link user-thumb" data-user-id="121486767" href="/UNpartnerships" rel="noopener">
 <img alt="" class="avatar js-action-profile-avatar " src="https://pbs.twimg.com/profile_images/744879716/UNOP_Twitter_bigger.png"/>
 <span class="account-group-inner" data-user-id="121486767">
 <strong class="fullname">UNOP</strong><span class="UserBadges"></span><span class="UserNameBreak"> </span><span class="username u-dir u-textTruncate" dir="ltr">@<b>UNpartnerships</b></span>
 </span>
 </a>,
 <a class="twitter-atreply pretty-link js-nav" data-mentioned-user-id="17463923" dir="ltr" href="/UNFCCC"><s>@</s><b>UNFCCC</b></a>,
 <a class="twitter-atreply pretty-link js-nav" data-mentioned-user-id="20646711" dir="ltr" href="/UNESCO"><s>@</s><b>UNESCO</b></a>,
 <a class="twitter-atreply pretty-link js-nav" data-mentioned-user-id="38146999" dir="ltr" href="/UNEnvironment"><s>@</s><b>UNEnvironment</b></a>,
 <a class="twitter-atreply pretty-link js-nav"

In [15]:
# How many times does the word "technology" appear?
len(soup.find_all(string=re.compile('technology')))

2

In [16]:
soup.find_all(string=re.compile('technology'))

[', and what is the role of science, technology and innovation in reaching these goals. ',
 '2018/11/17/in-poor-countries-technology-can-make-big-improvements-to-education?frsc=dg%7Ce']

In [17]:
# The page layout suggests that the stream of tweets exist in the class ('stream'). 
#Hence, we can only select that part and remove everything else from the page by creating a new variable called tw_stream:
tw_stream=soup.find(class_='stream')


Furthermore, looking at the HTML source of our data, we can  see that each tweet in the streem is encapsulated within a div element with the 'content' class as shown below:

<div class="content">
      <div class="stream-item-header">
          <a class="account-group js-account-group js-action-profile js-user-profile-link js-nav" href="/GSTICseries" data-user-id="875312126893711361">
      <img class="avatar js-action-profile-avatar" src="https://pbs.twimg.com/profile_images/875623570436575233/tndS3CqJ_bigger.jpg" alt="">
    <span class="FullNameGroup">
      <strong class="fullname show-popup-with-id u-textTruncate " data-aria-label-part="">G-STIC 2018 <span class="Emoji Emoji--forLinks" style="background-image:url('https://abs.twimg.com/emoji/v2/72x72/1f4c6.png')" title="Tear-off calendar" aria-label="Emoji: Tear-off calendar">&nbsp;</span><span class="visuallyhidden" aria-hidden="true">📆</span>28-29-30 Nov<span class="Emoji Emoji--forLinks" style="background-image:url('https://abs.twimg.com/emoji/v2/72x72/1f4cd.png')" title="Round pushpin" aria-label="Emoji: Round pushpin">&nbsp;</span><span class="visuallyhidden" aria-hidden="true">📍</span>Brussels <span class="Emoji Emoji--forLinks" style="background-image:url('https://abs.twimg.com/emoji/v2/72x72/1f39f.png')" title="Admission tickets" aria-label="Emoji: Admission tickets">&nbsp;</span><span class="visuallyhidden" aria-hidden="true">🎟️</span>gstic.org</strong><span>‏</span><span class="UserBadges"></span><span class="UserNameBreak">&nbsp;</span></span><span class="username u-dir u-textTruncate" dir="ltr" data-aria-label-part="">@<b>GSTICseries</b></span></a>
        <small class="time">
  <a href="/GSTICseries/status/1067786677383544833" class="tweet-timestamp js-permalink js-nav js-tooltip" title="1:25 PM - 28 Nov 2018" data-conversation-id="1067786677383544833"><span class="_timestamp js-short-timestamp " data-aria-label-part="last" data-time="1543415157" data-time-ms="1543415157000" data-long-form="true">Nov 28</span></a>
</small>
          <div class="ProfileTweet-action ProfileTweet-action--more js-more-ProfileTweet-actions">
    <div class="dropdown">
  <button class="ProfileTweet-actionButton u-textUserColorHover dropdown-toggle js-dropdown-toggle" type="button" aria-haspopup="true">
      <div class="IconContainer js-tooltip" title="More">
        <span class="Icon Icon--caretDownLight Icon--small"></span>
        <span class="u-hiddenVisually">More</span>
      </div>
  </button>
  <div class="dropdown-menu is-autoCentered">
  <div class="dropdown-caret">
    <div class="caret-outer"></div>
    <div class="caret-inner"></div>
  </div>
  <ul>
      <li class="copy-link-to-tweet js-actionCopyLinkToTweet">
        <button type="button" class="dropdown-link">Copy link to Tweet</button>
      </li>
      <li class="embed-link js-actionEmbedTweet" data-nav="embed_tweet">
        <button type="button" class="dropdown-link">Embed Tweet</button>
      </li>
          <li class="mute-user-item"><button type="button" class="dropdown-link">Mute <span class="username u-dir u-textTruncate" dir="ltr">@<b>GSTICseries</b></span></button></li>
    <li class="unmute-user-item"><button type="button" class="dropdown-link">Unmute <span class="username u-dir u-textTruncate" dir="ltr">@<b>GSTICseries</b></span></button></li>
        <li class="block-link js-actionBlock" data-nav="block">
          <button type="button" class="dropdown-link">Block <span class="username u-dir u-textTruncate" dir="ltr">@<b>GSTICseries</b></span></button>
        </li>
        <li class="unblock-link js-actionUnblock" data-nav="unblock">
          <button type="button" class="dropdown-link">Unblock <span class="username u-dir u-textTruncate" dir="ltr">@<b>GSTICseries</b></span></button>
        </li>
      <li class="report-link js-actionReport" data-nav="report">
        <button type="button" class="dropdown-link">
            Report Tweet
        </button>
      </li>
      <li class="dropdown-divider"></li>
      <li class="js-actionMomentMakerAddTweetToOtherMoment MomentMakerAddTweetToOtherMoment">
        <button type="button" class="dropdown-link">Add to other Moment</button>
      </li>
      <li class="js-actionMomentMakerCreateMoment">
        <button type="button" class="dropdown-link">Add to new Moment</button>
      </li>
  </ul>
</div>
</div>
  </div>
      </div>
        <div class="js-tweet-text-container">
  <p class="TweetTextSize  js-tweet-text tweet-text" lang="en" data-aria-label-part="0">All fueled up? Afternoon session (Waste)<a href="/hashtag/Water?src=hash" data-query-source="hashtag_click" class="twitter-hashtag pretty-link js-nav" dir="ltr"><s>#</s><b>Water</b></a> as a resource. Special focus on SDG 6. Purpose: review progress of water coming towards <a href="/hashtag/SDGs?src=hash" data-query-source="hashtag_click" class="twitter-hashtag pretty-link js-nav" dir="ltr"><s>#</s><b>SDGs</b></a>, reflecting on <strong>millennium development goals</strong>, and what is the role of science, technology and innovation in reaching these goals. <a href="/hashtag/GSTIC?src=hash" data-query-source="hashtag_click" class="twitter-hashtag pretty-link js-nav" dir="ltr"><s>#</s><b>GSTIC</b></a><a href="https://t.co/vXeKzTUBsG" class="twitter-timeline-link u-hidden" data-pre-embedded="true" dir="ltr">pic.twitter.com/vXeKzTUBsG</a></p>
</div>
            <div class="AdaptiveMediaOuterContainer">
    <div class="AdaptiveMedia is-square">
      <div class="AdaptiveMedia-container">
          <div class="AdaptiveMedia-singlePhoto" style="padding-top: calc(0.75 * 100% - 0.5px);">
    <div class="AdaptiveMedia-photoContainer js-adaptive-photo " data-image-url="https://pbs.twimg.com/media/DtGKKozX4AAk5C2.jpg" data-element-context="platform_photo_card" style="background-color:rgba(38,31,31,1.0);" data-dominant-color="[38,31,31]">
  <img data-aria-label-part="" src="https://pbs.twimg.com/media/DtGKKozX4AAk5C2.jpg" alt="" style="width: 100%; top: -0px;">
</div>
</div>
      </div>
    </div>
  </div>
      <div class="stream-item-footer">
      <div class="ProfileTweet-actionCountList u-hiddenVisually">
    <span class="ProfileTweet-action--reply u-hiddenVisually">
      <span class="ProfileTweet-actionCount" aria-hidden="true" data-tweet-stat-count="0">
        <span class="ProfileTweet-actionCountForAria" id="profile-tweet-action-reply-count-aria-1067786677383544833">0 replies</span>
      </span>
    </span>
    <span class="ProfileTweet-action--retweet u-hiddenVisually">
      <span class="ProfileTweet-actionCount" data-tweet-stat-count="1">
        <span class="ProfileTweet-actionCountForAria" id="profile-tweet-action-retweet-count-aria-1067786677383544833" data-aria-label-part="">1 retweet</span>
      </span>
    </span>
    <span class="ProfileTweet-action--favorite u-hiddenVisually">
      <span class="ProfileTweet-actionCount" data-tweet-stat-count="2">
        <span class="ProfileTweet-actionCountForAria" id="profile-tweet-action-favorite-count-aria-1067786677383544833" data-aria-label-part="">2 likes</span>
      </span>
    </span>
  </div>
  <div class="ProfileTweet-actionList js-actions" role="group" aria-label="Tweet actions">
    <div class="ProfileTweet-action ProfileTweet-action--reply">
  <button class="ProfileTweet-actionButton js-actionButton js-actionReply" data-modal="ProfileTweet-reply" type="button" aria-describedby="profile-tweet-action-reply-count-aria-1067786677383544833">
    <div class="IconContainer js-tooltip" title="Reply">
      <span class="Icon Icon--medium Icon--reply"></span>
      <span class="u-hiddenVisually">Reply</span>
    </div>
      <span class="ProfileTweet-actionCount ProfileTweet-actionCount--isZero ">
        <span class="ProfileTweet-actionCountForPresentation" aria-hidden="true"></span>
      </span>
  </button>
</div>
    <div class="ProfileTweet-action ProfileTweet-action--retweet js-toggleState js-toggleRt">
  <button class="ProfileTweet-actionButton  js-actionButton js-actionRetweet" data-modal="ProfileTweet-retweet" type="button" aria-describedby="profile-tweet-action-retweet-count-aria-1067786677383544833">
    <div class="IconContainer js-tooltip" title="Retweet">
      <span class="Icon Icon--medium Icon--retweet"></span>
      <span class="u-hiddenVisually">Retweet</span>
    </div>
      <span class="ProfileTweet-actionCount">
    <span class="ProfileTweet-actionCountForPresentation" aria-hidden="true">1</span>
  </span>
  </button><button class="ProfileTweet-actionButtonUndo js-actionButton js-actionRetweet" data-modal="ProfileTweet-retweet" type="button">
    <div class="IconContainer js-tooltip" title="Undo retweet">
      <span class="Icon Icon--medium Icon--retweet"></span>
      <span class="u-hiddenVisually">Retweeted</span>
    </div>
      <span class="ProfileTweet-actionCount">
    <span class="ProfileTweet-actionCountForPresentation" aria-hidden="true">1</span>
  </span>
  </button>
</div>
    <div class="ProfileTweet-action ProfileTweet-action--favorite js-toggleState">
  <button class="ProfileTweet-actionButton js-actionButton js-actionFavorite" type="button" aria-describedby="profile-tweet-action-favorite-count-aria-1067786677383544833">
    <div class="IconContainer js-tooltip" title="Like">
      <span role="presentation" class="Icon Icon--heart Icon--medium"></span>
      <div class="HeartAnimation"></div>
      <span class="u-hiddenVisually">Like</span>
    </div>
      <span class="ProfileTweet-actionCount">
    <span class="ProfileTweet-actionCountForPresentation" aria-hidden="true">2</span>
  </span>
  </button><button class="ProfileTweet-actionButtonUndo ProfileTweet-action--unfavorite u-linkClean js-actionButton js-actionFavorite" type="button">
    <div class="IconContainer js-tooltip" title="Undo like">
      <span role="presentation" class="Icon Icon--heart Icon--medium"></span>
      <div class="HeartAnimation"></div>
      <span class="u-hiddenVisually">Liked</span>
    </div>
      <span class="ProfileTweet-actionCount">
    <span class="ProfileTweet-actionCountForPresentation" aria-hidden="true">2</span>
  </span>
  </button>
</div>
      <div class="ProfileTweet-action ProfileTweet-action--dm">
    <button class="ProfileTweet-actionButton u-textUserColorHover js-actionButton js-actionShareViaDM" type="button" data-nav="share_tweet_dm">
      <div class="IconContainer js-tooltip" title="Direct message">
        <span class="Icon Icon--medium Icon--dm"></span>
        <span class="u-hiddenVisually">Direct message</span>
      </div>
    </button>
  </div>
  </div>
</div>
    </div>

In [20]:
# We can now extract all tweet blocks from the stream and put them in one list for further processing
tweet_blocks=tw_stream.find_all(class_='content')

# Let us see how many blocks we have
total_tweets=len(tweet_blocks)
print ("The stream has "+str(total_tweets)+" tweets!")

The stream has 135 tweets!


In [21]:
# Let us have a closer look at the first tweet block:
    
tweet_blocks[0]

<div class="content">
<div class="stream-item-header">
<a class="account-group js-account-group js-action-profile js-user-profile-link js-nav" data-user-id="875312126893711361" href="/GSTICseries">
<img alt="" class="avatar js-action-profile-avatar" src="https://pbs.twimg.com/profile_images/875623570436575233/tndS3CqJ_bigger.jpg"/>
<span class="FullNameGroup">
<strong class="fullname show-popup-with-id u-textTruncate " data-aria-label-part="">G-STIC 2018 <span aria-label="Emoji: Tear-off calendar" class="Emoji Emoji--forLinks" style="background-image:url('https://abs.twimg.com/emoji/v2/72x72/1f4c6.png')" title="Tear-off calendar"> </span><span aria-hidden="true" class="visuallyhidden">📆</span>28-29-30 Nov<span aria-label="Emoji: Round pushpin" class="Emoji Emoji--forLinks" style="background-image:url('https://abs.twimg.com/emoji/v2/72x72/1f4cd.png')" title="Round pushpin"> </span><span aria-hidden="true" class="visuallyhidden">📍</span>Brussels <span aria-label="Emoji: Admission tickets

## Chaining syntax: 
You can chain `.find()` commands together. This will be helpful to identify fields within fields.
For example, if we are to get the exact time when the tweet was published, we can first find the **time** class element and then the *&lt;a&gt;* element's **'title'** value as shown below:

In [22]:
# You can find the publishing date of the first tweet

(tweet_blocks[0].find(class_='time')  # find the time classed element
    .find('a')['title'])              # fetch the title value from the link element <a>


'1:25 PM - 28 Nov 2018'

In [23]:
# You can also use a shorthand version (without the .find part)

tweet_blocks[0](class_='time')[0].a['title']              # fetch the title value from the link element <a>


'1:25 PM - 28 Nov 2018'

So let us now break down and dissect the HTML code and identify which of the elements provide which information. 

To best do this, you could use the Inspect feature of your browser as shown in the below animation.

We first highlight the element we want to get. In the below example, we start with the avatar to know the class name where the URL of the image of the avatar exists. We then look into the field representing the number or retweets to get the class name reperesenting that value as well.

<video controls src="data/inspect.mp4" />

**By following the above approach, we were able to detect the class names for the below variable per tweet:**
    
* Username: "username u-dir u-textTruncate"->b
* Avatar URL: "avatar js-action-profile-avatar" ['src']
* Tweet publishing date: "time"->a ['title']
* Tweet text: "js-tweet-text-container"
* Number of replies: "ProfileTweet-actionButton js-actionButton js-actionReply"->"ProfileTweet-actionCountForPresentation"
* Number of retweets: "ProfileTweet-actionButton  js-actionButton js-actionRetweet"->"ProfileTweet-actionCountForPresentation"
* Number of favorites: "ProfileTweet-actionButton js-actionButton js-actionFavorite"->"ProfileTweet-actionCountForPresentation"


In [31]:
# Let us test the above on the first tweet in the stream

#tweet_blocks[0](class_='avatar js-action-profile-avatar')[0]['src']

print ("Username: "+tweet_blocks[0](class_='username u-dir u-textTruncate')[0].b.get_text())
print ("Avatar link: "+tweet_blocks[0](class_='avatar js-action-profile-avatar')[0]['src'])
print ("Time: "+tweet_blocks[0](class_='time')[0].a['title'])
#We have to add .encode('utf-8') since some characters are not ASCII (e.g., ä,å,é)
print ("Tweet text: "+str(tweet_blocks[0](class_='js-tweet-text-container')[0].p.get_text().encode('utf-8')))
print ("Replies: "+tweet_blocks[0](class_='ProfileTweet-actionButton js-actionButton js-actionReply')[0](class_='ProfileTweet-actionCountForPresentation')[0].get_text())
print ("Retweets: "+tweet_blocks[0](class_='ProfileTweet-actionButton js-actionButton js-actionRetweet')[0](class_='ProfileTweet-actionCountForPresentation')[0].get_text())
print ("Favorites: "+tweet_blocks[0](class_='ProfileTweet-actionButton js-actionButton js-actionFavorite')[0](class_='ProfileTweet-actionCountForPresentation')[0].get_text())


Username: GSTICseries
Avatar link: https://pbs.twimg.com/profile_images/875623570436575233/tndS3CqJ_bigger.jpg
Time: 1:25 PM - 28 Nov 2018
Tweet text: b'All fueled up? Afternoon session (Waste)#Water as a resource. Special focus on SDG 6. Purpose: review progress of water coming towards #SDGs, reflecting on millennium development goals, and what is the role of science, technology and innovation in reaching these goals. #GSTICpic.twitter.com/vXeKzTUBsG'
Replies: 
Retweets: 1
Favorites: 2


### Great now we know how to extract details of a single tweet. Let's run it for all tweets and save in a list

In [30]:
tweet_data=[]
#We want the data to be saved in a CSV file, so we initialize the first row with names of the variables

tweet_data.append(['tweeter_id','avatar_url','tw_time','tw_text','tw_replies','tw_retweets','tw_favorites'])
for tweet in tweet_blocks:
    tweeter_id=tweet(class_='username u-dir u-textTruncate')[0].b.get_text()
    avatar_url=tweet(class_='avatar js-action-profile-avatar')[0]['src']
    tw_time=tweet(class_='time')[0].a['title']
    #We have to add .encode('utf-8') to change to bytes since some characters are not ASCII (e.g., ä,å,é) and convert it back to string
    tw_text=str(tweet(class_='js-tweet-text-container')[0].p.get_text().encode('utf-8'))
    tw_replies=tweet(class_='ProfileTweet-actionButton js-actionButton js-actionReply')[0](class_='ProfileTweet-actionCountForPresentation')[0].get_text()
    if (tw_replies==''):
        tw_replies='0'
    tw_retweets=tweet(class_='ProfileTweet-actionButton js-actionButton js-actionRetweet')[0](class_='ProfileTweet-actionCountForPresentation')[0].get_text()
    if (tw_retweets==''):
        tw_retweets='0'
    tw_favorites=tweet(class_='ProfileTweet-actionButton js-actionButton js-actionFavorite')[0](class_='ProfileTweet-actionCountForPresentation')[0].get_text()
    if (tw_favorites==''):
        tw_favorites='0'
    
    tweet_data.append([tweeter_id,avatar_url,tw_time,tw_text,tw_replies,tw_retweets,tw_favorites])


#### Save your data to a CSV

In [26]:
#Take the data in tweet_data list and save it into a CSV file using the python csv library

import csv

with open("data/tweet_data.csv", "w") as f:
    writer = csv.writer(f)
    writer.writerows(tweet_data)

**Display your data in HTML format for easy viewing (optional)**

In [27]:
# Check your CSV file to verify all is in order

import pandas as pd

# Read the csv file in
df = pd.read_csv('data/tweet_data.csv')

# Save to file
df.to_html('data/twitter_data.htm')

# Assign to string
htmTable = df.to_html()

from IPython.core.display import display, HTML

display(HTML(htmTable))

Unnamed: 0,tweeter_id,avatar_url,tw_time,tw_text,tw_replies,tw_retweets,tw_favorites
0,GSTICseries,https://pbs.twimg.com/profile_images/875623570...,1:25 PM - 28 Nov 2018,All fueled up? Afternoon session (Waste)#Water...,0,1,2
1,11ionArt,https://pbs.twimg.com/profile_images/106785384...,11:20 AM - 28 Nov 2018,DID YOU KNOW? 5 Ongoing Trends that Will Resha...,0,1,0
2,G2H2_Geneva,https://pbs.twimg.com/profile_images/778233965...,6:08 AM - 28 Nov 2018,Global health disruptors: Millennium developme...,1,2,6
3,AnnaRHaskins,https://pbs.twimg.com/profile_images/941757829...,1:35 AM - 28 Nov 2018,In class today I referenced the Millennium Dev...,3,2,92
4,PensiveTM,https://pbs.twimg.com/profile_images/104786562...,6:08 PM - 27 Nov 2018,The true extent of global poverty and hunger: ...,0,1,1
5,drcamarin,https://pbs.twimg.com/profile_images/102185186...,3:20 PM - 27 Nov 2018,@agaviriau Global health disruptors: Millenniu...,0,0,1
6,AlanBenstock,https://pbs.twimg.com/profile_images/698441463...,2:09 PM - 27 Nov 2018,I have just read an article on global developm...,0,0,0
7,globalpeacef1,https://abs.twimg.com/sticky/default_profile_i...,10:54 AM - 27 Nov 2018,Advocate and contribute to the achievement of ...,0,0,0
8,egumboslav,https://pbs.twimg.com/profile_images/100480067...,10:33 AM - 27 Nov 2018,I once had a study guide for Transport Economi...,0,1,1
9,MalariaPapers,https://pbs.twimg.com/profile_images/633328476...,10:01 AM - 27 Nov 2018,Cause-specific child mortality performance and...,0,0,0


### Great! We have now fetched and saved some twitter data on the MDGs into a CSV file. 

# Exercise:

### **Task:** 


- Scrape all the tweets of a search term of your choice with results that exceed one hundred tweets and save in a CSV file. Justify why you chose the particular search term, what questions do you wish to answer.

- Enter your code below and show the instructor when ready.