## End of Module 2

#### Module 4

### 0. Setup (Mini-Tutorial)

**What is HTTP?**

HTTP (or Hypertext Transfer Protocol) is a protocol which allows the fetching of resources, such as HTML or JSON documents. It is the foundation of any data exchange on the Web and it is a client-server protocol, which means requests are initiated by the recipient, usually the Web browser. A complete document is reconstructed from the different sub-documents fetched, for instance text, layout description, images, videos, scripts, and more.

Source: [An Overview of HTTP (mozilla.org)](https://developer.mozilla.org/en-US/docs/Web/HTTP/Overview)

**What is an Application Programming Interface (API)?**

An application program interface (API) is a set of routines, protocols, and tools for building software applications. Basically, an API specifies how software components should interact. Additionally, APIs are used when programming graphical user interface (GUI) components.

Source: [Webopedia](https://www.webopedia.com/TERM/A/API.html)



#### Python Requests Module

Requests is a simple HTTP library for Python. The full documentation is found [here](https://requests.readthedocs.io/en/master/), but for this test, we don't need to go through the whole thing.

To use it, simply import it.


In [6]:
# run this cell
import requests

For this setup, we use a simple API which generates 10 random jokes returned in JSON format.

In [7]:
# run this cell to check if you can call APIs

# First, determine the API URL
api_url = "https://official-joke-api.appspot.com/random_ten"

# Next, invoke the URL via requests.get(url)
data = requests.get(api_url)

# Let's inspect what requests.get() returns
print(data)
print(type(data))

<Response [200]>
<class 'requests.models.Response'>


The call to requests.get(url) returns a **requests.models.Response** object. What you can see from above is the HTTP Response, and if everything is normal, the status code should be **200**.

In [8]:
type(data)

requests.models.Response

HTTP normally returns text which Python can then process as a string. The text body may be found in the Response's attribute **text**.

In [9]:
# run this cell

print(data.text)

[{"id":325,"type":"general","setup":"Why couldn't the lifeguard save the hippie?","punchline":"He was too far out, man."},{"id":169,"type":"general","setup":"What did the beaver say to the tree?","punchline":"It's been nice gnawing you."},{"id":42,"type":"general","setup":"If you see a robbery at an Apple Store...","punchline":"Does that make you an iWitness?"},{"id":283,"type":"general","setup":"What's the best thing about elevator jokes?","punchline":"They work on so many levels."},{"id":296,"type":"general","setup":"When is a door not a door?","punchline":"When it's ajar."},{"id":294,"type":"general","setup":"When do doctors get angry?","punchline":"When they run out of patients."},{"id":141,"type":"general","setup":"How many hipsters does it take to change a lightbulb?","punchline":"Oh, it's a really obscure number. You've probably never heard of it."},{"id":380,"type":"programming","setup":"There are 10 kinds of people in this world.","punchline":"Those who understand binary, thos

You should see a JSON Array of Jokes. A JSON Array looks like a Python List, but at this point, Python still treats it as a string. To process it in Python, you need to convert it to the equivalent advanced data type via the **json** module. Let's do that next and define the variable **jokes** to contain the data converted from JSON.

In [10]:
# run this cell

import json

jokes = json.loads(data.text)

print("Data:")
print(jokes)
print("Type:")
print(type(jokes))


Data:
[{'id': 325, 'type': 'general', 'setup': "Why couldn't the lifeguard save the hippie?", 'punchline': 'He was too far out, man.'}, {'id': 169, 'type': 'general', 'setup': 'What did the beaver say to the tree?', 'punchline': "It's been nice gnawing you."}, {'id': 42, 'type': 'general', 'setup': 'If you see a robbery at an Apple Store...', 'punchline': 'Does that make you an iWitness?'}, {'id': 283, 'type': 'general', 'setup': "What's the best thing about elevator jokes?", 'punchline': 'They work on so many levels.'}, {'id': 296, 'type': 'general', 'setup': 'When is a door not a door?', 'punchline': "When it's ajar."}, {'id': 294, 'type': 'general', 'setup': 'When do doctors get angry?', 'punchline': 'When they run out of patients.'}, {'id': 141, 'type': 'general', 'setup': 'How many hipsters does it take to change a lightbulb?', 'punchline': "Oh, it's a really obscure number. You've probably never heard of it."}, {'id': 380, 'type': 'programming', 'setup': 'There are 10 kinds of pe

You can see from above that `jokes` is a list of dictionaries. Each dictionary will have various keys. Run the next cell to do a simple list comprehension operation on jokes, displaying the setup and punchline.

In [11]:
[j["setup"] + " " + j["punchline"] for j in jokes]

["Why couldn't the lifeguard save the hippie? He was too far out, man.",
 "What did the beaver say to the tree? It's been nice gnawing you.",
 'If you see a robbery at an Apple Store... Does that make you an iWitness?',
 "What's the best thing about elevator jokes? They work on so many levels.",
 "When is a door not a door? When it's ajar.",
 'When do doctors get angry? When they run out of patients.',
 "How many hipsters does it take to change a lightbulb? Oh, it's a really obscure number. You've probably never heard of it.",
 "There are 10 kinds of people in this world. Those who understand binary, those who don't, and those who weren't expecting a base 3 joke.",
 'Do you know where you can get chicken broth in bulk? The stock market.',
 "What's the best time to go to the dentist? Tooth hurty."]

### 1 Financial Data API


**(50 Points)**

Stock Market Data

IEX Cloud is a platform that makes financial data and services accessible to everyone.

We shall be using their **sandbox** (playground) API environment for this test. If you wish to play around with real data, feel free to open an account with them at http://iexcloud.io. You will be issued an API token which you will then need to use in your API calls. 

For now, we are provided with a Sandbox Token.


In [5]:
# execute this cell
TOKEN="Tsk_b1e3203fb628428fb3f967dbd3dc2b0b"
finance_url="https://sandbox.iexapis.com/stable/tops?token={TOKEN}&symbols={TICKER}"



#### 1.1.

Define a function `formatted_url` that accepts a stock symbol and returns the formatted URL call.

Hint: use the string method `.format(...)` to be able to use the string template above work.

In [19]:
# define a function formatted_url that accepts a stock symbol and returns the formatted URL call

symbol = input("Enter stock symbol: ")

def formatted_url(symbol):
    # write function code below
    url = "https://sandbox.iexapis.com/stable/tops?token=" + TOKEN + "&symbols=" + symbol
    return url
    
# dump string below
print(formatted_url(symbol))


Enter stock symbol: AAPL
https://sandbox.iexapis.com/stable/tops?token=Tsk_b1e3203fb628428fb3f967dbd3dc2b0b&symbols=AAPL


#### 1.2.

Test the API call with your URL.

Sample output using Apple (AAPL) provided below. You may want to try other stock symbols like Amazon (AMZN), Google (GOOG), or Netflix (NFLX).

**NOTE:** The prices below may be different from what you see depending on the time you made the request.

In [20]:
# write code here

data2 = requests.get(formatted_url(symbol))


# dump text attribute of your response object (here, we assume your variable name is data2)
data2.text

'[{"symbol":"AAPL","sector":"nterycniloccgtelohoe","securityType":"cs","bidPrice":0,"bidSize":0,"askPrice":0,"askSize":0,"lastUpdated":1618773112987,"lastSalePrice":386.35,"lastSaleSize":4,"lastSaleTime":1622347556236,"volume":156654}]'

### 1.2 Get a list of stocks

Get stock prices of the following:
* Apple (AAPL)
* Amazon (AMZN)
* Google (GOOG)
* Netflix (NFLX)

The symbols are already in the list variable `portfolio`.


In [28]:
# execute this code
portfolio = ["AAPL","AMZN","GOOG","NFLX"]

The URL format is like so:

https://sandbox.iexapis.com/stable/stock/market/batch?symbols=aapl,fb&types=quotetoken=Tsk_b1e3203fb628428fb3f967dbd3dc2b0b



In [29]:
# execute this code
TOKEN="Tsk_b1e3203fb628428fb3f967dbd3dc2b0b"
market_url="https://sandbox.iexapis.com/stable/stock/market/batch?symbols={SYMBOLS}&types=quote&token={TOKEN}"


#### 1.2.

Define a function `market_formatted_url` that accepts a **list of stock symbols** (similar to `portfolio`) and returns the formatted URL call.

Hint: use the string method `.format(...)` to be able to use the string template above work.  
Hint: craft a string with comma-separated symbols based on the portfolio list.  
Hint: research on `str.join(list)` to generate a comma-separated string.  

In [30]:
# write code below

def market_formatted_url(symbols):
    # write the rest of the function code below
    separator = ","
    symbols = separator.join(portfolio)
    return "https://sandbox.iexapis.com/stable/stock/market/batch?symbols=" + symbols + "&types=quote&token=" + TOKEN
    
    
# dump string below
print(market_formatted_url(portfolio))

https://sandbox.iexapis.com/stable/stock/market/batch?symbols=AAPL,AMZN,GOOG,NFLX&types=quote&token=Tsk_b1e3203fb628428fb3f967dbd3dc2b0b


#### 1.3.

Retrieve the API data using `portfolio`.


In [32]:
# write code to call the API here
# Hint: use your new function market_formatted_url. 
# Hint: pass the portfolio list already defined for you.

market_data = requests.get(market_formatted_url(portfolio))


# Dump the text attribute of the response object. Sample output provided below (assuming you name your response variable market_data)
market_data.text

'{"AAPL":{"quote":{"symbol":"AAPL","companyName":"Apple, Inc.","primaryExchange":"SADNAQ","calculationPrice":"close","open":388.72,"openTime":1643504561437,"openSource":"ifcilafo","close":390.25,"closeTime":1599008892802,"closeSource":"flcifoai","high":381.929,"highTime":1631748763085,"highSource":"de1peeacumylti rdi5en  ","low":390.57,"lowTime":1636745725815,"lowSource":"e  rdpeidma1 uin5ylecte","latestPrice":387.21,"latestSource":"Close","latestTime":"July 28, 2020","latestUpdate":1636198401125,"latestVolume":26215479,"iexRealtimePrice":378.06,"iexRealtimeSize":4,"iexLastUpdated":1618358146992,"delayedPrice":386.8,"delayedPriceTime":1618429810930,"oddLotDelayedPrice":382.96,"oddLotDelayedPriceTime":1668258877093,"extendedPrice":375.23,"extendedChange":1.78,"extendedChangePercent":0.00489,"extendedPriceTime":1608945779663,"previousClose":387.44,"previousVolume":31455511,"change":-6.28,"changePercent":-0.01647,"volume":26074471,"iexMarketPercent":0.006127920814101106,"iexVolume":153361

#### 1.4.

Load the JSON string into a variable named `market_quotes`.

In [36]:
import json

# write code here
# Hint: what attribute of the response object contains the JSON string?

market_quotes = json.loads(market_data.text)

# dump market_quotes (interactive mode). Sample output below
market_quotes


{'AAPL': {'quote': {'symbol': 'AAPL',
   'companyName': 'Apple, Inc.',
   'primaryExchange': 'SADNAQ',
   'calculationPrice': 'close',
   'open': 388.72,
   'openTime': 1643504561437,
   'openSource': 'ifcilafo',
   'close': 390.25,
   'closeTime': 1599008892802,
   'closeSource': 'flcifoai',
   'high': 381.929,
   'highTime': 1631748763085,
   'highSource': 'de1peeacumylti rdi5en  ',
   'low': 390.57,
   'lowTime': 1636745725815,
   'lowSource': 'e  rdpeidma1 uin5ylecte',
   'latestPrice': 387.21,
   'latestSource': 'Close',
   'latestTime': 'July 28, 2020',
   'latestUpdate': 1636198401125,
   'latestVolume': 26215479,
   'iexRealtimePrice': 378.06,
   'iexRealtimeSize': 4,
   'iexLastUpdated': 1618358146992,
   'delayedPrice': 386.8,
   'delayedPriceTime': 1618429810930,
   'oddLotDelayedPrice': 382.96,
   'oddLotDelayedPriceTime': 1668258877093,
   'extendedPrice': 375.23,
   'extendedChange': 1.78,
   'extendedChangePercent': 0.00489,
   'extendedPriceTime': 1608945779663,
   'pre

#### 1.5.

Output the Last Price (`latestPrice`) per Symbol like so:

In [63]:
# Write code here. Sample output provided below but may vary depending on the time you made the request.
# No need to define a function for this step.

for r in market_quotes:
    print(r + "\t" + str(market_quotes[r]['quote']['latestPrice']))

AAPL	387.21
AMZN	3010.26
GOOG	1542.94
NFLX	491.4


Hints:
- use a tab ("\t") between the symbol and the price


### 2 Social Listenening

**(50 points)**

Social listening is the monitoring of a brand, a personality, a cause, or an idea for feedback, direct mentions of entities involved, sentiments, and discussions regarding related keywords (or hashtags), topics, competitors, haters, or industries, followed by an analysis of this data to gain further insights and actionable next steps.

Before we go to the questions, here are a few things not yet taught in class that you will use in your solutions.

Sorting lists using `sort()`

Say you have the following list:

In [64]:
# execute this cell

demolist = ['D','A','F','B','E','C']

Use `sort()` to put elements in order in the **same** list.

In [65]:
# execute this cell 

demolist.sort()
print(demolist)

['A', 'B', 'C', 'D', 'E', 'F']


But supposing you have a list consisting of elements with advanced data types, say, tuples...

In [66]:
demolist2 = [('A',59),('B',100),('C',20),('D',88),('E',25),('F',38)]

... and you wish to sort this list by the number in the second element of each tuple in descending order.

You use `sort()` with two parameters:
* `key` indicates the the function that returns the value that will serve as basis for sorting
* `reverse` takes on a boolean value (default is False) to specify whether the list will be in ascending or descending order

To sort demolist2 by the number indicated as the second element of each tuple, and in descending order, execute the following cell.

In [67]:
# execute this cell

demolist2.sort(key=lambda x: x[1], reverse=True)
print(demolist2)


[('B', 100), ('D', 88), ('A', 59), ('F', 38), ('E', 25), ('C', 20)]


Note that I used lambda above but nothing is stopping you from defining a full-blown function like so:

In [20]:
demolist2 = [('A',59),('B',100),('C',20),('D',88),('E',25),('F',38)]

def sort_tuple(x):
    return x[1]

# note that you only pass the function reference (without the parentheses and parameters)
demolist2.sort(key=sort_tuple, reverse=True)
print(demolist2)

[('B', 100), ('D', 88), ('A', 59), ('F', 38), ('E', 25), ('C', 20)]


### 2.1

Consider a Twitter search for 100 tweets (which is the limit of Twitter) using the keyword "SONA". The resulting JSON dump from the Twitter API call has been pre-saved in the file named `tweets.json`. Let's load the file and convert the JSON string into a list of dictionaries of tweets.

In [74]:
# execute this cell to load the file
import json

tweets_file = 'tweets.json'

with open("tweets.json","r") as json_file:
    tweets = json.loads(json_file.read())

# Dump contents of tweets to make sure things are okay.
print(tweets)

[{'created_at': 'Tue Jul 28 03:25:35 +0000 2020', 'id': 1287952316029521920, 'id_str': '1287952316029521920', 'full_text': 'RT @watchmejayjay: Ay di ka nakanood ng SONA? Wag ka mag-alala, ito yung summary ng mga plano ni tatay laban sa COVID 19: A thread', 'truncated': False, 'display_text_range': [0, 130], 'entities': {'hashtags': [], 'symbols': [], 'user_mentions': [{'screen_name': 'watchmejayjay', 'name': 'Kuya Sir #JUNKTERRORLAWNOW', 'id': 2148463897, 'id_str': '2148463897', 'indices': [3, 17]}], 'urls': []}, 'metadata': {'iso_language_code': 'tl', 'result_type': 'recent'}, 'source': '<a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a>', 'in_reply_to_status_id': None, 'in_reply_to_status_id_str': None, 'in_reply_to_user_id': None, 'in_reply_to_user_id_str': None, 'in_reply_to_screen_name': None, 'user': {'id': 1334271631, 'id_str': '1334271631', 'name': 'kai', 'screen_name': '_kyladel', 'location': 'Alaska', 'description': 'yessir i listen to hayley k

You won't really need to understand everything in the tweet object, but in case you want to know more, here is the documentation from the [Twitter Developer Website](https://developer.twitter.com/en/docs/tweets/data-dictionary/overview/tweet-object).

#### 2.1.1.

Print out the length of the number of tweets. Expected output is shown below.

In [76]:
# write your code here

print(len(tweets))


100


#### 2.1.2.

Define a list variable named `tweet_texts` of tweet texts contained in `tweets`  (which is the value of the `full_text` key).

Do a printout (using **script mode**) of the contents of `tweet_texts`. You will have to loop through each element of the list.

Sample output is provided below.

In [81]:
# write code here.

tweet_texts = []

for r in tweets:
    tweet_texts.append(r['full_text'])

# Dump tweet_texts (script mode)
print(tweet_texts)

['RT @watchmejayjay: Ay di ka nakanood ng SONA? Wag ka mag-alala, ito yung summary ng mga plano ni tatay laban sa COVID 19: A thread', 'RT @TOWER_Namba: 【#ジェジュン】\n\n\\カバーアルバムを引っさげたツアー映像🎥/\n\nツアー映像『J-JUN LIVE 2019～Love Covers～』本日入荷しました🙌\n2日間のみのプレミアムライヴを映像とCDでお楽しみいただける豪華盤!!!\n\nCDがあれば…', '@kapilkolotya @kharaa_sona_ Bhai jo le liya hai mene 3saal pehle usko me Boycott kr duga. But\n\nMujy Made in india IPHONE dedo.', 'RT @TOWER_Namba: 【#ジェジュン】\n\n\\ジェジュンの優しく、深い歌声がしみわたる..../\n\nカバーアルバム『Love Covers Ⅱ』本日入荷しました🙌\n原曲の良さをひきたてるだけでなく、ジェジュンらしさも全開のアルバムです♪\n\n🎁タワレコ特典はアナザージャ…', 'RT @JaDineTrash: Her own time. \nHer own thoughts. \nHer own style. \n\nThank you, Nadine. The battles you fight have always been for a bigger…', 'RT @Absolutalbert: VP isn’t invited to SONA? \n\nDesignated Survivor.', 'RT @cnnphilippines: THREAD: Duterte pushed for the reimposition of capital punishment, specifying the method of lethal injection, for heino…', 'RT @quarkhenares: WHY. WHY DO THEY EVEN GET FILMMAKERS TO SHOOT 

#### 2.1.3.

Remove duplicate tweet text lines and save the resulting (cleaned-up) list in the same variable `tweet_texts`.

Hint: using sets may save you a bit of headache.

Sample output is provided below for your reference.

In [121]:
# write code here

def remove_duplicates(tweet_texts):
    tweet_texts = set(tweet_texts)
    return tweet_texts

remove_duplicates(tweet_texts)

# dump, assuming variable name is tweet_texts
print(tweet_texts)

# note that after "de-duplication" (removing of duplicates), we should only have 90 tweet text lines left.
print(len(tweet_texts))

{'Nakalimang SONA na ako, di ka pa rin INSURED.\nPRES: DUTERTE\nSONA 2020 https://t.co/lAiuKKFKXJ', '@Chaiharvest OPEN YOUR PURSE SHOW ME YOUR SONA', 'Duterte asks Congress to pass 21 bills during the remainder of his presidency. Only 5 were specifically designed to respond to the COVID-19 crisis, reports @maracepeda \n\nhttps://t.co/6F8GGFl7GU', 'RT @JaDineTrash: Her own time. \nHer own thoughts. \nHer own style. \n\nThank you, Nadine. The battles you fight have always been for a bigger…', 'Nanga @ThalaAjith_FC  tag sona tha tweets poduvom . \n\nNanga @ThalaAjith_FC  tag sona tweet poda matom. \n\nEpudiye soltu suthunga nalaiku danush &amp; sl fans la namba record ya beat panitu povanga ukanthu avungaluku pidicha fc page ku pooja podunga 😊\n#valimai #ThalaAjith #Ajithkumar', "RT @cnnphilippines: Globe and PLDT shares are down at the first hour of trading today. It's the first time for market players to react to P…", 'RT @preenph: Nadine Lustre #SONAgKAISA protest art? Yes, please!\n\n

### 2.2.

Here we build our wordcount statistics.

#### 2.2.1.

Let's store our count stats in a dictionary variable named `words_dict`.

For each line, let's remove special characters contained in the following string:

`"&$@[].,'#()-\"!?’_"`

Print the resulting dictionary. Sample output is provided below.

In [145]:
words_dict = dict()
remove_chars = "&$@[].,'#()-\"!?’_"

list_chars = []
wordlist = []

# Write code below
# use as many lines as needed but within this same cell only.
# -----------------------------------------------------------

for r in tweet_texts:
    words += r.replace(remove_chars,"")
    words = words.upper()

wordlist = words.split()

for words in wordlist:
    words_dict[words] = words_dict.get(words,0) + 1

# -----------------------------------------------------------
# Dump dictionary contents here        
print(words_dict)

{'HTTPS://T.CO/ULILMRYGV4NAKALIMANG': 1, 'SONA': 40, 'NA': 19, 'AKO,': 1, 'DI': 4, 'KA': 4, 'PA': 4, 'RIN': 4, 'INSURED.': 1, 'PRES:': 1, 'DUTERTE': 8, '2020': 5, 'HTTPS://T.CO/LAIUKKFKXJ@CHAIHARVEST': 1, 'OPEN': 1, 'YOUR': 2, 'PURSE': 1, 'SHOW': 1, 'ME': 3, 'SONADUTERTE': 1, 'ASKS': 1, 'CONGRESS': 2, 'TO': 34, 'PASS': 1, '21': 2, 'BILLS': 2, 'DURING': 3, 'THE': 54, 'REMAINDER': 1, 'OF': 30, 'HIS': 5, 'PRESIDENCY.': 1, 'ONLY': 2, '5': 2, 'WERE': 2, 'SPECIFICALLY': 1, 'DESIGNED': 1, 'RESPOND': 1, 'COVID-19': 5, 'CRISIS,': 1, 'REPORTS': 1, '@MARACEPEDA': 2, 'HTTPS://T.CO/6F8GGFL7GURT': 1, '@JADINETRASH:': 1, 'HER': 3, 'OWN': 3, 'TIME.': 1, 'THOUGHTS.': 1, 'STYLE.': 1, 'THANK': 1, 'YOU,': 2, 'NADINE.': 1, 'BATTLES': 1, 'YOU': 2, 'FIGHT': 1, 'HAVE': 2, 'ALWAYS': 1, 'BEEN': 1, 'FOR': 10, 'A': 6, 'BIGGER…NANGA': 1, '@THALAAJITH_FC': 2, 'TAG': 3, 'THA': 3, 'TWEETS': 1, 'PODUVOM': 1, '.': 2, 'NANGA': 1, 'TWEET': 2, 'PODA': 1, 'MATOM.': 1, 'EPUDIYE': 1, 'SOLTU': 1, 'SUTHUNGA': 1, 'NALAIKU': 1, 

#### 2.2.2.

Now, we would like to have a list of words and counts sorted by count in descending order.

Define a new list variable named `words_list` containing **tuples**, where each tuple is as follows:

*(word, count)*

Dump the contents of `words_list`(in script mode). 

Sample output is provided below.

In [154]:
# write code here
words_list = []

for x,y in words_dict.items():
    words_list.append((x,y))    

# dump word_list here, assuming variable name is words_list
print(words_list)


[('HTTPS://T.CO/ULILMRYGV4NAKALIMANG', 1), ('SONA', 40), ('NA', 19), ('AKO,', 1), ('DI', 4), ('KA', 4), ('PA', 4), ('RIN', 4), ('INSURED.', 1), ('PRES:', 1), ('DUTERTE', 8), ('2020', 5), ('HTTPS://T.CO/LAIUKKFKXJ@CHAIHARVEST', 1), ('OPEN', 1), ('YOUR', 2), ('PURSE', 1), ('SHOW', 1), ('ME', 3), ('SONADUTERTE', 1), ('ASKS', 1), ('CONGRESS', 2), ('TO', 34), ('PASS', 1), ('21', 2), ('BILLS', 2), ('DURING', 3), ('THE', 54), ('REMAINDER', 1), ('OF', 30), ('HIS', 5), ('PRESIDENCY.', 1), ('ONLY', 2), ('5', 2), ('WERE', 2), ('SPECIFICALLY', 1), ('DESIGNED', 1), ('RESPOND', 1), ('COVID-19', 5), ('CRISIS,', 1), ('REPORTS', 1), ('@MARACEPEDA', 2), ('HTTPS://T.CO/6F8GGFL7GURT', 1), ('@JADINETRASH:', 1), ('HER', 3), ('OWN', 3), ('TIME.', 1), ('THOUGHTS.', 1), ('STYLE.', 1), ('THANK', 1), ('YOU,', 2), ('NADINE.', 1), ('BATTLES', 1), ('YOU', 2), ('FIGHT', 1), ('HAVE', 2), ('ALWAYS', 1), ('BEEN', 1), ('FOR', 10), ('A', 6), ('BIGGER…NANGA', 1), ('@THALAAJITH_FC', 2), ('TAG', 3), ('THA', 3), ('TWEETS', 1

#### 2.2.3

Sort the list by count (which is the second element of each tuple) in **descending** (or reverse) order.

Dump the contents of the newly-sorted list `words_list`. 

Sample output shown below.

Hint: this can be done using `lambda` but you can use a regular function definition. Make sure you go through the mini-tutorial at the start of Problem Set 2.

In [155]:
# write code here

def words_list_sort(x):
    return x[1]

words_list.sort(key = words_list_sort,reverse = True)

# dump contents

print(words_list)

[('THE', 54), ('SONA', 40), ('TO', 34), ('SA', 34), ('OF', 30), ('NA', 19), ('ANG', 19), ('NG', 14), ('AND', 13), ('5TH', 13), ('MGA', 12), ('IN', 11), ('FOR', 10), ('DUTERTE', 8), ('MY', 8), ('WITH', 8), ('NI', 7), ('THAT', 7), ('WE', 7), ('A', 6), ('&AMP;', 6), ('2020', 5), ('HIS', 5), ('COVID-19', 5), ('THIS', 5), ('-PRESIDENT', 5), ("#DUTERTE'S", 5), ('#SONART', 5), ('MAY', 5), ('ON', 5), ('UP', 5), ('IS', 5), ('DI', 4), ('KA', 4), ('PA', 4), ('RIN', 4), ('YA', 4), ('#SONA2020', 4), ('I', 4), ('WAY', 4), ('THEY', 4), ('RODRIGO', 4), ('ROA', 4), ('YUNG', 4), ("DUTERTE'S", 4), ('SONA,', 4), ('SI', 4), ('STATE', 4), ('STUDENT', 4), ('KO', 4), ('ME', 3), ('DURING', 3), ('HER', 3), ('OWN', 3), ('TAG', 3), ('THA', 3), ('GLOBE', 3), ('AT', 3), ('FIRST', 3), ('#SONAGKAISA', 3), ('ONE', 3), ('DAPAT', 3), ('NAMAN', 3), ('PRES.', 3), ('BEFORE', 3), ('LANG', 3), ('SILA', 3), ('IF', 3), ('SONA.', 3), ('@RAPPLERDOTCOM:', 3), ('PRESS', 3), ('FROM', 3), ('PRESIDENT', 3), ('LIKE', 3), ('WOULD', 3),

#### 2.2.4.

Print out the top 5 words (based on count).

Take note of the formatting below (i.e. one line per print output).

Sample output shown below, with the index shown as the leftmost element (the integer starting with 1).

Hint: No need to create a special index variable and manually increment. You may use one of the many `for` loop constructs for automatic index variable generation. You don't have to, but it's easier.

In [164]:
# write code here

i = 1
for x in words_list[0:5]:
    print(i,x)
    i = i + 1


1 ('THE', 54)
2 ('SONA', 40)
3 ('TO', 34)
4 ('SA', 34)
5 ('OF', 30)


#### 2.2.5.

Write a new **csv** file `wordcount.csv` with format like so:

```
word,count
SONA,66
THE,56
RT,54
TO,37
SA,35
...
```

Hint: You may use plain old string concatenation for writing to file, but feel free to experiment with other options.

In [204]:
import csv

pairs = []

# write code here
    
with open("wordcount.csv","w") as wordfile:
    writer = csv.writer(wordfile)
    
    field_names = ['word','count']
    
    writer.writerow(field_names)
    
    for x in words_list[0:5]:
        pair = x
        pairs.append(pair)
        
    writer.writerows(pairs)

Perform a `diff` or `fc` operation between your output file and the file named `wordcount-test.csv` which you can download from Canvas. Make sure there are no differences.

**IMPORTANT:** Please make sure that you commit `wordcount.csv` together with your Jupyter Notebook in your GitHub repository.
