### <font color="brown">JSON, Set, Timing Execution, Random</font>

---

#### <font color="brown">Working with JSON - Continued</font>

In [60]:
import json

**Storing JSON to file**

In [66]:
# dump to file
json4 = '{"quiz_scores" : [{"name": "Anika", "scores":[38,40,36,40,32]}, {"name": "Amir", "scores":[36,38,40,30,34]}]}'
dict4 = json.loads(json4)
print(dict4)
with open ("quiz_scores.json","w") as qsfile:
    json.dump(dict4, qsfile)

{'quiz_scores': [{'name': 'Anika', 'scores': [38, 40, 36, 40, 32]}, {'name': 'Amir', 'scores': [36, 38, 40, 30, 34]}]}


In [64]:
# load from file
with open("quiz_scores.json") as qsfile:
    qs_scores = json.load(qsfile)

In [65]:
print(qs_scores)

{'quiz_scores': [{'name': 'Anika', 'scores': [38, 40, 36, 40, 32]}, {'name': 'Amir', 'scores': [36, 38, 40, 30, 34]}]}


**JSON with just a string (no dictionary)**

In [57]:
# string must be double-quoted
jsonstr = json.loads('"JSON - JavaScript Object Notation"')
jsonstr

'JSON - JavaScript Object Notation'

**JSON with just an array**

In [59]:
jsonarr = json.loads('[1,2,2,4]')
print(jsonarr)
print(len(jsonarr))

[1, 2, 2, 4]
4


**JSON with just a number**

In [60]:
jsonint = json.loads('25')
print(type(jsonint))
jsonreal = json.loads('25.3')
print(type(jsonreal))

<class 'int'>
<class 'float'>


In [61]:
json.loads('12.x')

JSONDecodeError: Extra data: line 1 column 3 (char 2)

In [62]:
json.loads('"12.x"')

'12.x'

**JSON with just a boolean**

In [63]:
jsonbool = json.loads('true')
print(jsonbool)
print(type(jsonbool))

True
<class 'bool'>


**JSON with a null**

In [64]:
jsonnull = json.loads('null')
print(jsonnull)
print(type(jsonnull))

None
<class 'NoneType'>


---

#### Exercise: ad hoc storage to JSON

Suppose scores were in a file *qs_scores.txt*, like this:
    
Anika Sorenson|38,40,36,40,32<br>
Amir Sharif|36,38,40,30,34

We want to store this in JSON form so that it is standardized

In [65]:
# make an input text file, qs-scores.txt
qs_dict = {}
for line in open('qs_scores.txt'):
    flds = line.split('|')
    scores = flds[1].split(',')
    qs_scores = [int(qs) for qs in scores]
    qs_dict[flds[0].strip()] = qs_scores
print(qs_dict)

{'Anika Sorenson': [38, 40, 36, 40, 32], 'Amir Sharif': [36, 38, 40, 30, 34]}


In [66]:
with open('qs_scores.json','w') as qsfile:
    json.dump(qs_dict, qsfile)

# double-click the output file, will open in json interpretation mode
# right-click -> open with editor, can see plain text

---

#### Getting JSON data from a Web page

In [71]:
import requests

#### Example of reading public JSON dataset

Nobel Prizes - http://api.nobelprize.org/v1/prize.json

In [72]:
nobel_url = ' http://api.nobelprize.org/v1/prize.json'
resp = requests.get(nobel_url)
nobels = json.loads(resp.text)

In [73]:
len(nobels['prizes'])

658

In [72]:
print(nobels['prizes'][0])

{'year': '2021', 'category': 'chemistry', 'laureates': [{'id': '1002', 'firstname': 'Benjamin', 'surname': 'List', 'motivation': '"for the development of asymmetric organocatalysis"', 'share': '2'}, {'id': '1003', 'firstname': 'David', 'surname': 'MacMillan', 'motivation': '"for the development of asymmetric organocatalysis"', 'share': '2'}]}


In [73]:
# all prizes awarded in the year 2021
nobels_2021 = [prize for prize in nobels['prizes'] if prize['year'] == '2021']
print(nobels_2021)

[{'year': '2021', 'category': 'chemistry', 'laureates': [{'id': '1002', 'firstname': 'Benjamin', 'surname': 'List', 'motivation': '"for the development of asymmetric organocatalysis"', 'share': '2'}, {'id': '1003', 'firstname': 'David', 'surname': 'MacMillan', 'motivation': '"for the development of asymmetric organocatalysis"', 'share': '2'}]}, {'year': '2021', 'category': 'economics', 'laureates': [{'id': '1007', 'firstname': 'David', 'surname': 'Card', 'motivation': '"for his empirical contributions  to labour economics"', 'share': '2'}, {'id': '1008', 'firstname': 'Joshua', 'surname': 'Angrist', 'motivation': '"for their methodological contributions to the analysis  of causal relationships"', 'share': '4'}, {'id': '1009', 'firstname': 'Guido', 'surname': 'Imbens', 'motivation': '"for their methodological contributions to the analysis  of causal relationships"', 'share': '4'}]}, {'year': '2021', 'category': 'literature', 'laureates': [{'id': '1004', 'firstname': 'Abdulrazak', 'surnam

**This is TMI and a bit hard to read, we want to write in a user-friendly format**

**We want:**<br>
    Chemistry: name1, name2 ...<br>
    Economics: name1, name2 ...


In [74]:
for prize in nobels_2021:
    print(prize['category'].capitalize() + ': ',end='')
    winners = [winner['firstname']+' '+winner['surname'] for winner in prize['laureates']]
    print(', '.join(winners))

Chemistry: Benjamin List, David MacMillan
Economics: David Card, Joshua Angrist, Guido Imbens
Literature: Abdulrazak Gurnah
Peace: Maria Ressa, Dmitry Muratov
Physics: Syukuro Manabe, Klaus Hasselmann, Giorgio Parisi
Medicine: David Julius, Ardem Patapoutian


In [76]:
# let's take a look at prizes for 2020
nobels_2020 = [prize for prize in nobels['prizes'] if prize['year'] == '2020']
for prize in nobels_2020:
    print(prize['category'].capitalize() + ': ',end='')
    winners = [winner['firstname']+' '+winner['surname'] for winner in prize['laureates']]
    print(', '.join(winners))

Chemistry: Emmanuelle Charpentier, Jennifer A. Doudna
Economics: Paul Milgrom, Robert Wilson
Literature: Louise Glück
Peace: 

KeyError: 'surname'

In [77]:
# surname missing in Peace prize, could be missing in other years as well
# use dict get method with default return of empty string if key not found
for prize in nobels_2020:
    print(prize['category'].capitalize() + ': ',end='')
    winners = [winner['firstname']+' '+winner.get('surname','') for winner in prize['laureates']]
    print(', '.join(winners))

Chemistry: Emmanuelle Charpentier, Jennifer A. Doudna
Economics: Paul Milgrom, Robert Wilson
Literature: Louise Glück
Peace: World Food Programme 
Physics: Roger Penrose, Reinhard Genzel, Andrea Ghez
Medicine: Harvey Alter, Michael Houghton, Charles Rice


---

#### Basic JSON structure: https://www.json.org/json-en.html
As the description says at the top, JSON is built on two structures (using Python corresponding terminology: dictionary (key-value pairs), and lists (arrays)

---

#### <font color="brown">Set</font>

In [9]:
set1 = {2,2,3}
print(set1)

{2, 3}


In [10]:
set2 = {2,"two","three"}
print(set2)

{2, 'two', 'three'}


In [11]:
set2[0]
# sets are unordered, so can't be indexed

TypeError: 'set' object is not subscriptable

**Building sets out of iterables (strings, lists, tuples)**

In [15]:
set1 = set("apple")
print(set1)
set2 = set("pie")
print(set2)
set3 = set([1,3,4,2,1])
print(set3)
set4 = set(('x','y','x','z'))
print(set4)

{'l', 'p', 'a', 'e'}
{'e', 'p', 'i'}
{1, 2, 3, 4}
{'x', 'y', 'z'}


**Adding to set**

In [16]:
myset = set()
myset.add('a')
myset.add('p')
myset.add('p')
print(myset)

{'p', 'a'}


**Set Operations**

In [17]:
print(set("apple") | set("pie"))  # union
print(set("apple") & set("pie"))  # intersection
print(set("apple") - set("pie"))  # difference
print(set("pie") - set("apple"))
print(set("pie") <= set("apple")) # subset
print(set("pe") <= set("apple"))
print(set("pe") < set("apple"))   # proper subset
print(set("peal") <= set("apple"))
print(set("peal") < set("apple"))
print('i' in set("pie"))

{'l', 'p', 'e', 'a', 'i'}
{'e', 'p'}
{'l', 'a'}
{'i'}
False
True
True
True
False
True


**Example of set usage: find unique words in document**

In [31]:
def getword(token):
    token = token.strip(',.')
    if not token.isalpha():
        return None
    if len(token) < 4:
        return None
    return token.lower()

In [32]:
from collections import Counter

unique_words = set()
word_count = Counter()
for line in open('metamorphosis.txt'):
    tokens = line.split()  # default separator is whitespace
    for token in tokens:
        word = getword(token)
        if word:
            unique_words.add(word)
            word_count.update([word])

In [33]:
print(len(word_count))
print(len(unique_words))

64
64


In [34]:
print(unique_words)

{'right', 'tried', 'quite', 'that', 'drops', 'unable', 'onto', 'felt', 'into', 'only', 'stopped', 'legs', 'dull', 'made', 'there', 'thought', 'never', 'before', 'gregor', 'nonsense', 'pain', 'about', 'himself', 'hundred', 'sleeping', 'have', 'because', 'little', 'must', 'when', 'something', 'then', 'where', 'present', 'window', 'state', 'look', 'used', 'rolled', 'back', 'shut', 'floundering', 'began', 'this', 'threw', 'hitting', 'forget', 'which', 'weather', 'longer', 'position', 'could', 'pane', 'mild', 'always', 'times', 'eyes', 'rain', 'feel', 'hard', 'however', 'heard', 'turned', 'sleep'}


In [35]:
print(sorted(unique_words))

['about', 'always', 'back', 'because', 'before', 'began', 'could', 'drops', 'dull', 'eyes', 'feel', 'felt', 'floundering', 'forget', 'gregor', 'hard', 'have', 'heard', 'himself', 'hitting', 'however', 'hundred', 'into', 'legs', 'little', 'longer', 'look', 'made', 'mild', 'must', 'never', 'nonsense', 'only', 'onto', 'pain', 'pane', 'position', 'present', 'quite', 'rain', 'right', 'rolled', 'shut', 'sleep', 'sleeping', 'something', 'state', 'stopped', 'that', 'then', 'there', 'this', 'thought', 'threw', 'times', 'tried', 'turned', 'unable', 'used', 'weather', 'when', 'where', 'which', 'window']


---

#### <font color="brown">Timing Execution</font>

In [36]:
# set membership check is supposed to be very fast!
long_list = [x for x in range(100000) if x % 3 == 0]
len(long_list)

33334

In [37]:
# do this for membership, converting list into set:
print(89232 in set(long_list))
# instead of directly doing membership on list
print(89232 in long_list)

True
True


**Exactly how much faster is it to do membership in set, as opposed to membership in list?**

In [39]:
import timeit

**Version 1 - Timing combined List/Set construction and membership test**

In [45]:
timeit.timeit('89232 in [x for x in range(100000) if x % 3 == 0]',number=10)

0.07280360699951416

In [46]:
timeit.timeit('89232 in set([x for x in range(100000) if x % 3 == 0])',number=10)

0.07695339200290618

**Version 2 - Timing only membership, after one-time setup of list/set**

In [47]:
timeit.timeit('89232 in lst',setup='lst=[x for x in range(100000) if x % 3 == 0]',number=100)

0.03619240900297882

In [49]:
t1=timeit.timeit('89232 in lst',setup='lst=[x for x in range(100000) if x % 3 == 0]',number=100)
print(t1)

0.03600813099910738


In [50]:
t2=timeit.timeit('89232 in lstset',setup='lstset = set([x for x in range(100000) if x % 3 == 0])',number=100)
print(t2)

6.581998604815453e-06


In [52]:
print(t1/t2)

5470.6986678428475


**Version 3 - Do setup outside, and pass global namespace in to timeit**

In [53]:
alst = [x for x in range(100000) if x % 3 == 0]
timeit.timeit('89232 in alst',number=100)

NameError: name 'alst' is not defined

**Does not work because the timeit function does not have access to calling namespace, so alst is unknown**

In [54]:
alst = [x for x in range(100000) if x % 3 == 0]
timeit.timeit('89232 in alst',globals=globals(),number=100)

0.03596254899457563

In [55]:
aset = set([x for x in range(100000) if x % 3 == 0])
timeit.timeit('89232 in aset',globals=globals(),number=100)

7.201000698842108e-06

---

#### <font color="brown">Random</font>
- bring up Python reference, and search for random
- comes up with random module
- go over choice and randint, and use them in the following

In [68]:
import random

In [69]:
articles = ["the", "a"]
subjects = ["man", "woman", "boy", "girl", "scientist", "loser", "poser"]
verbs = ["jumped", "sang", "ran", "cried", "laughed", "played", "programmed"]
adverbs = ["loudly", "badly", "heavily", "softly", "madly", "sadly"]

In [70]:
line = 1
while line < 7:
    str = ""
    str += random.choice(articles)
    str += " " + random.choice(subjects)
    str += " " + random.choice(verbs)
    adv = random.randint(0,1)
    if adv:
         str += " " + random.choice(adverbs)
    line += 1
print(str)

the boy laughed heavily
