> **IMPORTANT:** Every week, you will be solving exercises in a Jupyter notebook that looks like this one. Because you are cloning a Github repository that only I can push to, you should **NOT EDIT** any of the files you pull from Github. Instead, what you should do, is either make a new notebook and write your solutions in there, or make a copy of this notebook and **save it somewhere else** on your computer, not inside the `caobd_f19` folder that you cloned, so you can write your answers in there. **If you don't follow this advice your solutions may be overwritten and lost**.

# Week 1: Coding with data in Python

We start out with the basics. The exercises today cover:

* Writing Python code and Markdown in Jupyter notebooks
* Introductory Python
* Getting some data from Reddit

**Feedback:** I'm always trying to improve. If you find errors or have concerns you can voice them safely and anonymously at https://ulfaslak.com/vent. You can also send me an email at ulfaslak@gmail.com or talk to me in class. I care about everything you have to say.

## Exercises

### Part 1: Know thy notebook

This document is what we call a *Jupyter notebook*. We will be using these extensively throughout the course so **READ THIS CLOSELY**. There are two basic things you need to know about Jupyter notebooks:

1. A notebook is nothing but a list of cells. A cell can either be a **code cell** or a **Markdown cell**. Code cells are for writing executable code, and Markdown cells (like this one) are for explaining things in text and making your notebook more readable. A typical workflow that you will soon get use to, is something like: solving a problem with some code in a *code cell* and explaining your reasoning or the results you obtained in a *Markdown cell*. You can toggle cell type when you are in *command mode* by pressing <kbd>y</kbd> for code and <kbd>m</kbd> for Markdown. **Try to do that**. Change this Markdown cell to a code cell, and change it back again. What happens if you execute (<kbd>shift</kbd>+<kbd>enter</kbd>) this cell as code cell, compared to when it is a Markdown cell?

2. The notebook has two *modes*: **edit mode** and **command mode**. You enter command mode by pressing <kbd>esc</kbd> or clicking outside a cell, and edit mode by clicking a cell and pressing <kbd>enter</kbd> or double clicking a cell. When you're in edit mode, the outline of the current cell turns green (not with `jupyter lab`, though, there the bar is always blue)and whatever you type into your keyboard goes into that cell, whether it is a code or Markdown cell. [Here](http://maxmelnick.com/2016/04/19/python-beginner-tips-and-tricks.html)'s a nice rundown of the different commands you can use. **Beware of <kbd>x</kbd> and <kbd>d</kbd>**. Read the full list of hotkeys by pressing <kbd>h</kbd> in command mode to figure out why.

>*Heads up:* Because we'll be using Jupyter notebooks so much in this course, I strongly recommend investing 5 minutes more than you would normally, playing around with cell types, modes and hotkeys. It will save you heaps of time down the road.

When you run a code cell by pressing <kbd>shift</kbd> + <kbd>enter</kbd>, the code gets evaluated by the Python interpreter installed on your computer. The interpreter always returns some output, so unless you store it in a variable, it gets printed below the cell. In general, you will use code cells for doing analysis and working with data.

*Markdown* is a simple markup language for formatting text (similar to *HTML* or $\LaTeX$, which you may know). You will typically use it for writing explanations about how you solve the exercises and the results you get, and styling your notebook with sections and subsections. It can do **bold**, *italics* and $\LaTeX$ formatting (for equations), and much much more. You can read about the Markdown language [here](http://daringfireball.net/projects/markdown/).

Below is your first exercise. The exercise are numbered by the convention `[session]`.`[section]`.`[problem]`.`[subproblem]`. For example, exercise 4.2.3.1 is in week 4, section 2, problem 3, and subproblem 1.

>**Ex. 1.0.1**: In the Markdown cell below, write a short text that shows that you can:
>* Create sections
>* Write words in bold and italics
>* Write an equation in LateX formatting
>* Create bullet lists
>* Create [hyperlinks](https://en.wikipedia.org/wiki/Hyperlink)

>*Hint: Remember to execute the cell (<kbd>shift</kbd>+<kbd>enter</kbd>) so the Markdown gets rendered.*

[Answer to Ex. 1.0.1]
# section title

**bold**

*italics*

$$f(x) = x + 3$$

- a
- b
- c

[hyperlinks](https://something.com)

### Part 2: Essential Python (DSFS Chapter 2)

These exercises take you through some very basic Python functionality. Use them to calibrate your expectations: If you find them hard, you must spend some more time getting up to speed (see the [preperation goals](https://canvas.disabroad.org/courses/3430/pages/sessions) for today's session).

>**Ex. 1.1.1**: Create a list `a` that contains the numbers from $1$ to $1110$ (including $1$ and $1110$), incremented by one, using the `range` function.

In [6]:
# [Answer to Ex. 1.1.1]
a = list(range(1,1110,1))
a

[1,
 2,
 3,
 4,
 5,
 6,
 7,
 8,
 9,
 10,
 11,
 12,
 13,
 14,
 15,
 16,
 17,
 18,
 19,
 20,
 21,
 22,
 23,
 24,
 25,
 26,
 27,
 28,
 29,
 30,
 31,
 32,
 33,
 34,
 35,
 36,
 37,
 38,
 39,
 40,
 41,
 42,
 43,
 44,
 45,
 46,
 47,
 48,
 49,
 50,
 51,
 52,
 53,
 54,
 55,
 56,
 57,
 58,
 59,
 60,
 61,
 62,
 63,
 64,
 65,
 66,
 67,
 68,
 69,
 70,
 71,
 72,
 73,
 74,
 75,
 76,
 77,
 78,
 79,
 80,
 81,
 82,
 83,
 84,
 85,
 86,
 87,
 88,
 89,
 90,
 91,
 92,
 93,
 94,
 95,
 96,
 97,
 98,
 99,
 100,
 101,
 102,
 103,
 104,
 105,
 106,
 107,
 108,
 109,
 110,
 111,
 112,
 113,
 114,
 115,
 116,
 117,
 118,
 119,
 120,
 121,
 122,
 123,
 124,
 125,
 126,
 127,
 128,
 129,
 130,
 131,
 132,
 133,
 134,
 135,
 136,
 137,
 138,
 139,
 140,
 141,
 142,
 143,
 144,
 145,
 146,
 147,
 148,
 149,
 150,
 151,
 152,
 153,
 154,
 155,
 156,
 157,
 158,
 159,
 160,
 161,
 162,
 163,
 164,
 165,
 166,
 167,
 168,
 169,
 170,
 171,
 172,
 173,
 174,
 175,
 176,
 177,
 178,
 179,
 180,
 181,
 182,
 183,
 184,
 185

>**Ex. 1.1.2**: Show that you understand [slicing](http://stackoverflow.com/questions/509211/explain-pythons-slice-notation) in Python by extracting a list `b` with the numbers from $760$ to $769$ (including both) from the list created above.

In [12]:
# [Answer to Ex. 1.1.2]
b = a[759:769]
b

[760, 761, 762, 763, 764, 765, 766, 767, 768, 769]

>**Ex. 1.1.3**: Define a function that takes as input a number $x$ and outputs the number multiplied by itself plus three $f(x) = x(x+3)$. 

In [13]:
# [Answer to Ex. 1.1.3]
def times_plus_three(x):
    return x * (x+3)

times_plus_three(5)

40

>**Ex. 1.1.4**: Apply this function to every element of the list `b` using a `for` loop and append the results to a new list `c`. Print `c`.

In [14]:
# [Answer to Ex. 1.1.4]
c = []
for x in b:
    c.append(times_plus_three(x))
    
c

[579880,
 581404,
 582930,
 584458,
 585988,
 587520,
 589054,
 590590,
 592128,
 593668]

>**Ex. 1.1.5**: Do the exact same thing using a *list comprehension*.

In [15]:
# [Answer to Ex. 1.1.5]
c = [times_plus_three(x) for x in b]
c

[579880,
 581404,
 582930,
 584458,
 585988,
 587520,
 589054,
 590590,
 592128,
 593668]

>**Ex. 1.1.6**: Write the numbers in `c` to a text file with one number per line.

In [20]:
# [Answer to Ex. 1.1.6]
with open('test_print', 'w') as file:
    for entry in c:
        file.write(str(entry) + '\n')
    file.close()

>**Ex. 1.1.7**: Show that you understand how strings work in Python. You should:
>
>1. Add a comment above each line of code that explains it.
>2. Find all the lines where **a string** is put into a string. How many are there?
>3. Explain the difference between `%d`, `%s` and `%r`.
>
>[Source](https://learnpythonthehardway.org/book/ex6.html)

In [24]:
# This is an example of a comment

# substitues the integer 10 for where the %d is
x = "There are %d types of people." % 10

# string "binary"
binary = "binary"

# string "dont't" ### the single apostrophe is literal bc encapsulated in the double quotes
do_not = "don't"

# substitues the string variables for the %s, respectively
y = "Those who know %s and those who %s." % (binary, do_not)

# prints "There are 10 types of people."
print(x)

# prints "Those who know binary and those who don't"
print(y)

# the %r converts the following argument variabls as python objects
# prints "I said : 'THere are 10 types of people'"
print("I said: %r." % x)
# the %s takes it literally
# prints "I also said: 'Those who know binary and those who don't"
print("I also said: '%s'." % y)

# boolean
hilarious = False
# joke_evaluation is a string that can be evaluatd
joke_evaluation = "Isn't that joke so funny?! %r"

print(joke_evaluation % hilarious)

w = "This is the left side of..."
e = "a string with a right side."

# string concatenation
print(w + e)

There are 10 types of people.
Those who know binary and those who don't.
I said: 'There are 10 types of people.'.
I also said: 'Those who know binary and those who don't.'.
Isn't that joke so funny?! False
This is the left side of...a string with a right side.


[Answer to Ex. 1.1.7.2]

[Answer to Ex. 1.1.7.3]

>**Ex. 1.1.8**: Why does `5 // 2 == 2` in Python 3.7? How is division different between Python 2 and 3?

In [26]:
# [Answer to Ex. 1.1.8]
5 // 2 # integer (floor) division
5 / 2 # floating point division

2.5

>**Ex. 1.1.9**: What is the point of using `try` and `except`? Write some code that shows how to use these.

In [143]:
# [Answer to Ex. 1.1.9]
try:
    print('foo' / 0)
except:
    print('oopsies')

oopsies


>**Ex 1.1.10**: `dict`s and `defaultdict`s.
1. What is a `defaultdict`? How would you say it is different from a normal Python `dict`?
2. Write some code that takes a list of tuples:

>        l = [("a", 1), ("b", 3), ("a", None), ("c", False), ("b", True), ("a", None)]

>     And produces a `defaultdict` object

>        defaultdict(<class 'list'>, {'a': [1, None, None], 'c': [False], 'b': [3, True]})

>*Hint: you can import `defaultdict` from `collections`*

In [34]:
# [Answer to Ex. 1.1.10]
from collections import defaultdict

l = [("a", 1), ("b", 3), ("a", None), ("c", False), ("b", True), ("a", None)]

ddict = defaultdict(list)
for k,v in l:
    ddict[k].append(v)

ddict

defaultdict(list, {'a': [1, None, None], 'b': [3, True], 'c': [False]})

>**Ex 1.1.11**: Take a list `a = list("justreadtheinstructions")` and
1. count the number of times each element occurs using `Counter`,
2. report the two most common elements

>*Hint: you can import `Counter` from `collections`*

In [51]:
# [Answer to Ex. 1.1.11]
from collections import Counter


a = list("justredtheinstructions")
count = Counter(a)
two_most_common = count.most_common(2)

for pair in two_most_common:
    print(pair[0])

t
s


>**Ex 1.1.12**: Take another list `b = list("ofcourseistillloveyou")` and
1. get the `set` of characters that exist in both `a` and `b` (intersection),
2. get the `set` of characters that exist in either `a` or `b` (union), and
3. compute the [Jaccard similarity](https://en.wikipedia.org/wiki/Jaccard_index) between the distinct elements in `a` and `b`.

>*Hint: use the `set` function to get a `set`-type object of distinct elements from a list*

In [63]:
# [Answer to Ex. 1.1.12]
b = list("Ofcourseistillloveyou")

intersection = set(a).intersection(set(b))
print('intersection: ' , intersection)

union = set(a).union(set(b))
print('union: ', union)

jaccard_similarity = len(intersection) / len(union)
print('jaccard similarity: ', jaccard_similarity)

intersection:  {'r', 'i', 'o', 's', 't', 'c', 'e', 'u'}
union:  {'i', 'f', 'j', 's', 'n', 'c', 't', 'u', 'r', 'h', 'o', 'd', 'y', 'O', 'v', 'e', 'l'}
jaccard similarity:  0.47058823529411764


### Part 3: A little bit of real data

>**Ex. 1.2.1**: Learn about JSON by reading the **[wikipedia page](https://en.wikipedia.org/wiki/JSON)**. Then answer the following questions in the cell below. 
>
>1. What do the letters stand for?
>2. What is `json`?
>3. Why is `json` superior to `xml`? (... or why not?)

[Answers to Ex. 1.2.1.1-3]

1. JavaScript Object Notation
2. `json` is a file format to transmit data objects consisting of key-value pairs and array data types
3. `json` is superior to `xml` because XML attributes can only have a single value and each attribute can appear at most once on each element. Also, XML has a separation of data from metadata whereas `json` does not. Furthermore, `json` has key->value mappings whereas XML addressing happens on 'nodes' with unique IDs. 

>**Ex. 1.2.2**: Working with JSON files
>1. Use [`requests`](https://www.google.dk/search?q=python+requests+get+json&gws_rd=cr&ei=M5OdWaewD8Ti6AS54J24Bg), or another Python module, to store **[this data](https://www.reddit.com/r/gameofthrones/.json)** in a new variable `data`.
>2. What is the [type](https://stackoverflow.com/questions/2225038/determine-the-type-of-an-object) of `data`?

In [89]:
# [Answer to Ex. 1.2.2.1]
import requests

response = requests.get('https://www.reddit.com/r/gameofthrones/.json')

In [102]:
response_json = response.json()
type(response_json)

dict

In [91]:
# [Answer to Ex. 1.2.2.2]

print('data object type: ', type(response))

data object type:  <class 'requests.models.Response'>


>**Ex. 1.2.3**: Let's try to inspect the data you retrieved. 
>
>1. Use the `json` module to print your data variable as a string with `indent=4`.
>2. The data is a dictionary, a type of Python object that stores data as key-value pairs. Print the keys.
>
>*Hint: 1. Use the `json` function `dumps`. 2. Call `.keys()` on the variable.*

In [142]:
# [Answer to Ex. 1.2.3.1]
import json

json_obj = json.dumps(response_json, indent=4)
print(json_obj)

{
    "kind": "Listing",
    "data": {
        "modhash": "",
        "dist": 25,
        "children": [
            {
                "kind": "t3",
                "data": {
                    "approved_at_utc": null,
                    "subreddit": "gameofthrones",
                    "selftext": "This hurts. My grandfather, who rode roller coasters with me and waterskiied in his 80\u2019s, just passed away 2 weeks away from turning 100. He was in the British Navy in WWII. When we were kids he would read to my brother and me from Harry Potter and lord of the rings, (which always sound better read with a British accent), so when my brother and I got my mom (his daughter) in to game of thrones years later, we were not surprised that pop (our name for him), had started watching the show as well. \n\nMy brother, mother, pop, and I loved discussing the show together so much that they all started reading or listening to the A song of Ice and Fire series as well!\nMy pop finished the audio

In [104]:
# [Answer to Ex. 1.2.3.2]
response_json.keys()

dict_keys(['kind', 'data'])

>**Ex. 1.2.4**: The URL reveals that the data is from reddit/r/gameofthrones, but can you recover that information from the data? Give your answer by 'keying' into the JSON data using square brackets.

>*Hint: 'Keying' is a word i just made up. By it, I mean the following. Consider a JSON object such as:*
>
>        my_json_obj = {
>            'cats': {
>                'awesome': ['Missy'],
>                'useless': ['Kim', 'Frank', 'Sandy']
>            },
>            'dogs': {
>                'awesome': ['Finn', 'Dolores', 'Fido', 'Casper'],
>                'useless': []
>            }
>        }
>
>*I can get the list of useless cats by keying into `my_json_obj` like such:*
>
>        >>> my_json_obj['cats']['useless']
>        Out [ ]: ['Kim', 'Frank', 'Sandy']
>
>*`my_json_obj['cats']` returns the dictionary `{'awesome': ['Missy'], 'useless': ['Kim', 'Frank', 'Sandy']}` and getting '`useless`' from that eventually gives us `['Kim', 'Frank', 'Sandy']`. If any of those list items were a list of a dictionary themselves, we could have kept keying deeper into the structure.*

In [119]:
# [Answer to Ex. 1.2.4]
print(response_json['data']['children'][0]['data']['subreddit_name_prefixed'])

r/gameofthrones


>**Ex 1.2.5**: Write two `for` loops (or list comprehensions for extra street credits) which:
>1. Counts the number of spoilers.
>2. Only prints headlines that aren't spoilers.

In [141]:
# [Answer to Ex. 1.2.5.1]
posts = response_json['data']['children']
count = 0
for post in posts:
    if post['data']['spoiler']: # increment for each spoiler
        count+=1
    else: # print headline for non-spoilers
        print(post['data']['title'])
        
print('There are %d spoilers' % count)

[No Spoilers] My grandfather, who has read all of the books and seen every episode of GoT except for the finale, just passed away
[No Spoilers] Hodor and Summer in the woods of Winterfell❤️
[No spoilers] Work from Attila Bátori(Hungary)
[No Spoilers] Game of Thrones theme played on one Guitar!
[No Spoilers] Well played Marvel... Found in Instagram.
[No spoilers] MY FANMADE SEASON 7 EXTENDED CREDITS
[NO SPOILERS] Hello! I would like to show you one of my oil paintings, this one is from April (I really wanted to paint something when there was the last season). If anyone wants to see process of the painting link below
[NO SPOILERS] Found Lady Stark in Rogue Legacy
[No Spoilers] Kit Harington to Cast In Marvel’s Eternals
[NO SPOILERS] Don’t know why Vladimir Furdik wasn’t Emmy nominated. Sometimes the hardest acting is the acting with no words. And he played the Night King beautifully.
[No spoilers] Got to take a picture with Bran the broken at Comic Con Panama
[NO SPOILERS] I gave Light o

In [58]:
# [Answer to Ex. 1.2.5.2]