# DNDS6013 Scientific Python: 9th Class
## Central European University, Winter 2019/2020

Instructor: Márton Pósfai, TA: Luis Natera Orozco

Emails: posfaim@ceu.edu, natera_luis@phd.ceu.edu



Welcome to our first online class! You are expected to go through this notebook and work on the exercises on your own. Solutions are provided in the usual way, however, try to first figure them out yourself before you look at them. 

If you have any question or need help you can contact us through:
* skype meeting, follow this [link](http://posfaim.web.elte.hu/skype.html) to join a group call
* the usual [slack channel](http://sp2020winter.slack.com)
* email: posfaim@ceu.edu, natera_luis@phd.ceu.edu

At the end of the notebook there is an exercise that requires you to generate and save a figure. Upload this figure to moodle, to show that you completed the class.


## Today's question
**How did the exchange rate between USD and HUF change recently?**

To answer this using automated python code we will learn about
* JSON file format
* Using API of on online service
* Plotting with dates


## A structured data format: JSON

The full name of the format is [Javascript Object Notation](http://en.wikipedia.org/wiki/JSON). It is basically a text file or a string that encodes data in a logical hierarchical sturcture. Look at the following example, it should look familiar to you!

<pre>
{
  "firstName": "John",
  "lastName": "Smith",
  "isAlive": true,
  "age": 27,
  "address": {
    "streetAddress": "21 2nd Street",
    "city": "New York",
    "state": "NY",
    "postalCode": "10021-3100"
  },
  "phoneNumbers": [
    {
      "type": "home",
      "number": "212 555-1234"
    },
    {
      "type": "office",
      "number": "646 555-4567"
    },
    {
      "type": "mobile",
      "number": "123 456-7890"
    }
  ],
  "children": [],
  "spouse": null
}
</pre>

If you thought this looks like a combination of nested dictionaries and lists in pyhton, you were right!

This data format has become **very popular recently** because it's not only how you write a python dict but also a **Javascript object**. This means a web browser can trivially parse a JSON string. Other formats, like XML, require real parser code.

#### JSON in Python

Let's see how to use JSON data in python then we'll return to the exchange data.

In [220]:
import json #module that can parse and create JSON strings

D = {'name' : 'Alice'}
json_str = json.dumps(D) # dumps -> creates a JSON string

print(json_str)
print(type(json_str))


{"name": "Alice"}
<class 'str'>


Notice that in the dictionary definition we used single quotes `'`, but the JSON string contains double qoutes `"`. In python, we can use both `'` and `"` interchangebly; in JSON, however, only double quotes are accepted.

JSON can represent more than dicts, like a list of dicts:

In [221]:
# note the single- vs. double-quotes...
string = '[ {"name":"Bob","age":28}, {"name":"Alice","age":23} ]'
print(type(string))
D = json.loads(string) # create an appropriate combination of lists and strings
print(D)
print(type(D))
print(D[0]["name"], D[0]["age"]) # we can access elements in the usual way 
print(type(D[0]))

<class 'str'>
[{'name': 'Bob', 'age': 28}, {'name': 'Alice', 'age': 23}]
<class 'list'>
Bob 28
<class 'dict'>


### Exercise -- JSON

* Take the first JSON example in this notebook and copy it to a text file.
* Read it at one (not line by line) as a string, and then convert it to a dictionary `d` by using the module json.
* Print out all phone numbers.

<details><summary><u>Hint.</u></summary>
<p>

To read the entire contents of a file `f` into a string use `f.read()`.

</p>
</details>

<details><summary><u>Solution.</u></summary>
<p>


```python
with open("json_example.txt","r",encoding="utf-8") as f:
    text = f.read()
d = json.loads(text)

for num in d["phoneNumbers"]:
    print(num["number"])
```

    
</p>
</details>

## Exchange rates through accessing a web API

As a data-collection example, suppose we want to find out how currencies compare to one another over time. In other words, let's plot a time series of [exchange rates](http://en.wikipedia.org/wiki/Exchange_rate).

There's a nice, free website called https://openexchangerates.org. They provide a nice API to get exchange rate data. Let's use this.

**BTW**, API means "Application Programming Interface" and is a set of functions and procedures that allow code to access the features or data of an operating system, application, or other service.

To use their API, we need to register with them and get an **APP ID**. This lets them track how often you call their website and block you if you do too much (this is known as rate limiting).


### Exercise -- APP ID

Register to [Open Exchange Rates](https://openexchangerates.org/signup/free) and copy-paste your own APP ID below. 

In [240]:
app_id = "your brand new id goes here"

<details><summary><u>Solution.</u></summary>
<p>

Everyone has a different ID! The APP ID is a string of 32 characters.
    
</p>
</details>

Now let's download something and see what we get. Their [docs](https://openexchangerates.org/documentation) help us see how to build a URL.

In [222]:
# build a url from pieces:
base_url = "http://openexchangerates.org/api"
id_str   = "app_id="+app_id
#stich it together
URL = base_url+"/historical/2011-10-18.json?"+id_str # this format is specified at the end of the doc page

In [224]:
#let's check it
URL

'http://openexchangerates.org/api/historical/2011-10-18.json?app_id=9627f16ce27f4040aa163e93a4db525d'

OK, let's download the text of that "page" and see what we get.

In [225]:
import urllib.request
result = urllib.request.urlopen(URL)
text = result.read()

In [226]:
print(type(text))
# now print the beginning and end of the text:
print(text[:1500])
print('\n')
print(text[-300:])

<class 'bytes'>
b'{\n  "disclaimer": "Usage subject to terms: https://openexchangerates.org/terms",\n  "license": "https://openexchangerates.org/license",\n  "timestamp": 1318953600,\n  "base": "USD",\n  "rates": {\n    "AED": 3.67285,\n    "AFN": 48.325965,\n    "ALL": 102.607855,\n    "AMD": 376.327731,\n    "ANG": 1.77665,\n    "AOA": 94.851761,\n    "ARS": 4.215038,\n    "AUD": 0.979142,\n    "AWG": 1.79025,\n    "AZN": 0.786155,\n    "BAM": 1.429934,\n    "BBD": 2,\n    "BDT": 75.987773,\n    "BGN": 1.430108,\n    "BHD": 0.37653,\n    "BIF": 1231.30548,\n    "BMD": 1,\n    "BND": 1.272581,\n    "BOB": 7.013496,\n    "BRL": 1.767354,\n    "BSD": 1,\n    "BTN": 49.334603,\n    "BWP": 7.340381,\n    "BYR": 4395.431805,\n    "BZD": 1.99315,\n    "CAD": 1.018634,\n    "CDF": 915.22783,\n    "CHF": 0.900405,\n    "CLF": 0.021176,\n    "CLP": 510.174179,\n    "CNY": 6.3813,\n    "COP": 1894.791035,\n    "CRC": 510.707928,\n    "CVE": 80.624452,\n    "CZK": 18.206884,\n    "DJF": 177.721,

**Great!** This means we can take the text from that website and run it through json.loads and we have a nice accessible python dict:

In [227]:
data = json.loads(str(text,"utf-8")) # This comes from our API, remember?
print(type(data))
print(list(data.keys()))

<class 'dict'>
['disclaimer', 'license', 'timestamp', 'base', 'rates']


Sweet. Now we see there's a timestamp key. What does it give us?

In [228]:
print(data["timestamp"])

1318953600


Is that a UNIX timestamp? Yup! If you don't remember what this is, take the notebook of class 6, and look at the epoch paragraph.

This format is so common that `datetime` has a builtin method to deal with it, `fromtimestamp`:

In [42]:
import datetime
t = datetime.datetime.fromtimestamp(data["timestamp"])

print(t)
print(type(t)) # this is datetime, not a timedelta. Do you remember the difference?

2011-10-18 18:00:00
<class 'datetime.datetime'>


Our original URL had `historical/2011-10-18.json` in it, so that timestamp makes perfect sense.
There are also the `base` and `rates` keys. Those are the actual exchange rate data:

In [13]:
print(data["base"])
print(type(data["rates"]))
print(list(data["rates"].keys())[:5]) # print first five keys

USD
<class 'dict'>
['AED', 'AFN', 'ALL', 'AMD', 'ANG']


`base` tells us what currency the exchange rate is relative to. `rates` is another dict, keyed by three-letter currency name.

In [43]:
print(data["rates"]["USD"])

1


Makes sense, the conversion for USD should always be 1 since the base was USD. Let's check if the exchange rate with the hungarian forint is present, and let's have a look at it:

### Exercise

Write a function that takes a currency code and the exchange rate dictionary as input and
* prints out the exchange rate if the currency is included in the dictionary
* prints out a message if not included

<details><summary><u>Hint.</u></summary>
<p>
    
To check dictionary `D` contains `key` use `if key in D`.
    
</p>
</details>

<details><summary><u>Solution.</u></summary>
<p>


```python
def exrate(currency, data):
    if currency in data["rates"]:
        print('The exchange rate of',currency, "is", data["rates"][currency])
    else:
        print(currency, 'is not included')

exrate('HUF', data)
exrate('Imaginary dollars',data)
```

    
</p>
</details>

### Exercise

How many different currencies are there in the json files? How can you check it quickly?

<details><summary><u>Hint.</u></summary>
<p>
    
To get the number of items in a dictionary `D` use `len(D)`.
    
</p>
</details>


<details><summary><u>Solution.</u></summary>
<p>


```python
print(len(data['rates']))
```

    
</p>
</details>

Now, having a `data` dict like this may seem a little verbose compared to a table or CSV file. A CSV file for exchange rates makes a lot of sense but many data do not fit into a nice regular form like that. Send JSON "over the wire" and using dictionary keys makes it easy for us to keep track of what number correspond to what unit of measurement.

## Putting it all together

We want to plot the USD-HUF exchange rate for the last 60 days, we now have all the tools to do this. Let's break down the task into smaller pieces:
1. Construct a list containing the appropriate dates using `datetime`.
2. Download and save the data into a dictionary using `json`.
3. Create plot using `matplotlib`.

### Part 1 -- Dates

To download the exchange rates of a specific day we have to construct and url containing the date:
```
http://openexchangerates.org/api/historical/2011-10-18.json?app_id=9627f16ce27f4040aa163e93a4db525d
```
For this we can use the `datetime` module.

### Recap: datetime

In [229]:
import datetime

DT = datetime.datetime(1985, 10, 26, 12, 0, 0)
print("formatted date:", DT.strftime("%Y-%m-%d"))

Dnow = datetime.datetime.now()
print("today:", Dnow.strftime("%Y-%m-%d"))

td = datetime.timedelta(days=1)
Dyester = Dnow-td
print("yesterday:", Dyester.strftime("%Y-%m-%d"))


formatted date: 1985-10-26
today: 2020-03-12
yesterday: 2020-03-11


### Exercise -- Date list

Create a list called `dt_list` that contains datetime objects of the last 60 days including today.

<details><summary><u>Hint.</u></summary>
<p>
    
To get the `datetime` object `k` subtract `datetime.timedelta(days=k)` from today.
    
</p>
</details>

<details><summary><u>Solution.</u></summary>
<p>


```python
dt_list = [Dnow - datetime.timedelta(days=60-t) for t in range(1,121)]
print(dt_list[-1])
```

    
</p>
</details>

### Exercise -- Downloading the data

Create a for loop over these dates, in each loop download the json exchange data, and save it to a list dictionaries ``daily_data``.

**Important**: It should take only a few seconds. If you are not sure your code works, test only with a few days, as you can reach the rate limit. 

<details><summary><u>Hint.</u></summary>
<p>
    
When constructing the urls use the `D.strftime("%Y-%m-%d")` method to create the correct date format.
</p>
</details>

<details><summary><u>Solution.</u></summary>
<p>


```python
import json
url_begin = "http://openexchangerates.org/api/historical/"
url_end = ".json?app_id="+app_id
daily_data = []
for D in dt_list:
    URL = url_begin + D.strftime("%Y-%m-%d") + url_end
    result = urllib.request.urlopen(URL)
    text = result.read()
    data = json.loads(str(text,"utf-8"))
    daily_data.append(data)
```

    
</p>
</details>

I assume you have your `daily_data`. Now we can look at some data, for example the USD-EUR exchange rate yesterday was:

In [69]:
daily_data[-2]['rates']['EUR']

0.902694

### Exercise -- list of rates

Create a list called `xrate_list` that contains the USD-HUF exchange rates for each day.

<details><summary><u>Solution.</u></summary>
<p>


```python
xrate_list = [data['rates']['HUF'] for data in daily_data]
```

    
</p>
</details>

### Exercise -- plotting

Plot the exchange rate versus the date. Check the documentation of the `plt.plot_date()` function. Try to make the plot look appealing.

<details><summary><u>Hint.</u></summary>
<p>
    
For example, check out the use of `plt.MaxNLocator()` to have fewer tics on the x axis.
</p>
</details>

<details><summary><u>Solution.</u></summary>
<p>


```python
import matplotlib.pyplot as plt

fig, ax = plt.subplots(figsize=(8,4))

plt.plot_date(dt_list, xrate_list, "o-")

#label the axes
plt.xlabel("Year",        fontsize=14)
plt.ylabel("HUF vs. USD", fontsize=14)

#to make it look nicer, we set the tics on the x axis so that the dates do not overlap
plt.tick_params(labelsize=12)
xloc = plt.MaxNLocator(4)
ax.xaxis.set_major_locator(xloc)

plt.show()
```

    
</p>
</details>

## Final Exercise

1. Plot one of the following figures:
    * Find the two currencies whose exchange rates increased the most and the least relative to its price 60 days ago. Plot the relative price of these two currencies over time, i.e., `relative_price = price / price_60_days_ago`.
    * Create a function that given a list and a window size, return the [moving average](https://en.wikipedia.org/wiki/Moving_average) with that window. Apply it to the HUF-USD with window size 5 and plot it together with the raw data.
    * Download the exchang rates for the first day of each month of 2019 and plot your favorite currency.
    * Come up with an interesting plot yourself.
2. Set the title of the plot to be your name and save the figure as a pdf.
3. **Upload the pdf to Moodle!**

<details><summary><u>Hint.</u></summary>
<p>

Use `plt.savefig("myfigure.pdf")` to save the figure.
</p>
</details>

### Additional exercise

Are you done and you would like some more exercise? Write a function that prints out the number of people currently in space.
<details><summary><u>Hint.</u></summary>
<p>
    
Check out this data source: http://api.open-notify.org/astros.json
</p>
</details>

b'{"people": [{"craft": "ISS", "name": "Andrew Morgan"}, {"craft": "ISS", "name": "Oleg Skripochka"}, {"craft": "ISS", "name": "Jessica Meir"}], "message": "success", "number": 3}'
