# Problem Set 4: 3rd party libraries

## Topics covered:

1. Numpy
2. Pandas
3. Matplotlib

## Exercise 1: Have a look at the documentation

Take some time and familiarize yourself with the online documentation for NumPy, Pandas and Matplotlib. These are very important tools for working with data using Python. The goal here is not to read all the documentation, but to get a brief overview of what you can do with these tools.

NumPy: https://numpy.org/doc/1.23/user/absolute_beginners.html

Pandas: https://pandas.pydata.org/docs/getting_started/intro_tutorials/index.html

Matplotlib: https://matplotlib.org/stable/tutorials/index.html

## Exercise 2: NumPy basics

1. Create a 1 dimensional array of numbers from 0 to 9

2. Reshape the array, so it has two rows and five columns

3. Create a new array where you rise all the elements in the array to the power of 2

4. From the array you created in 3., make a slice that only consist of the last row of the array

In [None]:
# Solution 1:

import numpy as np

array_1D = np.arange(10)
print(array_1D)

In [None]:
# Solution 2:

array_2D = array_1D.reshape(2, 5)
print(array_2D)

# can use -1 instead of 5 to automatically get the number of columns that matches two rows
array_2D_alt = array_1D.reshape(2, -1)
print(array_2D_alt)

In [None]:
# Solution 3:

arrayD_pow2 = array_2D**2
arrayD_pow2

In [None]:
# Solution 4:

arrayD_pow2[1]

## Exercise 3: 

Define the following 2-dimensional NumPy array:

$A = \left[\begin{array}{ccc}
1 & 2 & 3\\
4 & 5 & 6\\
7 & 8 & 9
\end{array}\right]$

1. Calculate the shape of the array $A$
2. Calculate the size of the array $A$
3. Calculate the transpose of the array $A$
4. From slicing $A$ compute two new arrays. One that has the two first rows and the two first columns and the other that has the last two rows and columns of $A$. That is:
$$A_1 = \left[\begin{matrix}
 1 & 2 \\
 4 & 5
\end{matrix}\right]\;\; \text{and} \;\; A_2 = \left[\begin{matrix}
 5 & 6 \\
 8 & 9
\end{matrix}\right]$$
5. Using $A_1$ and $A_2$ compute the element-wise sum, the element-wise product and the matrix product between them

In [None]:
# Solution 1 and 2

A = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

print(f'The shape of A is: {A.shape}')
print(f'The size of A is: {A.size}')

In [None]:
# Solution 3

print('Transpose:')
print(A.T)

In [None]:
# Solution 4

A_1 = A[0:2, 0:2]
A_2 = A[1:3, 1:3]

print(A_1)
print(A_2)

In [None]:
# Solution 5

print('A_1 + A_2')
print(A_1 + A_2)

print('Element-wise product:')
print(A_1 * A_2)

print('Matrix product:')
print(A_1 @ A_2)

## Exercise 4: Solve a system of linear equations

Solve the folowing linear equation system using NumPy:

$$ 3x - y + 14z = 7 $$

$$ 2x + 2y + 3z = 0 $$

$$ x - 12y - 18z = 33 $$

1. Create a NumPy array that represents the coefficients of the linear equation system and call it $A$

2. Create a NumPy array that represents the right hand side of the equation system and call it $b$

3. Solve the equation system using the function `np.linalg.solve()`

In [None]:
# Solution 1

import numpy as np

A = np.array([[3, -1, 14],
              [2, 2, 3],
              [1, -12, -18]])

In [None]:
# Solution 2

b = np.array([[7],
              [0],
              [33]])

In [None]:
# Solution 3

np.linalg.solve(A, b)

## Exercise 4: Pandas basics

1. Create the following DataFrame and you can call it `inflation_data`  (Inflation numbers from https://data.oecd.org/):

|    | Country | Inflation_rate  |Region         |
|--- |---      |---              |---            |
| 0  | Canada  | 7.01            | North America |
| 1  | France  | 5.91            | Europe        |
| 2  | Germany | 7.90            | Europe        |
| 3  | Italy   | 8.37            | Europe        |
| 4  | Japan   | 3.00            | Asia          |
| 5  | UK      | 8.60            | Europe        |
| 6  | US      | 8.26            | North America |

2. Create a new DataFrame, called `inflation_data_sorted`, by sorting the values in `inflation_data`, by the inflation rate (from high to low). Change the index so the new DataFrame has an increasing index (the country with the highest inflation rate should have the index zero, the next highest inflation rate should have the index 1, and so on).

3. Create a new DataFrame from `inflation_data`, called `inflation_data_europe`, that only contain the rows with data for the European countries. 

4. Print out the average inflation rate in the European countries.

Hint: you can create a new DataFrame with data grouped according to the values in a specific column. Try to run the following code:

`inflation_data_regions = inflation_data.groupby('Region').mean()`

5. Play around with the method used for aggregating the data outlined in the hint above. What happens if you use `.sum()` or `.median()` in stead of `.mean()`?

In [None]:
# Solution 1

import pandas as pd

inflation_data = pd.DataFrame([['Canada', 7.01, 'North America'],
                               ['France', 5.91, 'Europe'],
                               ['Germany', 7.90, 'Europe'],
                               ['Italy', 8.37, 'Europe'],
                               ['Japan', 3.00, 'Asia'],
                               ['UK', 8.60, 'Europe'],
                               ['US', 8.26, 'North America']],
                              columns=['Country', 'Inflation_rate', 'Region'])
inflation_data

In [None]:
# Solution 2:

inflation_data_sorted = inflation_data.sort_values(by='Inflation_rate', ascending=False)
inflation_data_sorted.index = range(len(inflation_data_sorted))
inflation_data_sorted

In [None]:
# Solution 3:

inflation_data_europe = inflation_data[inflation_data.Region == 'Europe']
inflation_data_europe

In [None]:
# Solution 4:

# With the groupby method we can aggregate the data by a specific category (in this case by Region)

inflation_data_regions = inflation_data.groupby('Region').mean()
inflation_data_regions

In [None]:
# Solution 5:

# When we use the sum instead of the mean, all values that belong to a group is summed together.
# Note that the 'Country' column is not used.
# this is because it does not make sense to aggregate the values of type str. 

inflation_data.groupby('Region').sum()

In [None]:
# Median is giving a different measure of central tendency, simmilar to the mean

inflation_data.groupby('Region').median()

## Exercise 5: Working with dates

1. Create a Pandas dataframe that contains the same information as the table bellow. 

The first (0th) observation gives the number of coffees that is being sold between 7 and 8. The second observation gives coffees sold between 8 and 9, and so on.

|    | Coffees_sold   | Time                |
|--- |---            |---                  | 
| 0  | 125           | 2022-09-20 07:00:00 | 
| 1  | 163           | 2022-09-20 08:00:00 |
| 2  | 148           | 2022-09-20 09:00:00 |
| 3  | 88            | 2022-09-20 10:00:00 |
| 4  | 80            | 2022-09-20 11:00:00 |
| 5  | 120           | 2022-09-20 12:00:00 |
| 6  | 132           | 2022-09-20 13:00:00 |
| 7  | 130           | 2022-09-20 14:00:00 |
| 8  | 50            | 2022-09-20 15:00:00 |
| 9  | 41            | 2022-09-20 16:00:00 |
| 10 | 15            | 2022-09-20 17:00:00 |

2. Create a slice of the dataframe that only contain the values for coffees sold in the morning (before 10). What is the total amount of coffees sold in these hours during the morning?  

3. Change the index so it uses the values in the 'Time' column and delete the 'Time' column.

Hint: You can delete a column with the following syntax: `del dataframe_name['column_name']`

4. Plot the coffees sold as a line plot 

5. Plot the cumulative amount of coffees sold throughout the day

In [None]:
# Solution 1:

# Note that I take the transpose of the Dataframe when I create it, to make the columns into the index and the index into the columns.

coffee_sales = pd.DataFrame([[125, 163, 148, 88, 80, 120, 132, 130, 50, 41, 15],
                            pd.date_range('2022-09-20 07:00:00', periods=11, freq='H')],
                            index=['Coffees_sold', 'Time']).T
coffee_sales

In [None]:
# Solution 2:


coffee_sales_morning = coffee_sales[coffee_sales.Time < '2022-09-20 10:00:00']
print(f'The total amount of coffees being sold in the morning (between 7 and 10) is: {sum(coffee_sales_morning.Coffees_sold)}')
coffee_sales_morning

In [None]:
# Solution 3:

coffee_sales.index = coffee_sales.Time
del coffee_sales['Time']
coffee_sales

In [None]:
# Solution 4:

import matplotlib.pyplot as plt

plt.figure(figsize=(10,6))
plt.plot(coffee_sales, lw=3, linestyle='--', color='navy')
plt.grid()
plt.show()

In [None]:
# Solution 5:

plt.figure(figsize=(10,6))
plt.plot(coffee_sales.cumsum(), lw=3, linestyle='--', color='navy')
plt.grid()
plt.show()

## Exercise 6: Plot some simple functions

1. Plot the following functions in the same figure. Plot for the x-values 0 to 5 using an interval of 0.1.

a. $y=2x$

b. $y=x^2$

c. $y=2^x$

Hint: create an $x$-variable (that is a NumPy array), and use this array to calculate the $y$-values. The $x$-values can be created as follows: `x = np.arange(0, 5, 0.1)`

2. Plot the same functions for a larger set of $x$-values, e.g., 0 to 15

3. Now the function $y=2^x$ is dominating the figure. Plot the function $y=2^x$ on a separate y-axis (the right axis)

In [None]:
# Solution 1

import numpy as np
import matplotlib.pyplot as plt

x = np.arange(0, 5, 0.1)

fig, ax = plt.subplots(figsize=(8, 5))
ax.plot(x, 2*x, label='y=2x', lw=2) 
ax.plot(x, x**2, label='$y=x^2$', lw=2)
ax.plot(x, 2**x, label='$y=2^x$', lw=2)
ax.set_xlabel('$x$-values')
ax.set_ylabel('$y$-values')
ax.set_title("Simple functions")
ax.legend()
plt.show()

In [None]:
# Solution 2

x = np.arange(0, 15, 0.1)

fig, ax = plt.subplots(figsize=(8, 5))
ax.plot(x, 2*x, label='y=2x', lw=3) 
ax.plot(x, x**2, label='$y=x^2$', lw=3)
ax.plot(x, 2**x, label='$y=2^x$', lw=3)
ax.set_xlabel('$x$-values')
ax.set_ylabel('$y$-values')
ax.set_title("Simple functions")
ax.legend()
plt.show()

In [None]:
# Solution 3

x = np.arange(0, 15, 0.1)

fig, ax = plt.subplots(figsize=(8, 5))
ax.plot(x, 2*x, label='y=2x', lw=3) 
ax.plot(x, x**2, label='$y=x^2$', lw=3)
ax.set_xlabel('$x$-values')
ax.set_ylabel('$y$-values')
ax.set_title("Simple functions")
ax.legend()
ax2 = ax.twinx() 
ax2.plot(x, 2**x, label='$y=2^x$ (right axis)', lw=3, color='green')
ax2.legend(loc='upper center')
plt.show()

## Exercise 7: Create a scatter-plot 

1. Create a scatter plot with two data sets and use a different marker for the two data sets, so you can see which values belong to which data set in the figure.

#### Dataset 1
`x1 = [79, 63, 26, 37, 85, 100, 67, 34, 18, 21]`

`y1 = [21, 46, 3, 35, 67, 95, 53, 72, 58, 10]`
 
#### Dataset 2
`x2 = [26, 29, 48, 65, 6, 5, 34, 66, 72, 40`

`y2 = [26, 34, 90, 33, 38, 23, 56, 2, 47, 15]`

2. Make the data points for data set 1 into triangles and the data points for data set 2 into circles.

3. Give the data points for data set 1 a green color and the data points for data set 2 an orange color.

4. Make the data points bigger in the figure by using input parameter `s`

In [None]:
# Solution 7

import matplotlib.pyplot as plt

# Dataset 1
x1 = [79, 63, 26, 37, 85, 100, 67, 34, 18, 21]
y1 = [21, 46, 3, 35, 67, 95, 53, 72, 58, 10]
 
# Dataset 2
x2 = [26, 29, 48, 65, 6, 5, 34, 66, 72, 40]
y2 = [26, 34, 90, 33, 38, 23, 56, 2, 47, 15]

plt.scatter(x1, y1, 
            c ="green",
            linewidths = 2,
            marker ="^",
            s = 200)
 
plt.scatter(x2, y2, 
            c ="orange",
            linewidths = 2,
            marker ="o",
            s = 200)
 
plt.xlabel("X-axis")
plt.ylabel("Y-axis")
plt.show()

## Exercise 8: Create a bar-plot

Create a bar-plot showing weekly fruit consumption by type of fruit. 

### Data (weekly fruit consumption):

- apples: 4
- bananas: 8
- pears: 3
- oranges: 5

In [None]:
# Solution 8


import matplotlib.pyplot as plt

fruits = ['apple', 'banana', 'pear', 'orange']
counts = [4, 8, 3, 5]
bar_colors = ['red', 'blue', 'red', 'orange']

plt.title('Fruit consumption by type of fruit')
plt.bar(fruits, counts, color=bar_colors)

# Exercise 9: Supply and demand

In economics we often assume that the quantity demanded of a good is getting smaller when the price increases. At the same time, the supply will increase when the price increase and it is more profitable to sell the good. Here is an example of a mathematical model that describes this behaviour:

$(1)\;\;q_D = 40 - 2p$

$(2)\;\;q_S = 6p$ 

a. Plot equation (1) and (2) in the same figure. The $q$'s should be on the $y$-axis and the $p$ on the $x$-axis. Where does the two lines cross?

Hint: Plot the two functions from 0 to 10, e.g., `p = np.arange(0, 10, 0.1)`

b. The equations (1) and (2) is a system of linear equations if we set $q_D=q_S=q$. Rewrite the two equations on the form of $Ax=b$, where
$$x=\left[\begin{matrix}
 p \\
 q
\end{matrix}\right]$$
and solve this system using NumPy.

In [None]:
# Solution 9a:

# Let's create the p-values we will use when calculating the functions

p = np.arange(0, 10, 0.1)

# Then we can calculate the two functions:

qD = 40 - 2*p
qS = 6*p

# And we can plot them in the same figure:

plt.plot(p, qD, 
         lw=3, label='Demand curve')
plt.plot(p, qS, 
         lw=3, label='Supply curve')
plt.grid()
plt.legend()

print('The two lines cross when qD=qS=30 and p=5')

In [None]:
# Solution 9b:

# we can write the equations as follows 
# 2p + q = 40
# -6p + q = 0

# And then we have: 

A = np.array([[2, 1], 
              [-6, 1]])

b = np.array([[40], 
              [0]])

# And the solution to this system is:

np.linalg.solve(A, b)