# Tutorial 2 - Python libraries NumPy and Matplotlib 


*written and revised by Jozsef Arato, Mengfan Zhang, Dominik Pegler*  
Computational Cognition Course, University of Vienna  
https://github.com/univiemops/tewa1-computational-cognition

---

## This week's lab:

We will briefly introduce the Python library NumPy and Matplotlib this week. Comprehensive documentation can be found at https://numpy.org/doc/stable/user/index.html and https://matplotlib.org/stable/users/index.html. 

NumPy is one of the most important libraries for scientific computing in Python. Foundational libraries such as Matplotlib and Pandas, as well as machine learning libraries such as TensorFlow and scikit-learn, which we will discuss in future weeks, often use NumPy arrays as input. In this tutorial, we will cover array creation, indexing, slicing, mathematical operations, and commonly used functions of NumPy. 

Matplotlib is a popular 2D plotting library for Python. It provides a flexible and powerful interface for creating a wide variety of plots, charts, and visualizations. In this tutorial, we will show you how to create simple plots, customize plots, and work with multiple subplots.

**Learning goals:** \
When finishing this tutorial, you should...
*   know how to import libraries and functions  
*   perform basic array operations, calculations, and functions with NumPy 
*   understand the structure of a plot and create custom plots using Matplotlib  

**Estimated time to complete:** 2 hours (depends on your previous knowledge) \
**Deadline:** Next Monday, 23:59

## 1. Import library 

Last week, you already used several built-in Python functions such as `print()`,`len()`, and `type()`. However, there are not so many of them and these functions are limited. If we want to do more complicated things, we can make use of Python libraries. 

To get access to functionality of the non-built-in libraries, you need to first explicitly import them by:\
`import library` \
and usually abbreviate the library name by two or three letters: \
`import library as abbreviation` 

Then the functions can be called by: \
`library.function` or `abbreviation.function` 

If you don't need the whole library, but just the specific functions, you can import by: \
`from library import function 1, function 2, etc` \
or you can customize the import \
`from library import someFunction as newName`

## 2. NumPy 

NumPy provides functions to create and work with arrays, a data format that makes numerical computation easier. NumPy arrays can have any number of dimensions. A one-dimensional array is similar to a simple list, a two-dimensional array is similar to a matrix, and arrays with more than two dimensions are also commonly used. For example, fMRI data can typically be represented as 4D arrays (3 dimensions represent x,y,z axis of the brain volume + 1 dimension of time points).

Now, let's import the library and start from creating an array. 

In [None]:
import numpy as np

# it just makes our lives easier if we all use the same abbreviation

To see how a function works, just type it after a ? mark:

In [None]:
?np.zeros 

There are many arbitrary conventions in how Python's built-in functions work, you can sometimes figure these out by trial and error, but it is often easier to use the built-in help by typing the ? in front and running the cell, or you can always Google it. 

### 2.1 Creation of numpy array 

In the below cell, we create a 1D numpy array of five zeros using the function np.zeros.

In [None]:
my_zeros = np.zeros(5)
print(my_zeros)

Try to make an 1D array called "ar_of_ones" of 200 ones using the np.ones in the cell below.

In [None]:
# YOUR CODE HERE

<div class="alert alert-info">The cell below tests if your code is correct or not. Just run the cell, and it will give you an error message or a compliment. We insert a test cell after every cell that requires you to write code. </div>

In [None]:
# Test cell
try:
    assert all(ar_of_ones) == 1 and sum(ar_of_ones) == 200
except AssertionError as msg:
    print("'ar_of_ones' is not correctly defined.")
    raise (msg)
else:
    print("Good!")

This allows us to easily create an array with the same values in it. If we want to make an array of 200 lengths with only 55, we can multiply an array of ones by 55 (or plus 54). 

In [None]:
ar_of_fiftyfives = ar_of_ones * 55
ar_of_fiftyfives

Alternatively, we can use `np.full`.

In [None]:
alt_to_previous = np.full(200, 55)
alt_to_previous

Another way to create a numpy array is to first define a list, and then you can convert this list to a numpy array using `np.array` function. Or you can combine these two steps to manually create a numpy array from a list of numbers.

In [None]:
my_list = [
    10,
    -43,
    45,
    2,
    56,
    67,
    76,
    12,
    -1,
    2,
    55,
    2345,
    4,
    345,
    4,
    5,
    67,
    7,
    545,
    3,
    5,
    3564,
    24,
]
my_array = np.array(my_list)

# alternatively:
my_array = np.array(
    [
        10,
        -43,
        45,
        2,
        56,
        67,
        76,
        12,
        -1,
        2,
        55,
        2345,
        4,
        345,
        4,
        5,
        67,
        7,
        545,
        3,
        5,
        3564,
        24,
    ]
)

print(my_array)

A simple list is converted to 1D array, but a nested list (a list within a list) is converted to 2D or higher dimensions based on the list structure:

In [None]:
my_2D_list = [[1, 2, 3], [4, 5, 6]]
my_2D_array = np.array(my_2D_list)

my_3D_list = [
    [[1, 2, 3], [4, 5, 6]],
    [[7, 8, 9], [10, 11, 12]],
    [[13, 14, 15], [16, 17, 18]],
]
my_3D_array = np.array(my_3D_list)

print("A 2D array:", "\n", my_2D_array, "\n")
print("A 3D array:", "\n", my_3D_array)

Apart from numbers, a numeric numpy array can handle missing values using np.NAN:

In [None]:
array_missing = np.array([10, 21, 33, 4, np.NaN, 7])
print("Array with a missing value", array_missing, "\n")

array_without_missing = array_missing[~np.isnan(array_missing)]  # ~ is bitwise NOT
print("Array without a missing value", array_without_missing)

As you might expect, creating nested lists and converting them to numpy arrays can quickly become troublesome as the number of dimensions or elements increases. Fortunately, Numpy provides various functions to efficiently initialize high-dimensional arrays, such as `np.zeros`, `np.ones`, and `np.empty`. Once you have initialized an array with zeros or ones of your desired dimensionality, you can fill in the array using loops or other operations.

In [None]:
my_dimension = (
    3,
    2,
    5,
)  # use () for defining dimensionality, here I want a 3D array with shape of 3*2*5
my_3d_zeros = np.zeros(my_dimension)

my_3d_ones = np.ones((2, 4, 3))

print("A 3d array with zeros:", "\n", my_3d_zeros, "\n")
print("A 3d array with ones:", "\n", my_3d_ones)

You may wonder why we don't do operations on the list directly? Why do we need the numpy array instead? Recall that we mentioned last week that Python's list allows different types of data as its elements. However, the numpy array contains only a single data type. This makes the numpy array memory more efficient compared to the list, and Python doesn't need to check the data type of the elements when performing the operation. Also, NumPy supports broadcasting and vectorized operations (we will mention them later), which can apply operations to the entire array without looping over individual elements. As a result, operations on NumPy arrays can be much faster, and this is really important when you do complex operations. See an example below:

In [None]:
time_list = %timeit -n 10000 [x + 1 for x in range(1000)] # for loop using list
time_nparray = %timeit -n 10000 np.arange(1000) + 1 # vectorized operation using numpy array

### 2.2 Indexing and slicing

Indexing and slicing a 1D numpy array is very similar to indexing and slicing a simple Python list as we discussed last week. 

In the cell below, try to extract the fifth element of the variable "my_array", and store it in "five_ele". Also store the second to the ninth elements of "my_array" in the new variable "two_to_nine_ele".

In [None]:
my_array = np.array(
    [
        10,
        -43,
        45,
        2,
        56,
        67,
        76,
        12,
        -1,
        2,
        55,
        2345,
        4,
        345,
        4,
        5,
        67,
        7,
        545,
        3,
        5,
        3564,
        24,
    ]
)

# YOUR CODE HERE

In [None]:
# Test cell
try:
    assert "five_ele" in dir() and "two_to_nine_ele" in dir()
except AssertionError as msg:
    print("Variable 'five_ele' or 'two_to_nine_ele' is not defined.")
    raise (msg)

try:
    assert five_ele == 56
except AssertionError as msg:
    print("The value of 'five_ele' is wrong. Remember Python indexing is zero based!")
    raise (msg)
else:
    print("Great!")

try:
    assert (two_to_nine_ele == np.array([-43, 45, 2, 56, 67, 76, 12, -1])).all()
except AssertionError as msg:
    print(
        "The values of 'two_to_nine_ele' are wrong. Remember Python slicing is 'begin inclusive, end exclusive'!"
    )
    raise (msg)
else:
    print("Great!")

To index a multidimensional array, you can index using an operation similar to a 1D array on each dimension, and separate each dimension indexing operation by comma. Let's look at some examples below and make sure that you understand the codes and outputs:

In [None]:
# 2D array
some_2D_array = np.arange(25, 45).reshape((4, 5))
print(
    "Full 2D array:",
    "\n",
    some_2D_array,
    "\n",
)

print("Index the first row")
print(some_2D_array[0, :], "\n")

print("Index the second column")
print(some_2D_array[:, 1], "\n")

print("Index the odd values of the third row")
print(some_2D_array[2, 0::2], "\n")

print("Index the value in the lower right corner and set it to 1")
some_2D_array[-1, -1] = 1
print(some_2D_array, "\n")

In [None]:
# 3D array
some_3D_array = np.arange(40).reshape((2, 5, 4))
print("Full 3D array:", "\n", some_3D_array, "\n")

print(
    "dim1: index the second element, dim2: index the third element, dim3: index the third element"
)
print(some_3D_array[1, 2, 2], "\n")

print("dim1: the first element, dim2: 1-4 elements, dim3: 2-3 elements")
print(some_3D_array[0, 0:4, 1:3], "\n")

print("Comparision operators >,<,==,!= can also be used for indexing")
print(some_3D_array[some_3D_array > 5])

An important operation above is that you can use `:` to select all the elements from a certain dimension. Also, it is important to know that in real practice, each dimension of a multidimensional array represents one attribute of your data (participant, trial, condition, time, etc).

Another common indexing way we'd like to introduce is called Boolean masking. Basically, you create a "boolean_array" (*mask*) to check whether a condition is true or not for each element in "your_array", then you apply the "boolean_array" to "your_array" so that all values that meet the condition can be indexed. See the example below:

In [None]:
three_times = some_2D_array % 3 == 0
print("The boolean_array")
print(three_times, "\n")

print("Result of indexing 'some_2D_array' with the boolean array")
print(some_2D_array[three_times])

Boolean masking is very useful with brain data. Suppose we have a 3D array of the whole brain, but we only need the amygdala area for further analysis. We can create a boolean array where only locations within the amygdala area have "True" values, and apply this array to the whole brain. 

Now try it yourself! Suppose I ran an experiment with 4 blocks, and each block has 30 trials. The reaction time data of 20 subjects were recorded in 5 days, and I stored the data in a 4D array with the shape 20 * 4 * 30 * 5 (subject * blocks * trials * days):
1. select all the data from subject 2 on the third day, and stored it as "sub2_day3". 
2. for "sub2_day3", select all the data where the reaction time is larger then 500 ms, and save the results as "sub2_day3_ex500".
2. select data from all the trials in the last block for subject 10 on all five days, and save it as "sub10_blolast_5day".



In [None]:
np.random.seed(10)  # ignore this for now
exp_data = np.random.randint(150, 600, size=(20, 4, 30, 5))

# YOUR CODE HERE

In [None]:
# Test cell
tests = np.load("Answers/week2.npy", allow_pickle=True)

try:
    assert "sub2_day3" in dir() and "sub2_day3_ex500" and "sub10_blolast_5day" in dir()
except AssertionError as msg:
    print("Could not find all the variables. Did you name them correctly?")
    raise (msg)

try:
    assert (sub2_day3 == tests[0]).all()
except AssertionError as msg:
    print("Your 'sub2_day3' is not the array I expect...")
    raise (msg)
else:
    print("Good!")

try:
    assert (sub2_day3_ex500 == tests[1]).all()
except AssertionError as msg:
    print("Your 'sub2_day3_ex500' is not the array I expect...")
    raise (msg)
else:
    print("Great!")

try:
    assert (sub10_blolast_5day == tests[2]).all()
except AssertionError as msg:
    print("Your 'sub10_blolast_5day' is not the array I expect...")
    raise (msg)
else:
    print("Amazing!")

### 2.3 Numpy functions, methods, and attributes 

#### Function vs. method
NumPy has both convenient functions and methods to summarize array information (as we mentioned last week, methods are just functions applied to the object). Take a look at the example below:

In [None]:
func_std = np.std(some_2D_array)
print("Standard deviation calculator by numpy function:", func_std)

meth_std = some_2D_array.std()
print("Standard deviation calculator by array method:", meth_std)

print("Is func_std identical to meth_std? Answer:", func_std == meth_std)

You can also perform the operation along the specified dimension using the `axis` argument. The value should be set to the axis to be collapsed.

In [None]:
print("Full 2D array:", "\n", some_2D_array, "\n")

some_2D_mean = some_2D_array.mean(axis=1)  # same as np.mean(some_2D_array, axis = 1)
print("Mean of each row:", "\n", some_2D_mean)
print("The row mean is calculated across each column, so the axis is set to 1", "\n")

some_2D_max = some_2D_array.max(axis=0)  # same as np.max(some_2D_array, axis = 0)
print("Maximum of each column:", "\n", some_2D_max, "\n")

some_2D_sum = some_2D_array.sum()  # same as np.sum()
print("Sum of the array:", "\n", some_2D_sum)

Tips: np.sum() is also often used to count how many elements satisfy certain conditions:

In [None]:
print("Number of elements larger than 33 in some_2D_array: ")
print(np.sum(some_2D_array > 33))

Could you calculate the range (maximum - minimum) across the third dimension of "some_3D_array" and name it as "some_3D_range" in the cell below? 

In [None]:
# YOUR CODE HERE

In [None]:
# Test cell
try:
    assert (some_3D_range == 3).all()
except AssertionError as msg:
    print("Did you index the dimension correctly?")
    raise (msg)
else:
    print("Great!")

**Method vs. attritube**

By now, you probably have a sense of whether something is a method or a function. Basically, if you see an object inside the brackets of a word, you would know it is a function. A function is called with `function(object, arguments)`. If you see an object followed by a word with brackets, you would know it is a method. Method is called by `object.method(arguments)`(arguments are optional). The main difference is that method is bound to an object, but function is not. 

Sometimes you will also see an object followed by a word *without brackets*, which is actually **attribute**, and called by  `object.attribute`. Similar to method, attribute is also associated with object and different types of objects in Python have their own attributes. The main difference between attribute and function/method is: attribute represents some **characteristic or property** of the object, it doesn't do anything on the object but only **tells something** about the object. In contrast, function/method **does something** or **performs a task** on the object. For example, *object.sum()* method calculates the sum of the object, whereas *object.shape* attribute tells me the shape of the object, which is a property belonging to the object itself. 

Check out some common numpy attribute below:

In [None]:
print("The size of the array in each dimension:")
print(some_3D_array.shape, "\n")

print("Number of elements:")
print(some_3D_array.size, "\n")

print("Number of dimension:")
print(some_3D_array.ndim, "\n")

print("Data type of elements:")
print(some_3D_array.dtype)

**Common NumPy functions**

Here are some Numpy functions that you'll use a lot in this course. Make sure you understand the code below. You can also Google them or ask chatGPT for more detailed usage. 


`np.arange(start, stop, step)` gives evenly spaced values within a given interval. The start and the step values are optional.

In [None]:
print("Numbers 0 to 6:")
print(np.arange(7), "\n")  # start inclusive, stop exclusive

print("Numbers 5 to 12:")
print(np.arange(5, 12), "\n")

print("All numbers between 200 and 300 in steps of 10:")
print(np.arange(200, 300, 10))

Check out the `np.linspace()` function. How is it diferent from arange? \
Create an array with linspace, and call it as "ls_array" that goes between 0 and 100 (include) with intervals  of 10.

In [None]:
# YOUR CODE HERE

In [None]:
# Test cell
try:
    assert (ls_array == np.array([0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100])).all()
except AssertionError as msg:
    print("Did you create the array with correct intervals?")
    raise (msg)
else:
    print("Good!")

We already used `np.reshape()`in previous cells, which allows us change the shape of the array without changing its data.

In [None]:
my_array = np.array(
    [
        10,
        -43,
        45,
        2,
        56,
        67,
        76,
        12,
        -1,
        2,
        55,
        1,
        2345,
        4,
        345,
        4,
        5,
        67,
        7,
        545,
        3,
        5,
        3564,
        24,
    ]
)
my_array_reshape = np.reshape(my_array, (6, 4))  # same as my_array.reshape((6, 4))
print(my_array_reshape)

`np.sort()` returns a sorted copy of the array.

In [None]:
print("Sort array along the first dimension:")
print(np.sort(my_array_reshape, axis=0), "\n")

print("The array is flattened before sorting:")
print(np.sort(my_array_reshape, axis=None))

## 2.4 Array Mathematics 

Thanks to **vectorized operations**, we can do very fast and efficient math with numpy arrays. Vectorized operations refer to performing operations on entire arrays at once, rather than iterating over individual elements. Let's look at some examples below:

In [None]:
np.random.seed(25)  # ignore this for now
x = np.random.randint(1, 9, 6).reshape(2, 3)
print("The original array:", "\n", x, "\n")

x = x + 1  # a single number is called a scalar, can also be written as x += 1
print("Plus 1:", "\n", x, "\n")

x *= 2  # same as x = x * 2
print("Multiply by 2:", "\n", x, "\n")

x **= 3  # same as x = x ** 3
print("Power 3:", "\n", x)

The same principle applies to calculations between two arrays, where operations are performed between the elements at corresponding locations in each array. See examples below:

In [None]:
np.random.seed(25)  # ignore this for now
x = np.random.randint(1, 9, 6).reshape(2, 3)
y = np.random.randint(1, 10, 6).reshape(2, 3)
print("The original arrays", "\n", x, "\n", "\n", y, "\n")

print("Addition")
print(x + y)
print(np.add(x, y), "\n")

print("Subtraction")
print(x - y)
print(np.subtract(x, y), "\n")

print("Elementwise Multiplication")
print(x * y)
print(np.multiply(x, y), "\n")

print("Elementwise Division")
print(x / y)
print(np.divide(x, y))

Above, we've shown you how to do elementwise operations between two arrays with the same shape. What if the two arrays have different shapes? NumPy has **broadcasting** for this purpose. \
Broadcasting implicitly uses the smaller array several times to match the shape of the larger one, so that  elementwise operations can be realized even if the arrays have different shapes. For example, if I want to add a smaller array of [1 2 3] to the "y" array, I can do so:

In [None]:
z = np.array([1, 2, 3])
print("Array y:", "\n", y, "\n")
print("Array z:", "\n", z, "\n")
print("y+z:", "\n", y + z)

It's equivalent to add the array of [[1 2 3], [1 2 3]] to "y", or add the "z" to each row of the "y" with an explicit loop:

In [None]:
# Add array zz which has the same shape with y
zz = np.array([[1, 2, 3], [1, 2, 3]])
print("Array y:", "\n", y, "\n")
print("Array zz:", "\n", zz, "\n")
print("y+zz:", "\n", y + zz, "\n")

# Add the vector z to each row of the y with an explicit loop
y_plus_z = np.zeros(y.shape)
for i in range(2):
    y_plus_z[i, :] = y[i, :] + z

print("y_plus_z:", "\n", y_plus_z)

As you can see, broadcasting works exactly the same as making multiple copies of the smaller array, but without explicitly doing so. It makes your code more concise, readable, and much faster than looping, which is very important when working with large amounts of data. 

Before using broadcasting, you should check whether arrays are broadcastable. NumPy compares shapes of two arrays elementwise, from the rightmost dimension to the left. Two dimensions are compatible when **they are equal, or one of them is 1**. The two arrays do not have to have the same number of dimnsions, but all existing dimension sets of the two arrays should be compatible to be broadcastable. 

For example, the "y" array above has two dimensions with the shape of (2, 3) and the "z" array has one dimension with the shape of (3, ). The rightmost dimensions of these two arrays are 3 and 3, they are compatible because they are equal. These two arrays are broadcastable because all their existing dimension sets are compatible. 

Check whether the following two arrays are broadcastable or not, and right down your answers as "True or False" in the corresponding variables:
1. A.shape = (8, 1, 6, 1); B.shape = (7, 1, 5)
2. C.shape = (10, 10, 3); D.shape = (3, )
3. E.shape = (3, 1); F.shape = (8, 6, 3)
4. G.shape = (5,1); H.shape = (1, 6)

In [None]:
A_B = # True or False?
C_D = # True or False?
E_F = # True or False?
G_H = # True or False?

In [None]:
# Test cell
tests = np.load("Answers/week2.npy", allow_pickle=True)
answers = [A_B, C_D, E_F, G_H]

try:
    assert answers == tests[3]
except AssertionError as msg:
    print("Please compare the dimensions of above arrays again:(")
    raise (msg)
else:
    print("Good ^_^")

Now let's do something more practical with what you've learned today.

One of the most popular methods for preparing data in machine learning and neural networks is data **standardization**.  It enables us to alter the values of numerical columns in the dataset to a standard scale with a mean of 0 and a standard deviation of 1. You can standardize each column using the equation: data_standarized = (data - mean) / std(data). Broadcasting helps to standardize data efficiently. Write a function called "standardizer" to:
1. Accept only a 2D array to be the input, print messages "Your array is not 2D!" if the input is incorrect, and return None.
2. Standardize each column using the above equation.
4. Use vectorized operations and broadcasting, don't use loops.

In [None]:
# YOUR CODE HERE

In [None]:
# Test cell
test_3D = np.random.randint(10, 50, 40).reshape((2, 4, 5))
out_3D = standardizer(test_3D)

try:
    assert out_3D == None
except ValueError as msg:
    print("Your function doesn’t only allow 2D array as input.")
    raise (msg)

In [None]:
# Test cell
test_2D = np.random.randint(10, 50, 60).reshape((10, 6))
out_2D = standardizer(test_2D)

try:
    np.testing.assert_almost_equal(out_2D.mean(axis=0), np.zeros(test_2D.shape[1]))
except AssertionError as msg:
    print("The mean of the column is not 0")
    raise (msg)

try:
    np.testing.assert_almost_equal(out_2D.std(axis=0), np.ones(test_2D.shape[1]))
except AssertionError as msg:
    print("The standard deviation of the column is not 1")
    raise (msg)
else:
    print("Well Done!")

## 3. Data visualization with Matplotlib

Data visualization is very important for communicating science. It also helps to see what the code does, to avoid coding errors, and to visualize different steps of an analysis with a plot. The standard data visualization library in Python is matplotlib, we usually import it as below: 


In [None]:
from matplotlib import pyplot as plt

Matplotlib uses a procedural style for plotting, where you typically create a plot and then add elements to it. For example, you can first create a figure, and then create a plot area within a figure, and then plot some lines within a plot area, and you can decorate the plot with labels, different colors, line styles, and so on. Now, let's begin with a simple **line plot**.

The `plot` function is used here. It is one of the most important functions of *pyplot* for plotting "y" versus "x" as lines and/or markers. 

In [None]:
# Generate some variables
x = np.linspace(-10, 10)
y = x**2 + 2 * x + 1

# Plot a line
plt.plot(x, y)  # plot x and y using default line style and color
plt.show()  # display figures

You can customise the line with the argument of "[marker][linestyple][color]" as shown below. You can find all available format strings here: https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.plot.html

In [None]:
plt.plot(
    x, y, "+--r"
)  # same as plt.plot(x, y, marker = "+", linestyle = "--", color = 'red' )
plt.show()

Now, let's add the axis labels and a title to the plot:

In [None]:
plt.plot(x, y, "+--r")
plt.xlabel("x axis")
plt.ylabel("y axis")
plt.title(
    "My first plot", fontsize=15, color="pink"
)  # text styles can also be changed with arguments
plt.show()

In addition to a line plot, we can draw a **scatter plot** using the `scatter()` function. See an example below:

In [None]:
animals = np.array(
    ["Dog", "Cat", "Elephant", "Horse", "Rat", "Mouse"]
)  # It's also possible to create a plot using categorical variables
life_exp = np.array([12, 16, 70, 25, 2, 0.1])  # years

plt.scatter(animals, life_exp, color="g")
plt.xlabel("Animal Species", fontsize=15)
plt.ylabel("Weigth (kg)", fontsize=15)
plt.show()

We can also get some **histograms** with the `hist()` function as below:

In [None]:
beta_dist = np.random.beta(
    2, 10, size=1000
)  # generate random numbers with a beta distribution
normal_dist = np.random.normal(
    0.5, 0.2, size=1000
)  # generate random numbers with a normal distribution

# Plot two distributions
plt.hist(
    beta_dist, color="steelblue", alpha=0.5
)  # alpha argument is used for changing transparency
plt.hist(
    normal_dist, color="salmon", alpha=0.5
)  # you can draw multiple plots on a single figure
plt.ylim((0, 300))  # xlim/ylim is used for setting limits of the axes
plt.xlabel("Values")
plt.ylabel("Frequency")
plt.legend(["beta", "normal"])  # place a legend on the axes
plt.title("Distributions", fontsize=15)
plt.show()

All right, it's time to try it yourself! Create a plot by following these steps:
1. Create an array "x" with 100 values from 0 to 10.
2. Create the variable "y1", where y1 = sin(x), and the variable "y2", where y2 = cos(x). (Hint: see numpy math functions)
3. Plot "y1" versus "x" with a dashed green line, "y2" versus "x" with a dotted orange line, and add a legend to differentiate them.
4. Add the x and y labels and a title. You can also customize your graph as you like. 

You can compare your plot to the image we provide as a reference to make sure you are doing everything correctly.

In [None]:
# Run the cell and compare your plot with the plot shown below
image = plt.imread("Answers/sin_cos_plot.png")
plt.imshow(image)
plt.axis("off")
plt.show()

Very well done! We've covered the basics of NumPy and Matplotlib. Don't worry if you cannot remember everything. You'll see and write more code using these two libraries in the coming weeks, and you'll get better with more practice. 

## Exercise 1

We import the iris dataset below. It's a dataset contain 150 flowers' sepal length, sepal width, petal length and petal width as columns. Could you perform the following calcultions?
1. Compute the mean, median, standard deviation of these four columns.
2. Normalize each column so the values range exactly between 0 and 1

In [None]:
from sklearn import datasets

iris = datasets.load_iris()["data"]