Introduction to Artificial Intelligence - Lab Session 1 - 
--
At the end of this session, you will be able to : 
- Create and manage the Jupyter Notebooks environment to run code, insert text and math equations.
- Perform basic matrix manipulations using Numpy. 

Part 1 - Intro to Jupyter Notebook
--
Here, we will only cover the basics. 

Jupyter Notebook is based on the .ipynb format (iPython Notebook), and is essentially a way to do rapid prototyping / demonstrations of scientific python. The basic idea is to define *cells*. 
Cells can be of several types, including python code, or rich text (using [markdown formatting](https://www.markdownguide.org/basic-syntax/)).

When a code cell is evaluated (i.e. the python code will be executed), the output of this evaluation will show up right below the cell. 

When a text cell is evaluated, the text will be formatted. 

You can now do the "User Interface Tour" from the Help menu. 

When working with Jupyter Notebook, you will essentially switch between two modes : 
- The Edit mode in which you edit the content of the cells 
- The Command mode, that enables you to change the cell types. 

When in Command mode, you can select cells. If you select a single cell, you can edit it by simply pressing enter, or double clicking on it. 

For example, try editing THIS CELL and change its content. 

Now, edit the cell below, change the code, and when you're done, press Shift+Enter to evalute the code. 

In [1]:
### CELL TO BE EDITED

a=32
b= 2*a
print("%d + %d"%(a,b))

32 + 64


In [2]:
### CELL TO BE COMPLETED 


In [3]:
### CELL TO BE COMPLETED 


Note that using Jupyter Notebook, if you evaluate a cell with a function followed by a "?" sign, the help of the function will pop up. 

Example : 

In [1]:
import os

os.listdir?

[1;31mSignature:[0m [0mos[0m[1;33m.[0m[0mlistdir[0m[1;33m([0m[0mpath[0m[1;33m=[0m[1;32mNone[0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m
Return a list containing the names of the files in the directory.

path can be specified as either str, bytes, or a path-like object.  If path is bytes,
  the filenames returned will also be bytes; in all other circumstances
  the filenames returned will be str.
If path is None, uses the path='.'.
On some platforms, path may also be specified as an open file descriptor;\
  the file descriptor must refer to a directory.
  If this functionality is unavailable, using it raises NotImplementedError.

The list is in arbitrary order.  It does not include the special
entries '.' and '..' even if they are present in the directory.
[1;31mType:[0m      builtin_function_or_method

You can also display the code of a function using the syntax "??" 

In [2]:
A??

Object `A` not found.


In [6]:
### CELL TO BE COMPLETED 


Part 2 - Introduction to Numpy
--

A code cell can contain any python code, including imports. Let's start by importing the Numpy package. 

In [3]:
import numpy as np

Numpy can be used to generate pseudo-random values from various distributions. In particular, a very useful distribution is the standard normal (zero mean and unit variance). Let's generate two vectors sampled from the normal distribution, using a length parameter that we'll be able to change if needed. 

In [4]:
length = 50

vecA = np.random.randn(length)
vecB = np.random.randn(2*length)

vecA and vecB are numpy *arrays*. One of their attributes can be fetched to check their *shape*

In [5]:
print(vecA.shape)
print(vecB.shape)

(50,)
(100,)


In [6]:
print(vecA)

[ 1.31786317 -0.93286426 -1.5382329   2.50308988 -0.82139306  0.71802331
  0.74947324  0.28759989 -0.9622208  -1.38371785 -0.18745742  1.53812372
  0.19166718 -0.90937633 -0.51973173  1.87174708 -0.41917633  0.96904468
 -0.00341403  0.10015533  1.9041551   1.16280334  0.56306441  0.2292018
 -0.16539648 -0.47499637  0.76071638  0.71813539  0.38904562 -2.7099715
  0.45307036 -0.67536297 -0.88765248  1.31678023 -0.71567019 -0.35706373
  0.48241324 -0.13984262  1.70859898 -0.92693795 -0.76090749  0.50625094
 -1.35076117  0.19588787  0.23293266 -0.65288279 -0.65729463 -0.38977148
  1.62682776  1.02633145]


Numpy arrays can be vectors as well as matrices, or any tensor. For example the following code will create tensors with 3 dimensions using the standard normal

In [12]:
arrayC = np.random.randn(3,10,4)
print(arrayC.shape)
print(arrayC)

(3, 10, 4)
[[[-1.37216677e+00 -1.48122031e+00 -1.35514767e-01 -1.49115367e+00]
  [-6.63632152e-01 -9.47742100e-01  4.42305372e-01  3.98986146e-01]
  [ 1.36424954e+00 -2.40759712e-01  6.08780433e-02 -5.39335600e-01]
  [ 9.71438665e-01  1.33901341e-01  6.43648909e-01  2.03383274e-01]
  [ 7.30484269e-01  5.75025597e-02 -2.11964236e+00 -2.29262363e-01]
  [-1.06963882e-01 -1.84871163e-01 -3.66894687e-01 -4.84100716e-01]
  [-5.00955700e-01  6.01096987e-01  2.05725414e+00 -1.41327286e+00]
  [ 1.07240130e+00  3.91823323e-01 -4.01109663e-01 -1.05706853e+00]
  [ 9.92504832e-01  6.89879409e-01 -1.04684488e+00 -8.39445069e-01]
  [ 4.25671203e-01  1.29986652e+00 -4.63912397e-02 -2.08775754e+00]]

 [[ 1.17256330e+00  4.74636363e-01  1.11601624e-01  3.49881982e-01]
  [-6.85652119e-01  1.35024190e+00  1.54809411e+00  7.44187976e-02]
  [ 6.98852913e-01 -1.34103331e+00  8.08537074e-02  1.81574815e+00]
  [-1.25027991e+00 -3.24597804e-01 -5.81419309e-01 -2.20264110e+00]
  [ 6.17464865e-01 -6.17944406e-01 

Note that the random package of Numpy has several other interesting functions. Try to test the two functions proposed in the cell below. 

Try uncommenting the two functions below one by one, look up their help page, and try to use them. 

In [12]:
### CELL TO BE Edited 

#np.random.randint
#np.random.permutation

In [10]:
arrayD = np.random.randint(3,500,4)
print(arrayD.shape)
print (arrayD)

(4,)
[497 253 190 341]


A very important features of arrays is the fact they can be used as *iterables*. For example, you can iterate over the dimensions of an array by simply "looping" over it using a *for* loop

In [13]:
for curdim in arrayC:
    print(curdim.shape)

(10, 4)
(10, 4)
(10, 4)


Also possible to enumerate along the dimension in order to get the index of the current "smaller" array


In [14]:
print('Initial shape is %d %d %d' % (arrayC.shape[0],arrayC.shape[1],arrayC.shape[2]))
print('Iterating over the first dimension using an index k')
for k,curdim in enumerate(arrayC):
    print('k = %d, shape is %d %d' % (k,curdim.shape[0],curdim.shape[1]))

Initial shape is 3 10 4
Iterating over the first dimension using an index k
k = 0, shape is 10 4
k = 1, shape is 10 4
k = 2, shape is 10 4


Use the previous principle in order to calculate the average of each 500x4 subvector, using the function np.mean()

In [15]:
### CELL TO BE COMPLETED 


Check that you obtain the same result when directly computing the average over the two axis 1 and 2 (look up the arguments of np.mean) 

In [16]:
### CELL TO BE COMPLETED 


These features will prove to be very useful when manipulate large arrays. 

Another important operation when working with Numpy Arrays is *reshaping*. Essentially, *reshaping* consists in changing the organisation of the array (in terms of dimension), while keeping the same number of elements. For example, a 20x10 2D array can be converted into a 4x5x10 array

In [17]:
A = np.random.randint(1,5,(10,20))
print('Initial shape of A is %d x %d' % (A.shape[0],A.shape[1]))
print(A)
B = A.reshape((4,5,10))
print('B is A reshaped to %d x %d x %d' % (B.shape[0],B.shape[1],B.shape[2]))
print(B)

Initial shape of A is 10 x 20
[[3 2 4 3 4 2 1 1 3 1 3 1 3 1 2 4 2 1 3 1]
 [4 4 4 2 4 2 4 4 2 1 4 2 2 3 3 1 3 3 2 4]
 [2 4 1 1 1 1 3 4 3 4 1 4 1 2 4 4 1 2 3 2]
 [4 1 2 1 3 4 4 1 2 2 3 1 4 1 4 2 3 4 1 4]
 [1 1 3 4 2 4 4 4 1 1 1 4 3 1 3 4 1 3 3 1]
 [1 3 4 2 3 4 4 1 4 4 2 2 1 1 1 1 4 2 1 1]
 [2 1 2 4 1 1 2 1 4 2 1 1 1 3 4 1 4 4 2 2]
 [4 3 4 4 4 1 4 2 1 2 2 3 3 4 4 1 4 4 2 4]
 [2 3 4 4 4 3 1 2 1 4 4 4 2 4 3 3 1 2 3 1]
 [3 1 2 1 2 4 1 4 4 3 4 3 2 1 3 2 4 3 3 4]]
B is A reshaped to 4 x 5 x 10
[[[3 2 4 3 4 2 1 1 3 1]
  [3 1 3 1 2 4 2 1 3 1]
  [4 4 4 2 4 2 4 4 2 1]
  [4 2 2 3 3 1 3 3 2 4]
  [2 4 1 1 1 1 3 4 3 4]]

 [[1 4 1 2 4 4 1 2 3 2]
  [4 1 2 1 3 4 4 1 2 2]
  [3 1 4 1 4 2 3 4 1 4]
  [1 1 3 4 2 4 4 4 1 1]
  [1 4 3 1 3 4 1 3 3 1]]

 [[1 3 4 2 3 4 4 1 4 4]
  [2 2 1 1 1 1 4 2 1 1]
  [2 1 2 4 1 1 2 1 4 2]
  [1 1 1 3 4 1 4 4 2 2]
  [4 3 4 4 4 1 4 2 1 2]]

 [[2 3 3 4 4 1 4 4 2 4]
  [2 3 4 4 4 3 1 2 1 4]
  [4 4 2 4 3 3 1 2 3 1]
  [3 1 2 1 2 4 1 4 4 3]
  [4 3 2 1 3 2 4 3 3 4]]]


One property of numpy that is really important is broadcasting. The goal of broadcasting is to simplify the vectorization of certain operations when the vectors do not have the same shape. For example you can easily perform element-wise multiplication.

To test this try doing an element-wise multiplication of the vector x and matrix y below

In [16]:
x = np.array([2,3])
y = np.array([[4,1],[9,10],[12,13]])
result = x*y
print("X: ",x)
print("Y: ")
print(y)
print("X shape is: ",x.shape)
print("Y shape is: ",y.shape)
print("Element-wise multiplication shape:", result.shape)
print("Element-wise multiplication result:")
print(result)


X:  [2 3]
Y: 
[[ 4  1]
 [ 9 10]
 [12 13]]
X shape is:  (2,)
Y shape is:  (3, 2)
Element-wise multiplication shape: (3, 2)
Element-wise multiplication result:
[[ 8  3]
 [18 30]
 [24 39]]


Another very powerful tool in numpy is indexing. You can use either an integer vector or a boolean vector to choose which indexes you want to extract from your numpy tensor.

Consider that we want to extract all elements from the first line of your vector y that have a higher value than 1, you would have to do:

In [19]:
first_row = y[0]
print(first_row)
first_row_higher_than_one = first_row > 1
print("Result: ", first_row[first_row_higher_than_one])

[4 1]
Result:  [4]


You can also choose specific lines to query, for example if you want to query lines 0 and 2

In [21]:
rows = [0,2]
rows_result = y[rows]
values_higher_than_one = rows_result > 1
print("Result: ", rows_result[values_higher_than_one])

Result:  [ 4 12 13]


You can also save and load your numpy tensors using np.savez and np.load. This will be really important in the next courses as this enable you to generate your data only once instead of having to do all the calculations every time you need your data.

In [20]:
filename = "x.npz"
source_tensor = x
np.savez(filename,data=source_tensor)

In [15]:
loaded_npz = np.load(filename)
loaded_tensor = loaded_npz["data"]
print("Your tensor was loaded and contains: ", loaded_tensor)

Your tensor was loaded and contains:  [2 3]
