# Talk 1

Objective of session:

> Get a working setup with Jupyter notebooks, making sure to have `numpy`, `scikit-learn`, `jax`, `jupyter`and `matplotlib`  packages installed. Can do this locally (more effort, but then don't need an internet connection) or via [Google Colab](https://colab.research.google.com/) (much less effort to set up, but need internet connection and a Google account; Colab also provides limited GPU access which will be useful for neural network training later on in the reading group). 

To do this, there are 2 options for installion:

1. **Easiest**: Setting up a Google Colab as an alternative (also show how you can change between CPU, GPU, and TPU usage on Colab by going into the "Runtime" menu option). [Google Collab](https://colab.research.google.com/). Copy and Paste this file there, and you will be ready to go! All packages will be available. 

2. **Local Copy**: Firstly, install [Python](https://www.python.org/downloads/). Then download [VS Code](https://code.visualstudio.com/). Open up VS Code, open up a terminal. Now its time to download packages. 

*Option 1*: pip install them. Type seperatley and then hit enter with the following statements into the terminal: `pip install numpy`, then `pip install -U scikit-learn`, `pip install -U jax`, `pip install jupyter` and `pip install -U matplotlib`. 

*Option 2*: You can download [Anaconda](https://www.anaconda.com/download) to manage the packages and find and download `numpy`, `scikit-learn`, `jax`, `jupter` and `matplotlib`. 

Rmk: In either case, explain difference between "Code" blocks and "Text" blocks in Jupyter notebook. Explain the fact that you can run cells in any order, and there is a "global state" where all the variables live. Encourage people to regularly test their notebooks by clicking "Restart" (deletes all the old variables) and rerun all cells in order to avoid [problems](https://erikjandevries.medium.com/when-and-how-jupyter-notebooks-fail-and-what-to-use-instead-a52c27dbaa4c).

In [2]:
# Import the packages we will use throughout the reading group

import jax.numpy as jnp
import numpy as np
import sklearn.datasets

In [4]:
# First, demonstrate the use of `numpy`, which allows us to use vectors, matrices, and n-dimensional arrays.
# We'll do a simple matrix multiplication:

A = np.array([[2, 0], [1, 1]])
print("A:")
print(A)
print(A.shape)

x = np.array([2, 0])
print("x:")
print(x)
print(x.shape)

# Matrix multiplication is done via the @ operator:

y = A @ x
print("y:")
print(y)
print(y.shape)

# BEWARE: don't use * for matrix multiplication
# numpy does something called "broadcasting" to multiply things of different
# shapes---details not relevant, but just remember * and @ are not the same.
y_wrong = A * x
print("y_wrong:")
print(y_wrong)
print(y_wrong.shape)

A:
[[2 0]
 [1 1]]
(2, 2)
x:
[2 0]
(2,)
y:
[4 2]
(2,)
y_wrong:
[[4 0]
 [2 0]]
(2, 2)


In [5]:
# Next, we'll test `jax`. You can think of this as a version of `numpy` that allows Python to automatically differentiate
# the functions we define. This will be very useful later on; but we won't use `jax` for the first few weeks.
# The `jax` version of `numpy` looks just the same from the outside, though:

A = jnp.array([[1, 2], [0, 1]])
x = jnp.array([2, 0])
y = A @ x
print(y)
print(y.shape)

[2 0]
(2,)


In [6]:
# Finally, we'll test `scikit-learn`. This is a package that contains lots of prebuilt machine-learning algorithms
# that we can just use "out of the box" without needing to understand the details of each method.
# It also contains some datasets that we can play with, e.g., the "Linnerud" dataset, which contains physiological data from 20
# people, which we can think of as vectors in R^3: weight, waist, and pulse. It also contains 20 corresponding vectors in R^3, the
# number of chin-ups, sit-ups, and jumps that those people could do.
#
# The dataset is set up to use the exercise variables as the known quantities, which we call X, to predict the physiological quantities,
# which we call y


X, y = sklearn.datasets.load_linnerud(return_X_y=True)
# The values are chin-ups, sit-ups, jumps respectively
print("X")
print(X)
# The values are weight, waist, and pulse, respectively.
print("y")
print(y)

# We can access a specific value like so:
# sit-ups (index 1 in the second dimension) of the 11th person (index 10 in the first dimension)
# [!] remember Python arrays start at index 0.
print(X[10, 1])

X
[[  5. 162.  60.]
 [  2. 110.  60.]
 [ 12. 101. 101.]
 [ 12. 105.  37.]
 [ 13. 155.  58.]
 [  4. 101.  42.]
 [  8. 101.  38.]
 [  6. 125.  40.]
 [ 15. 200.  40.]
 [ 17. 251. 250.]
 [ 17. 120.  38.]
 [ 13. 210. 115.]
 [ 14. 215. 105.]
 [  1.  50.  50.]
 [  6.  70.  31.]
 [ 12. 210. 120.]
 [  4.  60.  25.]
 [ 11. 230.  80.]
 [ 15. 225.  73.]
 [  2. 110.  43.]]
y
[[191.  36.  50.]
 [189.  37.  52.]
 [193.  38.  58.]
 [162.  35.  62.]
 [189.  35.  46.]
 [182.  36.  56.]
 [211.  38.  56.]
 [167.  34.  60.]
 [176.  31.  74.]
 [154.  33.  56.]
 [169.  34.  50.]
 [166.  33.  52.]
 [154.  34.  64.]
 [247.  46.  50.]
 [193.  36.  46.]
 [202.  37.  62.]
 [176.  37.  54.]
 [157.  32.  52.]
 [156.  33.  54.]
 [138.  33.  68.]]
120.0
