# APPM 5720 Biweekly Report
### *Alexey Yermakov*
### *September 1, 2022*

# Summary
I am already very comfortable with using git and GitHub, so I spent the past week and a half with Tensorflow as well as getting my development environment set up for the class.

# Main Content

## August 24:
 - This Summer I was messing around with Maziar's [pinned repositories](https://github.com/maziarraissi) in GitHub. In doing so, I was able to figure out which software I need installed locally when running Tensorflow. I made a [pull request](https://github.com/maziarraissi/DeepHPMs/pull/3) in one of his repositories explaining the [software](https://github.com/maziarraissi/DeepHPMs/blob/4cb7a4d33e4315ed20c091b91e2f1298e37133e2/README.md) that needs to be installed and the versions. Note that my pull request has older versions of the software listed since Maziar's code is based on an older version of Tensorflow. I was able to download this older version of the software using [Anaconda](https://www.anaconda.com/). Today, my goal was to get the latest version of Tensorflow working locally. Trying Anaconda again didn't lead to a successful installation, so I used [Docker](https://docs.docker.com/get-docker/) instead, since it is [recommended](https://www.tensorflow.org/install/docker) by Tensorflow as the easiest way to get GPU support.

 - The installation process went as follows for me on my desktop running Arch Linux with an `Intel I5-6600K` processor and `NVIDIA GeForce GTX 1070` GPU:
   - First, I installed Docker by running `sudo pacman -S docker`.
   - Then, I started the Docker daemon by running `sudo systemctl start docker.service`. Note that this needs to be run every time I restart my computer, but if I want it to start on boot I would instead run `sudo systemctl enable --now docker.service`.
   - I already had Nvidia's proprietary drivers installed on my machine, but instructions can be found [here](https://wiki.archlinux.org/title/NVIDIA). I verified this was the case by running `lsmod | grep nvidia` and observed the `nvidia` kernel module was loaded.
   - I installed the [NVIDIA Container Toolkit](https://github.com/NVIDIA/nvidia-docker) by installing [libnvidia-container-tools](https://aur.archlinux.org/packages/libnvidia-container-tools) and then [nvidia-container-toolkit](https://aur.archlinux.org/packages/nvidia-container-toolkit) from the AUR (Arch User Repository).
   - I added myself (`[user]`) to the `docker` group so that I don't need to be root to run docker commands by running `usermod -a -G docker [user]` (note: you will need to either reboot to have the group appear in the `groups` command or you can run `su - [user]`).
   - I then pulled a docker image with GPU and Jupyter notebook support by running `docker pull tensorflow/tensorflow:latest-gpu-jupyter`.
   - Finally, I had to do a lot of tinkering with the command to start the container so that I could access Jupyter from within the container. The command is:\
   `docker run -u $(id -u):$(id -g) -v '[Host]':'[Guest]' --network host --name [Name] --gpus all tensorflow/tensorflow:latest-gpu-jupyter`
   - Where, 
     - `docker run ... tensorflow:latest-gpu-jupyter` starts the container.
     - `-u $(id -u):$(id -g)` runs the container as the current user (to avoid running as root)
     - `-v '[Host]':'[Guest]'` mounts the absolute path `[Host]` on my computer to the absolute path `[Guest]` in the container. This allows me to work out of a permanent folder, since restarting the container will reset all files which aren't external and I'll lose all of my progress.
     - `--network host` makes the container's networking the same as my host machine, allowing me to access the Jupyter container easily. I think there's a better way to do this but I couldn't figure it out.
     - `--gpus all` lets the container access my GPU (this is why the NVIDIA Container Toolkit was installed).
     - `--name [Name]` names the container `Name`. This is useful for setting up an alias to join the docker container since you can expect the name to be the same, otherwise the name is randomized.
   - This then allows me to access the Jupyter server running in my container in my computer's browser.
   - I can access the container via bash by running `docker exec -it [Name] bash` where `[Name]` is the container's name, obtained from inspecting `docker container ls` (or set from using the `--name [Name] option from above`).
   - Running one of the tutorial notebooks in the container and observing my graphics card usage with `nvtop` showed me the graphics card was correctly being used in Tensorflow.
   - To run on CPU only, remove `--gpus all` and change `tensorflow:latest-gpu-jupyter` to `tensorflow:latest-jupyter`. I followed my own guide to successfully install tensorflow on my laptop.

## August 25
 - The [Tensorflow website](https://www.tensorflow.org/install/docker) I followed contains a link to [Tensorflow tutorials](https://www.tensorflow.org/tutorials). Today, I started working through them.

### [Tutorial 1](https://www.tensorflow.org/guide/keras/sequential_model): `sequential_model.ipynb`

- Running the notebook gave me a GPU related error (sorry, forgot to copy the error message `sad face`). Some quick [internet searching](https://www.tensorflow.org/guide/gpu) led me to the following code snippet which resolved my problem:
```python
gpus = tf.config.list_physical_devices('GPU')
if gpus:
  # Restrict TensorFlow to only allocate 1GB of memory on the first GPU
  try:
    tf.config.set_logical_device_configuration(
        gpus[0],
        [tf.config.LogicalDeviceConfiguration(memory_limit=1024)])
    logical_gpus = tf.config.list_logical_devices('GPU')
    print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")
  except RuntimeError as e:
    # Virtual devices must be set before GPUs have been initialized
    print(e)
```
 - I also had to `restart` another notebook's kernel which was using a lot of my GPU's memory. Though I can see this setting is limited my GPU usage, it isn't precise... The notebook uses 1439 MiB whereas the limit is 1024 MB (~977 MiB).

In [1]:
## Setup for tensorflow
import tensorflow as tf
import numpy as np
import math
from tensorflow import keras
from tensorflow.keras import layers

gpus = tf.config.list_physical_devices('GPU')
if gpus:
  # Restrict TensorFlow to only allocate 1GB of memory on the first GPU
  try:
    tf.config.set_logical_device_configuration(
        gpus[0],
        [tf.config.LogicalDeviceConfiguration(memory_limit=1024*4)])
    logical_gpus = tf.config.list_logical_devices('GPU')
    print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")
  except RuntimeError as e:
    # Virtual devices must be set before GPUs have been initialized
    print(e)


1 Physical GPUs, 1 Logical GPUs


 - The first thing this notebook does is define a "Sequential Model", which looks something like this:

<img src="https://raw.githubusercontent.com/yyexela/5720-public/master/Report1/Images/SequentialModel.jpg" alt="Diagram of a sequential model" width="700"/>

 - Where "A Sequential model is appropriate for a plain stack of layers where each layer has exactly one input tensor and one output tensor." The image shows a "dense" neural network model, where each node in one layer is connected to every other node in neighboring layers. This is the kind of network that is created in this tutorial, implying there are non-dense and non-stack neural networks as well. Note that the exact network in the notebook is 3 layers of sizes 2, 3, and 4 respectively with the first 2 layers having the "rectified linear unit activation function" ("relu" for short).
 - An [activation function](https://deepai.org/machine-learning-glossary-and-terms/activation-function) is used to introduce nonlinearity to a neural network, giving them the ability to "learn" complex tasks that you wouldn't necessarily be able to learn without it. The link in the previous sentence claims that neural networks without activation functions can be decomposed into a single matrix acting on the input, which is really cool and I believe the proof is simple: every layer is effectively a matrix, so all layers can be matrix-multiplied together to get a single matrix acting on input and resulting in an output.
 - A diagram showing a "feed-forward" neural network is obtained from [here](https://deepai.org/machine-learning-glossary-and-terms/activation-function):

<img src="https://raw.githubusercontent.com/yyexela/5720-public/d17b2794fdb4d7f4a1f1260765d14804f84ea415/Report1/Images/FeedForward.svg" alt="Diagram of a feed forward neural network" width="700"/>

 - The curvy line in a circle is the activation function. Take a look at layer 2. Each neuron is obtaining the values of the previous layer's neurons after being passed in the previous layer's activation function. Layer 2's neurons then take these values and compute their values with the following function:

<img src="https://raw.githubusercontent.com/yyexela/5720-public/master/Report1/Images/NeuronFunction.jpg" alt="Formula for a neuron" width="300"/>

 - Where the function `f` is the activation function and `B` is some bias. So each successive neuron takes the values of the previous layer's neurons to dtermine it's own value. Though it looks like [Keras'](https://keras.io/api/layers/core_layers/dense/) `Dense` layer puts the bias inside the activation function:

<img src="https://raw.githubusercontent.com/yyexela/5720-public/master/Report1/Images/NeuronFunction2.jpg" alt="Formula for a neuron" width="300"/>

In [2]:
# Option 1 to define the model
model = keras.Sequential(
    [
        layers.Dense(2, activation="relu"),
        layers.Dense(3, activation="relu"),
        layers.Dense(4),
    ]
)

model.layers

[<keras.layers.core.dense.Dense at 0x7fa6104cc670>,
 <keras.layers.core.dense.Dense at 0x7fa6104cc4f0>,
 <keras.layers.core.dense.Dense at 0x7fa6104cc490>]

In [3]:
# Option 2 to define the model
model = keras.Sequential()
model.add(layers.Dense(2, activation="relu"))
model.add(layers.Dense(3, activation="relu"))
model.add(layers.Dense(4))

model.layers

[<keras.layers.core.dense.Dense at 0x7fa5b02dd790>,
 <keras.layers.core.dense.Dense at 0x7fa6105777f0>,
 <keras.layers.core.dense.Dense at 0x7fa5b02ddf10>]

In [4]:
# The last layer added can be removed
model.pop()
print(len(model.layers))  # 2

2


In [5]:
# Layers and models can be named
model = keras.Sequential(name="my_sequential")
model.add(layers.Dense(2, activation="relu", name="layer1"))
model.add(layers.Dense(3, activation="relu", name="layer2"))
model.add(layers.Dense(4, name="layer3"))

In [6]:
# Summaries about models can also be printed once the model is "built"
#   (in this case, this means the weights have been defined, which is a result of 
#    calling the model with input)
x = tf.ones((1,4))
y = model(x)
model.summary()

Model: "my_sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 layer1 (Dense)              (1, 2)                    10        
                                                                 
 layer2 (Dense)              (1, 3)                    9         
                                                                 
 layer3 (Dense)              (1, 4)                    16        
                                                                 
Total params: 35
Trainable params: 35
Non-trainable params: 0
_________________________________________________________________


## August 28
  - The Keras API documentation can be found [here](https://keras.io/api/).
  - Continuing through the tutorial was mostly straight forward, but I had trouble parsing the following code:

In [7]:
layer = layers.Dense(3)
layer.weights
x = tf.ones((1, 4))
y = layer(x)
layer.weights

[<tf.Variable 'dense_6/kernel:0' shape=(4, 3) dtype=float32, numpy=
 array([[-0.19412315,  0.025047  , -0.2495017 ],
        [ 0.8338394 , -0.693617  , -0.05902171],
        [-0.3508405 ,  0.64317787,  0.5348314 ],
        [ 0.7090317 , -0.50605094, -0.10464364]], dtype=float32)>,
 <tf.Variable 'dense_6/bias:0' shape=(3,) dtype=float32, numpy=array([0., 0., 0.], dtype=float32)>]

 - What this code is doing is first making a `Dense` layer with `3` neurons. This layer initially has no weights associated with it, they are only defined after a tensor is passed into it. Then, a `(1,4)` sized-tensor is passed into the layer, resulting in the weights to be of size `(4,3)`, since the layer now knows the input size and output size, so by basic matrix multiplication we see that `(1,4)x(4,3)=(1,3)`, which is the number of neurons in the layer (and the number of generated weights).
 - If you were curious like me, you might wonder how the weights are randomized? It turns out that `kernel_initializer` value when creating the `Dense` layer controls that. By default, it is the `GlorotUniform` initializer (see [here for `GlorotUniform`](https://keras.io/api/layers/initializers/#glorotuniform-class) and [here for `Dense` layer initialization](https://keras.io/api/layers/core_layers/dense/)). Biases are randomly selected as well, and the way they are selected can be found by looking at `Dense` layer class as well. The weights can also be manually set with [the `set_weights` method](https://keras.io/api/layers/base_layer/#set_weights-method). The call to `layer(x)` produces random weights once and then the weights are the same from then on unless a new layer is created or the weights are manually changed.
 - The rest of the tutorial seemed much more technical that I'm comfortable with so far, so I wasn't able to follow along very well despite reading it all. I will revisit it later as I learn more about Tensorflow.

 ## August 29
 ### [Tutorial 2](https://www.tensorflow.org/guide/basics): `basics.ipynb`
  - The first bit of code threw me for a loop:"

In [8]:
x = tf.constant([[1., 2., 3.],
                 [4., 5., 6.]])

print(x)
print(x.shape)
print(x.dtype)
print(x[0][1]) # We can get individual elements
# print(x.op) # But we can't get the entire array (THIS WILL FAIL), but maybe it doesn't matter

tf.Tensor(
[[1. 2. 3.]
 [4. 5. 6.]], shape=(2, 3), dtype=float32)
(2, 3)
<dtype: 'float32'>
tf.Tensor(2.0, shape=(), dtype=float32)


 - This shows us how a constant tensor is created and extracts its parameters. I wanted to extract the tensor's values, not just the parameters. So, I checked [this documentation](https://www.tensorflow.org/api_docs/python/tf/Tensor) and tried `print(x.op)`, but it turns out this is not allowed "when eager execution is enabled". What is "eager execution"? Well, [this StackOverflow page](https://stackoverflow.com/questions/58112355/what-exactly-is-eager-execution-from-a-programming-point-of-view) explains it pretty well, but it doesn't help me understand *why* I can't just pull the array out of the tensor. Instead, numpy can be used in this case: `np.array(x)` will return a numpy array of the tensor's data. Thanks Google (developers of Tensorflow).

In [9]:
arr = np.asarray(x)

print("Numpy array")
print(arr)
print()

print("Numpy element")
print(arr[0][1]) # First row, second element of numpy array
print()

print("Tensorflow element")
print(x[0][1]) # First row, second element of tensorflow tensor
print()

Numpy array
[[1. 2. 3.]
 [4. 5. 6.]]

Numpy element
2.0

Tensorflow element
tf.Tensor(2.0, shape=(), dtype=float32)



In [10]:
# Matrix addition
x + x

<tf.Tensor: shape=(2, 3), dtype=float32, numpy=
array([[ 2.,  4.,  6.],
       [ 8., 10., 12.]], dtype=float32)>

In [11]:
# Element-wise multiplication
20 * x

<tf.Tensor: shape=(2, 3), dtype=float32, numpy=
array([[ 20.,  40.,  60.],
       [ 80., 100., 120.]], dtype=float32)>

In [12]:
# Showing the two matrices being multiplied to manually verify matrix multiplication
print("Matrix A:")
print(x)
print()

print("Matrix B:")
print(tf.transpose(x))
print()

# Matrix multiplication
print("A*B:")
print(x @ tf.transpose(x))

Matrix A:
tf.Tensor(
[[1. 2. 3.]
 [4. 5. 6.]], shape=(2, 3), dtype=float32)

Matrix B:
tf.Tensor(
[[1. 4.]
 [2. 5.]
 [3. 6.]], shape=(3, 2), dtype=float32)

A*B:
tf.Tensor(
[[14. 32.]
 [32. 77.]], shape=(2, 2), dtype=float32)


In [13]:
# Different ways to concatenate matrices

# Side by side
print("Side by side:")
print(tf.concat([x, x, x, x], axis=1))
print()

# On top of each other
print("On top of each other:")
print(tf.concat([x, x, x, x], axis=0))

Side by side:
tf.Tensor(
[[1. 2. 3. 1. 2. 3. 1. 2. 3. 1. 2. 3.]
 [4. 5. 6. 4. 5. 6. 4. 5. 6. 4. 5. 6.]], shape=(2, 12), dtype=float32)

On top of each other:
tf.Tensor(
[[1. 2. 3.]
 [4. 5. 6.]
 [1. 2. 3.]
 [4. 5. 6.]
 [1. 2. 3.]
 [4. 5. 6.]
 [1. 2. 3.]
 [4. 5. 6.]], shape=(8, 3), dtype=float32)


 - Here, I am also introduced to the `tf.nn.softmax` function, which [normalizes the input](https://en.wikipedia.org/wiki/Softmax_function) vector by making the output values in (0,1) and sum to 1 based on their exponentials by using the following formula for a `K`-dimensional vector:


<img src="https://github.com/yyexela/5720-public/blob/master/Report1/Images/Softmax.jpg" alt="Standard softmax formula" width="300"/>

 - The base can be changed to any exponential, the larger it is the more weight is put on larger input values, and vice versa. Bases in `(0,1)` make smaller values larger in the output. The Tensorflow function `tf.nn.softmax` just uses `e` as the base, though (verified manually, couldn't find it in the documentation), and for tensors you need to specify the axis for the vectors (for example, in a 2D tensor you either compute `softmax` per row or per column).
 

In [14]:
# Softmax, base 'e'
print("Softmax:")
print(tf.nn.softmax(x, axis=1))
print()

# Inspection that the values are correct:
print("Manual computation of one row:")
print([math.e**i/(math.e**1+math.e**2+math.e**3) for i in range(1,4,1)])

Softmax:
tf.Tensor(
[[0.09003057 0.24472848 0.6652409 ]
 [0.09003057 0.24472848 0.6652409 ]], shape=(2, 3), dtype=float32)

Manual computation of one row:
[0.09003057317038046, 0.24472847105479764, 0.6652409557748218]


In [15]:
# reduce_sum adds all elements of a tensor
x_s = tf.nn.softmax(x, axis=1)
tf.reduce_sum(x_s)

<tf.Tensor: shape=(), dtype=float32, numpy=1.9999998>

In [16]:
# Turn an array into a Tensor
tf.convert_to_tensor([1,2,3])

<tf.Tensor: shape=(3,), dtype=int32, numpy=array([1, 2, 3], dtype=int32)>

 - Automatic differentiation is also introduced, which is just machine-precise differentiation. See:

In [17]:
x = tf.Variable(1.0)

def f(x):
  y = x**2 + 2*x - 5
  return y

with tf.GradientTape() as tape:
  tape.watch(x)
  y = f(x)

g_x = tape.gradient(y, x)  # g(x) = dy/dx
print(g_x) # df/dx (1) is 2x + 2 evaluated at 1, which is 4

tf.Tensor(4.0, shape=(), dtype=float32)


 - I thought this syntax was strange, especially with the `tape`, but it [turns out](https://stackoverflow.com/questions/53953099/what-is-the-purpose-of-the-tensorflow-gradient-tape) that this syntax is necessary with eager execution since originally Tensorflow wasn't really built for working with notebooks. The [with](https://stackoverflow.com/questions/1369526/what-is-the-python-keyword-with-used-for) statement is something I haven't seen before either, but it's just a fancy wrapper for `try/finally` when objects support it.

 - Another important thing I learned is that `tf.Tensor` objects are *immutable*, once they are created they cannot be modified. Then, `tf.Variable` objects are needed, which *are* mutable:

In [18]:
# imm: immutable (Tensor)
# mut: mutable (Variable)
t_imm = tf.constant([[1,2],[3,4]])
t_mut = tf.Variable([[1,2],[3,4]])

print("Before assignment:")
print("Tensor:\n", t_imm, "\n")
print("Variable:\n", t_mut, "\n")

#t_imm[0][0] = 5 # THIS WILL FAIL (not valid syntax)
#t_mut[0][0] = 5 # THIS WILL FAIL (not valid syntax)
#t_imm.assign([5,5]) # THIS WILL FAIL (cannot assign to immutable Tensor)
#t_mut.assign([5,5]) # THIS WILL FAIL (needs to be the same shape)
t_mut.assign([[5,5],[5,5]]) # OK

print("After assignment:")
print("Tensor:\n", t_imm, "\n")
print("Variable:\n", t_mut, "\n")

#t_imm.assign_add(1) # THIS WILL FAIL (immutable Tensor)
#t_mut.assign_add(1) # THIS WILL FAIL (needs to be the same shape)
t_mut.assign_add([[1,2],[3,4]]) # OK

print("After addition:")
print("Tensor:\n", t_imm, "\n")
print("Variable:\n", t_mut, "\n")

Before assignment:
Tensor:
 tf.Tensor(
[[1 2]
 [3 4]], shape=(2, 2), dtype=int32) 

Variable:
 <tf.Variable 'Variable:0' shape=(2, 2) dtype=int32, numpy=
array([[1, 2],
       [3, 4]], dtype=int32)> 

After assignment:
Tensor:
 tf.Tensor(
[[1 2]
 [3 4]], shape=(2, 2), dtype=int32) 

Variable:
 <tf.Variable 'Variable:0' shape=(2, 2) dtype=int32, numpy=
array([[5, 5],
       [5, 5]], dtype=int32)> 

After addition:
Tensor:
 tf.Tensor(
[[1 2]
 [3 4]], shape=(2, 2), dtype=int32) 

Variable:
 <tf.Variable 'Variable:0' shape=(2, 2) dtype=int32, numpy=
array([[6, 7],
       [8, 9]], dtype=int32)> 



# MISC Content

## Getting Latex to Work
 - For this report I wanted to be able to convert latex code to something I could put in this MarkDown report.
 - I [installed](https://wiki.archlinux.org/title/TeX_Live) the following packages using `sudo pacman -S {package name}`: `texlive-core` (for regular Latex) and `texlive-latexextra` (for `amsmath`).
 - Here is a sample Latex file I wanted to compile:

```latex
\documentclass[]{standalone}
\usepackage{amsmath}

\begin{document}

$\sum_{1}^{2}3$

\end{document}
```

 - The `standalone` document class makes the document box only contain the formula I want to save to a file (instead of having the formula in a standard-sized page).
 - The command I use to convert from the Latex file to a jpg:\
 `pdflatex [FILE].tex && latexmk -c && convert -density 10000 [FILE].pdf -quality 100 [FILE].jpg`\
   Which is just three commands glued together:
   - `pdflatex [FILE].tex` converts the Latex file to a pdf, generating a bunch of other intermediary files.
   - `latexmk -c` cleans up the intermediary files from the previous step.
   - `convert -density 10000 [FILE].pdf -quality 100 [FILE].jpg` converts the pdf file to a jpg.
 - The final result is a jpg file which I can put in this markdown document!
 - Note that `convert` was probably installed from another package I already had (`imagemagick`).
 - I want to also mention that I tried a lot of other things before I got this solution working. I tried `dvisgm` which converts a DVI file generated by TexLive through the command `dvilualatex`. This caused problems with fonts not working properly and what not, and I couldn't be bothered getting this particular solution to work.
 - Also, converting to a PNG works fine as well, but the theme for the text editor I'm using to create this document is black, so I can't see the png files since they have transparent backgrounds and black text. The jpg files produced have white backgrounds, resolving this issue.
 - Note: I'm using VSCode to write this markdown document. You can press `ctrl+shift+v` to open up a preview window of your markdown, which is what I'm doing!