## Assignment #1

* Release date: 2022.03.15 Tue
* Due date: **2022.03.22 Tue 23:59** (will not accept late submission)
* Submission format: notebook file which can be executed in Colab environment
* Weighting: 5% (total 50 pts)

1. **(10pts)** Calculate `rotation*x` and `x*rotation`. Explain how each computation is performed and why two results are the same.

  ```python
    import numpy as np

    x = np.array([[2, 0]])
    rotation = np.array([ [0, -1],
                          [1,  0] ])
  ```

2. **(5pts)** Suppose we have the following 2D tensor (i.e., a matrix). How to rearrange its values into 1D tensor (i.e., a vector) in a column major order?
```python
x = np.array([[1,  2,  3,  4],
                 [5,  6,  7,  8],
                 [9, 10, 11, 12]])
```

3. **(5pts)** Compute a transpose of the matrix `x` in Problem 2 by using only `np.reshape` function.

4. **(5pts)** Perform vector arithmetic to create a `primes_squared_minus_one` vector, where the `i`th element is equal to the `i`th element in `primes` squared minus 1. For example, the second element of `primes_squared_minus_one` would be equal to `3^2 - 1 = 8`. Note that using `for` loop is not allowed.
```python
import numpy as np
primes = np.array([2, 3, 5, 7, 11, 13])
primes_squared_minus_one = ?
```

5. **(10pts)** Given any random matrices, compute the element-wise multiplication using a naive Python implementation and Numpy built-in function respectively. Compare the wall-clock times of these implementations as the size of matrices increases.



6. **(15pts)** Consider MNIST classification problem covered during the class. For the details, please refer to the course material. In this example, we used the multilayer perceptron composed of an input layer with 512 hidden nodes and an output layer that produces predicted probabilities over 10 classes. In the class, we used GPU as a hardware accelerator to train our model.

  Here, let's verify the actual benefit of using GPU for training. For this, compare the wall-clock times in the case of 1) using CPU and 2) using GPU to train MNIST classifier.


# 1번  
numpy broadcasting에 의해 x 배열이 자동으로 변환([[2,0]] -> [[2,0],[2,0]])되었기 때문에 요소별 곱 결과는 동일하다.

In [1]:
# 1번
import numpy as np

x = np.array([[2,0]])
rotation = np.array([[0,-1],
                    [1, 0]])

print(x*rotation)
print(rotation*x)

[[0 0]
 [2 0]]
[[0 0]
 [2 0]]


# 2번

In [2]:
# 2번
x = np.array([[1,  2,  3,  4],
              [5,  6,  7,  8],
              [9, 10, 11, 12]])
new_x = np.transpose(x)
new_x

array([[ 1,  5,  9],
       [ 2,  6, 10],
       [ 3,  7, 11],
       [ 4,  8, 12]])

# 3번

In [3]:
# 3번
x = np.array([[1,  2,  3,  4],
              [5,  6,  7,  8],
              [9, 10, 11, 12]])

temp_x = (x.reshape(3, 4, 1))
temp_x

temp_list = []

k = 0
while True:
    for i in range(len(temp_x)):
        temp_list.append(temp_x[i][k][0])
    
    if k==3:
        k=0
    else: 
        k+=1
    if len(temp_list)==x.size:
        break
        
new_x = np.array(temp_list)
new_x = np.reshape(new_x, (x.shape[1], x.shape[0]))
new_x

array([[ 1,  5,  9],
       [ 2,  6, 10],
       [ 3,  7, 11],
       [ 4,  8, 12]])

# 4번

In [4]:
import numpy as np
primes = np.array([2, 3, 5, 7, 11, 13])
primes_squared_minus_one = primes**2-1
primes_squared_minus_one

array([  3,   8,  24,  48, 120, 168])

# 5번  
numpy built-in function이 python naive function보다 월등히 빠르다.

In [5]:
ex_1 = np.reshape(np.arange(100), (10, 10))
ex_2 = np.reshape(np.arange(10000), (100, 100))
ex_3 = np.reshape(np.arange(1000000), (1000, 1000))

ex_list = [ex_1, ex_2, ex_3]

# function naive python
def naive_ex(np_array_1, np_array_2):
    assert np_array_1.shape[1] == np_array_2.shape[0]
        
    width, length = np_array_1.shape
    np_new = np_array_1.copy()
    for i in range(length):
        for j in range(width):
            np_new[i, j] = np_array_1[i,j]*np_array_2[i,j]
    return np_new


# compare

import time

for ex in ex_list:
    ex_temp = ex.copy()
    naive_start = time.time()
    naive_ex(ex, ex_temp)
    naive_elapsed = time.time() - naive_start
    print('naive function Elapsed time: %.4f' % naive_elapsed)
    
    numpy_start = time.time()
    np.multiply(ex, ex_temp)
    numpy_elapsed = time.time() - numpy_start
    print('numpy function Elapsed time: %.4f' % numpy_elapsed)

naive function Elapsed time: 0.0000
numpy function Elapsed time: 0.0000
naive function Elapsed time: 0.0021
numpy function Elapsed time: 0.0000
naive function Elapsed time: 0.2089
numpy function Elapsed time: 0.0011


# 6번  
In Colab Environment  
CPU_Elapsed_time : 45.4480  
GPU_Elapsed_time : 23.2251

In [6]:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.datasets import mnist
from tensorflow.keras import models
from tensorflow.keras import layers
from tensorflow.keras.utils import to_categorical

(train_images, train_labels), (test_images,test_labels) = mnist.load_data()

network = models.Sequential()
network.add(layers.Dense(512, activation='relu', input_shape=(28*28,)))
network.add(layers.Dense(10, activation='softmax'))
network.compile(optimizer='rmsprop',
                loss='categorical_crossentropy',
                metrics=['accuracy'])

train_images = train_images.reshape((60000, 28*28))
train_images = train_images.astype('float32') / 255

test_images = test_images.reshape((10000, 28*28))
test_images = test_images.astype('float32') / 255

train_labels = to_categorical(train_labels)
test_labels = to_categorical(test_labels)

start = time.time()
network.fit(train_images, train_labels, epochs=10, batch_size=128)
elapsed = time.time() - start

print('Elapsed time: %.4f' % elapsed)

Metal device set to: Apple M1
Epoch 1/10


2022-03-22 22:03:38.249715: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:305] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2022-03-22 22:03:38.249812: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:271] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: <undefined>)
2022-03-22 22:03:38.428393: W tensorflow/core/platform/profile_utils/cpu_utils.cc:128] Failed to get CPU frequency: 0 Hz


  8/469 [..............................] - ETA: 3s - loss: 1.4059 - accuracy: 0.5752  

2022-03-22 22:03:38.651346: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.


Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Elapsed time: 34.0182
