
* The common transformations can be divided into two categories: 
    + functions (no parameters during learning process):`pytorch.functional` contains the basic functions for high-level object-oriented modules with `torch.Tensor` (matrices) as parameters
    + architectures (containing learnable parameters): Rather than build neural network in a neuron/connection level, deep learning frameworks build NN in the **layer** level and use Object-oriented Implementations of Architectures. Two popular ones are Tensorflow `tensorflow.keras.layers` (e.g., `Conv1D`) and Pytorch `torch.nn.Module` (e.g., `Linear`, `Conv1d`, `Dropout`).
    + Pros/Cons of using layer-level architectures: only need to care inputs/outputs of each layers; unknown design decision
    + How to trace the parameters ?


How pytorch accumulate gradients for a batch of samples （e.g., [x1, x2]）independently in a vectorized way? 


In [None]:

# parameters
w = torch.Tensor([1.,2.,3.])
print('Original w grad:', w.grad)


# we create `x` containing two examples, each has 3-dimensional features
x1 = torch.Tensor([1., 2., 3.])
x2 = torch.Tensor([4., 5., 6.])

y = x1 * w
z = y.sum()
z.backward()
print('w grad with x1:', w.grad)

y = x2 * w
z = y.sum()
w.grad = None
z.backward()
print('w grad with x2:', w.grad)


# The gradients will be summed up in the batch dimension
x = torch.Tensor([[1., 2., 3.],
                  [4., 5., 6.]])
y = x * w
z = y.sum()
w.grad = None
z.backward()
print('mini-batch gradient of parameters with x1 and x2:', w.grad)

### High-level neural network using `nn.Module`

As said above, a neural network is just a stack of operations on data input tensors and model parameter tensors. `nn.Module` has the basic implementation to record the model parameters and operations in high level.

In a nutshell, 

**All the neural networks in Pytorch are built upon the parent class `nn.Module`**

The following code cell demonstrates how model parameters are used by Linear module `class Linear(Module)`.


In [None]:


# Use Pytorch Linear Module
nn_module = nn.Linear(5, 2)
for p in nn_module.parameters():
    print('W or b: ', p.shape)


`Module` is used **in a nested way**.

In [None]:
# build customized pytorch nn modules
class Network(nn.Module):
    def __init__(self):
        super().__init__() # pytorch will register layers and operations we put into the network
        
        self.hidden = nn.Linear(784, 256)
        self.output = nn.Linear(256, 10)
        
        self.sigmoid = nn.Sigmoid()
        self.softmax = nn.Softmax(dim=1)
        
    def forward(self, x):
        x = self.hidden(x)
        x = self.sigmoid(x)
        x = self.output(x)
        x = self.softmax(x)
        
        return x
        
model = Network()
model

In [None]:

# define neural network 
mlp = torch.nn.Sequential(
    torch.nn.Linear(2914, 1024),
    torch.nn.ReLU(),
    torch.nn.Linear(1024, 512),
    torch.nn.ReLU(),
    torch.nn.Linear(512, 256),
    torch.nn.ReLU(),
    torch.nn.Linear(256, 5),
)

# define optimizer for gradient descent
optimizer = torch.optim.SGD(mlp.parameters(), lr=0.001)

# assign the class weights according to the number of samples
# in each class in X
class_weights = [X_train.shape[0]/np.sum(y_train==i) for i in range(len(target_names))]
# class_weights = [1.0, 1.0, 1.0, 1.0, 1.0]
class_weights = torch.tensor(class_weights, dtype=torch.float)


for epoch in range(100):
    optimizer.zero_grad()
    X_train_tensor = torch.tensor(X_train, dtype=torch.float)
    y_train_tensor = torch.tensor(y_train, dtype=torch.long)
    outputs = mlp(X_train_tensor)
    loss = F.cross_entropy(outputs, y_train_tensor, weight=class_weights)
    loss.backward()
    optimizer.step()
    print('Epoch: ', epoch, ' Loss: ', loss.item())

# Predict
X_test_tensor = torch.tensor(X_test, dtype=torch.float)
y_test_tensor = torch.tensor(y_test, dtype=torch.long)
y_pred = mlp(X_test_tensor).argmax(dim=1).numpy()

# Evaluate
print(confusion_matrix(y_test, y_pred))
print(classification_report(y_test, y_pred, target_names=target_names))