**Tensor Neural Network Framework** 

In this framework, vectors and matrices are represened by a generalized `Tensor` class. A tensor object contains the data for the vector/matrix, a unique identifier and a list of Tensor operation methods. It also includes information pertaining to how the tensor was created, e.g. if it was created by a tensor operation from other tensors, then we would call it a `child` of those `parent tensors`. (So these Tensors can be considered to form the `nodes` of a `tree`-like hierarchical structure, with data being transmitted across the node edges during forward and backward propagation). Finally, the tensor object also contains a method for computing and backpropagating deriviatives, this feature can be turned on by setting the `autograd` property to `True`. The backpropagtion occurs recursively over all the ancestors of that Tensor and stops when a Tensor which does not have any parents is reached. Any given tensor will wait until it has recieved and accumulated the backpropagted derivatives from all it's children and then it will backpropagate it's gradient to it's parents.   

In addition to this Tensor class, we also create a base `Layer` class, and define a `Linear` Layer sub-class which represents a linear layer in a neural network, i.e. it has a matrix of weights and it takes a vector of input neurons and multiplies it to the weights matrix resulting in a vector of output neurons.

We also create sub-classes for `non-lineararity layers` which take a vector of input neurons and operates on this vector with a non-linear function such as `sigmoid` or `relu`. Similarly, we also have a `loss function layer` for computing error/loss for a given target and prediction.

In [29]:
import numpy as np

class Tensor(object):
    
    def __init__(self, data, creators=None, creation_op=None, autograd=False, id=None):
        self.data = np.array(data)
        self.creators = creators
        self.creation_op = creation_op
        self.grad = None
        self.autograd = autograd
        if(id == None):
            id = np.random.randint(0,100000)
        self.id = id
        self.children = {}
        if(creators is not None):
            for creator in creators:
                if self.id not in creator.children:
                    creator.children[self.id] = 1
                else:
                    creator.children[self.id] += 1    

    def backward(self, grad=None, grad_origin=None):
        if(self.autograd):
            if(grad_origin is not None):
                # if waiting to receive gradient, decrement counter
                if(self.children[grad_origin.id] != 0):
                    self.children[grad_origin.id] -= 1
                else:
                    raise Exception("Same child cannot backpropagate more than once!")

            # if this is the beginning of the backpropagtion chain
            if(grad is None):
                grad = Tensor(np.ones_like(self.data))

            # accumulate gradients from all the children 
            if(self.grad is None):
                self.grad = grad
            else:
                self.grad += grad    

            # backpropagate to creators if all gradients from children have been received or if gradients did not originate from another node
            if((self.creators is not None) and (self.received_grads_from_all_children() or (grad_origin is None))):
                if(self.creation_op == "add"):
                    new_grad = Tensor(self.grad.data)
                    self.creators[0].backward(new_grad, self)
                    self.creators[1].backward(new_grad, self)
                if(self.creation_op == "neg"):
                    new_grad = self.grad.__neg__()
                    self.creators[0].backward(new_grad, self)    
                if(self.creation_op == "sub"):
                    new_grad = Tensor(self.grad.data)
                    self.creators[0].backward(new_grad, self)
                    new_grad = self.grad.__neg__()
                    self.creators[1].backward(new_grad, self)    
                if(self.creation_op == "mul"):
                    new_grad = self.grad * self.creators[1]
                    self.creators[0].backward(new_grad, self)
                    new_grad = self.creators[0] * self.grad
                    self.creators[1].backward(new_grad, self)
                if(self.creation_op == "mm"):
                    new_grad = self.grad.mm(self.creators[1].transpose())
                    self.creators[0].backward(new_grad, self)
                    new_grad = (self.creators[0].transpose()).mm(self.grad)
                    self.creators[1].backward(new_grad, self)
                if(self.creation_op == "transpose"):
                    new_grad = self.grad.transpose()
                    self.creators[0].backward(new_grad, self)
                if(self.creation_op == "sigmoid"):
                    ones = Tensor(np.ones_like(self.grad.data))
                    # sigmoid derivative
                    new_grad = self.grad * (self * (ones - self))
                    self.creators[0].backward(new_grad, self)
                if(self.creation_op == "tanh"):
                    ones = Tensor(np.ones_like(self.grad.data))
                    # tanh derivative
                    new_grad = self.grad * (ones - self*self)
                    self.creators[0].backward(new_grad, self)
                if(self.creation_op == "relu"):
                    # relu derivative
                    new_grad = self.grad * (self.creators[0].data > 0)
                    self.creators[0].backward(new_grad, self)
                if("sum" in self.creation_op):
                    dim = int(self.creation_op.split("_")[1])
                    ds = self.creators[0].data.shape[dim]
                    self.creators[0].backward(self.grad.expand(dim,ds))
                if("expand" in self.creation_op):
                    dim = int(self.creation_op.split("_")[1])
                    self.creators[0].backward(self.grad.sum(dim))


    # check to see if this tensor has recieved gradients from all children, which is indicated by all children counts being zero
    def received_grads_from_all_children(self):
        for id,count in self.children.items():
            if (count != 0):
                return False
        return True     

    # Note: operations always return a new tensor object 

    # element-wise addition
    def __add__(self, other):
        # return a new tensor object containing the sum
        if(self.autograd and other.autograd):
            return Tensor(self.data + other.data, creators=[self,other], creation_op ="add", autograd=True)
        return Tensor(self.data + other.data)
    
    # element-wise negation
    def __neg__(self):
        # return a new tensor object containing the negation
        if(self.autograd):
            return Tensor(-1 * self.data, creators=[self], creation_op ="neg", autograd=True)
        return Tensor(-1 * self.data)

    # element-wise subtraction
    def __sub__(self, other):
        # return a new tensor object containing the subtraction
        if(self.autograd and other.autograd):
            return Tensor(self.data - other.data, creators=[self,other], creation_op ="sub", autograd=True)
        return Tensor(self.data - other.data)

    # element-wise multiplication
    def __mul__(self, other):
        # return a new tensor object containing the multiplication
        if(self.autograd and other.autograd):
            return Tensor(self.data * other.data, creators=[self,other], creation_op ="mul", autograd=True)
        return Tensor(self.data * other.data)
    
    # sum over all elements along given axis
    def sum(self, axis):
        # return a new tensor object containing the sum
        if(self.autograd):
            return Tensor(self.data.sum(axis), creators=[self], creation_op ="sum_"+str(axis), autograd=True)
        return Tensor(self.data.sum(axis))
    
    # expands the tensor along the given axis
    def expand(self, axis, copies):
        
        trans_cmd = list(range(0,len(self.data.shape)))
        trans_cmd.insert(axis, len(self.data.shape))
        
        new_shape = list(self.data.shape) + [copies]
        new_data = self.data.repeat(copies).reshape(new_shape)
        new_data = new_data.transpose(trans_cmd)
        
        if(self.autograd):
            return Tensor(new_data, autograd=True, creators=[self], creation_op="expand_"+str(axis))
        return Tensor(new_data)

    # transpose of matrix 
    def transpose(self):
        # return a new tensor object with the transposed tensor
        if(self.autograd):
            return Tensor(self.data.transpose(), creators=[self], creation_op ="transpose", autograd=True)
        return Tensor(self.data.transpose())

    # matrix multiplication
    def mm(self, other):
        # return a new tensor object containing the multiplication
        if(self.autograd and other.autograd):
            return Tensor(np.dot(self.data, other.data), creators=[self,other], creation_op ="mm", autograd=True)
        return Tensor(np.dot(self.data, other.data))

    def __str__(self):
        return str(self.data.__str__())
    
    def __repr__(self):
        return str(self.data.__repr__())

    # Non-linearity functions

    # sigmoid function
    def sigmoid(self):
        if(self.autograd):
            return Tensor(1.0 / (1.0 + np.exp(-self.data)), creators=[self], creation_op="sigmoid", autograd=True)
        return Tensor(1.0 / (1.0 + np.exp(self.data)))

    # tanh function
    def tanh(self):
        if(self.autograd):
            return Tensor(np.tanh(self.data), creators=[self], creation_op="sigmoid", autograd=True)
        return Tensor(np.tanh(self.data))
    
    # relu function
    def relu(self):
        if(self.autograd):
            return Tensor(self.data * (self.data > 0), creators=[self], creation_op="sigmoid", autograd=True)
        return Tensor(self.data * (self.data > 0))
    


# stochastic gradient descent optimizer    
class SGD_Optimizer(object):

    def __init__(self, parameters, alpha) -> None:
        self.parameters = parameters
        self.alpha = alpha    

    def zero(self):
        for p in self.parameters:
            p.grad.data *= 0

    def step(self, zero=True):
        for p in self.parameters:
            p.data -= self.alpha * p.grad.data

            if(zero):
                p.grad.data *= 0

# layer base class
class Layer(object):   
    def __init__(self) -> None:
        self.parameters = []

    def get_parameters(self):                     
        return self.parameters
    
# layer inherited classes
class Linear(Layer):
    def __init__(self, n_inputs, n_outputs) -> None:
        super().__init__()
        # initilize the weights
        W = np.random.randn(n_inputs, n_outputs) * np.sqrt(2.0/n_inputs)
        self.weight = Tensor(W, autograd=True)
        self.bias = Tensor(np.zeros(n_outputs), autograd=True)

        self.parameters.append(self.weight)
        self.parameters.append(self.bias)

    def forward(self, input):
        return input.mm(self.weight) + self.bias.expand(0,len(input.data))   

# a class for a senquence of layer, i.e. a neral network model
class Sequential(Layer):
    def __init__(self, layers = []) -> None:
        super().__init__()
        self.layers = layers

    def add(self, layer):
        self.layers.append(layer)

    def forward(self, input):
        for layer in self.layers:
            input = layer.forward(input)
        return input
    
    def get_parameters(self):
        params = []
        for layer in self.layers:
            params += layer.get_parameters()

        return params    
    
# means squared error loss function layer    
class MSELoss(Layer):
    def __init__(self) -> None:
        super().__init__()

    def forward(self, pred, target):
        return ((pred-target) * (pred-target)).sum(0)

# nonlinearity layers
class Sigmoid(Layer):
    def __init__(self) -> None:
        super().__init__()

    def forward(self, input):
        return input.sigmoid()

class Tanh(Layer):
    def __init__(self) -> None:
        super().__init__()

    def forward(self, input):
        return input.tanh()

class Relu(Layer):
    def __init__(self) -> None:
        super().__init__()

    def forward(self, input):
        return input.relu()



In [27]:
a = Tensor([1,2,3,4,5], autograd=True)
b = Tensor([2,2,2,2,2], autograd=True)
c = Tensor([3,3,3,3,3], autograd=True)
d = a + (-b)
e = (-b) + c
f = d + e

print(f"node(a), id: {a.id}, children: {a.children}, creators: {a.creators}")
print(f"node(b), id: {b.id}, children: {b.children}, creators: {b.creators}")
print(f"node(c), id: {c.id}, children: {c.children}, creators: {c.creators}")
print(f"node(d), id: {d.id}, children: {d.children}, creators: {d.creators}")
print(f"node(e), id: {e.id}, children: {e.children}, creators: {e.creators}")
print(f"node(f), id: {f.id}, children: {f.children}, creators: {f.creators}")

D = Tensor([1,1,1,1,1])
f.backward(grad = D)

print(f"f grad: {f.grad}")
print(f"e grad: {e.grad}")
print(f"d grad: {d.grad}")
print(f"c grad: {c.grad}")
print(f"b grad: {b.grad}")
print(f"a grad: {a.grad}")


node(a), id: 68083, children: {53360: 1}, creators: None
node(b), id: 34350, children: {54247: 1, 2559: 1}, creators: None
node(c), id: 80089, children: {31077: 1}, creators: None
node(d), id: 53360, children: {59906: 1}, creators: [array([1, 2, 3, 4, 5]), array([-2, -2, -2, -2, -2])]
node(e), id: 31077, children: {59906: 1}, creators: [array([-2, -2, -2, -2, -2]), array([3, 3, 3, 3, 3])]
node(f), id: 59906, children: {}, creators: [array([-1,  0,  1,  2,  3]), array([1, 1, 1, 1, 1])]
f grad: [1 1 1 1 1]
e grad: [1 1 1 1 1]
d grad: [1 1 1 1 1]
c grad: [1 1 1 1 1]
b grad: [-2 -2 -2 -2 -2]
a grad: [1 1 1 1 1]


Example 1: Using the tensor object and autograd to train a simple two layer linear network

In [26]:
np.random.seed(1)
input_data = Tensor(np.array([[0,0], [0,1], [1,0], [1,1]]), autograd=True)
target = Tensor(np.array([[0],[1],[0],[1]]), autograd=True) 

input_neurons = input_data.data.shape[1]
hidden_neurons = 3
output_neurons = target.data.shape[1]

# initialize neural net layers
model = Sequential(layers=[Linear(input_neurons, hidden_neurons), Linear(hidden_neurons, output_neurons)])
loss_layer = MSELoss()

# initialize optimizer
optim = SGD_Optimizer(parameters=model.get_parameters(), alpha = 0.05) 

# training iterations
niters = 10
for iter in range(niters):

    # forward pass
    pred = model.forward(input_data)

    # compute loss
    loss = loss_layer.forward(pred, target)

    # backpropagation
    loss.backward()

    # optimization of weights
    optim.step()

    print(f"Iteration# {iter+1}, Loss: {loss}")


Iteration# 1, Loss: [12.2648028]
Iteration# 2, Loss: [9.54239642]
Iteration# 3, Loss: [0.65868523]
Iteration# 4, Loss: [0.44037858]
Iteration# 5, Loss: [0.30768909]
Iteration# 6, Loss: [0.2183136]
Iteration# 7, Loss: [0.15605316]
Iteration# 8, Loss: [0.11190144]
Iteration# 9, Loss: [0.08028195]
Iteration# 10, Loss: [0.05752987]


Example 2: Using the tensor object and autograd to train a network with non-linear layers

In [34]:
np.random.seed(1)
input_data = Tensor(np.array([[0,0], [0,1], [1,0], [1,1]]), autograd=True)
target = Tensor(np.array([[0],[1],[0],[1]]), autograd=True) 

input_neurons = input_data.data.shape[1]
hidden_neurons = 3
output_neurons = target.data.shape[1]

# initialize neural net layers
model = Sequential(layers=[Linear(input_neurons, hidden_neurons), Tanh(),Linear(hidden_neurons, output_neurons), Sigmoid()])
loss_layer = MSELoss()

# initialize optimizer
optim = SGD_Optimizer(parameters=model.get_parameters(), alpha = 1) 

# training iterations
niters = 10
for iter in range(niters):

    # forward pass
    pred = model.forward(input_data)

    # compute loss
    loss = loss_layer.forward(pred, target)

    # backpropagation
    loss.backward()

    # optimization of weights
    optim.step()

    print(f"Iteration# {iter+1}, Loss: {loss}")


Iteration# 1, Loss: [1.14301976]
Iteration# 2, Loss: [1.04916143]
Iteration# 3, Loss: [0.85408068]
Iteration# 4, Loss: [0.75343445]
Iteration# 5, Loss: [0.68621271]
Iteration# 6, Loss: [0.59814987]
Iteration# 7, Loss: [0.51407253]
Iteration# 8, Loss: [0.43459528]
Iteration# 9, Loss: [0.36942081]
Iteration# 10, Loss: [0.31531027]


**Adding support for language processing:**

Previously we had a `linear layer` which had a matrix of weights and forward propagatopn involved computing the vector-matrix multiplication of the inputs with the weights. We will now create a similar `embedding layer` for natural language processing. The `embedding layer` will also have a weights matrix, in this case each row of the matrix will correspond to an embedding for a word from the vocabulary, and the number of rows should be set equal to the total number of words in the vocabulary. The number of columns on the other hand will be set equal to the desired number of hidden neurons.

During forward propagation, the input vector is going to be a list of word indices and the output will be specific rows (corresponding to the input word indices) selected from the weights matrix. To do this, we will add an `index_select` operation into our tensor object. During backpropagation, the gradients accociated with only those specific rows will be computed, and so the list of input word indices will be utilized during backpropagation as well.

In [58]:
class Tensor(object):
    
    def __init__(self, data, creators=None, creation_op=None, autograd=False, id=None):
        self.data = np.array(data)
        self.creators = creators
        self.creation_op = creation_op
        self.grad = None
        self.autograd = autograd
        if(id == None):
            id = np.random.randint(0,100000)
        self.id = id
        self.children = {}
        if(creators is not None):
            for creator in creators:
                if self.id not in creator.children:
                    creator.children[self.id] = 1
                else:
                    creator.children[self.id] += 1    

    def backward(self, grad=None, grad_origin=None):
        if(self.autograd):
            if(grad_origin is not None):
                # if waiting to receive gradient, decrement counter
                if(self.children[grad_origin.id] != 0):
                    self.children[grad_origin.id] -= 1
                else:
                    raise Exception("Same child cannot backpropagate more than once!")

            # if this is the beginning of the backpropagtion chain
            if(grad is None):
                grad = Tensor(np.ones_like(self.data))

            # accumulate gradients from all the children 
            if(self.grad is None):
                self.grad = grad
            else:
                self.grad += grad    

            # backpropagate to creators if all gradients from children have been received or if gradients did not originate from another node
            if((self.creators is not None) and (self.received_grads_from_all_children() or (grad_origin is None))):
                if(self.creation_op == "add"):
                    new_grad = Tensor(self.grad.data)
                    self.creators[0].backward(new_grad, self)
                    self.creators[1].backward(new_grad, self)
                if(self.creation_op == "neg"):
                    new_grad = self.grad.__neg__()
                    self.creators[0].backward(new_grad, self)    
                if(self.creation_op == "sub"):
                    new_grad = Tensor(self.grad.data)
                    self.creators[0].backward(new_grad, self)
                    new_grad = self.grad.__neg__()
                    self.creators[1].backward(new_grad, self)    
                if(self.creation_op == "mul"):
                    new_grad = self.grad * self.creators[1]
                    self.creators[0].backward(new_grad, self)
                    new_grad = self.creators[0] * self.grad
                    self.creators[1].backward(new_grad, self)
                if(self.creation_op == "mm"):
                    new_grad = self.grad.mm(self.creators[1].transpose())
                    self.creators[0].backward(new_grad, self)
                    new_grad = (self.creators[0].transpose()).mm(self.grad)
                    self.creators[1].backward(new_grad, self)
                if(self.creation_op == "transpose"):
                    new_grad = self.grad.transpose()
                    self.creators[0].backward(new_grad, self)
                if(self.creation_op == "sigmoid"):
                    ones = Tensor(np.ones_like(self.grad.data))
                    # sigmoid derivative
                    new_grad = self.grad * (self * (ones - self))
                    self.creators[0].backward(new_grad, self)
                if(self.creation_op == "tanh"):
                    ones = Tensor(np.ones_like(self.grad.data))
                    # tanh derivative
                    new_grad = self.grad * (ones - self*self)
                    self.creators[0].backward(new_grad, self)
                if(self.creation_op == "relu"):
                    # relu derivative
                    new_grad = self.grad * (self.creators[0].data > 0)
                    self.creators[0].backward(new_grad, self)
                if(self.creation_op == "index_select"):
                    # gradient of the weights matrix of word embeddings
                    new_grad = np.zeros_like(self.creators[0].data)
                    # we only add gradients to the specific rows corresponding to the selected words 
                    indices_ = self.index_select_indices.data.flatten() 
                    grad_ = self.grad.data.reshape(len(indices_), -1)
                    for i in range(len(indices_)):
                        new_grad[indices_[i]] += grad_[i]
                    self.creators[0].backward(new_grad, self)       

                if("sum" in self.creation_op):
                    dim = int(self.creation_op.split("_")[1])
                    ds = self.creators[0].data.shape[dim]
                    self.creators[0].backward(self.grad.expand(dim,ds))
                if("expand" in self.creation_op):
                    dim = int(self.creation_op.split("_")[1])
                    self.creators[0].backward(self.grad.sum(dim))


    # check to see if this tensor has recieved gradients from all children, which is indicated by all children counts being zero
    def received_grads_from_all_children(self):
        for id,count in self.children.items():
            if (count != 0):
                return False
        return True     

    # Note: operations always return a new tensor object 

    # element-wise addition
    def __add__(self, other):
        # return a new tensor object containing the sum
        if(self.autograd and other.autograd):
            return Tensor(self.data + other.data, creators=[self,other], creation_op ="add", autograd=True)
        return Tensor(self.data + other.data)
    
    # element-wise negation
    def __neg__(self):
        # return a new tensor object containing the negation
        if(self.autograd):
            return Tensor(-1 * self.data, creators=[self], creation_op ="neg", autograd=True)
        return Tensor(-1 * self.data)

    # element-wise subtraction
    def __sub__(self, other):
        # return a new tensor object containing the subtraction
        if(self.autograd and other.autograd):
            return Tensor(self.data - other.data, creators=[self,other], creation_op ="sub", autograd=True)
        return Tensor(self.data - other.data)

    # element-wise multiplication
    def __mul__(self, other):
        # return a new tensor object containing the multiplication
        if(self.autograd and other.autograd):
            return Tensor(self.data * other.data, creators=[self,other], creation_op ="mul", autograd=True)
        return Tensor(self.data * other.data)
    
    # sum over all elements along given axis
    def sum(self, axis):
        # return a new tensor object containing the sum
        if(self.autograd):
            return Tensor(self.data.sum(axis), creators=[self], creation_op ="sum_"+str(axis), autograd=True)
        return Tensor(self.data.sum(axis))
    
    # expands the tensor along the given axis
    def expand(self, axis, copies):
        
        trans_cmd = list(range(0,len(self.data.shape)))
        trans_cmd.insert(axis, len(self.data.shape))
        
        new_shape = list(self.data.shape) + [copies]
        new_data = self.data.repeat(copies).reshape(new_shape)
        new_data = new_data.transpose(trans_cmd)
        
        if(self.autograd):
            return Tensor(new_data, autograd=True, creators=[self], creation_op="expand_"+str(axis))
        return Tensor(new_data)

    # transpose of matrix 
    def transpose(self):
        # return a new tensor object with the transposed tensor
        if(self.autograd):
            return Tensor(self.data.transpose(), creators=[self], creation_op ="transpose", autograd=True)
        return Tensor(self.data.transpose())

    # matrix multiplication
    def mm(self, other):
        # return a new tensor object containing the multiplication
        if(self.autograd and other.autograd):
            return Tensor(np.dot(self.data, other.data), creators=[self,other], creation_op ="mm", autograd=True)
        return Tensor(np.dot(self.data, other.data))

    def __str__(self):
        return str(self.data.__str__())
    
    def __repr__(self):
        return str(self.data.__repr__())

    # Non-linearity functions

    # sigmoid function
    def sigmoid(self):
        if(self.autograd):
            return Tensor(1.0 / (1.0 + np.exp(-self.data)), creators=[self], creation_op="sigmoid", autograd=True)
        return Tensor(1.0 / (1.0 + np.exp(self.data)))

    # tanh function
    def tanh(self):
        if(self.autograd):
            return Tensor(np.tanh(self.data), creators=[self], creation_op="sigmoid", autograd=True)
        return Tensor(np.tanh(self.data))
    
    # relu function
    def relu(self):
        if(self.autograd):
            return Tensor(self.data * (self.data > 0), creators=[self], creation_op="sigmoid", autograd=True)
        return Tensor(self.data * (self.data > 0))
    
    # word embedding operations (the input 'indices' are just word a vector of indices, i.e. specifix row numbers that are to be selected and returned)
    def index_select(self, indices):
        if(self.autograd):
            selected_rows =  Tensor(self.data[indices.data], creators=[self], creation_op="index_select", autograd=True)
            selected_rows.index_select_indices = indices 
            return selected_rows 
        return Tensor(self.data[indices.data])


class Embedding(Layer):
    def __init__(self, vocab_size, hidden_neurons) -> None:
        super().__init__()
        self.vocab_size = vocab_size
        self.hidden_neurons = hidden_neurons

        # initialize the weights matrix of word embeddings 
        self.weight = Tensor((np.random.rand(vocab_size, hidden_neurons)-0.5)/hidden_neurons, autograd=True)
        self.parameters.append(self.weight)   

    def forward(self, input):
        return self.weight.index_select(input)    
        

Example of word embeddinf forward pass and backprop in action

In [61]:
# initialize a weights matrix for a vocabulary of 5 words and 5 hidden neurons
w = Tensor(np.eye(5), autograd=True)
print("weights matrix:")
print(w)

# forward pass for an input vector containing two sentences with three words each
indices = Tensor(np.array([[1,2,3], [2,3,4]]))
selected_rows = w.index_select(indices)
print("Selected rows:")
print(selected_rows)

# compute gradient of weights for the given input
selected_rows.backward()
print("weights gradient:")
print(w.grad)


weights matrix:
[[1. 0. 0. 0. 0.]
 [0. 1. 0. 0. 0.]
 [0. 0. 1. 0. 0.]
 [0. 0. 0. 1. 0.]
 [0. 0. 0. 0. 1.]]
Selected rows:
[[[0. 1. 0. 0. 0.]
  [0. 0. 1. 0. 0.]
  [0. 0. 0. 1. 0.]]

 [[0. 0. 1. 0. 0.]
  [0. 0. 0. 1. 0.]
  [0. 0. 0. 0. 1.]]]
weights gradient:
[[0. 0. 0. 0. 0.]
 [1. 1. 1. 1. 1.]
 [2. 2. 2. 2. 2.]
 [2. 2. 2. 2. 2.]
 [1. 1. 1. 1. 1.]]
