In [1]:
import numpy as np

### instance 1

The below shows the vanishing gradient problem.

Where as inputs to sigmoid become very low or high, the output tends to be either at 0 or 1, which may result in a very small gradient nearing 0.

When taking the gradient of the loss, w.r.t the prediction, using labels 0 ('grad w label0'), the  gradient is extremely small nearing 0.

If used in deeper layers, this flat gradient can propagate throughout an entire model causing the learning to be ineffective and almost non existent.

The gradient using labels 1, are then relatively high at -1. This implies that the output of sigmoid is very close to 0, reinforcing an issue with vanishing gradients.

The root cause of this issue is be due to the magnitude of the input values which are on the order of thousands. Despite Xavier weight Initialization being used to mitigate a potential issue, the magnitude of the raw inputs were too high without much normalization.

Therefore, a means to avoid the vanishing gradient problem is to normalize inputs to improve the stability of a model.

In [332]:
np.set_printoptions(precision = 50000)

input = [[1000, 2000, 3000],[1000, 2000, 3000]]
input = np.array(input) # 2 samples, 3 features
input = input.T

#w = np.random.randn(1, 3) * np.sqrt(1/3)

w =  np.array([[ 0.5390438890427759, -0.40388501506080915, -0.08320392462912277]])

b = 0

z = np.dot(w, input) + b

eps = 1e-10
a = 1 / (1+np.exp(-z + eps))

#using this loss:
# bce = - np.sum(y * np.log(a) + (1- y) * np.log(1 - a)) / 3

grad0 = a - np.array([0, 0])
grad1 = a - np.array([1, 1])

print(f"raw input: \n{input}")
print(f"weights: {w}")
print(f"sigmiod input: {z}")
print(f'sigmoid output : {a}')
print(f'grad w label0: {grad0}')
print(f'grad w label1: {grad1}')

raw input: 
[[1000 1000]
 [2000 2000]
 [3000 3000]]
weights: [[ 0.5390438890427759  -0.40388501506080915 -0.08320392462912277]]
sigmiod input: [[-518.3379149662107 -518.3379149662107]]
sigmoid output : [[7.739337196003053e-226 7.739337196003053e-226]]
grad w label0: [[7.739337196003053e-226 7.739337196003053e-226]]
grad w label1: [[-1. -1.]]
