# Dying ReLU

The dying ReLU problem occurs when several neurons only output a value of zero. This happens primarily when the input is negative. This offers an advantage of network sparsity to ReLU, but it creates a major problem when most of the inputs to the neurons are negative. The worst-case scenario is when the entire network dies and only a constant function remains.

When most of the neurons output zero, the gradient fails to flow and the weights stop getting updated. Thus, the network stops learning. 

In [None]:
import numpy as np
import tensorflow as tf
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Dense, Input
from tensorflow.keras import initializers

In [None]:
# Generate some synthetic training data
np.random.seed(42)
X_train = np.random.rand(1000, 10)
y_train = np.random.randint(2, size=(1000, 1))

#### ReLU Activation
We create a simple sequential model with two hidden layers and an output layer. Both hidden layers use the ReLU activation function. We initialize the weights of the layers using a constant value 0.5 or random normal distribution with a mean of 0 and a standard deviation of 0.1.

In [None]:
# Model using ReLU
inputs = Input(shape=(10,))
hidden1 = Dense(10, activation='relu', kernel_initializer=initializers.Constant(0.5))(inputs)
#hidden1 = Dense(10, activation='relu', kernel_initializer=initializers.RandomNormal(mean=0.0, stddev=0.1), input_shape=(10,)))
hidden2 = Dense(10, activation='relu', kernel_initializer=initializers.Constant(-0.5))(hidden1)
outputs = Dense(1, activation='sigmoid')(hidden2)

#### Swish Activation

In [None]:
# Model using Swish
# inputs = Input(shape=(10,))
# hidden1 = Dense(10, activation='swish', kernel_initializer=initializers.Constant(0.5))(inputs)
# hidden2 = Dense(10, activation='swish', kernel_initializer=initializers.Constant(-0.5))(hidden1)
# outputs = Dense(1, activation='sigmoid')(hidden2)

In [None]:
model = Model(inputs=inputs, outputs=outputs)

# Compile the model
model.compile(optimizer='adam', loss='binary_crossentropy')

# Train the model
model.fit(X_train, y_train, epochs=10, batch_size=32)

# Extract and print neuron values
get_layer_outputs = tf.keras.backend.function([model.layers[0].input], [model.layers[1].output, model.layers[2].output])
layer1_values, layer2_values = get_layer_outputs([X_train])

print("Layer 1 neuron values:")
print(layer1_values)
print("Layer 2 neuron values:")
print(layer2_values)