# 1. The MNIST dataset consists of numeric pictures written cursively from 0 to 9. If you want to create a neural network that classifies the data, what is the activation function of the last fully connected (=Dense) layer?

#Answer:- 
. Activation function of the last layer for MNIST classification
For multi-class classification (digits 0–9), the last dense layer should have softmax activation, which outputs a probability distribution over the 10 classes.

# (2) Write the training set that results after the application of the preprocessing formula below.
 


# (3) Explain your observation whether this data preprocessing alleviates the scale problem.
 

#Answer:-
Whether it alleviates the scale problem
Yes — after standardization, all features have comparable magnitude (~ zero mean, unit variance), so weight updates treat each feature more equally, improving convergence.


    (1) Assuming that the weight vector of the perceptron is and the bias is 0, explain the scale problem with the training set. 
    
    (2) Write the training set that results after the application of the preprocessing formula below. ​(5.9)
    
    (3) Explain your observation whether this data preprocessing alleviates the scale problem.

    (1) Scale problem
Blood pressure, height, and weight have different numerical ranges. Since the perceptron bias is 0, features with larger values dominate the dot product, causing unequal influence of features. This is called the scale problem.

    (2) Training set after preprocessing
After applying preprocessing formula (5.9), each feature is normalized/scaled, so blood pressure, height, and weight are transformed into comparable values (e.g., between 0 and 1 or with mean 0).

    (3) Observation
Yes, preprocessing alleviates the scale problem by bringing all features to the same scale, allowing each feature to contribute fairly and improving perceptron learning.

# 3. Neural networks, convolution layers are repeated multiple times in deep learning, and it causes some nodes get omitted to a great degree. What technique can you use to prevent this problem?

#Answer:- 
5. Problem: Repeating convolution layers in deep neural networks can cause some nodes to become inactive or omitted, known as the vanishing/exploding activation problem.
Solution: Use Dropout — a technique that randomly “drops” nodes during training, which:
Prevents overfitting
Ensures all nodes learn useful features
Improves generalization
Other techniques like Batch Normalization also help stabilize training.

# 4. Weight initialization should generate random numbers in the range [-1,1]. Please provide the Python code that does this function.

In [1]:
import numpy as np

# Example: initialize weight vector of size 5
weights = np.random.uniform(-1, 1, size=5)
print(weights)


[ 0.73732152 -0.17998871 -0.48819784 -0.84456052  0.44468135]


# 5. You want to train a classifier when you have many unlabeled training data but only a few thousand labeled data. Describe how the autoencoder can be helpful and how to work.

Scenario: Many unlabeled data + few labeled data.

      How autoencoder helps:
     1.Train an autoencoder on all unlabeled data to learn a compressed representation (features) of the input.
     2.Use the encoder part to extract meaningful features.
     3.Train a classifier using the few labeled data on these learned features.

     
         Benefit:
    1.Leverages unsupervised learning to capture patterns in unlabeled data
    2.Reduces dependency on large labeled datasets
    3.Improves classifier performance with limited labeled data

# 6.Train the 'Fashion-mnist' dataset classification model with reference to the following code.

In [None]:
import tensorflow as tf
import numpy as np
from tensorflow.keras.datasets.fashion_mnist import load_data
from tensorflow.keras.utils import to_categorical

# Load data
(X_train, y_train), (X_test, y_test) = load_data()

# Data preprocessing
X_train = X_train.reshape((-1, 28, 28, 1)) / 255.0
X_test = X_test.reshape((-1, 28, 28, 1)) / 255.0

# Convert labels to one-hot encoding
y_train = to_categorical(y_train, 10)
y_test = to_categorical(y_test, 10)

# Training parameters
batch_size = 8
n_epochs = 20
learn_rate = 0.0001

# CNN Model
model = tf.keras.Sequential([
    tf.keras.Input(shape=(28, 28, 1)),
    
    # Convolutional layer 1
    tf.keras.layers.Conv2D(32, kernel_size=(3,3), activation='relu'),
    tf.keras.layers.MaxPooling2D(pool_size=(2,2)),
    
    # Convolutional layer 2
    tf.keras.layers.Conv2D(64, kernel_size=(3,3), activation='relu'),
    tf.keras.layers.MaxPooling2D(pool_size=(2,2)),
    
    # Flatten and fully connected layers
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(128, activation='relu'),
    tf.keras.layers.Dropout(0.5),
    tf.keras.layers.Dense(10, activation='softmax')  # 10 classes
])

# Compile the model
model.compile(
    optimizer=tf.keras.optimizers.Adam(learning_rate=learn_rate),
    loss='categorical_crossentropy',
    metrics=['accuracy']
)

# Train the model
history = model.fit(
    X_train, y_train,
    validation_data=(X_test, y_test),
    batch_size=batch_size,
    epochs=n_epochs,
    verbose=1
)

# Evaluate the model
test_loss, test_acc = model.evaluate(X_test, y_test, verbose=0)
print(f"Test Accuracy: {test_acc:.4f}")
