<a href="https://www.kaggle.com/code/waheedhussain2/mnist-k-fold-resnet-model-acc-99-5?scriptVersionId=205377300" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

# introduction

The MNIST digit recognition competition on Kaggle challenges participants to classify handwritten digits (0-9) from grayscale images. It’s a well-known beginner-friendly competition, ideal for exploring image classification techniques and experimenting with deep learning models. Participants use a dataset of labeled training images to build models that can accurately predict the labels for a set of unlabeled test images. The competition is a great entry point for those interested in computer vision, allowing them to apply neural networks and other machine learning techniques while competing against a global community of learners and experts.

# Hyperparameters and Techniques
**Convolutional Filters:** 64, 128, and 256 filters across different layers.

**Dropout Rates:** Increasing dropout rates (0.2, 0.3, 0.4, and 0.5) for better regularization.

**Batch Size:** 256, chosen to balance training speed and stability.

**Optimizer:** Adam with an initial learning rate of 0.001, which works well with CNNs for fast convergence.

**Cross-Validation:** 5-fold to improve robustness.

In [3]:
import pandas as pd

In [1]:
train = '/kaggle/input/digit-recognizer/train.csv'

In [4]:
train_data= pd.read_csv(train)

In [6]:
train_data.shape

(42000, 785)

In [7]:
import tensorflow as tf
from tensorflow.keras import layers, models, optimizers, callbacks
from sklearn.model_selection import train_test_split, KFold
import pandas as pd
import numpy as np


In [8]:
# Load the data
data = pd.read_csv('/kaggle/input/digit-recognizer/train.csv')
X = data.iloc[:, 1:].values.reshape(-1, 28, 28, 1) / 255.0  # Normalize pixel values
y = data.iloc[:, 0].values

# Train/Validation Split
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, random_state=42)


In [9]:
# Model architecture with ResNet blocks for better accuracy
def build_model():
    model = models.Sequential()
    
    # Initial Convolution layer
    model.add(layers.Conv2D(64, (3, 3), activation='relu', input_shape=(28, 28, 1)))
    model.add(layers.BatchNormalization())
    model.add(layers.MaxPooling2D((2, 2)))
    model.add(layers.Dropout(0.2))
    
    # Residual block 1
    for _ in range(2):
        model.add(layers.Conv2D(128, (3, 3), activation='relu', padding='same'))
        model.add(layers.BatchNormalization())
    model.add(layers.MaxPooling2D((2, 2)))
    model.add(layers.Dropout(0.3))
    
    # Residual block 2
    for _ in range(2):
        model.add(layers.Conv2D(256, (3, 3), activation='relu', padding='same'))
        model.add(layers.BatchNormalization())
    model.add(layers.MaxPooling2D((2, 2)))
    model.add(layers.Dropout(0.4))
    
    # Fully connected layers
    model.add(layers.Flatten())
    model.add(layers.Dense(512, activation='relu'))
    model.add(layers.BatchNormalization())
    model.add(layers.Dropout(0.5))
    model.add(layers.Dense(10, activation='softmax'))
    
    return model


In [10]:

# Training function with cross-validation and learning rate scheduler
def train_and_evaluate(X_train, y_train):
    kfold = KFold(n_splits=5, shuffle=True, random_state=42)
    scores = []
    
    for fold, (train_idx, val_idx) in enumerate(kfold.split(X_train)):
        print(f"Training fold {fold+1}...")
        
        model = build_model()
        model.compile(optimizer=optimizers.Adam(learning_rate=0.001),
                      loss='sparse_categorical_crossentropy',
                      metrics=['accuracy'])
        
        lr_scheduler = callbacks.ReduceLROnPlateau(monitor='val_loss', factor=0.5, patience=3, verbose=1)
        early_stopping = callbacks.EarlyStopping(monitor='val_accuracy', patience=6, restore_best_weights=True)
        
        history = model.fit(X_train[train_idx], y_train[train_idx],
                            validation_data=(X_train[val_idx], y_train[val_idx]),
                            epochs=20,
                            batch_size=256,
                            callbacks=[lr_scheduler, early_stopping],
                            verbose=2)
        
        score = model.evaluate(X_train[val_idx], y_train[val_idx], verbose=0)
        print(f"Fold {fold+1} - Validation accuracy: {score[1]}")
        scores.append(score[1])
        
    print(f"Mean cross-validated accuracy: {np.mean(scores)}")
    return model

# Train and evaluate
final_model = train_and_evaluate(X_train, y_train)

# Evaluate on validation set
val_loss, val_accuracy = final_model.evaluate(X_val, y_val, verbose=0)
print(f"Final model validation accuracy: {val_accuracy}")


Training fold 1...


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


Epoch 1/20


I0000 00:00:1730799175.049119     164 service.cc:145] XLA service 0x7e664c0082a0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
I0000 00:00:1730799175.049169     164 service.cc:153]   StreamExecutor device (0): Tesla T4, Compute Capability 7.5
I0000 00:00:1730799175.049175     164 service.cc:153]   StreamExecutor device (1): Tesla T4, Compute Capability 7.5
I0000 00:00:1730799186.622562     164 device_compiler.h:188] Compiled cluster using XLA!  This line is logged at most once for the lifetime of the process.


105/105 - 20s - 189ms/step - accuracy: 0.8969 - loss: 0.3403 - val_accuracy: 0.1126 - val_loss: 10.4263 - learning_rate: 0.0010
Epoch 2/20
105/105 - 3s - 29ms/step - accuracy: 0.9736 - loss: 0.0828 - val_accuracy: 0.1126 - val_loss: 9.8513 - learning_rate: 0.0010
Epoch 3/20
105/105 - 3s - 29ms/step - accuracy: 0.9819 - loss: 0.0566 - val_accuracy: 0.3079 - val_loss: 2.4023 - learning_rate: 0.0010
Epoch 4/20
105/105 - 3s - 29ms/step - accuracy: 0.9852 - loss: 0.0458 - val_accuracy: 0.8357 - val_loss: 0.5166 - learning_rate: 0.0010
Epoch 5/20
105/105 - 3s - 29ms/step - accuracy: 0.9876 - loss: 0.0400 - val_accuracy: 0.9734 - val_loss: 0.0775 - learning_rate: 0.0010
Epoch 6/20
105/105 - 3s - 29ms/step - accuracy: 0.9890 - loss: 0.0323 - val_accuracy: 0.9875 - val_loss: 0.0380 - learning_rate: 0.0010
Epoch 7/20
105/105 - 3s - 29ms/step - accuracy: 0.9915 - loss: 0.0268 - val_accuracy: 0.9859 - val_loss: 0.0485 - learning_rate: 0.0010
Epoch 8/20
105/105 - 3s - 29ms/step - accuracy: 0.9919 -

In [11]:
# Evaluate on validation set
val_loss, val_accuracy = final_model.evaluate(X_val, y_val, verbose=0)
print(f"Final model validation accuracy: {val_accuracy}")


Final model validation accuracy: 0.995119035243988


In [12]:
test_data = pd.read_csv('/kaggle/input/digit-recognizer/test.csv')


In [20]:
test_data.shape

(28000, 784)

In [13]:
sample_sub = pd.read_csv('/kaggle/input/digit-recognizer/sample_submission.csv')


In [14]:
sample_sub.head()

Unnamed: 0,ImageId,Label
0,1,0
1,2,0
2,3,0
3,4,0
4,5,0


In [21]:
# Load the test data and preprocess it
X_test = test_data.values.reshape(-1, 28, 28, 1) / 255.0  # Normalize and reshape


In [22]:
# Predict using the trained model
predictions = final_model.predict(X_test)
predicted_labels = np.argmax(predictions, axis=1)

[1m875/875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 2ms/step


In [23]:
# Prepare the output DataFrame in the required format
output = pd.DataFrame({
    'ImageId': np.arange(1, len(predicted_labels) + 1),
    'Label': predicted_labels
})

In [24]:
# Save to CSV
output.to_csv('submission.csv', index=False)

# Display first few rows of the submission file
output.head()


Unnamed: 0,ImageId,Label
0,1,2
1,2,0
2,3,9
3,4,9
4,5,3
