In [None]:
from pykubegrader.tokens.validate_token import validate_token

validate_token("type the key provided by your instructor here", assignment="week9-quiz")

In [None]:
validate_token(assignment="week9-quiz")

# You must make sure to run all cells in sequence using shift + enter or you might encounter errors
from pykubegrader.initialize import initialize_assignment

responses = initialize_assignment(
    "ML_quiz_q", "week_9", "quiz", assignment_points=10.0, assignment_tag="week9-quiz"
)

# Initialize Otter
import otter

grader = otter.Notebook("ML_quiz_q.ipynb")

# ❓✍️ Training MNIST Handwritten Digits 🖋️

The MNIST dataset is a classic benchmark in machine learning, consisting of grayscale images of handwritten digits (0-9). This problem focuses on building a **classification model** using a **Support Vector Machine (SVM)** to recognize handwritten digits from the dataset. 🤖

## Key Steps in the Problem: 🗝️

1. **Load the MNIST Dataset** 📥  
   - The dataset is retrieved from OpenML and consists of **784 features (28×28 pixel images)** and **70,000 samples**. 📊
   - The labels represent digit classes from **0 to 9**. 🔢

2. **Data Preprocessing** 🔄  
   - Pixel values are normalized to **[0,1]** for better numerical stability. 📏
   - The dataset is split into a **training set (80%)** and a **test set (20%)**. 📚

3. **Model Selection: Support Vector Machine (SVM)** 🤔  
   - An **SVM classifier with stochastic gradient descent (SGDClassifier)** is used. 🏃‍♂️  
   - The loss function is set to `"log_loss"`, which enables logistic regression-like behavior. ⚙️

4. **Training the Model** 🏋️‍♂️  
   - The SVM model is trained using the processed dataset. 📈

5. **Model Evaluation** 📊  
   - Predictions are made on the test set. 🔍
   - Accuracy, classification report, and a confusion matrix are used to assess performance. 📑

6. **Visualization** 🖼️  
   - A confusion matrix is plotted to analyze model errors in classifying digits. 📉

This question tests the understanding of **machine learning pipelines**, **data preprocessing**, and **model evaluation** while applying SVM for handwritten digit recognition. Students should focus on **understanding how data is transformed and how performance metrics are interpreted**. 🎓

In [None]:
import numpy as np

# import necessary libraries
# from sklearn import datasets
# from sklearn.model_selection import train_test_split
# from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
# from sklearn.linear_model import SGDClassifier
...

# Set random seed for reproducibility
np.random.seed(42)

# Step 1: Load the MNIST dataset
# load the MNIST dataset, to the variable mnist using the datasets.fetch_openml function
# get version 1 of the MNIST dataset, by setting the option argument version=1
# set the as_frame argument to False, by setting the option as_frame=False
...

# Unpack the data by assigning the mnist.data to X and the mnist.target to y
# You want to ensure that the datatype is in np.uint8, do this by using the astype method of the object mnist.target, and setting the option to dtype=np.uint8
...

# Step 2: Preprocess the data
# Normalize pixel values to range [0,1], by dividing the X object by 255.0
...

# Split the data into training and testing sets, by using the train_test_split function, and setting the test_size to 0.2, and random_state to 42
...

# Step 3: Select a model (Support Vector Machine)
# Instantiate the SGDClassifier model, by setting the loss to 'log_loss', max_iter to 2000, tol to 1e-5, and random_state to 42
...

# Step 4: Train the model
# Fit the model to the training data, by using the fit method of the model object, and passing in the X_train and y_train data
...

# Step 5: Evaluate the model
# Predict the test data, by using the predict method of the model object, and passing in the X_test data
...

# Calculate the accuracy of the model, by using the accuracy_score function, and passing in the y_test and y_pred data
...

# Step 6: Display results
# this is optional, you can print the accuracy of the model, and the classification report
# Print the accuracy of the model
print(f"Model Accuracy: {accuracy:.4f}")
# Print the classification report
print("\nClassification Report:\n", classification_report(y_test, y_pred))

# create a plot of the confusion matrix
# create a figure and axis object, using the subplots function
# set the figsize to (8, 6), this is an optional parameter figsize
...

# plot the confusion matrix
# call the confusion_matrix function, and pass in the y_test and y_pred data
# on the ax object, call the imshow method, and pass in the confusion matrix, and set the cmap to 'Blues', and the interpolation to 'nearest'
...

# set the title of the plot to "Confusion Matrix" using the plt object
...

In [None]:
grader.check("MNIST-Handwritten-Digits")

## Submitting Assignment

Please run the following block of code using `shift + enter` to submit your assignment, you should see your score.

In [None]:
from pykubegrader.tokens.validate_token import validate_token

validate_token(assignment="week9-quiz")


from pykubegrader.submit.submit_assignment import submit_assignment

submit_assignment("week9-quiz", "ML_quiz_q")