# Unit 1 Understanding the Problem of Drawing Recognition

Welcome to the first step in your journey to understanding drawing recognition using Convolutional Neural Networks (CNNs). In this lesson, we will explore the problem of drawing recognition, why it is important, and how CNNs can help solve it. This will set the stage for building and training your own CNN models in the upcoming lessons.

### Understanding the Final Goal

Before proceeding, it's crucial to understand the final goal of this course path.

By the end of this course path, you will have the skills and knowledge to create an AI model capable of recognizing user-drawn sketches from a user interface. This involves building and training a Convolutional Neural Network (CNN) that can accurately identify and classify hand-drawn images. You will learn how to preprocess data, design and implement CNN architectures, and evaluate model performance. This course will empower you to develop applications that can interpret and understand drawings, paving the way for innovative solutions in fields like digital art, education, and interactive design.

As for this course, we will focus on a simple CNN model that can recognize hand-drawn digits. Let's move on to the next section to understand the problem of drawing recognition in more detail.

### Understanding the Problem

Drawing recognition is a fascinating and challenging problem in the field of computer vision. It involves teaching machines to understand and interpret hand-drawn sketches, which can vary significantly in style, size, and quality. This task is not only about recognizing shapes but also about understanding the context and meaning behind the drawings. This is a complex problem because human drawings can be highly variable. For example, the same digit can be drawn in many different ways, and the quality of the drawing can vary from person to person. Additionally, drawings may contain noise or artifacts that can make recognition difficult. To tackle this problem, we will use Convolutional Neural Networks (CNNs), which are a type of deep learning model specifically designed for image processing tasks. CNNs are capable of automatically learning and extracting features from images, making them well-suited for recognizing patterns in hand-drawn sketches. CNNs work by applying a series of convolutional filters to the input images, allowing them to learn hierarchical representations of the data. This means that the model can learn to recognize simple shapes in the early layers and more complex patterns in the deeper layers. By training on a large dataset of labeled images, CNNs can achieve high accuracy in recognizing and classifying hand-drawn sketches.

### Why CNNs?

Convolutional Neural Networks (CNNs) are a powerful tool for image recognition tasks, including drawing recognition. They are designed to automatically learn and extract features from images, making them particularly effective for recognizing patterns in visual data. CNNs consist of multiple layers, including convolutional layers, pooling layers, and fully connected layers, which work together to process and analyze images.

### What You'll Learn

Drawing recognition is the process of identifying and classifying hand-drawn sketches. This is a crucial task in various applications, such as digitizing handwritten notes, enabling sketch-based search engines, and even in the development of AI-driven art tools. In this lesson, we will delve into the basics of this problem and introduce you to the MNIST dataset, a popular dataset for training image processing systems.

### Loading and Splitting the Dataset

The MNIST dataset consists of 70,000 images of handwritten digits, each labeled with the corresponding number. Here’s how you can load and preprocess a smaller sample of this dataset using Python:

```python
import tensorflow as tf
from tensorflow.keras.datasets import mnist
from tensorflow.keras.utils import to_categorical

# Load and preprocess a smaller sample of the MNIST dataset
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

TRAIN_SIZE = 3000
TEST_SIZE = 1000

train_images = train_images[:TRAIN_SIZE].reshape((TRAIN_SIZE, 28, 28, 1)).astype('float32') / 255
test_images = test_images[:TEST_SIZE].reshape((TEST_SIZE, 28, 28, 1)).astype('float32') / 255

# Convert integer labels to one-hot encoded vectors
train_labels = to_categorical(train_labels[:TRAIN_SIZE])
test_labels = to_categorical(test_labels[:TEST_SIZE])
```

This code snippet demonstrates how to load the MNIST dataset, select a subset for training and testing, reshape the images, and normalize the pixel values to be between 0 and 1—a common preprocessing step in image recognition tasks.

When working with the MNIST dataset, each image is originally a 28x28 grid of pixels. Reshaping the data to `(28, 28, 1)` ensures that each image has a consistent shape and includes a channel dimension, which is required by most deep learning frameworks for grayscale images. This allows the CNN to process the images correctly.

Normalization, which scales the pixel values to be between 0 and 1, is also important. Neural networks train more efficiently and converge faster when input values are within a small, consistent range. Normalization helps prevent issues with large gradients and makes the training process more stable and effective.

The `to_categorical` function is used to convert integer class labels into one-hot encoded vectors. In the context of the MNIST dataset, each label is originally an integer from 0 to 9, representing the digit in the image. One-hot encoding transforms these integer labels into binary vectors, where each vector has the same length as the number of classes (in this case, 10). In a one-hot encoded vector, all elements are 0 except for the index corresponding to the class label, which is set to 1. For example, the label `3` becomes `[0, 0, 0, 1, 0, 0, 0, 0, 0, 0]`. This format is commonly used in machine learning because it allows the model to output probabilities for each class and makes it easier to compute loss and accuracy during training.

### Exploring the Dataset

After loading the data, it’s important to explore its structure and contents. You can print out the shapes of the data arrays, check the unique labels, and examine the range of pixel values:

```python
# Explore the data and its structure
print("Training data shape:", train_images.shape)
print("Training labels shape:", train_labels.shape)
print("Test data shape:", test_images.shape)
print("Test labels shape:", test_labels.shape)

# Display the unique labels in the training and test datasets
print("Unique labels in training data:", set(train_labels.argmax(axis=1)))
print("Unique labels in test data:", set(test_labels.argmax(axis=1)))

# Display the range of pixel values in the training and test datasets
print("Range of pixel values in training data:", train_images.min(), "to", train_images.max())
print("Range of pixel values in test data:", test_images.min(), "to", test_images.max())
```

Example Output:

```
Training data shape: (3000, 28, 28, 1)
Training labels shape: (3000, 10)
Test data shape: (1000, 28, 28, 1)
Test labels shape: (1000, 10)
Unique labels in training data: {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}
Unique labels in test data: {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}
Range of pixel values in training data: 0.0 to 1.0
Range of pixel values in test data: 0.0 to 1.0
```

This output provides insights into the structure of the dataset, including the number of samples, the shape of the images, the unique labels present, and the normalization of pixel values.

### Visualizing the Dataset

Visualizing some of the images helps you understand what the data looks like and what your model will be learning from. Here’s a simple snippet to display a few sample images from the training set:

```python
import matplotlib.pyplot as plt

fig, axes = plt.subplots(1, 5, figsize=(10, 2))
for i, ax in enumerate(axes):
    ax.imshow(train_images[i].reshape(28, 28), cmap='gray')
    ax.axis('off')
    ax.set_title(f"Label: {train_labels[i].argmax()}")
plt.savefig('static/images/plot.png')
plt.close()
```

As a result, you'll see a grid of images with their corresponding labels, giving you a visual representation of the dataset.

### Why It Matters

Understanding the problem of drawing recognition is essential because it forms the foundation for many real-world applications. By learning how to recognize and classify drawings, you can contribute to advancements in technology that make our lives easier and more efficient. CNNs are particularly well-suited for this task due to their ability to automatically learn and extract features from images, making them a powerful tool in the field of computer vision.

Are you ready to dive deeper into the world of drawing recognition? Let's move on to the practice section and start exploring the MNIST dataset in more detail.

## Exploring MNIST Dataset Structure

Now that you understand the importance of drawing recognition and the MNIST dataset, let's get hands-on with the data. In this practice, you'll analyze the MNIST dataset structure to gain deeper insights into what you're working with.

After loading and preprocessing the dataset, you need to complete the code to examine its dimensions, label distribution, and pixel value ranges. This exploration is a critical first step before building any machine learning model.

Understanding these characteristics will help you confirm that the data has been properly loaded and preprocessed and will give you valuable context as you begin developing CNN models for digit recognition in the upcoming lessons.

```python
import tensorflow as tf
from tensorflow.keras import layers, models
from tensorflow.keras.datasets import mnist
from tensorflow.keras.utils import to_categorical

# Load and preprocess a smaller sample of the MNIST dataset
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()
TRAIN_SIZE = 3000
TEST_SIZE = 1000
train_images = train_images[:TRAIN_SIZE].reshape((TRAIN_SIZE, 28, 28, 1)).astype('float32') / 255
test_images = test_images[:TEST_SIZE].reshape((TEST_SIZE, 28, 28, 1)).astype('float32') / 255
train_labels = to_categorical(train_labels[:TRAIN_SIZE])
test_labels = to_categorical(test_labels[:TEST_SIZE])

# TODO: Display the shape of training and test data
print("Training data shape:", ________)
print("Training labels shape:", ________)
print("Test data shape:", ________)
print("Test labels shape:", ________)

# TODO: Display the unique labels in the training and test datasets
print("Unique labels in training data:", ________)
print("Unique labels in test data:", ________)

# TODO: Display the range of pixel values in the training and test datasets
print("Range of pixel values in training data:", ________, "to", ________)
print("Range of pixel values in test data:", ________, "to", ________)

```

```python
import tensorflow as tf
from tensorflow.keras import layers, models
from tensorflow.keras.datasets import mnist
from tensorflow.keras.utils import to_categorical

# Load and preprocess a smaller sample of the MNIST dataset
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()
TRAIN_SIZE = 3000
TEST_SIZE = 1000
train_images = train_images[:TRAIN_SIZE].reshape((TRAIN_SIZE, 28, 28, 1)).astype('float32') / 255
test_images = test_images[:TEST_SIZE].reshape((TEST_SIZE, 28, 28, 1)).astype('float32') / 255
train_labels = to_categorical(train_labels[:TRAIN_SIZE])
test_labels = to_categorical(test_labels[:TEST_SIZE])

# TODO: Display the shape of training and test data
print("Training data shape:", train_images.shape)
print("Training labels shape:", train_labels.shape)
print("Test data shape:", test_images.shape)
print("Test labels shape:", test_labels.shape)

# TODO: Display the unique labels in the training and test datasets
print("Unique labels in training data:", set(train_labels.argmax(axis=1)))
print("Unique labels in test data:", set(test_labels.argmax(axis=1)))

# TODO: Display the range of pixel values in the training and test datasets
print("Range of pixel values in training data:", train_images.min(), "to", train_images.max())
print("Range of pixel values in test data:", test_images.min(), "to", test_images.max())

```

## Visualizing MNIST Digits with Matplotlib

Great job exploring the MNIST dataset structure! Now, let's take your understanding to the next level by visualizing the actual digit images.

Visualization is a crucial step in working with image datasets, as it allows you to confirm what the data actually contains and verify that labels match the content. This step helps catch potential issues with the data and gives you better intuition about what your model will be learning.

In this practice, you'll add code to display sample images from the training dataset alongside their corresponding labels. You'll see firsthand how the handwritten digits appear after preprocessing and confirm that each displayed label correctly matches the drawn digit.


```python
import tensorflow as tf
from tensorflow.keras import layers, models
from tensorflow.keras.datasets import mnist
from tensorflow.keras.utils import to_categorical
import matplotlib.pyplot as plt

# Load and preprocess a smaller sample of the MNIST dataset
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()
TRAIN_SIZE = 3000
TEST_SIZE = 1000
train_images = train_images[:TRAIN_SIZE].reshape((TRAIN_SIZE, 28, 28, 1)).astype('float32') / 255
test_images = test_images[:TEST_SIZE].reshape((TEST_SIZE, 28, 28, 1)).astype('float32') / 255
train_labels = to_categorical(train_labels[:TRAIN_SIZE])
test_labels = to_categorical(test_labels[:TEST_SIZE])

# Explore the data and its structure
print("Training data shape:", train_images.shape)
print("Training labels shape:", train_labels.shape)
print("Test data shape:", test_images.shape)
print("Test labels shape:", test_labels.shape)

# Display the unique labels in the training and test datasets
print("Unique labels in training data:", set(train_labels.argmax(axis=1)))
print("Unique labels in test data:", set(test_labels.argmax(axis=1)))

# Display the range of pixel values in the training and test datasets
print("Range of pixel values in training data:", train_images.min(), "to", train_images.max())
print("Range of pixel values in test data:", test_images.min(), "to", test_images.max())

# TODO: Plot a few sample images from the training dataset
fig, axes = plt.subplots(1, 5, figsize=(10, 2))
for i, ax in enumerate(axes):
    # TODO: Display the image (remember to reshape it to 28x28 and use grayscale colormap)
    ax.imshow(________, cmap=________)
    # TODO: Turn off the axis
    ________
    # TODO: Set the title to show the corresponding label
    ax.set_title(f"Label: {________}")

# Save the plot to a file
plt.savefig('static/images/plot.png')
plt.close()

```

```python
import tensorflow as tf
from tensorflow.keras import layers, models
from tensorflow.keras.datasets import mnist
from tensorflow.keras.utils import to_categorical
import matplotlib.pyplot as plt

# Load and preprocess a smaller sample of the MNIST dataset
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()
TRAIN_SIZE = 3000
TEST_SIZE = 1000
train_images = train_images[:TRAIN_SIZE].reshape((TRAIN_SIZE, 28, 28, 1)).astype('float32') / 255
test_images = test_images[:TEST_SIZE].reshape((TEST_SIZE, 28, 28, 1)).astype('float32') / 255
train_labels = to_categorical(train_labels[:TRAIN_SIZE])
test_labels = to_categorical(test_labels[:TEST_SIZE])

# Explore the data and its structure
print("Training data shape:", train_images.shape)
print("Training labels shape:", train_labels.shape)
print("Test data shape:", test_images.shape)
print("Test labels shape:", test_labels.shape)

# Display the unique labels in the training and test datasets
print("Unique labels in training data:", set(train_labels.argmax(axis=1)))
print("Unique labels in test data:", set(test_labels.argmax(axis=1)))

# Display the range of pixel values in the training and test datasets
print("Range of pixel values in training data:", train_images.min(), "to", train_images.max())
print("Range of pixel values in test data:", test_images.min(), "to", test_images.max())

# TODO: Plot a few sample images from the training dataset
fig, axes = plt.subplots(1, 5, figsize=(10, 2))
for i, ax in enumerate(axes):
    # TODO: Display the image (remember to reshape it to 28x28 and use grayscale colormap)
    ax.imshow(train_images[i].reshape(28, 28), cmap='gray')
    # TODO: Turn off the axis
    ax.axis('off')
    # TODO: Set the title to show the corresponding label
    ax.set_title(f"Label: {train_labels[i].argmax()}")

# Save the plot to a file
plt.savefig('static/images/plot.png')
plt.close()
```

## Analyzing Digit Distribution in MNIST

Now that you've visualized individual digits, let's analyze the class distribution in our dataset. This is a critical step before building any machine learning model.

In this practice, you'll examine how frequently each digit (0-9) appears in the training dataset. A balanced dataset — with roughly equal numbers of each digit — typically leads to better model performance, while imbalanced data might require special handling techniques.

You'll count the occurrences of each digit, calculate their percentages, and create a visualization that clearly shows the distribution. This analysis will help you determine if any digits are underrepresented or overrepresented before you start building your CNN model.


```python
import tensorflow as tf
from tensorflow.keras import layers, models
from tensorflow.keras.datasets import mnist
from tensorflow.keras.utils import to_categorical
import matplotlib.pyplot as plt
import numpy as np

# Load and preprocess a smaller sample of the MNIST dataset
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()
TRAIN_SIZE = 3000
TEST_SIZE = 1000
train_images = train_images[:TRAIN_SIZE].reshape((TRAIN_SIZE, 28, 28, 1)).astype('float32') / 255
test_images = test_images[:TEST_SIZE].reshape((TEST_SIZE, 28, 28, 1)).astype('float32') / 255

# TODO: Store original labels before one-hot encoding for analysis
original_train_labels = ________

# One-hot encode the labels for later use
train_labels = to_categorical(train_labels[:TRAIN_SIZE])
test_labels = to_categorical(test_labels[:TEST_SIZE])

# Explore the data and its structure
print("Training data shape:", train_images.shape)
print("Training labels shape:", train_labels.shape)
print("Test data shape:", test_images.shape)
print("Test labels shape:", test_labels.shape)

# Display the unique labels in the training and test datasets
print("Unique labels in training data:", set(train_labels.argmax(axis=1)))
print("Unique labels in test data:", set(test_labels.argmax(axis=1)))

# Display the range of pixel values in the training and test datasets
print("Range of pixel values in training data:", train_images.min(), "to", train_images.max())
print("Range of pixel values in test data:", test_images.min(), "to", test_images.max())

# TODO: Count the frequency of each digit in the training set
digit_counts = ________

# TODO: Print the counts for each digit
print("\nDigit frequency in training set:")
for digit, count in enumerate(digit_counts):
    print(__________)

# TODO: Visualize the distribution with a bar chart
plt.figure(figsize=(10, 6))
plt.bar(__________, __________)
plt.xlabel('Digit')
plt.ylabel('Frequency')
plt.title('Distribution of Digits in MNIST Training Set')
plt.xticks(range(10))
plt.grid(axis='y', alpha=0.3)

# TODO: Calculate and display the percentage of each digit
total_samples = sum(digit_counts)
for i, count in enumerate(digit_counts):
    percentage = __________ 
    plt.text(i, count + 5, f"{percentage:.1f}%", ha='center')

# Save the plot to a file
plt.savefig('static/images/plot.png')
plt.close()

```