-
Notifications
You must be signed in to change notification settings - Fork 0
/
handwriting_recognition.html
206 lines (186 loc) · 14.2 KB
/
handwriting_recognition.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
<!DOCTYPE HTML>
<!--
-->
<html>
<head>
<title>Xiao Tianyou Theo's Portfolio</title>
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1, user-scalable=no" />
<link rel="stylesheet" href="assets/css/main.css" />
<noscript><link rel="stylesheet" href="assets/css/noscript.css" /></noscript>
</head>
<body class="is-preload">
<!-- Wrapper -->
<div id="wrapper">
<!-- Header -->
<header id="header">
<a href="index.html" class="logo">Theo</a>
</header>
<!-- Nav -->
<nav id="nav">
<ul class="links">
<li><a href="index.html">Projects</a></li>
<li class="active"><a href="handwriting_recognition.html">Handwriting</a></li>
<li><a href="binary_search.html">Binary_Search</a></li>
<li><a href="investment.html">Investment</a></li>
<li><a href="mnist.html">MNIST</a></li>
<li><a href="passion_computing.html">Passion_Programming</a></li>
<li><a href="myself.html">Myself</a></li>
</ul>
<ul class="icons">
<li><a href="https://github.com/theo-xiao-sg" class="icon brands fa-github" target="_blank"><span class="label">GitHub</span></a></li>
</ul>
</nav>
<!-- Main -->
<div id="main">
<!-- Post -->
<section class="post">
<header class="major">
<h1>MOUSE-WRITTEN NUMBER RECOGNITION USING KNN, NEURAL NETWORK, CONVOLUTIONAL NEURAL NETWORK</h1>
<a href="https://github.com/theo-xiao-sg/handwriting_recognition" target="_blank" class="button">Source code</a>
<p> </p>
<p>Demonstration in a GIF format:</p>
<img src = "images/handwriting_recog_demo.gif">
</header>
<!-- Lists -->
<h2>Introduction</h2>
<p>In the ever-evolving realm of AI, remarkable progress and advancements have been made, which are exemplified by the emergence of OpenAI ChatGPT models in 2023. Motivated by the rapid growth of AI and fuelled by my curiosity, I started my first AI project, creating a number recognition system that can predict mouse-written digits. </p>
<p>Through my Python courses, I have acquired basic knowledge of K-Nearest Neighbours (KNN) and Neural Network (NN), which motivated me to dive deeper into the realm of deep learning. During this summer holiday, I took the challenge to self-learn how the cutting-edge Convolutional Neural Network (CNN) works and make a CNN model of my own. To find out which model was better, I conducted a comprehensive comparison of all three models in this project. This involved the meticulous process of coding and algorithmic exploration to ensure the reliability and precision of the model. </p>
<p>To make it fun and interactive, I developed an interactive tool using Pygame, allowing me to actively engage with my model using a mouse. I think this interactive framework of combining user interface and AI model can be extended to solve other real life image recognition problems in the future. </p>
<h2>Data</h2>
<p> In the initial stages, I chose to leverage my previous experience with the MNIST dataset (<a href="https://en.wikipedia.org/wiki/MNIST_database" style="color: blue;">link</a>). Although the MNIST dataset is commonly used to test AI models in handwriting recognition, many people found this dataset is too simple for most machine learning algorithms. Similar in my case, my KNN model can easily achieve 97.1% accuracy testing on the MNIST dataset, NN model can achieve 97.8%, and CNN can achieve 99.2%. The main reason of such a success is due to the huge data size of 70k images. </p>
<p>However, there is a problem. These AI models trained on MNIST handwritten images find it hard to recognize my mouse-written digits when doing real-life testing. It is not only hard to write nicely on-screen using mouse, but also more importantly, the mouse-written digits often have gaps in them when you write quickly. To solve this issue, I decided to use a mouse-written dataset consisting of 1447 images to train my model. Image samples are as below:</p>
<img src = "images/1447_data.jpg">
<p>However, the change to a more suitable dataset came at a cost. The dataset only had 1447 images; a huge disadvantage compared to the 70,000 images in MNIST dataset. Since the model had less data to train on, I expect my models would achieve lower accuracies than tested on the MNIST dataset. The less amount of training data, the less accuracy AI model can achieve and therefore more challenging to the task of classification. In real life scenarios, we often struggle with getting enough training data for the model to train on. </p>
<h2>Models</h2>
<p>I utilized three kinds of machine learning methods, namely K-Nearest Neighbours (KNN), Neural Network (NN), and Convolutional Neural Network (CNN), to analyse and compare their performance in recognizing mouse-written digits. </p>
<h3>KNN</h3>
<p>K-Nearest Neighbour is a simple algorithm, as it classifies new cases based on the classes of their close neighbours. To illustrate, I made a simple graph below to show how an unknown case is classified by KNN (with nearest neighbour of 3) using majority votes. </p>
<img src = "images/knn.png" style="width: 80%; height: auto;">
<h3>NN</h3>
<p>A neural network (NN) is like a team of neurons that work together to solve problems. They pass messages to each other to form connections in forward direction and make predictions (this is called forward propagation). Then, they learn from prediction errors and update neurons and connections backwards to improve the network (this is called back propagation). It helps computers recognize patterns in things like images or numbers. </p>
<p>To illustrate, I made a graph below to show a simplified illustration of how NN performs forward propagation and backpropagation. </p>
<img src="images/nn.png" style="width: 80%; height: auto;">
<h3>CNN</h3>
<p>Convolutional Neural Network (CNN) is the cutting-edge deep neural network that extracts and distinguishes specific features within pictures, such as low-level feature of edges and high-level feature of shapes. CNNs can figure out what objects or patterns are in images, which can even help self-driving cars recognize traffic signs! </p>
<p>To illustrate the idea of convolution, I made a graph below to show how convolution filter can extract the feature of vertical edges from an image of T-shirt in the Fashion-MNIST dataset. </p>
<img src="images/convolution.png" style="width: 80%; height: auto;">
<p>As illustrated as below, deep learning neural network, like CNN, is better than traditional neural networks in two ways: </p>
<ol>
<li>CNN can use convolutional filters to extract more meaningful features from images. </li>
<li>CNN can accommodate much more layers of neurons. </li>
</ol>
<img src="images/deep_nn.png" style="width: 80%; height: auto;">
<p>To ensure a fair test, I divided the dataset into a training set (70%) and a testing set (30%). Then, I measured the accuracy of the three abovementioned models using the same testing set. </p>
<h2>Results</h2>
<p>K-Nearest Neighbors (KNN) achieved the accuracy of 81.61 - 85.52% on the testing set using different numbers of neighbours. Comparing with testing on the MNIST dataset of 70k images, the KNN accuracy dropped from 97.1% to 85.5% using 1447 images. </p>
<style>
pre {
tab-size: 4;
}
</style>
<pre><code>model training for num_neighbors: 1 ...
num_neighbors: 1, accuracy: 85.52%
model training for num_neighbors: 3 ...
num_neighbors: 3, accuracy: 84.37%
model training for num_neighbors: 5 ...
num_neighbors: 5, accuracy: 83.22%
model training for num_neighbors: 7 ...
num_neighbors: 7, accuracy: 82.76%
model training for num_neighbors: 9 ...
num_neighbors: 9, accuracy: 81.61%
</code></pre>
<p>The Neural Network model attained the accuracy of 81.15% - 85.75% on the same testing set using one hidden layer and different numbers of neurons. I also tested multiple hidden layers but with very little improvement. Comparing with testing on the MNIST dataset, the accuracy of NN model dropped from 97.8% to 85.8% using 1447 images. </p>
<style>
pre {
tab-size: 4;
}
</style>
<pre><code>model training for number_of_neurons_i: 100 ...
number_of_neurons_i: 100, accuracy: 81.15%
model training for number_of_neurons_i: 300 ...
number_of_neurons_i: 300, accuracy: 83.68%
model training for number_of_neurons_i: 500 ...
number_of_neurons_i: 500, accuracy: 82.99%
model training for number_of_neurons_i: 700 ...
number_of_neurons_i: 700, accuracy: 85.52%
model training for number_of_neurons_i: 900 ...
number_of_neurons_i: 900, accuracy: 83.91%
model training for number_of_neurons_i: 1100 ...
number_of_neurons_i: 1100, accuracy: 85.06%
model training for number_of_neurons_i: 1300 ...
number_of_neurons_i: 1300, accuracy: 85.75%
model training for number_of_neurons_i: 1500 ...
number_of_neurons_i: 1500, accuracy: 83.91%
</code></pre>
<p>The CNN model significantly improved the accuracy on the same testing set. For examples, using CNN with two convolution layers, I can improve the accuracy to 90.34%. </p>
<style>
pre {
tab-size: 4;
}
</style>
<pre><code>On testing dataset:
loss: 0.4519, - accuracy: 0.9034
</code></pre>
<p>When using CNN with four convolutional layers, I can further achieve 98.62% accuracy without too much playing around with model parameters. Comparing with testing on the MNIST dataset, the accuracy of CNN model slightly dropped from 99.2% to 98.6% using 1447 images. It illustrates the power of deep neural network – CNN even in this challenging setup. </p>
<style>
pre {
tab-size: 4;
}
</style>
<pre><code>On testing dataset:
loss: 0.0680, - accuracy: 0.9862
</code></pre>
<p>The summary of my CNN model with four convolutional layers is shown as below: </p>
<img src="images/model_summary.png" style="width: 80%; height: auto;">
<p> </p>
<p>The learning progress is shown as below: </p>
<img src="images/learning_progress.png" style="width: 80%; height: auto;">
<p>Furthermore, I can still improve the model through some degree of hyperparameter tuning. For example, by adding more layers or changing the filter sizes in the convolutional layers, I reached an astounding 99.34% accuracy on the test set! That is 0.14% better than using MNIST dataset! Despite having way less data to train on, which to models is an immense disadvantage, it still achieves very high accuracy. This proves that although hyperparameter tuning might be a bit tedious, it still pays off at the end. </p>
<style>
pre {
tab-size: 4;
}
</style>
<pre><code>On testing dataset:
loss: 0.0360, - accuracy: 0.9931
</code></pre>
<h2>REAL LIFE TEST AND DISCUSSION</h2>
<p>After I compared the results, it became evident that traditional machine learning models, KNN and NN, achieved similar levels of accuracy on the testing dataset. With the help of convolution filters and deep neural network, CNN can achieve much better at image recognition. This explains why deep learning has achieved so much success in the recent years. </p>
<p>To assess their performance in real-life scenarios, I decided to go a step further. I deployed all the three models using a Pygame GUI and actively interacted with them, particularly testing them on my messy and challenging mouse-written samples. </p>
<p>Based on my live experience of playing with my tool, the CNN model exhibited superior capabilities in effectively handling these difficult mouse-written samples. The model proved to be more adept at accurately recognizing and interpreting the messy mouse-written numbers. To illustrate the model and Pygame GUI, I have prepared animations and videos on my website (see the bottom of this page) that demonstrate how the tool handles both normal and messy mouse writing numbers, providing visual illustrations of its functionality.</p>
<p>I also included a few static images of messy mouse-written numbers in this PDF file as below:</p>
<img src="images/messy_image_1.png" style="width: 50%; height: auto;">
<img src="images/messy_image_2.png" style="width: 50%; height: auto;">
<img src="images/messy_image_3.png" style="width: 50%; height: auto;">
<img src="images/messy_image_4.png" style="width: 50%; height: auto;">
<p> </p>
<p>To gain a better understanding of the models' performance, I have shared the codes and models (the .pkl files) on my GitHub repository (<a href="https://github.com/theo-xiao-sg/handwriting_recognition" style="color: blue;">source codes</a>). I invite you to explore and experiment with them.
You can find the readme file on how to deploy and use this tool here (<a href="https://github.com/theo-xiao-sg/handwriting_recognition#readme" style="color: blue;">readme</a>). I hope you find them as fascinating as I do. </p>
<p>To this end, these fascinating results stimulate me to explore more advanced deep learning algorithms and more challenging applications in real life. </p>
<header class="major">
<p>Demonstration in a GIF format:</p>
<img src = "images/handwriting_recog_demo.gif">
</header>
<p>
<br/>
</p>
<header class="major">
<p>Demonstration in a MP4 format:</p>
<video width="640" height="512" autoplay muted controls loop>
<source src="images/handwriting_recog_demo.mp4" type="video/mp4">
Your browser does not support the video tag.
</video>
</header>
</section>
</div>
<!-- Scripts -->
<script src="assets/js/jquery.min.js"></script>
<script src="assets/js/jquery.scrollex.min.js"></script>
<script src="assets/js/jquery.scrolly.min.js"></script>
<script src="assets/js/browser.min.js"></script>
<script src="assets/js/breakpoints.min.js"></script>
<script src="assets/js/util.js"></script>
<script src="assets/js/main.js"></script>
</body>
</html>