# Cat vs Dog Image Classification using K-Nearest Neighbors (KNN)

Welcome! My name is Yasin Pourraisi.  
In this project, I will build a machine learning model to classify images of cats and dogs using the K-Nearest Neighbors (KNN) algorithm. The goal is to explore how KNN can be applied to computer vision tasks and evaluate its performance on distinguishing between cat and dog images.

**Connect with me:**  
- GitHub: [yasinpurraisi](https://github.com/yasinpurraisi)  
- Email: yasinpourraisi@gmail.com  
- Telegram: @yasinprsy


### Import Required Libraries

Let's start by importing all the necessary libraries for data processing, visualization, and building our KNN model.

In [1]:
import numpy as np
import cv2
from sklearn.model_selection import train_test_split
import json

### Load Images and Labels

In this section, I load the image data and their corresponding labels from the dataset. The image file names and labels are stored in _annotation.json file, which you can [download here](https://cv-studio-accessible.s3.us-south.cloud-object-storage.appdomain.cloud/cats_dogs_images_.zip). For each image, we convert it to grayscale and resize it to 30 by 30 pixels to simplifies the algorithm and reduces computational requirements.
Then i flatten the image and turn it to 1D array.
and i append 0 for cat and 1 for dog to labels.

In [2]:
train_images =[]
train_labels = []
images_dir = "cats_dogs_images/" 
with open("cats_dogs_images/_annotations.json","r") as d:
    data = json.load(d)
    for key,value in data['annotations'].items():
        image = cv2.imread(images_dir+key)
        image = cv2.cvtColor(image,cv2.COLOR_BGR2GRAY)
        image = cv2.resize(image,(30,30))
        image_pixel = image.flatten()
        label = 0 if value[0]['label'] == "cat" else 1

        train_images.append(image_pixel)
        train_labels.append(label)


now i change <code>train_images</code> to a numpy array with <code>float32</code> data type becasue KNN expects input to be in <code>float32</code> format, and i convert <code>train_labels</code> to a numpy array with <code>integers</code> and reshape <code>train_labels</code> to a 2D column vector because OpenCV’s KNN expects the labels to be in this shape

In [3]:
train_images = np.array(train_images).astype('float32')
train_labels = np.array(train_labels)
train_labels = train_labels.astype(int)
train_labels = train_labels.reshape((train_labels.size,1))

### Split Data

To evaluate the performance of my KNN classifier, i split the dataset into two parts: a training set and a test set. I use the `train_test_split` function from scikit-learn to randomly divide the data, reserving 80% for training and 20% for testing. Setting a `random_state` guarantees reproducibility of the split. This approach helps prevent overfitting.

In [None]:
test_size = 0.2
train_samples, test_samples, train_labels, test_labels = train_test_split(
    train_images, train_labels, test_size=test_size, random_state=0)

### Train model

To train the KNN model, I will use the <code>cv2.ml.KNearest_create()</code> from the <code>OpenCV</code> library. We need to define how many nearest neighbors will be used for classification as a hyper-parameter k. 
<code>k</code> refers to the number of nearest neighbours to include in the majority of the voting process
i will try multiple values of <code>k</code> to find the optimal value for the dataset.

In [5]:
knn = cv2.ml.KNearest_create()
knn.train(train_samples, cv2.ml.ROW_SAMPLE, train_labels)

k_values = [1, 2, 3, 4, 5]
k_result = []
for k in k_values:
    ret,result,neighbours,dist = knn.findNearest(test_samples,k=k)
    k_result.append(result)

#convert k_results into a simple flat list
flattened_results = []
for res in k_result:
    flat_result = [item for sublist in res for item in sublist]
    flattened_results.append(flat_result)


geting the accuracy for each value of <code>k</code>


In [7]:

accuracy_res = []

for k_res in k_result:
    matches = k_res==test_labels
    correct = np.count_nonzero(matches)
    accuracy = correct*100.0/result.size
    accuracy_res.append(accuracy)

res_accuracy = {k_values[i]: accuracy_res[i] for i in range(len(k_values))}
list_res = sorted(res_accuracy.items())
k_best = max(list_res,key=lambda item:item[1])[0]
list_res

[(1, 50.0), (2, 52.5), (3, 45.0), (4, 47.5), (5, 60.0)]

now that we know the best value for <code>k</code>,i finilize the model

In [50]:
ret, final_result, neighbours, dist = knn.findNearest(test_samples, k=k_best)
knn.save('knn_best_model.yml')

## Conclusion

In this project, i successfully built and trained a K-Nearest Neighbors (KNN) model to classify cat and dog images using OpenCV. By experimenting with different values of `k`, we identified the optimal number of neighbors for our dataset and evaluated the model's performance.
This workflow demonstrates how KNN can be applied to computer vision tasks and provides a foundation for further exploration and improvement. Thank you for following along, and feel free to experiment with your own images and datasets!
