<a href="https://colab.research.google.com/github/udupa-varun/pyimagesearch_uni/blob/main/deep_learning/101/intro_linear_classification.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
!pip install --upgrade --force-reinstall --no-deps kaggle

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting kaggle
  Downloading kaggle-1.5.12.tar.gz (58 kB)
[K     |████████████████████████████████| 58 kB 2.8 MB/s 
[?25hBuilding wheels for collected packages: kaggle
  Building wheel for kaggle (setup.py) ... [?25l[?25hdone
  Created wheel for kaggle: filename=kaggle-1.5.12-py3-none-any.whl size=73051 sha256=b55101e35e34f4e26de1462226de6bd638b1e1094defcf30834bf490569b86ea
  Stored in directory: /root/.cache/pip/wheels/62/d6/58/5853130f941e75b2177d281eb7e44b4a98ed46dd155f556dc5
Successfully built kaggle
Installing collected packages: kaggle
  Attempting uninstall: kaggle
    Found existing installation: kaggle 1.5.12
    Uninstalling kaggle-1.5.12:
      Successfully uninstalled kaggle-1.5.12
Successfully installed kaggle-1.5.12


In [2]:
!wget https://pyimagesearch-code-downloads.s3-us-west-2.amazonaws.com/intro-linear-classification/intro-linear-classification.zip
!unzip -qq intro-linear-classification.zip
%cd intro-linear-classification

--2022-06-07 14:01:36--  https://pyimagesearch-code-downloads.s3-us-west-2.amazonaws.com/intro-linear-classification/intro-linear-classification.zip
Resolving pyimagesearch-code-downloads.s3-us-west-2.amazonaws.com (pyimagesearch-code-downloads.s3-us-west-2.amazonaws.com)... 52.218.209.217
Connecting to pyimagesearch-code-downloads.s3-us-west-2.amazonaws.com (pyimagesearch-code-downloads.s3-us-west-2.amazonaws.com)|52.218.209.217|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1956 (1.9K) [binary/octet-stream]
Saving to: ‘intro-linear-classification.zip’


2022-06-07 14:01:37 (122 MB/s) - ‘intro-linear-classification.zip’ saved [1956/1956]

/content/intro-linear-classification


### Using Kaggle API to download dataset into Colab environment

In [11]:
from google.colab import drive
drive.mount("/content/gdrive", force_remount=True)

Mounted at /content/gdrive


In [13]:
!mkdir ~/.kaggle
!cp "/content/gdrive/MyDrive/Colab Notebooks/kaggle.json" ~/.kaggle
!chmod 600 ~/.kaggle/kaggle.json

mkdir: cannot create directory ‘/root/.kaggle’: File exists


In [20]:
!kaggle competitions download -c dogs-vs-cats

Downloading dogs-vs-cats.zip to /content/intro-linear-classification
 97% 790M/812M [00:16<00:00, 79.7MB/s]
100% 812M/812M [00:16<00:00, 50.9MB/s]


In [29]:
!mkdir kaggle_dogs_vs_cats
!unzip --qq dogs-vs-cats.zip
!unzip --qq train.zip -d kaggle_dogs_vs_cats

In [23]:
# import the necessary packages
import argparse
import os

import cv2
import imutils
import numpy as np
from imutils import paths
from sklearn.preprocessing import LabelEncoder
from sklearn.svm import LinearSVC
from sklearn.metrics import classification_report
from sklearn.model_selection import train_test_split

In [24]:
def extract_color_histogram(image, bins=(8, 8, 8)):
    # extract a 3D color hist from HSV space
    hsv = cv2.cvtColor(image, cv2.COLOR_BGR2HSV)
    hist = cv2.calcHist([hsv], [0, 1, 2], None, bins, [0, 180, 0, 256, 0, 256])

    # normalize histogram
    if imutils.is_cv2():
        hist = cv2.normalize(hist)
    else:
        cv2.normalize(hist, hist)

    # return flattened hist as the feature vector
    return hist.flatten()

In [25]:
# construct the argument parse and parse the arguments
# ap = argparse.ArgumentParser()
# ap.add_argument("-d", "--dataset", required=True,
# 	help="path to input dataset")
# args = vars(ap.parse_args())

# since we are using Jupyter Notebooks we can replace our argument
# parsing code with *hard coded* arguments and values
args = {
	"dataset": "kaggle_dogs_vs_cats"
}

In [33]:
# grab the list of images that we'll be describing
print("[INFO] describing images...")
image_paths = list(paths.list_images(args["dataset"]))

# initialize the data matrix and labels list
data = []
labels = []

# loop over input images
for (i, image_path) in enumerate(image_paths):
    # load image and extract class label
    # assumes path format /path/to/dataset/{class}.{image_num}.jpg
    image = cv2.imread(image_path)
    label = image_path.split(os.path.sep)[-1].split(".")[0]

    # extract color histogram
    hist = extract_color_histogram(image)
    # update data matrix and labels list
    data.append(hist)
    labels.append(label)

    # update progress
    if (i > 0 and i % 1000 == 0) or ((i + 1) == len(image_paths)):
        print(f"[INFO] processed {i + 1}/{len(image_paths)}")

[INFO] describing images...
[INFO] processed 1001/25000
[INFO] processed 2001/25000
[INFO] processed 3001/25000
[INFO] processed 4001/25000
[INFO] processed 5001/25000
[INFO] processed 6001/25000
[INFO] processed 7001/25000
[INFO] processed 8001/25000
[INFO] processed 9001/25000
[INFO] processed 10001/25000
[INFO] processed 11001/25000
[INFO] processed 12001/25000
[INFO] processed 13001/25000
[INFO] processed 14001/25000
[INFO] processed 15001/25000
[INFO] processed 16001/25000
[INFO] processed 17001/25000
[INFO] processed 18001/25000
[INFO] processed 19001/25000
[INFO] processed 20001/25000
[INFO] processed 21001/25000
[INFO] processed 22001/25000
[INFO] processed 23001/25000
[INFO] processed 24001/25000
[INFO] processed 25000/25000


In [34]:
# encode labels, converting from strings to integers
le = LabelEncoder()
labels = le.fit_transform(labels)

# train/test split (75/25)
(train_data, test_data, train_labels, test_labels) = train_test_split(
    np.array(data), labels, test_size=0.25, random_state=42
)

# train linear regression classifier
print("[INFO] training Linear SVM classifier...")
model = LinearSVC()
model.fit(train_data, train_labels)

[INFO] training Linear SVM classifier...


LinearSVC()

In [35]:
# evaluate classifier
print("[INFO] evaluating classifier...")
predictions = model.predict(test_data)
print(classification_report(test_labels, predictions, target_names=le.classes_))

[INFO] evaluating classifier...
              precision    recall  f1-score   support

         cat       0.62      0.67      0.65      3142
         dog       0.64      0.58      0.61      3108

    accuracy                           0.63      6250
   macro avg       0.63      0.63      0.63      6250
weighted avg       0.63      0.63      0.63      6250



For a detailed walkthrough of the concepts and code, be sure to refer to the full tutorial, [*An intro to linear classification with Python*](https://www.pyimagesearch.com/2016/08/22/an-intro-to-linear-classification-with-python/) published on 2016-08-22.