# Classification of Bottle Openers, Can Openers and Corc Screws by Means of AI Methods

This assignment aims to provide multiple methods for classifying bottle openers, can openers and corc screws from images. The results of the different methods are to be analyzed and compared.

## 1. Data Aquisition and Augmentation

To train the methods for classification from images, first some training data has to be acquired. Many images have been provided in the course. They have been sorted to get only the images suitable for training. Additionally xxxxxxx images have been taken.

Data augmentation is useful to get more data. The following kinds of augmentation are applied:
* asödkfj
* öaskdjf

They are perfomed multiple times with different parameters. The results are xxxxxx images total in comparison to xxxxx original images before the augmentation. The images are saved as numpy arrays in .npy files. RELATIVE DIRECTORY???

## 2. Feature Extraction

The following features have been extracted from the original image dataset (without the augmented images).
* The outer contour's aspect ratio
* Number of corners detected via Harris Corner Detection
* Number of corners detected via Shi-Tomasi Corner Detection
* The outer contour's perimeter-area ratio

The extracted features are saved in "/data/features.csv".

## 3. Apply AI Methods

A total of five AI methods is to be applied. Three of them are self-implemented from scratch: Naive Bayes Classifier, Decision Tree and Random Forest. Additionally a Convolutional Neural Network (CNN) and a CNN with Transfer Learning using tensorflow and keras have been implemented as well as an additional method which has not been reviewed during the lecture: Support Vector Machines (SVM). 

In [9]:
# Some variables required by multiple methods
# Dictionary to get data by name
data = dict()

# Dictionary to get predictions by method name
predictions = dict()

### 3.1 Import Data

#### Required Imports

In [10]:
import numpy as np
import pandas as pd
import cv2
import os

import HelperFunctions
import DataAugmentation

#### Import Images

In [11]:
# Import original images and save as numpy array
original = np.load("data/images/augmented/original.npy", allow_pickle=True)

X_original = []
y_original = []

for idx, d in enumerate(original):
    X_original.extend(d)
    for e in d:
        y_original.append([idx])
        
data["Original Images"] = np.array(X_original)
data["Original Labels"] = np.array(y_original)

In [12]:
image_shape = data["Original Images"].shape
IMG_SIZE = image_shape[1:3]
IMG_SHAPE = image_shape[1:4]

In [13]:
testpaths, classes = HelperFunctions.load_images(os.path.join("data", "images", "test"))

X_tests = []
y_tests = []

for path in testpaths:
    img = cv2.imread(path, cv2.COLOR_BGR2RGB)
    img = DataAugmentation.resizeAndPad(img, IMG_SIZE)
    X_tests.append(img)
    y_tests.append(classes[path.split(os.sep)[-2]])
    
data["Test Images"] = np.array(X_tests)
data["Test Labels"] = np.array(y_tests)

In [14]:
# Import augmented images and save as numpy array
augmented = np.load("data/images/augmented/augmentation.npy", allow_pickle=True)

X_augmented = []
y_augmented = []

for idx, d in enumerate(augmented):
    X_augmented.extend(d)
    for e in d:
        y_augmented.append([idx])
        
data["Augmented Images"] = np.array(X_augmented)
data["Augmented Labels"] = np.array(y_augmented)

# User Feedback: shapes of image arrays
print(f"Original images: {data['Original Images'].shape}, original labels: {data['Original Labels'].shape}.")
print(f"Augmented images: {data['Augmented Images'].shape}, augmented labels: {data['Augmented Labels'].shape}.")
print(f"Test images: {data['Test Images'].shape}, test labels: {data['Test Labels'].shape}.")

Original images: (5243, 32, 32, 3), original labels: (5243, 1).
Augmented images: (795438, 32, 32, 3), augmented labels: (795438, 1).
Test images: (19, 32, 32, 3), test labels: (19,).


#### Import Features

In [15]:
# Read the features CSV file
features = pd.read_csv(r"data\features.csv", sep=';', header=None)

# Drop unnecessary data and put classification column at the end
features = features.iloc[1:, 3:]
features = features.reindex(columns=[4, 5, 6, 7, 3])

# Remove the multitool class
features = features.apply(pd.to_numeric, errors="coerce")
features = features[features.iloc[:, -1] < 3]

X_features = features.iloc[:, :-1]
y_features = features.iloc[:, -1]

data["DataFrame Features"] = X_features
data["DataFrame Classes"] = y_features

### 3.2 Naive Bayes Classifier

#### Required Imports

In [16]:
import os
from random import randrange
import pandas as pd

from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB

import modules.naive_bayes as nb

#### Prepare Data

In [17]:
X_train, X_test, y_train, y_test = train_test_split(data["DataFrame Features"], 
                                                    data["DataFrame Classes"], 
                                                    test_size=0.3, 
                                                    random_state=2)

train = pd.concat([X_train, y_train], axis = 1)

#### Predict With Self-Implemented Method

In [18]:
y_pred = nb.predict(X_test, train)
predictions["Naive Bayes"] = (y_test, y_pred)

#### Predict With sklearn Method

In [19]:
bayes_clf = GaussianNB()
bayes_clf.fit(X_train, y_train)

y_pred = bayes_clf.predict(X_test)
predictions["Naive Bayes sklearn"] = (y_test, y_pred)

### 3.2 Decision Tree Classifier

#### Required Imports

In [20]:
import os
import pandas as pd
import numpy as np

from sklearn.tree import DecisionTreeClassifier

import DecisionTree as dt

#### Prepare Data

In [21]:
X_train, X_test, y_train, y_test = train_test_split(data["DataFrame Features"], 
                                                    data["DataFrame Classes"], 
                                                    test_size=0.2, 
                                                    random_state=42)

train = pd.concat([X_train, y_train], axis=1).to_numpy(dtype='float32')
test = pd.concat([X_test, y_test], axis=1).to_numpy(dtype='float32')

#### Predict With Self-Implemented Method

In [22]:
tree = dt.build_tree(train, 8, 1)
y_pred = [dt.predict(tree, row) for row in test]
predictions["Decision Tree"] = (y_test, y_pred)

#### Predict With sklearn Method

In [23]:
tree_clf = DecisionTreeClassifier()
tree_clf.fit(X_train, y_train)
y_pred = tree_clf.predict(X_test)
predictions["Decision Tree sklearn"] = (y_test, y_pred)

### 3.3 Random Forest Classifier

#### Required Imports

In [24]:
import os
import pandas as pd
import numpy as np

import DecisionTree as dt
import RandomForest as rf

from sklearn.ensemble import RandomForestClassifier

#### Prepare Data

In [25]:
X_train, X_test, y_train, y_test = train_test_split(data["DataFrame Features"], 
                                                    data["DataFrame Classes"], 
                                                    test_size=0.2, 
                                                    random_state=42)

train = pd.concat([X_train, y_train], axis=1).to_numpy(dtype='float32')
test = pd.concat([X_test, y_test], axis=1).to_numpy(dtype='float32')

#### Predict With Self-Implemented Method

In [26]:
max_depths = [3, 6, 9, 12, 15, 18, 21, 24, 27, 30] # nochmal nachschauen
min_sizes = [3 for i in range(len(max_depths))]
forest = rf.build_forest(train, max_depths, min_sizes)

In [27]:
y_pred = [rf.predict(forest, row) for row in test]
predictions["Random Forest"] = (y_test, y_pred)

#### Predict With sklearn Method

In [28]:
forest_clf = RandomForestClassifier(max_depth=20, random_state=0)
forest_clf.fit(X_train, y_train)
y_pred = forest_clf.predict(X_test)
predictions["Random Forest sklearn"] = (y_test, y_pred)

### 3.4 Convolutional Neural Network

#### Required Imports

In [29]:
from tensorflow.keras import models

#### Predict With Previously Self-Trained Network

In [33]:
# Load previously trained model
model = models.load_model('models/v4')
y_pred = np.argmax(model.predict(data["Test Images"]), axis=1)
predictions["CNN"] = (data["Test Labels"], y_pred)

### 3.5 Transfer Learning With a CNN

#### Required Imports

In [31]:
from tensorflow.keras import models

#### Predict With Previously Adapted and Self-Trained Transfer Learning Network

In [None]:
model = models.load_model('models/tl_v1')
y_pred = np.argmax(model.predict(data["Test Images"]), axis=1)
predictions["CNN Transfer Learning"] = (data["Test Labels"], y_pred)

### 3.6 Support Vector Machine

#### Required Imports

In [39]:
from sklearn.svm import SVC

In [41]:
model = SVC(verbose=1, C=1000, gamma=0.002)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
predictions["Support Vector Machine"] = (y_test, y_pred)

[LibSVM]

## 4. Compare Some Scores

In [34]:
import HelperFunctions

In [42]:
for method in predictions:
    print("-----------------------------------------------------------------------------")
    print(f"{method}: ")
    print(HelperFunctions.get_scores(predictions[method][0], predictions[method][1]))

-----------------------------------------------------------------------------
Naive Bayes: 
Accuracy:	0.545
Precision:	0.4614444444444444
Loss:		-1
Recall:		0.545
F1-Score:	0.44140889225209645
-----------------------------------------------------------------------------
Naive Bayes sklearn: 
Accuracy:	0.545
Precision:	0.4614444444444444
Loss:		-1
Recall:		0.545
F1-Score:	0.44140889225209645
-----------------------------------------------------------------------------
Decision Tree: 
Accuracy:	0.47368421052631576
Precision:	0.5150636913804424
Loss:		-1
Recall:		0.47368421052631576
F1-Score:	0.4880154053086384
-----------------------------------------------------------------------------
Decision Tree sklearn: 
Accuracy:	0.42857142857142855
Precision:	0.5232350718065004
Loss:		-1
Recall:		0.42857142857142855
F1-Score:	0.46024563671622487
-----------------------------------------------------------------------------
Random Forest: 
Accuracy:	0.46616541353383456
Precision:	0.511310777304376


## 5. Presentation

#### Required Imports

In [36]:
import HelperFunctions
import modules.feature_extraction as fe


%load_ext autoreload
%autoreload 2

#### Load the Data Provided by the Examiner

In [37]:
testpaths, classes = HelperFunctions.load_images(os.path.join("data", "images", "presentation"))

X_presentation = []
y_presentation = []

for path in testpaths:
    img = cv2.imread(path, cv2.COLOR_BGR2RGB)
    img = DataAugmentation.resizeAndPad(img, IMG_SIZE)
    X_presentation.append(img)
    y_presentation.append(classes[path.split(os.sep)[-2]])
    
data["Presentation Images"] = np.array(X_tests)
data["Presentation Labels"] = np.array(y_tests)

print(f"Presentation images: {data['Presentation Images'].shape}, presentation labels: {data['Presentation Labels'].shape}.")

features = fe.extract_features(testpaths, classes, display_imgs=True)#, show_all=True)
features.info()

UnboundLocalError: local variable 'classes' referenced before assignment