## Bank note authentication

Problem statement: Data has image specimens of genuine and forged banknotes. For digitization, an industrial camera usually used for print inspection was used. The final images have 400x 400 pixels. Due to the object lens and distance to the investigated object gray-scale pictures with a resolution of about 660 dpi were gained. Wavelet Transform tool were used to extract features from images.

Dataset can be used for Binary Classification sample problems

In [3]:
#Dataset: https://www.kaggle.com/datasets/ritesaluja/bank-note-authentication-uci-data

import pandas as pd
import numpy as np
import os

In [2]:
data_dir = "./data/"

In [4]:
df = pd.read_csv(os.path.join(data_dir,'BankNote_Authentication.csv'))

In [7]:
df

Unnamed: 0,variance,skewness,curtosis,entropy,class
0,3.62160,8.66610,-2.8073,-0.44699,0
1,4.54590,8.16740,-2.4586,-1.46210,0
2,3.86600,-2.63830,1.9242,0.10645,0
3,3.45660,9.52280,-4.0112,-3.59440,0
4,0.32924,-4.45520,4.5718,-0.98880,0
...,...,...,...,...,...
1367,0.40614,1.34920,-1.4501,-0.55949,1
1368,-1.38870,-4.87730,6.4774,0.34179,1
1369,-3.75030,-13.45860,17.5932,-2.77710,1
1370,-3.56370,-8.38270,12.3930,-1.28230,1


In [8]:
# Independent and dependent features

X = df.iloc[:,:-1]
y = df.iloc[:,-1]

In [9]:
X.head()

Unnamed: 0,variance,skewness,curtosis,entropy
0,3.6216,8.6661,-2.8073,-0.44699
1,4.5459,8.1674,-2.4586,-1.4621
2,3.866,-2.6383,1.9242,0.10645
3,3.4566,9.5228,-4.0112,-3.5944
4,0.32924,-4.4552,4.5718,-0.9888


In [10]:
y.head()

0    0
1    0
2    0
3    0
4    0
Name: class, dtype: int64

In [11]:
# Train test split
from sklearn.model_selection import train_test_split

In [20]:
X_train,X_test, y_train,y_test = train_test_split(X,y,test_size = 0.3 , random_state=0)

In [21]:
print(f"X_train: {X_train.shape}\nX_test: {X_test.shape}\ny_train: {y_train.shape}\ny_test:{y_test.shape}")

X_train: (960, 4)
X_test: (412, 4)
y_train: (960,)
y_test:(412,)


In [28]:
y_test

1023    1
642     0
1196    1
31      0
253     0
       ..
986     1
271     0
1266    1
769     1
634     0
Name: class, Length: 412, dtype: int64

In [22]:
# Implementing Random Forest classifier
from sklearn.ensemble import RandomForestClassifier
rfclassifier = RandomForestClassifier()
rfclassifier.fit(X_train,y_train)

RandomForestClassifier()

In [24]:
RandomForestClassifier??

In [26]:
# Prediction:

y_pred = rfclassifier.predict(X_test)

In [31]:
y_pred_df = pd.DataFrame(list(y_pred), index= y_test.index)

In [33]:
y_pred_df.columns = ['y_pred']

In [35]:
df_y = pd.merge(left=y_test, right = y_pred_df, left_index=True, right_index=True)

In [39]:
df_y.shape

(412, 2)

In [38]:
df_y[df_y['class']== df_y['y_pred']]

Unnamed: 0,class,y_pred
1023,1,1
642,0,0
1196,1,1
31,0,0
253,0,0
...,...,...
986,1,1
271,0,0
1266,1,1
769,1,1


In [42]:
# From the above observation, we can see that the model predicted 408 classes correctly out of 412 classes. 
#The performance thus in simple calculations adss up to:
perf = (df_y[df_y['class']== df_y['y_pred']].shape[0]/ df_y.shape[0])*100
print(f"Performance: {round(perf,2)}%")

Performance: 99.03%


In [43]:
# Checking accuracy using sklearn
from sklearn.metrics import accuracy_score
score = accuracy_score(y_test,y_pred)

In [46]:
round(score*100, 2)

99.03

In [48]:
# Create a pickle file usin serialization

import pickle
picle_out = open(f"{data_dir}rfclassifier.pkl","wb")
pickle.dump(rfclassifier,picle_out)
picle_out.close()