# Signature Forgery detection

This assignment is to find ways we can identify whether an image is **real** or **forged**.

## MSE Approach

One way we can approach the problem is to find the **MSE** between 2 images. This will give us the similarity of the 2 images, the lower the MSE, the more similar it is. 

Before we proceed, as the dataset is already split into **train set** and **test set**, we do not have to further split them into train and test set. We will use the existing train and test set split in our detection. 

As the dataset is in the form of picture (.png), we have to find ways to convert them into statistical data so we can use the **MSE** approach. One of the ways is to convert the pictures into 2D array of RGB representing the pictures.



## Converting images to 2D array

We will just use existing library such as **OpenCV** to achieve this.

___Need to install OpenCV packages if encounter error___ `ImportError: No module named cv2`

In [1]:
# Import libraries
import cv2 
import numpy as np
from matplotlib import pyplot as plt
from PIL import Image


# Define folder structure as a constant
DATASET_FOLDER = "../Dataset"
TEST_SET_FOLDER = "/test"
TRAIN_SET_FOLDER = "/train"

# Define MSE function
def mse(imageA, imageB):
    # the 'Mean Squared Error' between the two images is the
    # sum of the squared difference between the two images;
    # NOTE: the two images must have the same dimension
    err = np.sum((imageA.astype("float") - imageB.astype("float")) ** 2)
    err /= float(imageA.shape[0] * imageA.shape[1])
    
    # return the MSE, the lower the error, the more "similar"
    # the two images are
    return err

# Define image dimension check and resize if not the same
def image_resize(imageA, imageB):
    if not (imageA.shape == imageB.shape):
        # Scale the smaller dimension pictures
        if (imageA.shape < imageB.shape):
            dim = (imageB.shape[1], imageB.shape[0])
            imageA = cv2.resize(imageA, dim)
        elif (imageB.shape < imageA.shape):
            dim = (imageA.shape[1], imageA.shape[0])
            imageB = cv2.resize(imageB, dim)
        
    return imageA, imageB


img_001_real = cv2.imread(DATASET_FOLDER + TRAIN_SET_FOLDER + '/real/001/001_01.png', 0)
img_001_forged = cv2.imread(DATASET_FOLDER + TRAIN_SET_FOLDER + '/forged/001_forg/0119001_01.png',0)

# Resize the images if they are not the same
img_001_real , img_001_forged=  image_resize(img_001_real,img_001_forged)

# Calculate MSE of signature 1 and its forged
MSE = mse(img_001_real,img_001_forged)
print("The MSE of img_001_real and img_001_forge is:\t", MSE)


img_002_real = cv2.imread(DATASET_FOLDER + TRAIN_SET_FOLDER + '/real/002/002_02.png',0)
img_001_real , img_002_real=  image_resize(img_001_real,img_002_real)

# Calculate MSE of signature 1 and signature 2
MSE = mse(img_001_real,img_002_real)
print("The MSE of img_001_real and img_002_real is:\t", MSE)

# Calculate MSE of signature 1 and itself
MSE = mse(img_001_real,img_001_real)
print("The MSE of img_001_real and itself is:\t", MSE)

The MSE of img_001_real and img_001_forge is:	 900.9136797617904
The MSE of img_001_real and img_002_real is:	 1183.9209835750648
The MSE of img_001_real and itself is:	 0.0


As shown above, we can see that the MSE of real and forged is below **1000** whereas MSE of image1 and image2 has a MSE of above **1000**. Additionally, MSE of **0** implies the images are identical.

Using MSE as an approach to identify forgery is not an ideal solution as MSE **below 1000** should be considered as *similar/genuine* but in our case forged signature is considered genuine - hence a **false positive** result.

We have to find another approach of finding similarity of images, which we have taken **SSIM** as another approach.

## SSIM Approach

**SSIM**, stands for  is another method to find similarity between 2 images. It works somewhat similar to MSE but it is more in depth in which it consider other areas as well. 