Python script that tackles the problem of Duplicate Product Detection!
Python 3.6, numpy 1.13.3, pandas 0.23.0, Spacy 2.0.11
The Problem Statement & the dataset can be referred here
* duplicateDetection.ipynb: This file contains the entire script that tackles the problem of Duplicate Product Detection
on the given dataset.
* duplicateProducts.json: This JSON file contains the Dictionary with a Product ID as key and list of tuples
of Duplicate Product ID as value. This is the generated after running the script on 50 products.
NOTE: Although I ran the script only on 50 products, it does scale to cover all the Products in the given dataset.