Skip to content

Python script that tackles the problem of Duplicate Product Detection!

Notifications You must be signed in to change notification settings

viratc/duplicateProductDetection

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 

Repository files navigation

duplicateProductDetection

Python script that tackles the problem of Duplicate Product Detection!

Getting Started

Prerequisites

Python 3.6, numpy 1.13.3, pandas 0.23.0, Spacy 2.0.11

Dataset

The Problem Statement & the dataset can be referred here

File Description

* duplicateDetection.ipynb: This file contains the entire script that tackles the problem of Duplicate Product Detection 
                            on the given dataset.
                            
* duplicateProducts.json: This JSON file contains the Dictionary with a Product ID as key and list of tuples
                          of Duplicate Product ID as value. This is the generated after running the script on 50 products.

NOTE: Although I ran the script only on 50 products, it does scale to cover all the Products in the given dataset.

About

Python script that tackles the problem of Duplicate Product Detection!

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published