Skip to content

tim5go/quora-question-pairs

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 

Repository files navigation

Quora Duplicated Question Challenge

Description

The Quora dataset consists of over 400,000 lines of potential question duplicate pairs. Each line contains IDs for each question in the pair, the full text for each question, and a binary value that indicates whether the line truly contains a duplicate pair. Here are a few sample lines of the dataset:

Alt text

Architecture

A deep-learning (LSTM) approach is being used here. First of all, a pre-trained GoogleNews word embeddings is used to generate question embeddings for the two questions, and then fed those question embeddings into a representation layer. Then concatenate the two vector representation outputs from the representation layers and fed the concatenated vector into a dense layer to produce the final classification outcome. Here's a graphical representation of this approach:

Alt text

About

Quora Duplicated Question Challenge (Kaggle Competition)

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages