Quora Duplicated Question Challenge

Description

The Quora dataset consists of over 400,000 lines of potential question duplicate pairs. Each line contains IDs for each question in the pair, the full text for each question, and a binary value that indicates whether the line truly contains a duplicate pair. Here are a few sample lines of the dataset:

Architecture

A deep-learning (LSTM) approach is being used here. First of all, a pre-trained GoogleNews word embeddings is used to generate question embeddings for the two questions, and then fed those question embeddings into a representation layer. Then concatenate the two vector representation outputs from the representation layers and fed the concatenated vector into a dense layer to produce the final classification outcome. Here's a graphical representation of this approach:

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
img		img
LSTM.py		LSTM.py
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

img

img

LSTM.py

LSTM.py

README.md

README.md

Repository files navigation

Quora Duplicated Question Challenge

Description

Architecture

About

Releases

Packages

Languages

tim5go/quora-question-pairs

Folders and files

Latest commit

History

Repository files navigation

Quora Duplicated Question Challenge

Description

Architecture

About

Topics

Resources

Stars

Watchers

Forks

Languages