# Quora Insincere Questions Classification Using Neural Networks and Deep Learning Models
Outline:
1. Dowload data from Kaggle to google Colab

2. Prepare the data for Modeling using the TF-IDF Technique

3. Train a deep learning model using `PyTorch` package

## Download Data from Kaggle

In [2]:
!ls

kaggle.json  sample_data


In [None]:
!pwd

/content


In [3]:
import os

In [4]:
os.environ['KAGGLE_CONFIG_DIR'] = '/content'

In [5]:
!kaggle competitions download -c quora-insincere-questions-classification -f train.csv -p data
!kaggle competitions download -c quora-insincere-questions-classification -f test.csv -p data
!kaggle competitions download -c quora-insincere-questions-classification -f sample_submission.csv -p data

Downloading train.csv to data
  0% 0.00/54.9M [00:00<?, ?B/s]
100% 54.9M/54.9M [00:00<00:00, 772MB/s]
Downloading test.csv to data
  0% 0.00/15.8M [00:00<?, ?B/s]
100% 15.8M/15.8M [00:00<00:00, 965MB/s]
Downloading sample_submission.csv to data
  0% 0.00/4.09M [00:00<?, ?B/s]
100% 4.09M/4.09M [00:00<00:00, 698MB/s]


In [6]:
train_fname = 'data/train.csv.zip'
test_fname = 'data/test.csv.zip'
sample_fname = 'data/sample_submission.csv.zip'

In [7]:
import pandas as pd

In [9]:
raw_df = pd.read_csv(train_fname)
test_df = pd.read_csv(test_fname)
sub_df = pd.read_csv(sample_fname)

In [10]:
raw_df

Unnamed: 0,qid,question_text,target
0,00002165364db923c7e6,How did Quebec nationalists see their province...,0
1,000032939017120e6e44,"Do you have an adopted dog, how would you enco...",0
2,0000412ca6e4628ce2cf,Why does velocity affect time? Does velocity a...,0
3,000042bf85aa498cd78e,How did Otto von Guericke used the Magdeburg h...,0
4,0000455dfa3e01eae3af,Can I convert montra helicon D to a mountain b...,0
...,...,...,...
1306117,ffffcc4e2331aaf1e41e,What other technical skills do you need as a c...,0
1306118,ffffd431801e5a2f4861,Does MS in ECE have good job prospects in USA ...,0
1306119,ffffd48fb36b63db010c,Is foam insulation toxic?,0
1306120,ffffec519fa37cf60c78,How can one start a research project based on ...,0


In [11]:
test_df

Unnamed: 0,qid,question_text
0,0000163e3ea7c7a74cd7,Why do so many women become so rude and arroga...
1,00002bd4fb5d505b9161,When should I apply for RV college of engineer...
2,00007756b4a147d2b0b3,What is it really like to be a nurse practitio...
3,000086e4b7e1c7146103,Who are entrepreneurs?
4,0000c4c3fbe8785a3090,Is education really making good people nowadays?
...,...,...
375801,ffff7fa746bd6d6197a9,How many countries listed in gold import in in...
375802,ffffa1be31c43046ab6b,Is there an alternative to dresses on formal p...
375803,ffffae173b6ca6bfa563,Where I can find best friendship quotes in Tel...
375804,ffffb1f7f1a008620287,What are the causes of refraction of light?


In [12]:
sub_df

Unnamed: 0,qid,prediction
0,0000163e3ea7c7a74cd7,0
1,00002bd4fb5d505b9161,0
2,00007756b4a147d2b0b3,0
3,000086e4b7e1c7146103,0
4,0000c4c3fbe8785a3090,0
...,...,...
375801,ffff7fa746bd6d6197a9,0
375802,ffffa1be31c43046ab6b,0
375803,ffffae173b6ca6bfa563,0
375804,ffffb1f7f1a008620287,0


In [13]:
raw_df.sample(10)

Unnamed: 0,qid,question_text,target
748658,92ab8acba2dd8b319b49,Why do people avoid the big stall in restrooms?,0
852568,a70bd7173737306c7231,What would you do if you want to break down bu...,0
265890,3409fe2fd95c832b689a,"What does ""jardins flottant"" mean?",0
996848,c356d9780fd37b292804,How do you swallow fire?,0
353954,455f21ac3d760429db2f,How does San Francisco's approach to dealing w...,0
50772,09f3290d48a3a7cfd71f,What is Quora's policy relating to information...,0
976727,bf5bab1d8f31a629513d,Were they actually locking kids in cages?,0
731651,8f4a21ccb37a0b1fa51f,Is there a way to transform oily skin into non...,0
886707,adba1d55df9fa71e1db4,"What's the ""Charger"" in a tank crew?",0
264290,33b82b2f239bb535d1ec,Is the Mercedes C coupe assembled in Germany b...,0


In [14]:
SAMPLE_SIZE = 100_000
sample_df = raw_df.sample(SAMPLE_SIZE)

In [15]:
sample_df

Unnamed: 0,qid,question_text,target
308585,3c72fdb4b582ab569098,Is paying off your mortgage asap the best way ...,0
1093733,d65b4a2b0c991cf76546,What do you love most about BTS V?,0
304109,3b8fa155706e263c0be6,What feelings does an attractive Indian woman ...,1
412788,50e1c1da5f8127ba6660,How do I become mentally strong and active?,0
479598,5deb8447e276dc5791a5,What are factors that determinate a computer CPU?,0
...,...,...,...
1096483,d6e711f87e424d1a65e0,What is the time gap between IBPS PO Mains res...,0
802292,9d3447b14b5320a7e542,Why water becomes hot when an acid is add?,0
635953,7c8ff4d5076471fbf57a,How do I know my greatest dream?,0
59827,0bbcca326d78872d39f5,Why is the government bringing back the one ru...,0


In [16]:
sample_df.target.value_counts(normalize=True)

Unnamed: 0_level_0,proportion
target,Unnamed: 1_level_1
0,0.93764
1,0.06236


## Prepare the Data for Training
- Convert text to TF-IDF Vectors

- Convert Vectors to PyyTorch tensors

- Create PyTorch Data Loaders

### Convert text to TF-IDF

In [17]:
from sklearn.feature_extraction.text import TfidfVectorizer

In [None]:
vectorizer = TfidfVectorizer()