### Random Acts of Pizza

In machine learning, it is often said there are no free lunches. How wrong we were.

This competition contains a dataset with 5671 textual requests for pizza from the Reddit community Random Acts of Pizza together with their outcome (successful/unsuccessful) and meta-data.

The task is to create an algorithm capable of predicting which requests will garner a cheesy (but sincere!) act of kindness.

In [1]:
#get data
import json
with open('train.json') as fin:
    trainjson = json.load(fin)

In [2]:
trainjson[0]

{'giver_username_if_known': 'N/A',
 'number_of_downvotes_of_request_at_retrieval': 0,
 'number_of_upvotes_of_request_at_retrieval': 1,
 'post_was_edited': False,
 'request_id': 't3_l25d7',
 'request_number_of_comments_at_retrieval': 0,
 'request_text': 'Hi I am in need of food for my 4 children we are a military family that has really hit hard times and we have exahusted all means of help just to be able to feed my family and make it through another night is all i ask i know our blessing is coming so whatever u can find in your heart to give is greatly appreciated',
 'request_text_edit_aware': 'Hi I am in need of food for my 4 children we are a military family that has really hit hard times and we have exahusted all means of help just to be able to feed my family and make it through another night is all i ask i know our blessing is coming so whatever u can find in your heart to give is greatly appreciated',
 'request_title': 'Request Colorado Springs Help Us Please',
 'requester_accoun

We are only interested in the text fields

### Input:
    - request_id: unique identifier for the request
    - request_title: title of the reddit post for pizza request
    - request_text-edit_aware: expository to request for pizza

### Output:
    - requester_received_pizza: whether requester gets his/her pizza

Currently, we are only using request_text as the input to build Naive Bayes classifier and the output is the requester_received_pizza field.

In [4]:
print('UID:\t', trainjson[0]['request_id'], '\n')
print('Title:\t', trainjson[0]['request_title'], '\n')
print('Text:\t', trainjson[0]['request_text_edit_aware'], '\n')
print('Tag:\t', trainjson[0]['requester_received_pizza'], end='\n')

UID:	 t3_l25d7 

Title:	 Request Colorado Springs Help Us Please 

Text:	 Hi I am in need of food for my 4 children we are a military family that has really hit hard times and we have exahusted all means of help just to be able to feed my family and make it through another night is all i ask i know our blessing is coming so whatever u can find in your heart to give is greatly appreciated 

Tag:	 False


### Converting json to pandas DataFrame

In [5]:
import pandas as pd
df = pd.io.json.json_normalize(trainjson)

df_train = df[['request_id', 'request_title','request_text_edit_aware','requester_received_pizza']]

df_train.head()

Unnamed: 0,request_id,request_title,request_text_edit_aware,requester_received_pizza
0,t3_l25d7,Request Colorado Springs Help Us Please,Hi I am in need of food for my 4 children we a...,False
1,t3_rcb83,"[Request] California, No cash and I could use ...",I spent the last money I had on gas today. Im ...,False
2,t3_lpu5j,"[Request] Hungry couple in Dundee, Scotland wo...",My girlfriend decided it would be a good idea ...,False
3,t3_mxvj3,"[Request] In Canada (Ontario), just got home f...","It's cold, I'n hungry, and to be completely ho...",False
4,t3_1i6486,[Request] Old friend coming to visit. Would LO...,hey guys:\n I love this sub. I think it's grea...,False


In [6]:
#getting the test data

import json

with open('test.json') as fin:
    testjson = json.load(fin)

In [8]:
print('UID:\t', testjson[0]['request_id'], '\n')
print('Title:\t', testjson[0]['request_title'], '\n')
print('Text:\t', testjson[0]['request_text_edit_aware'], '\n')
print('Tag:\t', testjson[0]['requester_received_pizza'], end='\n')

UID:	 t3_i8iy4 

Title:	 [request] pregger gf 95 degree house and no food.. promise to pay it forward! Northern Colorado 

Text:	 Hey all! It's about 95 degrees here and our kitchen is pretty much empty save for some bread and cereal.  My girlfriend/fiance is 8 1/2 months pregnant and we could use a good meal.  We promise to pay it forward when we get money! Thanks so much in advance! 



KeyError: 'requester_received_pizza'

In the test data, the label i.e. requester_received_pizza won't be known since that's the thing that our classifier is predicting. 

In [9]:
#converting test to DataFrame

df = pd.io.json.json_normalize(testjson)
df_test = df[['request_id','request_title','request_text_edit_aware']]
df_test.head()

Unnamed: 0,request_id,request_title,request_text_edit_aware
0,t3_i8iy4,[request] pregger gf 95 degree house and no fo...,Hey all! It's about 95 degrees here and our ki...
1,t3_1mfqi0,"[Request] Lost my job day after labour day, st...",I didn't know a place like this exists! \n\nI ...
2,t3_lclka,(Request) pizza for my kids please?,Hi Reddit. Im a single dad having a really rou...
3,t3_1jdgdj,[Request] Just moved to a new state(Waltham MA...,Hi I just moved to Waltham MA from my home sta...
4,t3_t2qt4,"[Request] Two girls in between paychecks, we'v...",We're just sitting here near indianapolis on o...


### Splitting training data before vectorization

The frist thing to do is to split our training data into two parts:
    - training: Use for trainig our model
    - validation: Use to check the "soundness" of the model built 

Splitting the data into 2 parts and holding out one part to check the model is one of method to validate the "soundness" of our model. It's call the hold-out validation.

In [10]:
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.model_selection import train_test_split

#train_test_split splits the data into 2 parts. We are spliting itto two set called traina nd valid split

train,valid = train_test_split(df_train, test_size = 0.2)
