---

_You are currently looking at **version 1.1** of this notebook. To download notebooks and datafiles, as well as get help on Jupyter notebooks in the Coursera platform, visit the [Jupyter Notebook FAQ](https://www.coursera.org/learn/python-machine-learning/resources/bANLa) course resource._

---

## Predicting Property Maintenance Fines

Based on a data challenge from the Michigan Data Science Team ([MDST](http://midas.umich.edu/mdst/)). 

[Blight violations](http://www.detroitmi.gov/How-Do-I/Report/Blight-Complaint-FAQs) are issued by the city to individuals who allow their properties to remain in a deteriorated condition. Every year, the city of Detroit issues millions of dollars in fines to residents and every year, many of these fines remain unpaid. Enforcing unpaid blight fines is a costly and tedious process, so the city wants to know: how can we increase blight ticket compliance?

**File descriptions** (Use only this data for training your model!)

    train.csv - the training set (all tickets issued 2004-2011)
    test.csv - the test set (all tickets issued 2012-2016)
    addresses.csv & latlons.csv - mapping from ticket id to addresses, and from addresses to lat/lon coordinates. 
    Note: misspelled addresses may be incorrectly geolocated.

<br>
**Data fields**

train.csv & test.csv

    ticket_id - unique identifier for tickets
    agency_name - Agency that issued the ticket
    inspector_name - Name of inspector that issued the ticket
    violator_name - Name of the person/organization that the ticket was issued to
    violation_street_number, violation_street_name, violation_zip_code - Address where the violation occurred
    mailing_address_str_number, mailing_address_str_name, city, state, zip_code, non_us_str_code, country - Mailing address of the violator
    ticket_issued_date - Date and time the ticket was issued
    hearing_date - Date and time the violator's hearing was scheduled
    violation_code, violation_description - Type of violation
    disposition - Judgment and judgement type
    fine_amount - Violation fine amount, excluding fees
    admin_fee - $20 fee assigned to responsible judgments
state_fee - $10 fee assigned to responsible judgments
    late_fee - 10% fee assigned to responsible judgments
    discount_amount - discount applied, if any
    clean_up_cost - DPW clean-up or graffiti removal cost
    judgment_amount - Sum of all fines and fees
    grafitti_status - Flag for graffiti violations
    
train.csv only

    payment_amount - Amount paid, if any
    payment_date - Date payment was made, if it was received
    payment_status - Current payment status as of Feb 1 2017
    balance_due - Fines and fees still owed
    collection_status - Flag for payments in collections
    compliance [target variable for prediction] 
     Null = Not responsible
     0 = Responsible, non-compliant
     1 = Responsible, compliant
    compliance_detail - More information on why each ticket was marked compliant or non-compliant


___

Predictions will be given as the probability that the corresponding blight ticket will be paid on time.

The evaluation metric for is the Area Under the ROC Curve (AUC).

In [2]:
import pandas as pd
import numpy as np

from sklearn.ensemble import GradientBoostingClassifier
from sklearn.model_selection import GridSearchCV

train= pd.read_csv('train.csv', encoding = "ISO-8859-1") 
test= pd.read_csv('test.csv')
address= pd.read_csv('addresses.csv')
latlon= pd.read_csv('latlons.csv')

train= train[(train['compliance'] == 0) | (train['compliance'] == 1)] #NaN data out
y_train = train['compliance'] 


# Discard unnecessary columns
common_drop = ['agency_name', 'inspector_name','zip_code', 'city', 'state', 'country',
               'violator_name','violation_street_number', 'violation_street_name',
               'violation_zip_code', 'violation_description',
               'mailing_address_str_number', 'mailing_address_str_name',
               'non_us_str_code', 'disposition',
               'ticket_issued_date', 'hearing_date', 'grafitti_status', 'violation_code']

train_drop = ['payment_status', 'payment_date', 'payment_amount', 'balance_due', 'collection_status',
               'compliance_detail', 'compliance']

train.drop(train_drop+ common_drop, axis=1, inplace=True)
test.drop(common_drop, axis=1, inplace=True)

#Data Cleaning
address= address.set_index('address').join(latlon.set_index('address'), how='left') #   !!!

#print(train)
#print(address)

# Merge the addresses and lat/lons into the train and test DataFrames
train = train.set_index('ticket_id').join(address.set_index('ticket_id'))
test = test.set_index('ticket_id').join(address.set_index('ticket_id'))

# NaN values are filled with pad method
train.lat.fillna(method='pad', inplace=True)
train.lon.fillna(method='pad', inplace=True)
test.lat.fillna(method='pad', inplace=True)
test.lon.fillna(method='pad', inplace=True)

#print(train)

#test[(test['lon'] > -84) & (test['lon'] < -82) != True]
#test[test['ticket_id']== 317124]
#test[(test['ticket_id']>= 317123) & (test['ticket_id']<= 317125) ]
#test.iloc[20457:20462]

#print(test.iloc[0:2,:])
#print(test)
X_train = train

grid_values = {'learning_rate': [0.08, 0.09, 0.1], 'max_depth': [3,4,5]}
clf = GradientBoostingClassifier(random_state = 0)
gridclf = GridSearchCV(clf, param_grid = grid_values, scoring = 'roc_auc')
gridclf.fit(X_train, y_train)
proba = gridclf.predict_proba(test)[:, 1]

#test.set_index('ticket_id', inplace=True)
result = pd.Series(proba, index=test.index)
print(result)


  interactivity=interactivity, compiler=compiler, result=result)


ticket_id
284932    0.053564
285362    0.020538
285361    0.060159
285338    0.071119
285346    0.079711
285345    0.071119
285347    0.083226
285342    0.330005
285530    0.036140
284989    0.029578
285344    0.074700
285343    0.029267
285340    0.030713
285341    0.095144
285349    0.079711
285348    0.071119
284991    0.029578
285532    0.061051
285406    0.028194
285001    0.078075
285006    0.065271
285405    0.020538
285337    0.026888
285496    0.053793
285497    0.056208
285378    0.025264
285589    0.021492
285585    0.053674
285501    0.065542
285581    0.024667
            ...   
376367    0.029048
376366    0.033645
376362    0.036594
376363    0.059719
376365    0.029048
376364    0.033645
376228    0.079210
376265    0.029534
376286    0.315724
376320    0.033602
376314    0.025795
376327    0.366536
376385    0.345815
376435    0.510536
376370    0.359759
376434    0.058126
376459    0.067991
376478    0.014868
376473    0.033674
376484    0.035806
376482    0.033600
37