---

_You are currently looking at **version 1.1** of this notebook. To download notebooks and datafiles, as well as get help on Jupyter notebooks in the Coursera platform, visit the [Jupyter Notebook FAQ](https://www.coursera.org/learn/python-machine-learning/resources/bANLa) course resource._

---

## Assignment 4 - Understanding and Predicting Property Maintenance Fines

This assignment is based on a data challenge from the Michigan Data Science Team ([MDST](http://midas.umich.edu/mdst/)). 

The Michigan Data Science Team ([MDST](http://midas.umich.edu/mdst/)) and the Michigan Student Symposium for Interdisciplinary Statistical Sciences ([MSSISS](https://sites.lsa.umich.edu/mssiss/)) have partnered with the City of Detroit to help solve one of the most pressing problems facing Detroit - blight. [Blight violations](http://www.detroitmi.gov/How-Do-I/Report/Blight-Complaint-FAQs) are issued by the city to individuals who allow their properties to remain in a deteriorated condition. Every year, the city of Detroit issues millions of dollars in fines to residents and every year, many of these fines remain unpaid. Enforcing unpaid blight fines is a costly and tedious process, so the city wants to know: how can we increase blight ticket compliance?

The first step in answering this question is understanding when and why a resident might fail to comply with a blight ticket. This is where predictive modeling comes in. For this assignment, your task is to predict whether a given blight ticket will be paid on time.

All data for this assignment has been provided to us through the [Detroit Open Data Portal](https://data.detroitmi.gov/). **Only the data already included in your Coursera directory can be used for training the model for this assignment.** Nonetheless, we encourage you to look into data from other Detroit datasets to help inform feature creation and model selection. We recommend taking a look at the following related datasets:

* [Building Permits](https://data.detroitmi.gov/Property-Parcels/Building-Permits/xw2a-a7tf)
* [Trades Permits](https://data.detroitmi.gov/Property-Parcels/Trades-Permits/635b-dsgv)
* [Improve Detroit: Submitted Issues](https://data.detroitmi.gov/Government/Improve-Detroit-Submitted-Issues/fwz3-w3yn)
* [DPD: Citizen Complaints](https://data.detroitmi.gov/Public-Safety/DPD-Citizen-Complaints-2016/kahe-efs3)
* [Parcel Map](https://data.detroitmi.gov/Property-Parcels/Parcel-Map/fxkw-udwf)

___

We provide you with two data files for use in training and validating your models: train.csv and test.csv. Each row in these two files corresponds to a single blight ticket, and includes information about when, why, and to whom each ticket was issued. The target variable is compliance, which is True if the ticket was paid early, on time, or within one month of the hearing data, False if the ticket was paid after the hearing date or not at all, and Null if the violator was found not responsible. Compliance, as well as a handful of other variables that will not be available at test-time, are only included in train.csv.

Note: All tickets where the violators were found not responsible are not considered during evaluation. They are included in the training set as an additional source of data for visualization, and to enable unsupervised and semi-supervised approaches. However, they are not included in the test set.

<br>

**File descriptions** (Use only this data for training your model!)

    train.csv - the training set (all tickets issued 2004-2011)
    test.csv - the test set (all tickets issued 2012-2016)
    addresses.csv & latlons.csv - mapping from ticket id to addresses, and from addresses to lat/lon coordinates. 
     Note: misspelled addresses may be incorrectly geolocated.

<br>

**Data fields**

train.csv & test.csv

    ticket_id - unique identifier for tickets
    agency_name - Agency that issued the ticket
    inspector_name - Name of inspector that issued the ticket
    violator_name - Name of the person/organization that the ticket was issued to
    violation_street_number, violation_street_name, violation_zip_code - Address where the violation occurred
    mailing_address_str_number, mailing_address_str_name, city, state, zip_code, non_us_str_code, country - Mailing address of the violator
    ticket_issued_date - Date and time the ticket was issued
    hearing_date - Date and time the violator's hearing was scheduled
    violation_code, violation_description - Type of violation
    disposition - Judgment and judgement type
    fine_amount - Violation fine amount, excluding fees
    admin_fee - $20 fee assigned to responsible judgments
state_fee - $10 fee assigned to responsible judgments
    late_fee - 10% fee assigned to responsible judgments
    discount_amount - discount applied, if any
    clean_up_cost - DPW clean-up or graffiti removal cost
    judgment_amount - Sum of all fines and fees
    grafitti_status - Flag for graffiti violations
    
train.csv only

    payment_amount - Amount paid, if any
    payment_date - Date payment was made, if it was received
    payment_status - Current payment status as of Feb 1 2017
    balance_due - Fines and fees still owed
    collection_status - Flag for payments in collections
    compliance [target variable for prediction] 
     Null = Not responsible
     0 = Responsible, non-compliant
     1 = Responsible, compliant
    compliance_detail - More information on why each ticket was marked compliant or non-compliant


___

## Evaluation

Your predictions will be given as the probability that the corresponding blight ticket will be paid on time.

The evaluation metric for this assignment is the Area Under the ROC Curve (AUC). 

Your grade will be based on the AUC score computed for your classifier. A model which with an AUROC of 0.7 passes this assignment, over 0.75 will recieve full points.
___

For this assignment, create a function that trains a model to predict blight ticket compliance in Detroit using `train.csv`. Using this model, return a series of length 61001 with the data being the probability that each corresponding ticket from `test.csv` will be paid, and the index being the ticket_id.

Example:

    ticket_id
       284932    0.531842
       285362    0.401958
       285361    0.105928
       285338    0.018572
                 ...
       376499    0.208567
       376500    0.818759
       369851    0.018528
       Name: compliance, dtype: float32
       
### Hints

* Make sure your code is working before submitting it to the autograder.

* Print out your result to see whether there is anything weird (e.g., all probabilities are the same).

* Generally the total runtime should be less than 10 mins. You should NOT use Neural Network related classifiers (e.g., MLPClassifier) in this question. 

* Try to avoid global variables. If you have other functions besides blight_model, you should move those functions inside the scope of blight_model.

* Refer to the pinned threads in Week 4's discussion forum when there is something you could not figure it out.

In [212]:
import pandas as pd
import numpy as np
import re
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

pd.set_option('display.max_rows', 600)
pd.set_option('max_colwidth', 500)

from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
from sklearn.linear_model import LogisticRegression, LogisticRegressionCV
from sklearn.svm import SVC
from sklearn.model_selection import train_test_split, GridSearchCV

from sklearn.preprocessing import MinMaxScaler, StandardScaler
# from sklearn.impute import SimpleImputer  ## <- autograder errored out because of this module

from sklearn.pipeline import Pipeline

from sklearn.metrics import precision_recall_curve, roc_curve, confusion_matrix, auc

def blight_model():
    
    # Your code here
    
    ## Read in data
    date_cols = ['ticket_issued_date', 'hearing_date']

#     train_raw = pd.read_csv('train.csv', encoding = 'cp1252', parse_dates=date_cols+['payment_date'])
    train_raw = pd.read_csv('train.csv', encoding = 'cp1252')
#     test = pd.read_csv('test.csv', parse_dates=date_cols)
    test = pd.read_csv('test.csv')
    address = pd.read_csv('addresses.csv')
    latlons = pd.read_csv('latlons.csv')
    
   
    ## Define Function for Feature Extraction
    def extract(df, nm):

        # ---- Get geocode
        df = df.merge(right=address, how='left', on='ticket_id')\
               .merge(right=latlons, how='left', on='address')
        l = len(df[(df['lat'].isnull())|(df['lon'].isnull())])
#         print('Missing {} geocode in {} set'.format(l, nm))


        # ---- Extract useful info from raw data

        # 1. Violator = acorn vs bank vs LLC
        df['violator_grp'] = np.where(df['violator_name'].str.contains('bank(,|\s)|bank$', case=False)==True, 'Bank',
                                np.where(df['violator_name'].str.contains('LLC(,|\s)|LLC$', case=False)==True, 'LLC',
                                  np.where(df['violator_name'].str.contains('acorn(,|\s)|acorn$', case=False)==True, 'Acorn', 'Other'
    #                              np.where(df['violator_name'].isnull()==False, 'Other', np.nan
                                         )))

        df['violatorAcorn']=df['violator_grp'].map(lambda x: 'Y' if x=='Acorn' else 'N')

        #violator group + location
        df['violator_grp_loc'] = df['violator_grp'] +  df['city'].astype(str).str.upper() + df['zip_code'].astype(str)

        list_of_violator = sorted(df['violator_grp_loc'].unique())
        k  = pd.DataFrame({'violator_grp_loc': list_of_violator,
                            'violator_grp_id' : range(len(list_of_violator))})


        df = df.merge(right=k, how='left', on='violator_grp_loc')

        df['violatorAcorn']=df['violator_grp'].map(lambda x: 'Y' if x=='Acorn' else 'N')

        # 2. Standardize varities of 'PO Box'
        df['POBOX']=np.where(df['mailing_address_str_name'].str.contains('P(\s|\.)*O(\s|\.)*BOX', case=False) ==True, 1, 0)


        # 3. Standardize mailing address city names
        df['city']=df['city'].str.upper()
        df.loc[ (df['city'].str.contains('\ADET(\s|\.|\,)?')==True) & 
                (df['city'].str.contains('DETOUR')==False), 'city']='DETROIT'
        df.loc[df['city']=='W. BLOOMFIELD', 'city']='WEST BLOOMFIELD'

        top5city = ['DETROIT', 'SOUTHFIELD', 'DEARBORN', 'WEST BLOOMFIELD', 'FARMINGTON HILLS']

        df['city_cat'] = df['city'].map(lambda x: x if x in top5city else 'Other')
        df['city_cat'].replace(to_replace='\s', value='_', regex=True, inplace=True)    


        # 4. violation_description = solid waste, etc
        # a=test[['violation_description', 'violation_code']]
        # b=a.groupby(['violation_code','violation_description']).size().sort_values(ascending=False)
        # b = pd.DataFrame(b, columns=['n'])
        # b.reset_index(inplace=True)
        # b.sort_values('violation_code')
    #     df['violation_type']= \
    #         list(map(lambda x: 'Solid_Waste' if x[:3] == '22-' \
    #                                          else('Prop_Maint' if (x[:2] =='9-') or (re.search('\d{4}(0901)\Z',x)!=None)\
    #                                                            else('Zoning' if x[:3] == '61-' \
    #                                                                          else None)), df['violation_code']))
        df['violation_type']= np.where(df['violation_code'].str.contains('\A22-', regex=True)==True, 'Solid_Waste',
                               np.where((df['grafitti_status']=='GRAFFITI TICKET') | 
                                     (df['violation_description'].str.contains('graffiti', case=False)==True), 'Graffiti',
                                np.where(df['violation_code'].str.contains('((\A9-)|(\d{4}(0901)\Z))', regex=True)==True, 'Prop_Maint',
                                 np.where(df['violation_code'].str.contains('\A61-', regex=True)==True, 'Zoning', 'Other'))))      


        df['violation_c'] = df['violation_code'].map(lambda x: 9 if re.match('(\d{4}(0901)\Z)',x)
                                                             else int((re.match('\A\d+(?=\-)',x).group(0))
                                                                   if re.match('\A\d+(?=\-)',x) else np.nan))
        #5. Extract quarters, months, hours from Ticket Issue Date
#         df['qt'] = df['ticket_issued_date'].dt.quarter
#         df['qt_num'] = df['qt']
#         df['mo'] = df['ticket_issued_date'].dt.month
#         df['mo_hearing'] = df['hearing_date'].dt.month
#         df['hr'] = df['ticket_issued_date'].dt.hour
#         df['hr_hearing'] = df['hearing_date'].dt.hour
#         df['hr10am'] = df['hr'].map(lambda x: 'Y' if x>=10 else 'N')
#         df['hr10am_hearing'] = df['hr_hearing'].map(lambda x: 'Y' if x>=10 else 'N')

#         df['dow_hearing'] = df['hearing_date'].dt.dayofweek

        #6. Days between ticket issued and hearing dates - assume dates switched if number of days is negative
#         df.loc[df['hearing_date'].notnull(), 'issued_hearing_days'] \
#             = abs((df.loc[df['hearing_date'].notnull(), 'hearing_date'] \
#                  - df.loc[df['ticket_issued_date'].notnull(),'ticket_issued_date'])).map(lambda x: x.days)

        df['issued_hearing_days'] = np.where((df['hearing_date'].notnull()) & (df['ticket_issued_date'].notnull()), 
                   abs(pd.to_datetime(df['hearing_date']) - pd.to_datetime(df['ticket_issued_date'])).dt.days, np.nan)
            
        #7 Abbreviate agency names
        df['agency']= df['agency_name'].map({'Buildings, Safety Engineering & Env Department': 'Env',
                                        'Department of Public Works' : 'Public',
                                        'Detroit Police Department': 'Police',
                                        'Health Department': 'Other',
                                        'Neighborhood City Halls': 'Other'})
        #8. Convert geocode to XYZ coordinates
           #https://datascience.stackexchange.com/questions/13567/ways-to-deal-with-longitude-latitude-feature
        #Convert lat, lon from degrees to radians
        df[['lat_rad', 'lon_rad']]=df[['lat', 'lon']].apply(np.radians)

        df['loc_x'] = np.cos(df['lat_rad']) * np.cos(df['lon_rad'])
        df['loc_y'] = np.cos(df['lat_rad']) * np.sin(df['lon_rad'])
        df['loc_z'] = np.sin(df['lat_rad']) 


        #9. Flag state fee, admin fee, discount_amount
    #     df['state_fee_YN'] = df['state_fee'].map(lambda x: 'Y' if x>0 else 'N') #Comment out as known compliant status always have a fee
    #     df['admin_fee_YN'] = df['admin_fee'].map(lambda x: 'Y' if x>0 else 'N')
        df['discount_YN']=df['discount_amount'].map(lambda x: 'Y' if x>0 else 'N')

        #10. See if judgment amount > total fee responsible (late fee applied -> not compliant) or 
            #       judgment amount <= total fee responsible (no late fee applied -> compliant)
        df['fee'] = df['fine_amount']+df['admin_fee']+df['state_fee']+df['clean_up_cost']
        df['diff'] = df['judgment_amount'] - df['fee']
        df['responsible'] = df['diff'].map(lambda x: 1 if x<=0 else 0)

        #11. Keep just columns needed
        cols_to_keep = ['ticket_id', 'fine_amount', 'late_fee', 'discount_amount', 'judgment_amount',  
                        'issued_hearing_days', 
                        'loc_x', 'loc_y', 'loc_z',
                        'lat', 'lon',
                        'discount_YN', 'diff', 'responsible',
                         'POBOX',
                        'agency', 'violator_grp', 'violatorAcorn', 'violator_grp_id', 'city_cat',
                        'violation_type', 'violation_c'
#                         'qt', 'qt_num', 'hr', 'mo', hr10am hr10am_hearing
#                         'hr_hearing',  'mo_hearing', 'dow_hearing'
                        ]
        # Add target column to training set
        if nm=='train':
            cols_to_keep += ['compliance']

        def keepcols(_df):
            _df = _df[cols_to_keep]
            return _df

        df = keepcols(df)  #<----- comment out if testing for extraction results in the next cell
        
        #12. Impute continuous columns using fillena() bc autograder doesn't accept sklearn.impute.SimpleImputer
        if nm=='train':
            df = df.fillna(df.median())
        if nm=='test':
            df = df.fillna(train.median())
        return df

    
    train_raw = pd.read_csv('train.csv', encoding = 'cp1252', parse_dates=date_cols+['payment_date'])
    test = pd.read_csv('test.csv', parse_dates=date_cols)

    #Remove missing compliance
    train = train_raw[train_raw['compliance'].isnull()==False]

    #Convert compliance to integer values
    train['compliance'] = train['compliance'].astype(int)
    #a = extract(train, 'train')  # For testing extraction results in next cell
    train = extract(train, 'train')

    #a = extract(test, 'test') # For testing extraction results in next cell
    test = extract(test, 'test')


    
    ## Create dummy variables
#     dummylist = ['POBOX', 'discount_YN', 'agency', 'violator_grp', 'violatorAcorn', 'violation_type', 'city_cat']
#     prefix = list(map(lambda x: re.search('[^_]*',x).group(), dummylist))


#     for x in dummylist:
#         cat = np.union1d(train[x], test[x])
#     #     train[x] = train[x].astype(pd.CategoricalDtype(categories=cat))
#         test[x] = test[x].astype(pd.CategoricalDtype(categories=cat))

#     train_ = pd.get_dummies(train.iloc[:, :-1], columns=dummylist, prefix=prefix) #Pop compliance column to be put back as the last column
#     test = pd.get_dummies(test, columns=dummylist, prefix=prefix)

#     train = pd.concat([train_, train[['compliance']]], axis=1)


    ## Create Transforms and Classifiers
    # imp = SimpleImputer(strategy = 'median')  ## <- autograder errored out because of this module
    #     X_train = Imp.fit_transform(X_train)
    #     X_test = Imp.transform(X_test)

    scaler = MinMaxScaler()
    #     X_train_scaled = scaler.fit_transform(X_train)
    #     X_test_scaled = scaler.transform(X_test)

    def clf_lr():
        lr = LogisticRegression(C=10,
                               class_weight='balanced',
                               fit_intercept=True,
                               max_iter=1000,
                               n_jobs=None,
                               penalty='l1',
                               random_state=0,
                               solver='saga')
        ## Pipiline to chain data transformation and estimator
        # pipe.get_params().keys()
        return Pipeline(steps=[('impute', imp), ('scale', scaler), ('logistic', lr)])
    
    def clf_gb():
        gb = GradientBoostingClassifier(random_state = 0
                                          , n_estimators = 70
                                          , learning_rate = 0.05
                                          , max_depth = 4
                                          , max_features = 1)
        ## Pipiline to chain data transformation and estimator
        # pipe.get_params().keys()
#         return Pipeline(steps=[('impute', imp), ('gb', gb)])
        return Pipeline(steps=[('gb', gb)])

    ## Train Optimized Classifier
    #Full model with all variables the worst!
    cols = [
            'discount_amount'
            , 'late_fee'
            , 'diff'
            , 'responsible'
    #         , 'ticket_id'
            , 'judgment_amount'
            , 'fine_amount'
            , 'issued_hearing_days'
            , 'lat', 'lon'
            ]

    X = train.loc[:, cols]
    y = train.iloc[:, -1]
    
    ## Set up Pipeline
#     pipe = clf_lr()
    pipe = clf_gb()
    
    # rate = [i*j for i in np.logspace(-5,-1,5) for j in [1, 5]] #Best = 0.1
    # rate = np.linspace(0.1, 0.5, 5) # Best = 0.1
    rate = [0.05, 0.075, 0.1, 0.125, 0.15, 0.2] # Best = 0.05

    # n = [100, 200, 300, 400, 500], # Best at n = 200
    # n = np.linspace(200, 400, 5, dtype=int) # Best = 250
    # n = np.linspace(100, 400, 7, dtype=int) #Best = 100
    # n = [30, 50, 80, 100] # Best = 80
    n = [65, 70, 75] # Best = 70


    # Parameters of pipelines can be set using ‘__’ separat ved parameter names:
    grid_values = {
                     'gb__n_estimators': [70]
    #                 , 'gb__max_features': [1, 2, 3]
    #                 , 'gb__max_depth' : [2, 3, 4]
#                       , 'gb__learning_rate' : rate
                 }

    grid_gb = GridSearchCV(estimator = pipe, param_grid = grid_values, scoring = 'roc_auc', cv = 5, return_train_score=True
                           ,n_jobs = 4, pre_dispatch = '2*n_jobs')
    grid_gb.fit(X, y)


    ## Model Evaluation
    print("Best Estimator: \n{}\n".format(grid_gb.best_estimator_))
    print("Best Parameters: \n{}\n".format(grid_gb.best_params_))
    print("Best Test Score: \n{}\n".format(grid_gb.best_score_))
    print("Best Training Score: \n{}\n".format(grid_gb.cv_results_['mean_train_score'][grid_gb.best_index_]))
    print("All Test Scores: \n{}\n".format(grid_gb.cv_results_['mean_test_score']))
    print("All Training Scores: \n{}\n".format(grid_gb.cv_results_['mean_train_score']))

#     FI = grid_gb.best_estimator_.named_steps['gb'].feature_importances_
#     print(pd.Series(FI, name='Feature_Importance', index=cols).sort_values(ascending=False))

#     ## Predicition
    X_test = test.loc[:, cols]
    prob = grid_gb.predict_proba(X_test)

#     #Put together probabilities into a dataframe
    df = pd.DataFrame(prob, columns=['prob_0', 'prob_1'], index=test['ticket_id'])
    # df['prob_1'].head(3); df['prob_1'].tail(3)
    # prob[:3]
    # prob[-3:]

#     #Check tickets that have the same probability
#     dup = df[df.duplicated(keep=False)].sort_values(by=['prob_1'])
#     dup = dup.merge(right=test, on='ticket_id', how='inner')

#     dup.iloc[:, :15]
#     dup.iloc[:, 15:30]
#     dup.iloc[:, 30:]

    return df['prob_1']

In [213]:
# ans = blight_model()
# ans.shape
# ans

  if (await self.run_code(code, result,  async_=asy)):
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  return func(self, *args, **kwargs)


Best Estimator: 
Pipeline(memory=None,
         steps=[('gb',
                 GradientBoostingClassifier(ccp_alpha=0.0,
                                            criterion='friedman_mse', init=None,
                                            learning_rate=0.05, loss='deviance',
                                            max_depth=4, max_features=1,
                                            max_leaf_nodes=None,
                                            min_impurity_decrease=0.0,
                                            min_impurity_split=None,
                                            min_samples_leaf=1,
                                            min_samples_split=2,
                                            min_weight_fraction_leaf=0.0,
                                            n_estimators=70,
                                            n_iter_no_change=None,
                                            presort='deprecated',
                                          

(61001,)

ticket_id
284932    0.059676
285362    0.027041
285361    0.067562
285338    0.093402
285346    0.097855
            ...   
376496    0.026945
376497    0.026945
376499    0.068448
376500    0.068448
369851    0.280130
Name: prob_1, Length: 61001, dtype: float64

### Modified for autograder:
1. change isna() to isnull()
2. comment out <code>from sklearn.impute import SimpleImputer</code>
3. comment out <code>SimpleImputer()</code> and imp step in pipeline
4. use <code>df.fillna(df.median())</code> to fill out nan instead of using SimpleImputer
5. skip parse_dates= in read_csv and convert non-missing str date to datetime values only when getting number of days inbewteen
6. remove dummy variable creation step bc: The error is module 'pandas' has no attribute 'CategoricalDtype'.
7. remove verbose=0 option in pipeline bc: The error is __init__() got an unexpected keyword argument 'verbose'.   
    
### Finally passed after 10 submissions:    
Your AUC of 0.782798486394 was awarded a value of 1.0 out of 1.0 total grades.  
My cv test score was only 0.7690514557693442

In [155]:
## This is autograder sanity check code:

# import numpy as np
# bm = blight_model()
# res = '{:40s}'.format('Object Type:')
# res += ['Failed: type(bm) should Series\n','Passed\n'][type(bm)==pd.Series]
# res += '{:40s}'.format('Data Shape:')
# res += ['Failed: len(bm) should be 61001\n','Passed\n'][len(bm)==61001]
# res += '{:40s}'.format('Data Values Type:')
# res += ['Failed: bm.dtype should be float\n','Passed\n'][str(bm.dtype).count('float')>0]
# res += '{:40s}'.format('Data Values Infinity:')
# res += ['Failed: values should not be infinity\n','Passed\n'][not any(np.isinf(bm))]
# res += '{:40s}'.format('Data Values NaN:')
# res += ['Failed: values should not be NaN\n','Passed\n'][not any(np.isnan(bm))]
# res += '{:40s}'.format('Data Values in [0,1] Range:')
# res += ['Failed: all values should be in [0.,1.]\n','Passed\n'][all((bm<=1.) & (bm>=0.))]
# res += '{:40s}'.format('Data Values not all 0 or 1:')
# res += ['Failed: values should be scores not predicted labels\n','Passed\n'][not all((bm.isin({0,1,0.0,1.0})))]
# res += '{:40s}'.format('Index Type:')
# res += ['Failed: type(bm.index) should be Int64Index\n','Passed\n'][type(bm.index)==pd.Int64Index]
# res += '{:40s}'.format('Index Values:')
# if bm.index.shape==(61001,):
#     res +=['Failed: index values should match test.csv\n','Passed\n'
#           ][all(pd.read_csv('test.csv',usecols=[0],index_col=0
#                            ).sort_index().index.values==bm.sort_index().index.values)]
# else:
#     res+='Failed: bm.index length should be 61001'
# res += '{:40s}'.format('Can run model twice:')
# bm2 = None
# try:
#     bm2 = blight_model()
#     res += 'Passed\n'
# except:
#     res += ['Failed: second run of blight_model() threw an Exception']
# res += '{:40s}'.format('Can run model twice with same results:')
# if not bm2 is None:
#     res += ['Failed: second run of blight_model() produced different results (this might not be a problem)\n','Passed\n'][
#         all(bm.apply(lambda x:round(x,3))==bm2.apply(lambda x:round(x,3))) and all(bm.index==bm2.index)]    
# print(res)

  if (await self.run_code(code, result,  async_=asy)):
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy


Missing 2 geocode in train set


  return func(self, *args, **kwargs)


Missing 5 geocode in test set


  if (await self.run_code(code, result,  async_=asy)):
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy


Missing 2 geocode in train set


  return func(self, *args, **kwargs)


Missing 5 geocode in test set
Object Type:                            Passed
Data Shape:                             Passed
Data Values Type:                       Passed
Data Values Infinity:                   Passed
Data Values NaN:                        Passed
Data Values in [0,1] Range:             Passed
Data Values not all 0 or 1:             Passed
Index Type:                             Passed
Index Values:                           Passed
Can run model twice:                    Passed
Can run model twice with same results:  Passed

