In [244]:
# A linear regression learning algorithm example using TensorFlow library.

from __future__ import print_function

import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from sklearn.model_selection import train_test_split
from tensorflow.contrib import learn
from sklearn import metrics
import random

rng = np.random

#read csv file
datapath = "./"
Ha_Noi = pd.read_csv(datapath+"OnlineDrivers_HaNoi_10days.csv")
#Add an additional column into the table
# sLength = len(Ha_Noi['accept_rate'])
Ha_Noi['accept_rate_timeT'] = pd.Series(Ha_Noi['accept_rate'], index=Ha_Noi.index)
#Shift the entries in the accept_rate column upward
Ha_Noi.accept_rate = Ha_Noi.accept_rate.shift(-1)
Ha_Noi['Pricing_timeT'] =  pd.Series(Ha_Noi['Pricing'], index=Ha_Noi.index)
Ha_Noi.Pricing = Ha_Noi.Pricing.shift(-1)

#Drop all the "na" entries in the original table
Ha_Noi = Ha_Noi.dropna(subset = ["longwait_percent4"])
Ha_Noi = Ha_Noi.dropna(subset=["accept_rate"])
Ha_Noi = Ha_Noi.dropna(subset = ["longwait_percent2"])
Ha_Noi = Ha_Noi.drop(Ha_Noi[Ha_Noi.Percentchange_onlinedrivers == 0].index)
Ha_Noi = Ha_Noi.dropna(subset = ["DriverBusyRate"])


#define normalized function for our dataset
# def normalize(array):
#     return (array - array.mean()) / array.std()

df2 = pd.DataFrame(Ha_Noi)

#split the dataset into training and testing sets
train_set, test_set = train_test_split(Ha_Noi, test_size=0.2, random_state = random.randint(20, 200))


# Training Data
train_X =  train_set[['Pricing_timeT', 'accept_rate_timeT', 'request', 'long_waiting', 'longwait_percent3','Request/Supply', 'DriverBusyRate', 'Hour', 'wd1', 'wd2', 'wd3', 'wd4', 'wd5', 'wd6', 'wd7']]
train_Y =  train_set['Pricing']

#Testing Data
Xtest = test_set[['Pricing_timeT','accept_rate_timeT', 'request','long_waiting', 'longwait_percent3', 'Request/Supply','DriverBusyRate', 'Hour', 'wd1', 'wd2', 'wd3', 'wd4', 'wd5', 'wd6', 'wd7']]
Ytest = test_set['Pricing']


# 'longwait_percent2', 'accept_rate_timeT', 'Percentchange_onlinedrivers','DriverBusyRate'





Comment: We choose all the features that have the highest positive correlations with Pricing based on the correlation testing obtained in the 3rd slide. This resulted in *7* features plus *2* obvious *lattitude features* to take into account the weekly and seasonality effects (i.e, Hour and Day of the week).

The reason we decided to add the acceptance rate in the previous 20-mins is because of its strong negative correlations with pricing, which prevents the model from overfitting and *gives* it the ability to now when to *decrease* the price (as it's clear that pricing has strong negative correlation with acceptance rate in the previous 20-mins, at least based on our historical data from July 3rd to August 2nd). Without it, our best model - Random Forest Regressor - would always *increase* the price at the moment when we *actually* decreased it (we obtain this observation by testing our Random Forest Regression model with historical data). Also, we chose the factor "longwait_percent3" mainly due to its stronger positive correlation with pricing compared to other "longwait_percentX" factors (X = 2,3,4). Longwait_Percent3 is used to explain the fact that the later a request is made in the 20-min time interval, the more it matters to the acceptance rate in the next 20-min period. The formula to compute longwait_percent3 is in *card 1868* (https://bi.ahamove.com/question/1868?service_id=HAN-BIKE&time_interval=20&order_date=2017-06-04&end_date=2017-06-13). The factor Request/Supply is added because our goal for this project is to choose a price that *BALANCES* supply (number of online drivers) vs demand (total request), which is the same as balance the magnitude of Request/Supply. 

In [228]:
# Parameters
num_epochs = 1000
STEPS = 150000
BATCH_SIZE = 80

#Deep Neural Network Regressor 
feature_column1 = learn.infer_real_valued_columns_from_input(train_X)
# feature_column2 = learn.infer_real_valued_columns_from_input(train_X2)
regressor = learn.DNNRegressor(feature_columns = feature_column1, hidden_units= [100,3,100], model_dir = "./SP3")
regressor.fit(train_X, train_Y, max_steps= STEPS, batch_size= BATCH_SIZE)
Ypred = regressor.predict_scores(Xtest, as_iterable=False)
Ypred = np.asarray(list(Ypred))
rmse = np.sqrt(((Ypred - Ytest) ** 2).mean(axis=0))
print("Root mean square Error: %.3f" %rmse)


INFO:tensorflow:Using default config.
INFO:tensorflow:Using config: {'_save_checkpoints_secs': 600, '_num_ps_replicas': 0, '_keep_checkpoint_max': 5, '_task_type': None, '_is_chief': True, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7fd9746dbfd0>, '_model_dir': './SP3', '_save_checkpoints_steps': None, '_keep_checkpoint_every_n_hours': 10000, '_session_config': None, '_tf_random_seed': None, '_environment': 'local', '_num_worker_replicas': 0, '_task_id': 0, '_save_summary_steps': 100, '_tf_config': gpu_options {
  per_process_gpu_memory_fraction: 1.0
}
, '_evaluation_master': '', '_master': ''}
Instructions for updating:
Estimator is decoupled from Scikit Learn interface by moving into
separate class SKCompat. Arguments x, y and batch_size are only
available in the SKCompat class, Estimator will only accept input_fn.
Example conversion:
  est = Estimator(...) -> est = SKCompat(Estimator(...))
Instructions for updating:
Estimator is decoupled from Sci

INFO:tensorflow:global_step/sec: 228.975
INFO:tensorflow:loss = 0.00741892, step = 5201 (0.438 sec)
INFO:tensorflow:global_step/sec: 247.265
INFO:tensorflow:loss = 0.0075751, step = 5301 (0.403 sec)
INFO:tensorflow:global_step/sec: 208.257
INFO:tensorflow:loss = 0.00161943, step = 5401 (0.483 sec)
INFO:tensorflow:global_step/sec: 239.295
INFO:tensorflow:loss = 0.0034957, step = 5501 (0.415 sec)
INFO:tensorflow:global_step/sec: 224.132
INFO:tensorflow:loss = 0.00771807, step = 5601 (0.447 sec)
INFO:tensorflow:global_step/sec: 258.079
INFO:tensorflow:loss = 0.0102293, step = 5701 (0.388 sec)
INFO:tensorflow:global_step/sec: 214.603
INFO:tensorflow:loss = 0.00659596, step = 5801 (0.466 sec)
INFO:tensorflow:global_step/sec: 233.035
INFO:tensorflow:loss = 0.00816559, step = 5901 (0.429 sec)
INFO:tensorflow:global_step/sec: 177.253
INFO:tensorflow:loss = 0.00473394, step = 6001 (0.564 sec)
INFO:tensorflow:global_step/sec: 238.507
INFO:tensorflow:loss = 0.00790863, step = 6101 (0.421 sec)
INF

INFO:tensorflow:global_step/sec: 233.493
INFO:tensorflow:loss = 0.00461209, step = 13401 (0.428 sec)
INFO:tensorflow:global_step/sec: 240.138
INFO:tensorflow:loss = 0.00762883, step = 13501 (0.414 sec)
INFO:tensorflow:global_step/sec: 286.77
INFO:tensorflow:loss = 0.00849416, step = 13601 (0.349 sec)
INFO:tensorflow:global_step/sec: 286.682
INFO:tensorflow:loss = 0.00786008, step = 13701 (0.349 sec)
INFO:tensorflow:global_step/sec: 265.66
INFO:tensorflow:loss = 0.00320125, step = 13801 (0.376 sec)
INFO:tensorflow:global_step/sec: 278.82
INFO:tensorflow:loss = 0.00546685, step = 13901 (0.358 sec)
INFO:tensorflow:global_step/sec: 234.306
INFO:tensorflow:loss = 0.0034741, step = 14001 (0.427 sec)
INFO:tensorflow:global_step/sec: 278.234
INFO:tensorflow:loss = 0.00724205, step = 14101 (0.360 sec)
INFO:tensorflow:global_step/sec: 248.487
INFO:tensorflow:loss = 0.0057576, step = 14201 (0.403 sec)
INFO:tensorflow:global_step/sec: 213.091
INFO:tensorflow:loss = 0.00970365, step = 14301 (0.473 

INFO:tensorflow:loss = 0.00609772, step = 21501 (0.433 sec)
INFO:tensorflow:global_step/sec: 268.648
INFO:tensorflow:loss = 0.00642629, step = 21601 (0.373 sec)
INFO:tensorflow:global_step/sec: 276.432
INFO:tensorflow:loss = 0.00881566, step = 21701 (0.363 sec)
INFO:tensorflow:global_step/sec: 271.568
INFO:tensorflow:loss = 0.00963953, step = 21801 (0.364 sec)
INFO:tensorflow:global_step/sec: 290.929
INFO:tensorflow:loss = 0.0103906, step = 21901 (0.344 sec)
INFO:tensorflow:global_step/sec: 259.276
INFO:tensorflow:loss = 0.00523292, step = 22001 (0.387 sec)
INFO:tensorflow:global_step/sec: 291.506
INFO:tensorflow:loss = 0.00470502, step = 22101 (0.342 sec)
INFO:tensorflow:global_step/sec: 248.934
INFO:tensorflow:loss = 0.0109216, step = 22201 (0.402 sec)
INFO:tensorflow:global_step/sec: 236.524
INFO:tensorflow:loss = 0.00551679, step = 22301 (0.427 sec)
INFO:tensorflow:global_step/sec: 258.236
INFO:tensorflow:loss = 0.00865517, step = 22401 (0.382 sec)
INFO:tensorflow:global_step/sec: 

INFO:tensorflow:global_step/sec: 223.948
INFO:tensorflow:loss = 0.00969238, step = 29701 (0.447 sec)
INFO:tensorflow:global_step/sec: 249.325
INFO:tensorflow:loss = 0.0105287, step = 29801 (0.401 sec)
INFO:tensorflow:global_step/sec: 270.981
INFO:tensorflow:loss = 0.00577302, step = 29901 (0.369 sec)
INFO:tensorflow:global_step/sec: 260.738
INFO:tensorflow:loss = 0.00147255, step = 30001 (0.384 sec)
INFO:tensorflow:global_step/sec: 237.909
INFO:tensorflow:loss = 0.00473812, step = 30101 (0.423 sec)
INFO:tensorflow:global_step/sec: 249.966
INFO:tensorflow:loss = 0.00694917, step = 30201 (0.397 sec)
INFO:tensorflow:global_step/sec: 261.515
INFO:tensorflow:loss = 0.00668267, step = 30301 (0.383 sec)
INFO:tensorflow:global_step/sec: 247.688
INFO:tensorflow:loss = 0.00666166, step = 30401 (0.404 sec)
INFO:tensorflow:global_step/sec: 262.05
INFO:tensorflow:loss = 0.0055582, step = 30501 (0.382 sec)
INFO:tensorflow:global_step/sec: 256.026
INFO:tensorflow:loss = 0.00356825, step = 30601 (0.39

INFO:tensorflow:global_step/sec: 269.688
INFO:tensorflow:loss = 0.00538466, step = 37901 (0.372 sec)
INFO:tensorflow:global_step/sec: 209.055
INFO:tensorflow:loss = 0.00329627, step = 38001 (0.478 sec)
INFO:tensorflow:global_step/sec: 207.046
INFO:tensorflow:loss = 0.00458977, step = 38101 (0.484 sec)
INFO:tensorflow:global_step/sec: 207.348
INFO:tensorflow:loss = 0.00674194, step = 38201 (0.481 sec)
INFO:tensorflow:global_step/sec: 208.958
INFO:tensorflow:loss = 0.0115099, step = 38301 (0.485 sec)
INFO:tensorflow:global_step/sec: 217.625
INFO:tensorflow:loss = 0.00117469, step = 38401 (0.454 sec)
INFO:tensorflow:global_step/sec: 256.723
INFO:tensorflow:loss = 0.00873238, step = 38501 (0.388 sec)
INFO:tensorflow:global_step/sec: 170.956
INFO:tensorflow:loss = 0.0110124, step = 38601 (0.585 sec)
INFO:tensorflow:global_step/sec: 250.495
INFO:tensorflow:loss = 0.00331995, step = 38701 (0.399 sec)
INFO:tensorflow:global_step/sec: 214.23
INFO:tensorflow:loss = 0.00411677, step = 38801 (0.46

INFO:tensorflow:global_step/sec: 282.972
INFO:tensorflow:loss = 0.00489059, step = 46101 (0.356 sec)
INFO:tensorflow:global_step/sec: 256.372
INFO:tensorflow:loss = 0.00775761, step = 46201 (0.388 sec)
INFO:tensorflow:global_step/sec: 283.281
INFO:tensorflow:loss = 0.0111132, step = 46301 (0.353 sec)
INFO:tensorflow:global_step/sec: 254.951
INFO:tensorflow:loss = 0.00252268, step = 46401 (0.392 sec)
INFO:tensorflow:global_step/sec: 225.265
INFO:tensorflow:loss = 0.0042706, step = 46501 (0.444 sec)
INFO:tensorflow:global_step/sec: 252.555
INFO:tensorflow:loss = 0.00269286, step = 46601 (0.396 sec)
INFO:tensorflow:global_step/sec: 250.238
INFO:tensorflow:loss = 0.0115775, step = 46701 (0.400 sec)
INFO:tensorflow:global_step/sec: 292.78
INFO:tensorflow:loss = 0.00487062, step = 46801 (0.341 sec)
INFO:tensorflow:global_step/sec: 246.907
INFO:tensorflow:loss = 0.0175711, step = 46901 (0.405 sec)
INFO:tensorflow:global_step/sec: 279.149
INFO:tensorflow:loss = 0.00732405, step = 47001 (0.361 

INFO:tensorflow:global_step/sec: 196.692
INFO:tensorflow:loss = 0.00764319, step = 54301 (0.513 sec)
INFO:tensorflow:global_step/sec: 186.492
INFO:tensorflow:loss = 0.00662433, step = 54401 (0.527 sec)
INFO:tensorflow:global_step/sec: 265.692
INFO:tensorflow:loss = 0.00903456, step = 54501 (0.377 sec)
INFO:tensorflow:global_step/sec: 178.836
INFO:tensorflow:loss = 0.00857108, step = 54601 (0.561 sec)
INFO:tensorflow:global_step/sec: 228.741
INFO:tensorflow:loss = 0.00257552, step = 54701 (0.441 sec)
INFO:tensorflow:global_step/sec: 193.539
INFO:tensorflow:loss = 0.0134497, step = 54801 (0.516 sec)
INFO:tensorflow:global_step/sec: 179.231
INFO:tensorflow:loss = 0.00733751, step = 54901 (0.557 sec)
INFO:tensorflow:global_step/sec: 187.142
INFO:tensorflow:loss = 0.0118541, step = 55001 (0.531 sec)
INFO:tensorflow:global_step/sec: 233.818
INFO:tensorflow:loss = 0.0067439, step = 55101 (0.429 sec)
INFO:tensorflow:global_step/sec: 163.435
INFO:tensorflow:loss = 0.00692286, step = 55201 (0.61

INFO:tensorflow:loss = 0.00210967, step = 62401 (0.426 sec)
INFO:tensorflow:global_step/sec: 283.329
INFO:tensorflow:loss = 0.00808525, step = 62501 (0.353 sec)
INFO:tensorflow:global_step/sec: 254.768
INFO:tensorflow:loss = 0.00412726, step = 62601 (0.392 sec)
INFO:tensorflow:global_step/sec: 242.392
INFO:tensorflow:loss = 0.00449245, step = 62701 (0.412 sec)
INFO:tensorflow:global_step/sec: 240.526
INFO:tensorflow:loss = 0.00433594, step = 62801 (0.416 sec)
INFO:tensorflow:global_step/sec: 231.835
INFO:tensorflow:loss = 0.0105083, step = 62901 (0.431 sec)
INFO:tensorflow:global_step/sec: 218.234
INFO:tensorflow:loss = 0.00798331, step = 63001 (0.458 sec)
INFO:tensorflow:global_step/sec: 212.533
INFO:tensorflow:loss = 0.00539181, step = 63101 (0.471 sec)
INFO:tensorflow:global_step/sec: 273.062
INFO:tensorflow:loss = 0.0115753, step = 63201 (0.366 sec)
INFO:tensorflow:global_step/sec: 228.173
INFO:tensorflow:loss = 0.00326733, step = 63301 (0.438 sec)
INFO:tensorflow:global_step/sec: 

INFO:tensorflow:global_step/sec: 236.013
INFO:tensorflow:loss = 0.00328951, step = 70601 (0.424 sec)
INFO:tensorflow:global_step/sec: 237.226
INFO:tensorflow:loss = 0.00600551, step = 70701 (0.426 sec)
INFO:tensorflow:global_step/sec: 254.748
INFO:tensorflow:loss = 0.010159, step = 70801 (0.388 sec)
INFO:tensorflow:global_step/sec: 263.633
INFO:tensorflow:loss = 0.00812969, step = 70901 (0.380 sec)
INFO:tensorflow:global_step/sec: 249.517
INFO:tensorflow:loss = 0.0103557, step = 71001 (0.400 sec)
INFO:tensorflow:global_step/sec: 245.442
INFO:tensorflow:loss = 0.00353305, step = 71101 (0.408 sec)
INFO:tensorflow:global_step/sec: 267.006
INFO:tensorflow:loss = 0.00868721, step = 71201 (0.377 sec)
INFO:tensorflow:global_step/sec: 266.208
INFO:tensorflow:loss = 0.00784388, step = 71301 (0.376 sec)
INFO:tensorflow:global_step/sec: 262.67
INFO:tensorflow:loss = 0.0058244, step = 71401 (0.381 sec)
INFO:tensorflow:global_step/sec: 214.017
INFO:tensorflow:loss = 0.00250806, step = 71501 (0.466 

INFO:tensorflow:global_step/sec: 281.213
INFO:tensorflow:loss = 0.0024338, step = 78801 (0.356 sec)
INFO:tensorflow:global_step/sec: 259.622
INFO:tensorflow:loss = 0.00245564, step = 78901 (0.385 sec)
INFO:tensorflow:global_step/sec: 264.687
INFO:tensorflow:loss = 0.00557763, step = 79001 (0.381 sec)
INFO:tensorflow:global_step/sec: 251.429
INFO:tensorflow:loss = 0.00541344, step = 79101 (0.394 sec)
INFO:tensorflow:global_step/sec: 283.811
INFO:tensorflow:loss = 0.00619874, step = 79201 (0.352 sec)
INFO:tensorflow:global_step/sec: 258.457
INFO:tensorflow:loss = 0.0104804, step = 79301 (0.387 sec)
INFO:tensorflow:global_step/sec: 272.895
INFO:tensorflow:loss = 0.00461574, step = 79401 (0.366 sec)
INFO:tensorflow:global_step/sec: 255.876
INFO:tensorflow:loss = 0.00726495, step = 79501 (0.391 sec)
INFO:tensorflow:global_step/sec: 279.21
INFO:tensorflow:loss = 0.00201098, step = 79601 (0.358 sec)
INFO:tensorflow:global_step/sec: 261.177
INFO:tensorflow:loss = 0.00621093, step = 79701 (0.38

INFO:tensorflow:global_step/sec: 268.26
INFO:tensorflow:loss = 0.00156957, step = 87001 (0.373 sec)
INFO:tensorflow:global_step/sec: 255.277
INFO:tensorflow:loss = 0.00688479, step = 87101 (0.394 sec)
INFO:tensorflow:global_step/sec: 278.146
INFO:tensorflow:loss = 0.00552047, step = 87201 (0.357 sec)
INFO:tensorflow:global_step/sec: 304.376
INFO:tensorflow:loss = 0.00426389, step = 87301 (0.331 sec)
INFO:tensorflow:global_step/sec: 301.917
INFO:tensorflow:loss = 0.0117493, step = 87401 (0.329 sec)
INFO:tensorflow:global_step/sec: 219.432
INFO:tensorflow:loss = 0.0080913, step = 87501 (0.456 sec)
INFO:tensorflow:global_step/sec: 257.05
INFO:tensorflow:loss = 0.00433552, step = 87601 (0.388 sec)
INFO:tensorflow:global_step/sec: 269.92
INFO:tensorflow:loss = 0.00804727, step = 87701 (0.374 sec)
INFO:tensorflow:global_step/sec: 255.394
INFO:tensorflow:loss = 0.00606916, step = 87801 (0.391 sec)
INFO:tensorflow:global_step/sec: 296.272
INFO:tensorflow:loss = 0.00867797, step = 87901 (0.339 

INFO:tensorflow:loss = 0.00541902, step = 95101 (0.449 sec)
INFO:tensorflow:global_step/sec: 229.125
INFO:tensorflow:loss = 0.00728286, step = 95201 (0.441 sec)
INFO:tensorflow:global_step/sec: 184.971
INFO:tensorflow:loss = 0.00381962, step = 95301 (0.537 sec)
INFO:tensorflow:global_step/sec: 242.776
INFO:tensorflow:loss = 0.000672615, step = 95401 (0.412 sec)
INFO:tensorflow:global_step/sec: 259.601
INFO:tensorflow:loss = 0.00488237, step = 95501 (0.387 sec)
INFO:tensorflow:global_step/sec: 232.299
INFO:tensorflow:loss = 0.00621086, step = 95601 (0.430 sec)
INFO:tensorflow:global_step/sec: 246.907
INFO:tensorflow:loss = 0.00389641, step = 95701 (0.404 sec)
INFO:tensorflow:global_step/sec: 248.077
INFO:tensorflow:loss = 0.00276146, step = 95801 (0.402 sec)
INFO:tensorflow:global_step/sec: 280.622
INFO:tensorflow:loss = 0.00206786, step = 95901 (0.356 sec)
INFO:tensorflow:global_step/sec: 285.265
INFO:tensorflow:loss = 0.00794365, step = 96001 (0.351 sec)
INFO:tensorflow:global_step/se

INFO:tensorflow:loss = 0.0115782, step = 103201 (0.417 sec)
INFO:tensorflow:global_step/sec: 259.294
INFO:tensorflow:loss = 0.0116265, step = 103301 (0.386 sec)
INFO:tensorflow:global_step/sec: 247.557
INFO:tensorflow:loss = 0.00930513, step = 103401 (0.404 sec)
INFO:tensorflow:global_step/sec: 219.995
INFO:tensorflow:loss = 0.0091981, step = 103501 (0.459 sec)
INFO:tensorflow:global_step/sec: 241.205
INFO:tensorflow:loss = 0.0100845, step = 103601 (0.411 sec)
INFO:tensorflow:global_step/sec: 220.323
INFO:tensorflow:loss = 0.0153934, step = 103701 (0.454 sec)
INFO:tensorflow:global_step/sec: 194.869
INFO:tensorflow:loss = 0.00619262, step = 103801 (0.513 sec)
INFO:tensorflow:global_step/sec: 240.211
INFO:tensorflow:loss = 0.00421828, step = 103901 (0.415 sec)
INFO:tensorflow:global_step/sec: 196.473
INFO:tensorflow:loss = 0.004718, step = 104001 (0.509 sec)
INFO:tensorflow:global_step/sec: 212.636
INFO:tensorflow:loss = 0.00497493, step = 104101 (0.472 sec)
INFO:tensorflow:global_step/

INFO:tensorflow:loss = 0.00215208, step = 111301 (0.429 sec)
INFO:tensorflow:global_step/sec: 207.489
INFO:tensorflow:loss = 0.0102702, step = 111401 (0.483 sec)
INFO:tensorflow:global_step/sec: 232.163
INFO:tensorflow:loss = 0.0065801, step = 111501 (0.429 sec)
INFO:tensorflow:global_step/sec: 216.238
INFO:tensorflow:loss = 0.003176, step = 111601 (0.462 sec)
INFO:tensorflow:global_step/sec: 226.251
INFO:tensorflow:loss = 0.00586915, step = 111701 (0.443 sec)
INFO:tensorflow:global_step/sec: 247.127
INFO:tensorflow:loss = 0.00650615, step = 111801 (0.403 sec)
INFO:tensorflow:global_step/sec: 229.494
INFO:tensorflow:loss = 0.00960783, step = 111901 (0.436 sec)
INFO:tensorflow:global_step/sec: 228.186
INFO:tensorflow:loss = 0.015971, step = 112001 (0.439 sec)
INFO:tensorflow:global_step/sec: 237.553
INFO:tensorflow:loss = 0.0059348, step = 112101 (0.419 sec)
INFO:tensorflow:global_step/sec: 258.109
INFO:tensorflow:loss = 0.00613497, step = 112201 (0.389 sec)
INFO:tensorflow:global_step/

INFO:tensorflow:loss = 0.0106939, step = 119401 (0.385 sec)
INFO:tensorflow:global_step/sec: 289.303
INFO:tensorflow:loss = 0.0090291, step = 119501 (0.346 sec)
INFO:tensorflow:global_step/sec: 275.424
INFO:tensorflow:loss = 0.00560622, step = 119601 (0.363 sec)
INFO:tensorflow:global_step/sec: 234.915
INFO:tensorflow:loss = 0.0107291, step = 119701 (0.426 sec)
INFO:tensorflow:global_step/sec: 247.325
INFO:tensorflow:loss = 0.00363472, step = 119801 (0.405 sec)
INFO:tensorflow:global_step/sec: 255.638
INFO:tensorflow:loss = 0.0112181, step = 119901 (0.393 sec)
INFO:tensorflow:global_step/sec: 277.02
INFO:tensorflow:loss = 0.00916401, step = 120001 (0.358 sec)
INFO:tensorflow:global_step/sec: 267.333
INFO:tensorflow:loss = 0.0111734, step = 120101 (0.374 sec)
INFO:tensorflow:global_step/sec: 282.59
INFO:tensorflow:loss = 0.00581267, step = 120201 (0.354 sec)
INFO:tensorflow:global_step/sec: 247.244
INFO:tensorflow:loss = 0.00405161, step = 120301 (0.408 sec)
INFO:tensorflow:global_step/

INFO:tensorflow:loss = 0.0176195, step = 127501 (0.337 sec)
INFO:tensorflow:global_step/sec: 291.078
INFO:tensorflow:loss = 0.00393031, step = 127601 (0.343 sec)
INFO:tensorflow:global_step/sec: 295.504
INFO:tensorflow:loss = 0.0107374, step = 127701 (0.339 sec)
INFO:tensorflow:global_step/sec: 287.687
INFO:tensorflow:loss = 0.00194443, step = 127801 (0.350 sec)
INFO:tensorflow:global_step/sec: 279.44
INFO:tensorflow:loss = 0.00868802, step = 127901 (0.355 sec)
INFO:tensorflow:global_step/sec: 258.792
INFO:tensorflow:loss = 0.00965399, step = 128001 (0.387 sec)
INFO:tensorflow:global_step/sec: 290.752
INFO:tensorflow:loss = 0.0120196, step = 128101 (0.344 sec)
INFO:tensorflow:global_step/sec: 280.146
INFO:tensorflow:loss = 0.00292056, step = 128201 (0.356 sec)
INFO:tensorflow:global_step/sec: 238.837
INFO:tensorflow:loss = 0.00763167, step = 128301 (0.419 sec)
INFO:tensorflow:global_step/sec: 256.345
INFO:tensorflow:loss = 0.00772084, step = 128401 (0.391 sec)
INFO:tensorflow:global_st

INFO:tensorflow:loss = 0.00512061, step = 135601 (0.342 sec)
INFO:tensorflow:global_step/sec: 251.321
INFO:tensorflow:loss = 0.0110824, step = 135701 (0.398 sec)
INFO:tensorflow:global_step/sec: 292.875
INFO:tensorflow:loss = 0.00466448, step = 135801 (0.341 sec)
INFO:tensorflow:global_step/sec: 288.991
INFO:tensorflow:loss = 0.0018848, step = 135901 (0.346 sec)
INFO:tensorflow:global_step/sec: 267.531
INFO:tensorflow:loss = 0.00549407, step = 136001 (0.374 sec)
INFO:tensorflow:global_step/sec: 276.079
INFO:tensorflow:loss = 0.00731052, step = 136101 (0.362 sec)
INFO:tensorflow:global_step/sec: 271.738
INFO:tensorflow:loss = 0.00706047, step = 136201 (0.368 sec)
INFO:tensorflow:global_step/sec: 274.969
INFO:tensorflow:loss = 0.00414335, step = 136301 (0.364 sec)
INFO:tensorflow:global_step/sec: 241.055
INFO:tensorflow:loss = 0.00855257, step = 136401 (0.414 sec)
INFO:tensorflow:global_step/sec: 288.822
INFO:tensorflow:loss = 0.00124869, step = 136501 (0.346 sec)
INFO:tensorflow:global_

INFO:tensorflow:loss = 0.0172038, step = 143701 (0.387 sec)
INFO:tensorflow:global_step/sec: 213.471
INFO:tensorflow:loss = 0.00324081, step = 143801 (0.467 sec)
INFO:tensorflow:global_step/sec: 234.144
INFO:tensorflow:loss = 0.0051611, step = 143901 (0.427 sec)
INFO:tensorflow:global_step/sec: 258.453
INFO:tensorflow:loss = 0.00909664, step = 144001 (0.386 sec)
INFO:tensorflow:global_step/sec: 254.666
INFO:tensorflow:loss = 0.0052549, step = 144101 (0.393 sec)
INFO:tensorflow:global_step/sec: 280.65
INFO:tensorflow:loss = 0.0110936, step = 144201 (0.359 sec)
INFO:tensorflow:global_step/sec: 255.333
INFO:tensorflow:loss = 0.00812007, step = 144301 (0.389 sec)
INFO:tensorflow:global_step/sec: 246.506
INFO:tensorflow:loss = 0.00569292, step = 144401 (0.407 sec)
INFO:tensorflow:global_step/sec: 277.999
INFO:tensorflow:loss = 0.0100304, step = 144501 (0.358 sec)
INFO:tensorflow:global_step/sec: 253.881
INFO:tensorflow:loss = 0.00580155, step = 144601 (0.394 sec)
INFO:tensorflow:global_step

Root mean square Error: 0.090


Comment: We use DNN Regression Model with 2 layers: 1 input with 100 nodes, and 1 output with 100 nodes, and train our neural networks using the train set X obtained from a random cut to our original dataset,  and then measure its accuracy with RMSE using the test set X. We save our model into the same folder as our Python code with the file name "SP". In this case, the RMSE of DNN Regression is much higher than that of RF or Linear Regression Model

In [234]:
x = np.array([0.56,0.2, 18,0, 0, 0, 0, 0, 0, 1])
feature_column1 = learn.infer_real_valued_columns_from_input(x)

y = np.array([[1, 0.875, 40, 5, 0.225, 0.1132, 0.0679, 0.83333*24, 0, 1, 0, 0, 0, 0, 0]])
new_regressor = learn.DNNRegressor(feature_columns = feature_column1, hidden_units= [100,3,100], model_dir = './SP3')
new_regressor.predict_scores(y, as_iterable = False)

INFO:tensorflow:Using default config.
INFO:tensorflow:Using config: {'_save_checkpoints_secs': 600, '_num_ps_replicas': 0, '_keep_checkpoint_max': 5, '_task_type': None, '_is_chief': True, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7fd9746496d0>, '_model_dir': './SP3', '_save_checkpoints_steps': None, '_keep_checkpoint_every_n_hours': 10000, '_session_config': None, '_tf_random_seed': None, '_environment': 'local', '_num_worker_replicas': 0, '_task_id': 0, '_save_summary_steps': 100, '_tf_config': gpu_options {
  per_process_gpu_memory_fraction: 1.0
}
, '_evaluation_master': '', '_master': ''}
Instructions for updating:
The default behavior of predict() is changing. The default value for
as_iterable will change to True, and then the flag will be removed
altogether. The behavior of this flag is described below.
Instructions for updating:
Estimator is decoupled from Scikit Learn interface by moving into
separate class SKCompat. Arguments x, y and batc

array([ 0.81819803], dtype=float32)

Comment: We load our saved neural network model, and then use it to make prediction with new input. Note that the dimension of the *VECTOR* input always *EQUALS* to (number of columns of train_X x 1). The prediction here is sometimes more accurate than that of RF (although they are pretty close), but most of the time it is worse than RF. Furthermore, RF is way better than Linear Regression model for this particular problem when testing with real-world inputs (based on trends and how accurate the predicted value is compared to actual).

# Correlation testing between acceptance rate, online drivers and percent change in online drivers

In [240]:
corr_matrix = Ha_Noi.corr()
corr_matrix["Pricing"].sort_values(ascending=False)

# %matplotlib inline
# import matplotlib.pyplot as plt
# df2.hist(bins = 50, figsize = (15, 15))



Pricing                                          1.000000
Pricing_timeT                                    0.885364
wd7                                              0.114458
wd5                                              0.085784
longwait_percent3                                0.083549
longwait_percent4                                0.082287
longwait_percent2                                0.081695
longwait_percent1                                0.078280
long_waiting                                     0.046830
Hour                                             0.039753
Request/Supply                                   0.017537
wd2                                             -0.018119
wd3                                             -0.021234
request                                         -0.022793
wd6                                             -0.022800
Percentchange_onlinedrivers                     -0.029610
wd1                                             -0.049747
accept_rate   

# Random Forest Algorithm and Model Evaluations using Cross-Validation

In [245]:
import numpy as np
from sklearn.preprocessing import LabelEncoder  
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_error
from sklearn.metrics import mean_absolute_error
from sklearn.model_selection import cross_val_score
from sklearn.linear_model import LinearRegression
from sklearn.externals import joblib


forest_reg = RandomForestRegressor()
forest_model = forest_reg.fit(train_X, train_Y.ravel())
Ypred2 = forest_reg.predict(Xtest)

lin_reg = LinearRegression()
linreg_model = lin_reg.fit(train_X, train_Y.ravel())
Ypred3 = lin_reg.predict(Xtest)

joblib.dump(linreg_model, 'LinReg_model.csv', protocol=2) #save the Lin-Reg model into the file named "LinReg_model.pkl"
joblib.dump(forest_model, 'Forest_Model.csv', protocol=2) #save the RF model into the file named "Forest_model.pkl"


lin_mse = mean_squared_error(Ytest, Ypred2)
forest_rmse = np.sqrt(lin_mse)
print("Root Mean Square Error of RF Algo:\t",forest_rmse)

lin_mse2 = mean_squared_error(Ytest, Ypred3)
lin_rmse = np.sqrt(lin_mse2)
print("Root Mean Square Error of Linear Regression Algo:\t", lin_rmse)

#Evaluate RF algo on the whole training set by cross-validation
scores = cross_val_score(forest_reg, train_X, train_Y.ravel(), scoring = "neg_mean_squared_error", cv = 10)
forest_rmse_scores = np.sqrt(-scores)

#Evaluate RF algo on the whole test set by cross-validation
scores3 = cross_val_score(forest_reg, Xtest, Ytest.ravel(), scoring = "neg_mean_squared_error", cv = 10)
forest_rmse_scores3 = np.sqrt(-scores3)

#Evaluate Lin-Reg algo on the whole training set by cross-validation with k = 50 folds
scores2 = cross_val_score(lin_reg, train_X, train_Y.ravel(), scoring = "neg_mean_squared_error", cv = 10)
linreg_rmse_scores2 = np.sqrt(-scores2)

#Evaluate Lin-Reg algo on the test set by cross-validation
scores4 = cross_val_score(lin_reg, Xtest, Ytest.ravel(), scoring = "neg_mean_squared_error", cv = 10)
linreg_rmse_scores4 = np.sqrt(-scores4)

def display_scores(scores):
    print("Scores:", scores)
    print("Mean:", scores.mean())
    print("Standard", scores.std())
    print("Max:", scores.max())
    print("Min:", scores.min())

display_scores(linreg_rmse_scores2)
display_scores(linreg_rmse_scores4)
lin_mae_RF = mean_absolute_error(Ytest, Ypred2)
lin_mae_LR = mean_absolute_error(Ytest, Ypred3)

# display_scores(Accept_rate_prediction)
# print("Mean Square Error:\t", linreg_rmse_scores2)
# print("Mean Absolute Error:\t", lin_mae)

Root Mean Square Error of RF Algo:	 0.0360283597271
Root Mean Square Error of Linear Regression Algo:	 0.0352627974441
Scores: [ 0.0276238   0.05822091  0.06850968  0.04625425  0.03871351  0.03433169
  0.01718119  0.0427755   0.02523202  0.03400342]
Mean: 0.039284595989
Standard 0.0146567723974
Max: 0.0685096755811
Min: 0.0171811892026
Scores: [ 0.01145857  0.09111746  0.01300991  0.03629501  0.04353716  0.01177619
  0.02941098  0.01463893  0.03188275  0.02766631]
Mean: 0.0310793278391
Standard 0.0227077781564
Max: 0.0911174619879
Min: 0.0114585676898


Comment: We generated Random Forest and Linear Regression model using the same train and test sets generated in the first slide. We then compute the RMSE of each model, as well as the RMSE using cross-validation with 50 folds. In both ways, RF performs, *at the very least*, as worse as Linear Regression model in terms of the RMSE.  

In [250]:
Pricing_timeT = 1.1
Accept_rate = 0.75325
Request = 77
long_waiting = 18
longwait_percent3 = 0.23377
Supply = 593
DriverBusyRate = 0.0955
Hour = 0.71875
wd = [0,0,0,0,0,0,0]
wd[3] = 1

a = np.array([Pricing_timeT, Accept_rate, Request, long_waiting, longwait_percent3, Request/Supply, DriverBusyRate, Hour])
Xtest = np.array([np.concatenate([a, wd])],  dtype=np.float32)

lin_model = joblib.load('LinReg_model.csv')
forest_model = joblib.load('Forest_Model.csv')

float(lin_model.predict(Xtest)), float(forest_model.predict(Xtest))


(1.1057942273499437, 1.19)

Comment: When testing with real data, we observed that the Random Forest Regression works much better than either lin-Reg or DNN Regression because it predicts the correct trend of the pricing that we used in the past (although for downward trends, it resuled in overfits, and for upward trends, it resulted in underfit. One possible explanation  might be due to our bign increase/decrease in pricing suddenly that it's hard to predict such magnitude exactly). I wrote this code so that anh Thuc only needs to supply the inputs, and click "Run" to automatically get the result:)