In [26]:
# A linear regression learning algorithm example using TensorFlow library.

from __future__ import print_function

import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from sklearn.model_selection import train_test_split
from tensorflow.contrib import learn
from sklearn import metrics
import random

rng = np.random

#read csv file
datapath = "./"
Ha_Noi = pd.read_csv(datapath+"OnlineDrivers_HaNoi_10days.csv")
#Add an additional column into the table
# sLength = len(Ha_Noi['accept_rate'])
Ha_Noi['accept_rate_timeT'] = pd.Series(Ha_Noi['accept_rate'], index=Ha_Noi.index)
Ha_Noi['online_drivers'] = pd.Series(Ha_Noi['online_drivers (no data on day 13 - 09:20:00)'], index=Ha_Noi.index)

#Shift the entries in the accept_rate column upward
Ha_Noi.accept_rate = Ha_Noi.accept_rate.shift(-1)
Ha_Noi['Pricing_timeT'] =  pd.Series(Ha_Noi['Pricing'], index=Ha_Noi.index)
Ha_Noi.Pricing = Ha_Noi.Pricing.shift(-1)

#Drop all the "na" entries in the original table
Ha_Noi = Ha_Noi.dropna(subset = ["longwait_percent4"])
Ha_Noi = Ha_Noi.dropna(subset=["accept_rate"])
Ha_Noi = Ha_Noi.dropna(subset = ["longwait_percent2"])
Ha_Noi = Ha_Noi.drop(Ha_Noi[Ha_Noi.Percentchange_onlinedrivers == 0].index)
Ha_Noi = Ha_Noi.dropna(subset = ["DriverBusyRate"])


#define normalized function for our dataset
# def normalize(array):
#     return (array - array.mean()) / array.std()

df2 = pd.DataFrame(Ha_Noi)


#split the dataset into training and testing sets
train_set, test_set = train_test_split(Ha_Noi, test_size=0.2, random_state = random.randint(20, 200))


# Training Data
train_X =  train_set[['accept_rate_timeT', 'longwait_percent3','Request/Supply', 'DriverBusyRate', 'Hour', 'wd1', 'wd2', 'wd3', 'wd4', 'wd5', 'wd6', 'wd7']]
train_Y =  train_set['accept_rate']

#Testing Data
Xtest = test_set[['accept_rate_timeT', 'longwait_percent3', 'Request/Supply', 'DriverBusyRate', 'Hour', 'wd1', 'wd2', 'wd3', 'wd4', 'wd5', 'wd6', 'wd7']]
Ytest = test_set['accept_rate']


#'longwait_percent2', 'accept_rate_timeT', 'Percentchange_onlinedrivers','DriverBusyRate'





Comment: This dataset is made up combining the queries from 3 cards: 1868, 1943, 1950. Since our system only saves the information of the last 10 days for Number of Online Drivers, which means we also only have the data for that many of days for DriverBusyRate, the author had to make up of the rest of the data by querying on successive days. The consequence is that the DriverBusyRate and Request/Supply could not be completely correct, which might be the reason why our RF model overfits. However, what is amazing is that our RF model still often predicts the trend of the acceptance rate *accurately*, even with such a noisy dataset!

We choose all the features that have the highest positive correlations with Pricing based on the correlation testing obtained in the 3rd slide. This resulted in *7* features plus *2* obvious *lattitude features* to take into account the weekly and seasonality effects (i.e, Hour and Day of the week). 

The reason we decided to add the acceptance rate in the previous 20-mins is because of its strong negative correlations with pricing, which prevents the model from overfitting and *gives* it the ability to now when to *decrease* the price (as it's clear that pricing has strong negative correlation with acceptance rate in the previous 20-mins, at least based on our historical data from July 3rd to August 2nd). Without it, our best model - Random Forest Regressor - would always *increase* the price at the moment when we *actually* decreased it (we obtain this observation by testing our Random Forest Regression model with historical data). Also, we chose the factor "longwait_percent3" mainly due to its stronger negative correlation with pricing compared to other "longwait_percentX" factors (X = 2,3,4). Longwait_Percent3 is used to explain the fact that the later a request is made in the 20-min time interval, the more it matters to the acceptance rate in the next 20-min period. The formula to compute longwait_percent3 is in *card 1868* (https://bi.ahamove.com/question/1868?service_id=HAN-BIKE&time_interval=20&order_date=2017-06-04&end_date=2017-06-13). This factor is used to prevent our model from *overfitting*

In [38]:
# Parameters
num_epochs = 1000
STEPS = 150000
BATCH_SIZE = 80

#Deep Neural Network Regressor 
feature_column1 = learn.infer_real_valued_columns_from_input(train_X)
# feature_column2 = learn.infer_real_valued_columns_from_input(train_X2)
regressor = learn.DNNRegressor(feature_columns = feature_column1, hidden_units= [100,100], model_dir = "./AR3")
regressor.fit(train_X, train_Y, max_steps= STEPS, batch_size= BATCH_SIZE)
Ypred = regressor.predict_scores(Xtest, as_iterable=False)
Ypred = np.asarray(list(Ypred))
rmse = np.sqrt(((Ypred - Ytest) ** 2).mean(axis=0))
print("Root mean square Error: %.3f" %rmse)


INFO:tensorflow:Using default config.
INFO:tensorflow:Using config: {'_save_checkpoints_secs': 600, '_num_ps_replicas': 0, '_keep_checkpoint_max': 5, '_task_type': None, '_is_chief': True, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f9438fed550>, '_model_dir': './AR3', '_save_checkpoints_steps': None, '_keep_checkpoint_every_n_hours': 10000, '_session_config': None, '_tf_random_seed': None, '_environment': 'local', '_num_worker_replicas': 0, '_task_id': 0, '_save_summary_steps': 100, '_tf_config': gpu_options {
  per_process_gpu_memory_fraction: 1.0
}
, '_evaluation_master': '', '_master': ''}
Instructions for updating:
Estimator is decoupled from Scikit Learn interface by moving into
separate class SKCompat. Arguments x, y and batch_size are only
available in the SKCompat class, Estimator will only accept input_fn.
Example conversion:
  est = Estimator(...) -> est = SKCompat(Estimator(...))
Instructions for updating:
Estimator is decoupled from Sci

INFO:tensorflow:global_step/sec: 186.027
INFO:tensorflow:loss = 0.00257515, step = 5201 (0.537 sec)
INFO:tensorflow:global_step/sec: 191.108
INFO:tensorflow:loss = 0.00479797, step = 5301 (0.524 sec)
INFO:tensorflow:global_step/sec: 229.751
INFO:tensorflow:loss = 0.00296212, step = 5401 (0.435 sec)
INFO:tensorflow:global_step/sec: 213.366
INFO:tensorflow:loss = 0.0025869, step = 5501 (0.470 sec)
INFO:tensorflow:global_step/sec: 198.817
INFO:tensorflow:loss = 0.00418669, step = 5601 (0.501 sec)
INFO:tensorflow:global_step/sec: 224.183
INFO:tensorflow:loss = 0.00941251, step = 5701 (0.447 sec)
INFO:tensorflow:global_step/sec: 218.402
INFO:tensorflow:loss = 0.00357046, step = 5801 (0.458 sec)
INFO:tensorflow:global_step/sec: 216.65
INFO:tensorflow:loss = 0.00462269, step = 5901 (0.463 sec)
INFO:tensorflow:global_step/sec: 225.419
INFO:tensorflow:loss = 0.00269474, step = 6001 (0.442 sec)
INFO:tensorflow:global_step/sec: 232.129
INFO:tensorflow:loss = 0.00626478, step = 6101 (0.431 sec)
IN

INFO:tensorflow:global_step/sec: 198.932
INFO:tensorflow:loss = 0.00462006, step = 13401 (0.500 sec)
INFO:tensorflow:global_step/sec: 234.569
INFO:tensorflow:loss = 0.00446535, step = 13501 (0.429 sec)
INFO:tensorflow:global_step/sec: 230.8
INFO:tensorflow:loss = 0.00341082, step = 13601 (0.431 sec)
INFO:tensorflow:global_step/sec: 175.84
INFO:tensorflow:loss = 0.00273174, step = 13701 (0.569 sec)
INFO:tensorflow:global_step/sec: 234.54
INFO:tensorflow:loss = 0.00275252, step = 13801 (0.426 sec)
INFO:tensorflow:global_step/sec: 226.639
INFO:tensorflow:loss = 0.00402048, step = 13901 (0.442 sec)
INFO:tensorflow:global_step/sec: 202.489
INFO:tensorflow:loss = 0.00259799, step = 14001 (0.493 sec)
INFO:tensorflow:global_step/sec: 235.193
INFO:tensorflow:loss = 0.00466792, step = 14101 (0.427 sec)
INFO:tensorflow:global_step/sec: 249.738
INFO:tensorflow:loss = 0.00384944, step = 14201 (0.400 sec)
INFO:tensorflow:global_step/sec: 212.033
INFO:tensorflow:loss = 0.00477467, step = 14301 (0.470

INFO:tensorflow:loss = 0.00408743, step = 21501 (0.462 sec)
INFO:tensorflow:global_step/sec: 240.071
INFO:tensorflow:loss = 0.0155604, step = 21601 (0.417 sec)
INFO:tensorflow:global_step/sec: 206.33
INFO:tensorflow:loss = 0.00335909, step = 21701 (0.487 sec)
INFO:tensorflow:global_step/sec: 206.031
INFO:tensorflow:loss = 0.00249002, step = 21801 (0.486 sec)
INFO:tensorflow:global_step/sec: 241.888
INFO:tensorflow:loss = 0.00325247, step = 21901 (0.411 sec)
INFO:tensorflow:global_step/sec: 226.05
INFO:tensorflow:loss = 0.00302526, step = 22001 (0.445 sec)
INFO:tensorflow:global_step/sec: 189.219
INFO:tensorflow:loss = 0.00283946, step = 22101 (0.526 sec)
INFO:tensorflow:global_step/sec: 213.964
INFO:tensorflow:loss = 0.00290987, step = 22201 (0.468 sec)
INFO:tensorflow:global_step/sec: 243.919
INFO:tensorflow:loss = 0.00405336, step = 22301 (0.409 sec)
INFO:tensorflow:global_step/sec: 195.505
INFO:tensorflow:loss = 0.00304727, step = 22401 (0.512 sec)
INFO:tensorflow:global_step/sec: 1

INFO:tensorflow:global_step/sec: 189.927
INFO:tensorflow:loss = 0.00277171, step = 29701 (0.527 sec)
INFO:tensorflow:global_step/sec: 205.734
INFO:tensorflow:loss = 0.0034925, step = 29801 (0.486 sec)
INFO:tensorflow:global_step/sec: 224.79
INFO:tensorflow:loss = 0.00237523, step = 29901 (0.445 sec)
INFO:tensorflow:global_step/sec: 234.546
INFO:tensorflow:loss = 0.00360607, step = 30001 (0.427 sec)
INFO:tensorflow:global_step/sec: 214.219
INFO:tensorflow:loss = 0.00299471, step = 30101 (0.466 sec)
INFO:tensorflow:global_step/sec: 229.409
INFO:tensorflow:loss = 0.00409553, step = 30201 (0.437 sec)
INFO:tensorflow:global_step/sec: 231.493
INFO:tensorflow:loss = 0.0095599, step = 30301 (0.435 sec)
INFO:tensorflow:global_step/sec: 218.809
INFO:tensorflow:loss = 0.00466556, step = 30401 (0.454 sec)
INFO:tensorflow:global_step/sec: 220.837
INFO:tensorflow:loss = 0.0029476, step = 30501 (0.453 sec)
INFO:tensorflow:global_step/sec: 178.625
INFO:tensorflow:loss = 0.0033209, step = 30601 (0.562 

INFO:tensorflow:loss = 0.00233164, step = 37801 (0.390 sec)
INFO:tensorflow:global_step/sec: 240.199
INFO:tensorflow:loss = 0.00361661, step = 37901 (0.417 sec)
INFO:tensorflow:global_step/sec: 220.607
INFO:tensorflow:loss = 0.00316936, step = 38001 (0.453 sec)
INFO:tensorflow:global_step/sec: 239.981
INFO:tensorflow:loss = 0.00246614, step = 38101 (0.416 sec)
INFO:tensorflow:global_step/sec: 228.725
INFO:tensorflow:loss = 0.00386319, step = 38201 (0.438 sec)
INFO:tensorflow:global_step/sec: 238.276
INFO:tensorflow:loss = 0.00234236, step = 38301 (0.419 sec)
INFO:tensorflow:global_step/sec: 217.292
INFO:tensorflow:loss = 0.00887863, step = 38401 (0.460 sec)
INFO:tensorflow:global_step/sec: 203.224
INFO:tensorflow:loss = 0.00220047, step = 38501 (0.492 sec)
INFO:tensorflow:global_step/sec: 228.684
INFO:tensorflow:loss = 0.00383987, step = 38601 (0.439 sec)
INFO:tensorflow:global_step/sec: 228.819
INFO:tensorflow:loss = 0.00229658, step = 38701 (0.436 sec)
INFO:tensorflow:global_step/sec

INFO:tensorflow:global_step/sec: 203.115
INFO:tensorflow:loss = 0.00296442, step = 46001 (0.492 sec)
INFO:tensorflow:global_step/sec: 246.128
INFO:tensorflow:loss = 0.00365282, step = 46101 (0.406 sec)
INFO:tensorflow:global_step/sec: 202.135
INFO:tensorflow:loss = 0.0028491, step = 46201 (0.495 sec)
INFO:tensorflow:global_step/sec: 198.999
INFO:tensorflow:loss = 0.00291245, step = 46301 (0.504 sec)
INFO:tensorflow:global_step/sec: 182.806
INFO:tensorflow:loss = 0.00285173, step = 46401 (0.546 sec)
INFO:tensorflow:global_step/sec: 231.573
INFO:tensorflow:loss = 0.00668653, step = 46501 (0.432 sec)
INFO:tensorflow:global_step/sec: 204.255
INFO:tensorflow:loss = 0.00285072, step = 46601 (0.489 sec)
INFO:tensorflow:global_step/sec: 228.327
INFO:tensorflow:loss = 0.00308481, step = 46701 (0.440 sec)
INFO:tensorflow:global_step/sec: 217.328
INFO:tensorflow:loss = 0.00267063, step = 46801 (0.458 sec)
INFO:tensorflow:global_step/sec: 247.346
INFO:tensorflow:loss = 0.00233064, step = 46901 (0.

INFO:tensorflow:loss = 0.00297777, step = 54101 (0.424 sec)
INFO:tensorflow:global_step/sec: 253.297
INFO:tensorflow:loss = 0.00306799, step = 54201 (0.395 sec)
INFO:tensorflow:global_step/sec: 233.378
INFO:tensorflow:loss = 0.00306746, step = 54301 (0.429 sec)
INFO:tensorflow:global_step/sec: 251.78
INFO:tensorflow:loss = 0.00367097, step = 54401 (0.397 sec)
INFO:tensorflow:global_step/sec: 213.138
INFO:tensorflow:loss = 0.00447329, step = 54501 (0.470 sec)
INFO:tensorflow:global_step/sec: 239.959
INFO:tensorflow:loss = 0.00358149, step = 54601 (0.418 sec)
INFO:tensorflow:global_step/sec: 186.511
INFO:tensorflow:loss = 0.00412684, step = 54701 (0.534 sec)
INFO:tensorflow:global_step/sec: 212.289
INFO:tensorflow:loss = 0.00404176, step = 54801 (0.471 sec)
INFO:tensorflow:global_step/sec: 249.867
INFO:tensorflow:loss = 0.002714, step = 54901 (0.400 sec)
INFO:tensorflow:global_step/sec: 245.936
INFO:tensorflow:loss = 0.0029778, step = 55001 (0.407 sec)
INFO:tensorflow:global_step/sec: 25

INFO:tensorflow:global_step/sec: 207.07
INFO:tensorflow:loss = 0.00297372, step = 62301 (0.483 sec)
INFO:tensorflow:global_step/sec: 228.195
INFO:tensorflow:loss = 0.00484799, step = 62401 (0.438 sec)
INFO:tensorflow:global_step/sec: 215.011
INFO:tensorflow:loss = 0.00353844, step = 62501 (0.465 sec)
INFO:tensorflow:global_step/sec: 236.79
INFO:tensorflow:loss = 0.00184198, step = 62601 (0.422 sec)
INFO:tensorflow:global_step/sec: 224.888
INFO:tensorflow:loss = 0.00303737, step = 62701 (0.445 sec)
INFO:tensorflow:global_step/sec: 230.636
INFO:tensorflow:loss = 0.00283513, step = 62801 (0.433 sec)
INFO:tensorflow:global_step/sec: 234.934
INFO:tensorflow:loss = 0.00290772, step = 62901 (0.426 sec)
INFO:tensorflow:global_step/sec: 227.4
INFO:tensorflow:loss = 0.00353189, step = 63001 (0.440 sec)
INFO:tensorflow:global_step/sec: 188.481
INFO:tensorflow:loss = 0.00410448, step = 63101 (0.532 sec)
INFO:tensorflow:global_step/sec: 192.763
INFO:tensorflow:loss = 0.0031625, step = 63201 (0.518 

INFO:tensorflow:loss = 0.002798, step = 70401 (0.506 sec)
INFO:tensorflow:global_step/sec: 238.077
INFO:tensorflow:loss = 0.00214162, step = 70501 (0.420 sec)
INFO:tensorflow:global_step/sec: 242.843
INFO:tensorflow:loss = 0.00264477, step = 70601 (0.412 sec)
INFO:tensorflow:global_step/sec: 227.769
INFO:tensorflow:loss = 0.00297826, step = 70701 (0.438 sec)
INFO:tensorflow:global_step/sec: 223.464
INFO:tensorflow:loss = 0.00376312, step = 70801 (0.449 sec)
INFO:tensorflow:global_step/sec: 251.575
INFO:tensorflow:loss = 0.00267975, step = 70901 (0.397 sec)
INFO:tensorflow:global_step/sec: 217.978
INFO:tensorflow:loss = 0.0029137, step = 71001 (0.459 sec)
INFO:tensorflow:global_step/sec: 220.461
INFO:tensorflow:loss = 0.00275229, step = 71101 (0.454 sec)
INFO:tensorflow:global_step/sec: 222.381
INFO:tensorflow:loss = 0.00379852, step = 71201 (0.449 sec)
INFO:tensorflow:global_step/sec: 232.346
INFO:tensorflow:loss = 0.00290355, step = 71301 (0.430 sec)
INFO:tensorflow:global_step/sec: 2

INFO:tensorflow:global_step/sec: 193.168
INFO:tensorflow:loss = 0.00212313, step = 78601 (0.518 sec)
INFO:tensorflow:global_step/sec: 235.526
INFO:tensorflow:loss = 0.00276025, step = 78701 (0.425 sec)
INFO:tensorflow:global_step/sec: 244.046
INFO:tensorflow:loss = 0.00415111, step = 78801 (0.412 sec)
INFO:tensorflow:global_step/sec: 239.984
INFO:tensorflow:loss = 0.00412479, step = 78901 (0.414 sec)
INFO:tensorflow:global_step/sec: 229.591
INFO:tensorflow:loss = 0.00300277, step = 79001 (0.436 sec)
INFO:tensorflow:global_step/sec: 202.71
INFO:tensorflow:loss = 0.00412734, step = 79101 (0.494 sec)
INFO:tensorflow:global_step/sec: 203.568
INFO:tensorflow:loss = 0.00246594, step = 79201 (0.491 sec)
INFO:tensorflow:global_step/sec: 226.04
INFO:tensorflow:loss = 0.00387279, step = 79301 (0.445 sec)
INFO:tensorflow:global_step/sec: 244.311
INFO:tensorflow:loss = 0.00163096, step = 79401 (0.406 sec)
INFO:tensorflow:global_step/sec: 217.775
INFO:tensorflow:loss = 0.00259422, step = 79501 (0.4

INFO:tensorflow:loss = 0.00409388, step = 86701 (0.460 sec)
INFO:tensorflow:global_step/sec: 199.805
INFO:tensorflow:loss = 0.00232169, step = 86801 (0.500 sec)
INFO:tensorflow:global_step/sec: 247.89
INFO:tensorflow:loss = 0.00217422, step = 86901 (0.404 sec)
INFO:tensorflow:global_step/sec: 241.128
INFO:tensorflow:loss = 0.00411548, step = 87001 (0.415 sec)
INFO:tensorflow:global_step/sec: 254.166
INFO:tensorflow:loss = 0.00235626, step = 87101 (0.393 sec)
INFO:tensorflow:global_step/sec: 220.364
INFO:tensorflow:loss = 0.00414317, step = 87201 (0.454 sec)
INFO:tensorflow:global_step/sec: 244.166
INFO:tensorflow:loss = 0.00845606, step = 87301 (0.410 sec)
INFO:tensorflow:global_step/sec: 216.155
INFO:tensorflow:loss = 0.00257462, step = 87401 (0.462 sec)
INFO:tensorflow:global_step/sec: 195.157
INFO:tensorflow:loss = 0.00396727, step = 87501 (0.513 sec)
INFO:tensorflow:global_step/sec: 212.684
INFO:tensorflow:loss = 0.00384387, step = 87601 (0.469 sec)
INFO:tensorflow:global_step/sec:

INFO:tensorflow:global_step/sec: 244.594
INFO:tensorflow:loss = 0.00323795, step = 94901 (0.410 sec)
INFO:tensorflow:global_step/sec: 251.064
INFO:tensorflow:loss = 0.00369391, step = 95001 (0.396 sec)
INFO:tensorflow:global_step/sec: 248.598
INFO:tensorflow:loss = 0.00305321, step = 95101 (0.405 sec)
INFO:tensorflow:global_step/sec: 213.14
INFO:tensorflow:loss = 0.00207392, step = 95201 (0.467 sec)
INFO:tensorflow:global_step/sec: 245.731
INFO:tensorflow:loss = 0.00180097, step = 95301 (0.409 sec)
INFO:tensorflow:global_step/sec: 232.413
INFO:tensorflow:loss = 0.0031994, step = 95401 (0.428 sec)
INFO:tensorflow:global_step/sec: 221.896
INFO:tensorflow:loss = 0.00266508, step = 95501 (0.450 sec)
INFO:tensorflow:global_step/sec: 217.021
INFO:tensorflow:loss = 0.00217378, step = 95601 (0.461 sec)
INFO:tensorflow:global_step/sec: 197.425
INFO:tensorflow:loss = 0.00296403, step = 95701 (0.509 sec)
INFO:tensorflow:global_step/sec: 204.664
INFO:tensorflow:loss = 0.00187878, step = 95801 (0.4

INFO:tensorflow:loss = 0.00277155, step = 103001 (0.420 sec)
INFO:tensorflow:global_step/sec: 238.523
INFO:tensorflow:loss = 0.00214789, step = 103101 (0.417 sec)
INFO:tensorflow:global_step/sec: 236.435
INFO:tensorflow:loss = 0.00204897, step = 103201 (0.423 sec)
INFO:tensorflow:global_step/sec: 236.177
INFO:tensorflow:loss = 0.00275988, step = 103301 (0.423 sec)
INFO:tensorflow:global_step/sec: 231.45
INFO:tensorflow:loss = 0.00241362, step = 103401 (0.433 sec)
INFO:tensorflow:global_step/sec: 204.608
INFO:tensorflow:loss = 0.00356726, step = 103501 (0.488 sec)
INFO:tensorflow:global_step/sec: 202.241
INFO:tensorflow:loss = 0.00337311, step = 103601 (0.494 sec)
INFO:tensorflow:global_step/sec: 252.69
INFO:tensorflow:loss = 0.0031203, step = 103701 (0.396 sec)
INFO:tensorflow:global_step/sec: 198.619
INFO:tensorflow:loss = 0.00257793, step = 103801 (0.504 sec)
INFO:tensorflow:global_step/sec: 227.704
INFO:tensorflow:loss = 0.00315728, step = 103901 (0.439 sec)
INFO:tensorflow:global_s

INFO:tensorflow:global_step/sec: 201.273
INFO:tensorflow:loss = 0.00217753, step = 111101 (0.496 sec)
INFO:tensorflow:global_step/sec: 206.588
INFO:tensorflow:loss = 0.00320417, step = 111201 (0.484 sec)
INFO:tensorflow:global_step/sec: 192.198
INFO:tensorflow:loss = 0.00527603, step = 111301 (0.520 sec)
INFO:tensorflow:global_step/sec: 179.662
INFO:tensorflow:loss = 0.00286125, step = 111401 (0.557 sec)
INFO:tensorflow:global_step/sec: 172.872
INFO:tensorflow:loss = 0.00274355, step = 111501 (0.578 sec)
INFO:tensorflow:global_step/sec: 239.379
INFO:tensorflow:loss = 0.00418437, step = 111601 (0.417 sec)
INFO:tensorflow:global_step/sec: 215.817
INFO:tensorflow:loss = 0.00466817, step = 111701 (0.463 sec)
INFO:tensorflow:global_step/sec: 201.231
INFO:tensorflow:loss = 0.00344662, step = 111801 (0.498 sec)
INFO:tensorflow:global_step/sec: 210.182
INFO:tensorflow:loss = 0.00293093, step = 111901 (0.475 sec)
INFO:tensorflow:global_step/sec: 214.404
INFO:tensorflow:loss = 0.0021379, step = 

INFO:tensorflow:global_step/sec: 237.301
INFO:tensorflow:loss = 0.00282522, step = 119201 (0.419 sec)
INFO:tensorflow:global_step/sec: 218.117
INFO:tensorflow:loss = 0.00269882, step = 119301 (0.458 sec)
INFO:tensorflow:global_step/sec: 242.252
INFO:tensorflow:loss = 0.00310531, step = 119401 (0.412 sec)
INFO:tensorflow:global_step/sec: 251.254
INFO:tensorflow:loss = 0.00203702, step = 119501 (0.398 sec)
INFO:tensorflow:global_step/sec: 241.385
INFO:tensorflow:loss = 0.00226485, step = 119601 (0.414 sec)
INFO:tensorflow:global_step/sec: 228.358
INFO:tensorflow:loss = 0.00337979, step = 119701 (0.442 sec)
INFO:tensorflow:global_step/sec: 222.055
INFO:tensorflow:loss = 0.0021419, step = 119801 (0.446 sec)
INFO:tensorflow:global_step/sec: 252.442
INFO:tensorflow:loss = 0.00274831, step = 119901 (0.397 sec)
INFO:tensorflow:global_step/sec: 212.134
INFO:tensorflow:loss = 0.00330962, step = 120001 (0.471 sec)
INFO:tensorflow:global_step/sec: 251.702
INFO:tensorflow:loss = 0.00254417, step = 

INFO:tensorflow:global_step/sec: 219.315
INFO:tensorflow:loss = 0.00305351, step = 127301 (0.456 sec)
INFO:tensorflow:global_step/sec: 233.684
INFO:tensorflow:loss = 0.00211643, step = 127401 (0.426 sec)
INFO:tensorflow:global_step/sec: 202.185
INFO:tensorflow:loss = 0.00272382, step = 127501 (0.496 sec)
INFO:tensorflow:global_step/sec: 219.397
INFO:tensorflow:loss = 0.00293748, step = 127601 (0.454 sec)
INFO:tensorflow:global_step/sec: 221.654
INFO:tensorflow:loss = 0.00220748, step = 127701 (0.451 sec)
INFO:tensorflow:global_step/sec: 179.916
INFO:tensorflow:loss = 0.00393658, step = 127801 (0.559 sec)
INFO:tensorflow:global_step/sec: 171.178
INFO:tensorflow:loss = 0.0024825, step = 127901 (0.584 sec)
INFO:tensorflow:global_step/sec: 187.275
INFO:tensorflow:loss = 0.00291186, step = 128001 (0.532 sec)
INFO:tensorflow:global_step/sec: 193.619
INFO:tensorflow:loss = 0.00251732, step = 128101 (0.517 sec)
INFO:tensorflow:global_step/sec: 196.934
INFO:tensorflow:loss = 0.0025475, step = 1

INFO:tensorflow:global_step/sec: 208.687
INFO:tensorflow:loss = 0.00252827, step = 135301 (0.479 sec)
INFO:tensorflow:global_step/sec: 226.227
INFO:tensorflow:loss = 0.00269918, step = 135401 (0.442 sec)
INFO:tensorflow:global_step/sec: 253.345
INFO:tensorflow:loss = 0.00164238, step = 135501 (0.395 sec)
INFO:tensorflow:global_step/sec: 258.024
INFO:tensorflow:loss = 0.00282616, step = 135601 (0.392 sec)
INFO:tensorflow:global_step/sec: 235.261
INFO:tensorflow:loss = 0.00247265, step = 135701 (0.421 sec)
INFO:tensorflow:global_step/sec: 252.311
INFO:tensorflow:loss = 0.00207855, step = 135801 (0.396 sec)
INFO:tensorflow:global_step/sec: 218.586
INFO:tensorflow:loss = 0.00288051, step = 135901 (0.458 sec)
INFO:tensorflow:global_step/sec: 229.247
INFO:tensorflow:loss = 0.0026048, step = 136001 (0.436 sec)
INFO:tensorflow:global_step/sec: 225.186
INFO:tensorflow:loss = 0.00403261, step = 136101 (0.444 sec)
INFO:tensorflow:global_step/sec: 258.075
INFO:tensorflow:loss = 0.00328539, step = 

INFO:tensorflow:global_step/sec: 184.698
INFO:tensorflow:loss = 0.0032362, step = 143401 (0.541 sec)
INFO:tensorflow:global_step/sec: 255.286
INFO:tensorflow:loss = 0.00190047, step = 143501 (0.393 sec)
INFO:tensorflow:global_step/sec: 250.522
INFO:tensorflow:loss = 0.0027754, step = 143601 (0.398 sec)
INFO:tensorflow:global_step/sec: 242.662
INFO:tensorflow:loss = 0.00326621, step = 143701 (0.412 sec)
INFO:tensorflow:global_step/sec: 242.338
INFO:tensorflow:loss = 0.00222779, step = 143801 (0.413 sec)
INFO:tensorflow:global_step/sec: 187.564
INFO:tensorflow:loss = 0.0022966, step = 143901 (0.533 sec)
INFO:tensorflow:global_step/sec: 227.886
INFO:tensorflow:loss = 0.00260571, step = 144001 (0.440 sec)
INFO:tensorflow:global_step/sec: 236.715
INFO:tensorflow:loss = 0.00268687, step = 144101 (0.421 sec)
INFO:tensorflow:global_step/sec: 235.313
INFO:tensorflow:loss = 0.00253284, step = 144201 (0.425 sec)
INFO:tensorflow:global_step/sec: 247.346
INFO:tensorflow:loss = 0.00301579, step = 14

INFO:tensorflow:Restoring parameters from ./AR3/model.ckpt-150000
Root mean square Error: 0.067


Comment: We use DNN Regression Model with 2 layers: 1 input with 100 nodes, and 1 output with 100 nodes, and train our neural networks using the train set X obtained from a random cut to our original dataset,  and then measure its accuracy with RMSE using the test set X. We save our model into the same folder as our Python code with the file name "SP". In this case, the RMSE of DNN Regression is much higher than that of RF or Linear Regression Model

In [49]:
x = np.array([0.56,0.2, 18,0, 0, 0, 0, 0, 0, 1])
feature_column1 = learn.infer_real_valued_columns_from_input(x)

y = np.array([[0.8515, 0.1485, 0.15981, 0.1328, 0.46875*24, 0, 1, 0, 0, 0, 0, 0]])
new_regressor = learn.DNNRegressor(feature_columns = feature_column1, hidden_units= [100,100], model_dir = './AR3')
new_regressor.predict_scores(y, as_iterable = False)

INFO:tensorflow:Using default config.
INFO:tensorflow:Using config: {'_save_checkpoints_secs': 600, '_num_ps_replicas': 0, '_keep_checkpoint_max': 5, '_task_type': None, '_is_chief': True, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f9438ee4450>, '_model_dir': './AR3', '_save_checkpoints_steps': None, '_keep_checkpoint_every_n_hours': 10000, '_session_config': None, '_tf_random_seed': None, '_environment': 'local', '_num_worker_replicas': 0, '_task_id': 0, '_save_summary_steps': 100, '_tf_config': gpu_options {
  per_process_gpu_memory_fraction: 1.0
}
, '_evaluation_master': '', '_master': ''}
Instructions for updating:
The default behavior of predict() is changing. The default value for
as_iterable will change to True, and then the flag will be removed
altogether. The behavior of this flag is described below.
Instructions for updating:
Estimator is decoupled from Scikit Learn interface by moving into
separate class SKCompat. Arguments x, y and batc

array([ 0.36590445], dtype=float32)

Comment: We load our saved neural network model, and then use it to make prediction with new input. Note that the dimension of the *VECTOR* input always *EQUALS* to (number of columns of train_X x 1). The prediction here is sometimes more accurate than that of RF (although they are pretty close), but most of the time it is worse than RF. Furthermore, RF is way better than Linear Regression model for this particular problem when testing with real-world inputs (based on trends and how accurate the predicted value is compared to actual).

# Correlation testing between acceptance rate, online drivers and percent change in online drivers

In [137]:
corr_matrix = Ha_Noi.corr()
corr_matrix["accept_rate"].sort_values(ascending=False)

# %matplotlib inline
# import matplotlib.pyplot as plt
# df2.hist(bins = 50, figsize = (15, 15))



accept_rate                                      1.000000
accept_rate_timeT                                0.683009
online_drivers (no data on day 13 - 09:20:00)    0.140001
Percentchange_onlinedrivers                      0.082624
wd3                                              0.068297
wd2                                              0.048107
wd4                                              0.039887
wd7                                              0.009160
wd5                                             -0.023075
Pricing_timeT                                   -0.028831
DriverBusyRate                                  -0.030489
wd6                                             -0.048887
Pricing                                         -0.055921
wd1                                             -0.090935
request                                         -0.102674
Hour                                            -0.219941
Request/Supply                                  -0.264265
long_waiting  

# Random Forest Algorithm and Model Evaluations using Cross-Validation

In [27]:
import numpy as np
from sklearn.preprocessing import LabelEncoder  
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_error
from sklearn.metrics import mean_absolute_error
from sklearn.model_selection import cross_val_score
from sklearn.linear_model import LinearRegression
from sklearn.externals import joblib


forest_reg = RandomForestRegressor()
#Obtain the Random Forest Regression Model using the train sets
forest_model = forest_reg.fit(train_X, train_Y.ravel())
Ypred2 = forest_model.predict(Xtest)

lin_reg = LinearRegression()
linreg_model = lin_reg.fit(train_X, train_Y.ravel())
Ypred3 = linreg_model.predict(Xtest)

joblib.dump(linreg_model, 'LinReg_model_AR.csv', protocol=2) #save the Lin-Reg model into the file named "LinReg_model.pkl"
joblib.dump(forest_model, 'Forest_Model_AR.csv', protocol=2) #save the RF model into the file named "Forest_model.pkl"


lin_mse = mean_squared_error(Ytest, Ypred2)
forest_rmse = np.sqrt(lin_mse)
print("Root Mean Square Error of RF Algo:\t",forest_rmse)

lin_mse2 = mean_squared_error(Ytest, Ypred3)
lin_rmse = np.sqrt(lin_mse2)
print("Root Mean Square Error of Linear Regression Algo:\t", lin_rmse)

#Evaluate RF algo on the whole training set by cross-validation
scores = cross_val_score(forest_reg, train_X, train_Y.ravel(), scoring = "neg_mean_squared_error", cv = 10)
forest_rmse_scores = np.sqrt(-scores)

#Evaluate RF algo on the whole test set by cross-validation
scores3 = cross_val_score(forest_reg, Xtest, Ytest.ravel(), scoring = "neg_mean_squared_error", cv = 10)
forest_rmse_scores3 = np.sqrt(-scores3)

#Evaluate Lin-Reg algo on the whole training set by cross-validation with k = 50 folds
scores2 = cross_val_score(lin_reg, train_X, train_Y.ravel(), scoring = "neg_mean_squared_error", cv = 10)
linreg_rmse_scores2 = np.sqrt(-scores2)

#Evaluate Lin-Reg algo on the test set by cross-validation
scores4 = cross_val_score(lin_reg, Xtest, Ytest.ravel(), scoring = "neg_mean_squared_error", cv = 10)
linreg_rmse_scores4 = np.sqrt(-scores4)

def display_scores(scores):
    print("Scores:", scores)
    print("Mean:", scores.mean())
    print("Standard", scores.std())
    print("Max:", scores.max())
    print("Min:", scores.min())

display_scores(linreg_rmse_scores2)
display_scores(linreg_rmse_scores4)
lin_mae_RF = mean_absolute_error(Ytest, Ypred2)
lin_mae_LR = mean_absolute_error(Ytest, Ypred3)

# display_scores(Accept_rate_prediction)
# print("Mean Square Error:\t", linreg_rmse_scores2)
# print("Mean Absolute Error:\t", lin_mae)

Root Mean Square Error of RF Algo:	 0.0642034489374
Root Mean Square Error of Linear Regression Algo:	 0.0596156868056
Scores: [ 0.0604544   0.05998684  0.0669917   0.05900586  0.05670346  0.05710712
  0.06771698  0.06268166  0.05831281  0.07541041]
Mean: 0.0624371229392
Standard 0.00563018121526
Max: 0.0754104112358
Min: 0.0567034594764
Scores: [ 0.04659366  0.06794019  0.07100747  0.05820386  0.06165992  0.05306184
  0.06218072  0.05445435  0.06362233  0.06404962]
Mean: 0.0602773965271
Standard 0.00694493762257
Max: 0.0710074696513
Min: 0.0465936579765


Comment: We generated Random Forest and Linear Regression model using the same train and test sets generated in the first slide. We then compute the RMSE of each model, as well as the RMSE using cross-validation with 50 folds. In both ways, RF performs, *at the very least*, as worse as Linear Regression model in terms of the RMSE. Note that because our train and test set is generated *randomly*, there is *NO* guarantee that the random train set and test set include every possible trends that actuallly occured. Thus, sometimes we could improve our model simply by re-run the first slide and this slide to obtain the new RF model.

In [29]:
Accept_rate = 0.76
Longwait_percent3 = 0.43
Request = 129
Supply = 597
DriverBusyRate = 0.15
Hour = 18/24
# Percentchange = (Supply - 625)/625
wd = [0,0,0,0,0,0,0]
wd[5] = 1

a = np.array([Accept_rate, Longwait_percent3, Request/Supply, DriverBusyRate, Hour])

Xtest = np.array([np.concatenate([a, wd])],  dtype=np.float32)
lin_model = joblib.load('LinReg_model_AR.csv')
forest_model = joblib.load('Forest_Model_AR.csv')

float(lin_model.predict(Xtest)), float(forest_model.predict(Xtest))


(0.759521484375, 0.772627796)

Comment: When testing with real data, we observed that the Random Forest Regression works much better than either lin-Reg or DNN Regression because it predicts the correct trend of the pricing that we used in the past (although for downward trends, it resuled in overfits, and for upward trends, it resulted in underfit. One possible explanation  might be due to our bign increase/decrease in pricing suddenly that it's hard to predict such magnitude exactly).  testing with real data, we observed that the Random Forest Regression works much better than either lin-Reg or DNN Regression because it predicts the correct trend of the pricing that we used in the past (although for downward trends, it *often* resuled in overfits on the average of 2% - 5%, and for upward trends, it *often* resulted in underfit on the average of 2% - 4%. One possible explanation  might be due to our bign increase/decrease in pricing suddenly that it's hard to predict such magnitude exactly, especially for the extreme case, for example, from 0.85 to 1 or from 1 to 0.85). Note that form some reasons, this model is *extremely* good at predicting downward trend:)

Note that in order to train this model, we would need to compute the feature Request/Supply (the author used card 1943 & 1868 with Excel to compute this ratio, just for convenience), so there should be a BI card mainly for computing this feature, together with the feature DriverBusyRate (currently, we could query DriverBusyRate from *card 1950* - https://bi.ahamove.com/question/1950?order_date=2017-06-04&end_date=2017-06-06&service=HAN-BIKE&time_interval=20). All the remaining features could be queried from the *card 1868* (https://bi.ahamove.com/question/1868?service_id=HAN-BIKE&time_interval=20&order_date=2017-06-04&end_date=2017-06-13). 

Further idea for development: we see that each variable demand and supply could be expressed as a linear combination of certain features, one of them is *pricing* (for example, demand = a_0 + a_1*pricing + a_2 *request + a_3*feature3 where a_0, a_1, a_2, a_3 = parameters to be optimized. Similar to supply = b_0 + b_1*pricing + b_2*online_drivers + b_3*feature4 where b_0, b_1, b_2, b_3 = parameters to be optimized). Now, our goal is to figure out the optimal pricing such that demand = supply, which means we only need to solve for the equation: a_0 + a_1*pricing + a_2 *request + a_3*feature3 = b_0 + b_1*pricing + b_2*online_drivers + b_3*feature4. Now, we could solve for pricing, and then use Scikit-Learn's Lin-Reg built-in function or RandomForest.Regressor to figure out pricing, based on our historical data!