<h1><center>Code and Algorithms</center></h1>

We implement here the application algorithm for using the database parameters: 'minute','hour','day','day_w','month','year','available'. The database represents the number of available parking slots in the Kuala Lumpur City Center parking lot over a year in a half. We use the data to predict the number of the available spot there at a given moment in the future.

### 1: Handling the database

In [34]:
# Creating .csv file from the database
# Database: 'parking-klcc-2016-2017.txt'
# Source: 'https://www.kaggle.com/mypapit/klccparking'
# Created database file: 'KLCC.csv'

import csv
import datetime

def day_week(year, month, day):
    return (int(datetime.date(int(year), int(month), int(day)).strftime('%w')) - 1)%7 + 1

def create_database():
    with open('KLCC.csv', 'w') as database:
        writer = csv.writer(database)
        writer.writerow(['minute','hour','day','day_w','month','year','available'])
        with open('parking-klcc-2016-2017.txt', 'r') as data:
            for line in data.readlines():
                if line[5:9] != 'OPEN':
                    minute = line[24:26]
                    hour = line[21:23]
                    day = line[18:20]
                    month = line[15:17]
                    year = line[10:14]
                    day_w = str(day_week(year, month, day))
                    available = 0 if line[5:9] == 'FULL' else int(line[5:9])
                    writer.writerow([minute,hour,day,day_w,month,year,available])

In [35]:
#Test
create_database()

In [36]:
#Updating database file 'KLCC.csv'

def update_database(data_line):
    """
    Input: data_line of the form: 'minute,hour,day,day_w,month,year,available'
    Output: adds the data_line to the database file 'KLCC.csv'
    """
    data = data_line.split(',')
    with open('KLCC.csv', 'a') as database:
        writer = csv.writer(database)
        writer.writerow(data)
    

In [38]:
#Test
#data_line = 
#update_database(data_line)

### 2: Implementing the machine learning program

In [39]:
# loading packages

import pandas as pd

# data visualization

import matplotlib.pyplot as plt

# machine learning

from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor

In [40]:
#Reading data from database

data = pd.read_csv('KLCC.csv')

In [41]:
#Visualising dataset

data.head()

Unnamed: 0,minute,hour,day,day_w,month,year,available
0,12,10,1,3,6,2016,1642
1,15,10,1,3,6,2016,1609
2,30,10,1,3,6,2016,1458
3,45,10,1,3,6,2016,1357
4,0,11,1,3,6,2016,1235


In [42]:
#Setting input values and output values

X = data.drop('available', axis=1)
y = data['available']

In [43]:
#Dividing data into training set and testing set

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2)

In [44]:
#Creating Random Forest Regressor Model based on the training set

rf = RandomForestRegressor(random_state=42)
model = rf.fit(X_train, y_train)

In [45]:
#Testing the data


r_sq = model.score(X_test, y_test)
print('coefficient of determination:', r_sq)

coefficient of determination: 0.9825809957025983


In [46]:
#Using the Machine learning algorithm to predict the number of available spots at a given moment

import datetime

def day_week(year, month, day):
    return (int(datetime.date(year, month, day).strftime('%w')) - 1)%7 + 1

def predict_available(minute, hour, day, month, year):
    """
    Predicts the number of available spots in the parking at the time in the future given by: minute, hour, day, month and year
    """
    day_w = day_week(year, month, day)
    available = model.predict([[minute, hour, day, day_w, month, year]])[0]
    return available

In [47]:
def importance(feature):
    """
    Determines the importance of each of the features: minute, hour, day, day_w, month, year
    """
    features = ['minute', 'hour', 'day', 'day_w', 'month', 'year']
    importances = model.feature_importances_
    index = features.index(feature)
    return f'{round(importances[index]*100,2)} %'

In [48]:
#Test
features = ['minute', 'hour', 'day', 'day_w', 'month', 'year']
for feature in features:
    print(f'The importance of the {feature} for prediction', importance(feature))

The importance of the minute for prediction 0.74 %
The importance of the hour for prediction 51.97 %
The importance of the day for prediction 16.2 %
The importance of the day_w for prediction 8.29 %
The importance of the month for prediction 18.3 %
The importance of the year for prediction 4.5 %
