# Iteration 1 - Melbourne Public Toilets
This Jupyter Notebook is about preparing 'Melbourne Public Toilets' CSV file for being used in Mo-Buddy Website Solution.
1. Read Raw Data
2. Clean Raw Data
3. Export Clean Data

In [3]:
# Import Packages
import pandas as pd

In [4]:
# Set option to display all columns
pd.set_option('display.max_columns', None)

## 1. Read in Raw Data from a CSV file

In [5]:
# Function for reading in raw data from a CSV file
def read_in_data(file_path):
    """
    Function for reading in raw data from CSV file.
    Inputs: 
        - file_Path, type: string, desc: CSV file path
    Outputs:
        - raw_data, type: dataframe, desc: Raw data
    """

    raw_data = pd.read_csv(file_path)
    
    return raw_data

In [6]:
# Read in data
filepath_raw_data = 'DataBases\Public_toilets.csv'
df_raw_toilets = read_in_data(filepath_raw_data)

In [7]:
# Check how the data look like
df_raw_toilets

Unnamed: 0,name,female,male,wheelchair,operator,baby_facil,lat,lon
0,Public Toilet - Toilet 140 - Queensberry Stree...,no,yes,no,City of Melbourne,no,-37.803995,144.959091
1,Public Toilet - Toilet 106 - Kings Domain Gove...,yes,yes,no,City of Melbourne,no,-37.826916,144.974648
2,Public Toilet - Queen Victoria Market (153 Vic...,yes,yes,no,City of Melbourne,no,-37.806121,144.956538
3,"Public Toilet - Victoria Harbour, Shed 3 (Nort...",no,yes,no,City of Melbourne,no,-37.819796,144.937665
4,Public Toilet - Toilet 6 - Elizabeth Street (T...,yes,no,no,City of Melbourne,no,-37.813838,144.963097
...,...,...,...,...,...,...,...,...
69,Public Toilet - Toilet 43 - Queen Street (oppo...,yes,yes,yes,City of Melbourne,no,-37.815838,144.961062
70,Public Toilet - Toilet 119 - Fitzroy Gardens T...,yes,yes,yes,City of Melbourne,no,-37.812036,144.983075
71,Public Toilet - Toilet 110 - Queen Victoria Ga...,yes,yes,yes,City of Melbourne,no,-37.822948,144.970986
72,Public Toilet - Toilet 41 - Flinders Street (N...,yes,yes,yes,City of Melbourne,no,-37.817903,144.966264


## 2. Clean up Raw Data

Check whetere there is duplicates or strange values. Firstly, check unique values for each column.

In [8]:
df_raw_toilets.loc[:,'female'].value_counts()

yes    61
no     11
U       1
Name: female, dtype: int64

In [9]:
df_raw_toilets.loc[:,'male'].value_counts()

yes    68
no      4
U       1
Name: male, dtype: int64

In [10]:
df_raw_toilets.loc[:,'wheelchair'].value_counts()

yes    48
no     24
U       1
Name: wheelchair, dtype: int64

In [11]:
df_raw_toilets.loc[:,'operator'].value_counts()

City of Melbourne    74
Name: operator, dtype: int64

In [12]:
df_raw_toilets.loc[:,'baby_facil'].value_counts()

no     67
yes     5
U       2
Name: baby_facil, dtype: int64

Secondly, check for missing values.

In [13]:
# Get list of column names
columns_list = df_raw_toilets.columns

In [14]:
# Check for missing values
missing_index_list = []
for column in columns_list:
    missing_index = df_raw_toilets[df_raw_toilets[column].isnull()].index.tolist()
    missing_index_list.append(missing_index)

In [15]:
missing_index_list

[[], [67], [67], [67], [], [], [], []]

Row 67 seems to present missing values. LEt's check how it looks like.

In [16]:
df_raw_toilets[67:68]

Unnamed: 0,name,female,male,wheelchair,operator,baby_facil,lat,lon
67,Public Toilet - Toilet 13 - Queensberry Street...,,,,City of Melbourne,no,-37.803094,144.949946


Indeed, row 67 has 'NaN' as value for 'female', 'male' and 'wheelchair' column. Due to the small dataset, the entry will remain as part of the dataset, but instead of 'NaN', the new value will be 'U'.

In [17]:
# Impute value to missing rows
df_raw_toilets.loc[67,['female', 'male', 'wheelchair']] = 'U'

In [18]:
# Check if imputation is ok
df_raw_toilets.iloc[65:70,:]

Unnamed: 0,name,female,male,wheelchair,operator,baby_facil,lat,lon
65,Public Toilet - Toilet 55 - Royal Park (Nature...,yes,yes,yes,City of Melbourne,no,-37.795523,144.952143
66,Public Toilet - Toilet 112 - Alexandra Gardens...,yes,yes,yes,City of Melbourne,no,-37.820355,144.973313
67,Public Toilet - Toilet 13 - Queensberry Street...,U,U,U,City of Melbourne,no,-37.803094,144.949946
68,Public Toilet - Toilet 179 - Lincoln Square (1...,yes,yes,yes,City of Melbourne,no,-37.802712,144.962268
69,Public Toilet - Toilet 43 - Queen Street (oppo...,yes,yes,yes,City of Melbourne,no,-37.815838,144.961062


## 3. Export Clean Data to a CSV file

In [19]:
df_raw_toilets.to_csv('OK_Public_Toilets_Melbourne_V1.csv')