# Metered Parking in Boston
We are going to do some analysis on what metered parking is available in the Boston area, using data taken from Boston's [Open Data Portal](https://data.boston.gov/dataset/parking-meters). 

This file is available in the repository as a [csv](https://www.computerhope.com/issues/ch001356.htm) (comma seperated value file, similar to the type of tabular data you would work with in excel).  
![Parking Map](./data/parking.png)

#### Exercise Notes:
  `Syntax will be contained in code blocks like this.`
  
*Italicized portions of the example syntax should be replaced with the your variables*.  Normal text (not italicized) should be copied precisely.

We will cover:  
[Step 1: Importing Libraries](#Step-1:-Import-the-libraries-you-plan-to-use)  
[Step 2: Loading a CSV](#Step-2:-Loading-a-CSV)  
[Step 3: Exploring the data](#Step-3:-Exploring-the-data)  
[Step 4: Reorganizing the Data](#Step-4:-Reorganizing-the-Data)  
[Step 5: Mapping the Data](#Step-5:-Mapping-the-Data)  
[Step 6: Exporting Files](#Step-6:-Exporting-Files)


  

## Step 1: Import the libraries you plan to use

(This is done in the first lines of your script.  Always keep in mind that the script will run in order and won't have access to variables and functions set later in the file, just as you wouldn't be able to give someone the weather report if you hadn't looked it up yet.)

We will use:
- [pandas](https://pandas.pydata.org/pandas-docs/stable/reference/index.html).  This library allows us to easily manipulate and analyze data structures.
- [folium](https://python-visualization.github.io/folium/) for data vizualization with leaflet maps

importing "as pd" allows us nickname pandas so that instead of typing the full name later, we can substitute "pd"  
Example: (pandas.dataframe.columns can instead be typed pd.dataframe.columns)

In [4]:
import pandas as pd
import geopandas as gp
from geopy.geocoders import Nominatim 
from geopy.extra.rate_limiter import RateLimiter #optional for our purposes
import folium


## Step 2: Loading a CSV 
![csv example](./data/meters_csv.png)

Pandas comes with built in functionality to read in a csv  
The syntax is:  
`pd.read_csv('`*`file_path`*`')`

To make this file easier to refer back to later, we are going to save it to a variable name of our choice. I'm going to call it boston_meters.

In [5]:
# Remember, variable names cannot contain spaces, 
# To make the name more readable you can separate words with-a-dash or_with_underscores

boston_meters = pd.read_csv('./data/parking_meters_boston.csv')
charlestown_pay = pd.read_csv('./data/charlestown_pay.csv')
charlestown_locations = pd.read_csv('./data/charlestown_location.csv')


In [6]:
#load the charlestown_location.csv





## Step 3: Exploring the data
There are several techniques we can use to get a sense of what sort of data is available. 

Keep in mind that the code that is run will not automatically display results.  If you want the program to report back to you, you will need to wrap the command (or the variable it is saved to) in a print funtion


#### How many datapoints?
To start, let's find out how much data we are dealing with. Since each row gives information about a specific parking meter, we can find out how many parking meters are reported in this dataset by getting a row count for our CSV.

The syntax is:
*`dataframe`*`.shape`

In [7]:
# Remember we named our dataframe "boston_meters" in step 2
# Keep in mind that the code that is run will not automatically display results.  
#If you want the program to report back to you, you will need to wrap the command (or the variable) in a print funtion 

print(boston_meters.shape)

(6955, 14)


#### What columns does this csv have?
Let's take a look at the data available in the csv by printing the column headings.  The data structure is identical for the Charlestown and Boston dataframes.

The syntax is:
*`pd.dataframe`*`.columns`

In [8]:
print(boston_meters.columns)

Index(['OBJECTID', 'METER_ID', 'VENDOR', 'PAY_POLICY', 'PARK_NO_PAY',
       'TOW_AWAY', 'BLK_NO', 'STREET', 'LONGITUDE', 'LATITUDE', 'G_DISTRICT',
       'G_SUBZONE', 'G_ZONE', 'BASE_RATE'],
      dtype='object')


#### Finding all unique values
Dataframe 1 tells us which vendors service the meters in the "VENDOR" column. How many vendors service the boston area meters?  
Syntax: *`dataframe.column`*`.unique()`

In [9]:
print(boston_meters.VENDOR.unique())

['IPS' 'Parkeon' nan]


In [10]:
# What are the distinct types of pay policies for meters? 

print(boston_meters.PAY_POLICY.unique())


['08:00AM-04:00PM MON-FRI $0.25 120,08:00AM-08:00PM SAT $0.25 120,06:00PM-08:00PM MON-FRI $0.25 120'
 '08:00AM-08:00PM MON-SAT $0.25 120' '11:00AM-08:00PM MON-SAT $0.25 120'
 '08:00AM-06:00PM MON-SAT $0.25 240' '08:00AM-05:00PM MON-SAT $0.25 120'
 '08:00AM-08:00PM SAT $0.25 120, 09:30AM-08:00PM MON-FRI $0.25 120'
 '08:00AM-08:00PM MON-SAT $0.25 720' '08:00AM-06:00PM MON-SAT $0.25 120'
 '08:00AM-08:00PM SAT $0.25 120' '08:00AM-04:00PM MON-SAT $0.25 120'
 '08:00AM-04:00PM MON-FRI $0.25 120, 08:00AM-08:00PM SAT $0.25 120, 06:00PM-08:00PM MON-FRI $0.25 120'
 '08:00AM-08:00PM SAT $0.25 120, 09:30AM-04:00PM MON-FRI $0.25 120, 06:00PM-08:00PM MON-FRI $0.25 120'
 '09:00AM-05:00PM MON-SAT $0.25 120'
 '08:00AM-06:00PM SAT $0.25 120, 10:00AM-04:00PM MON-FRI $0.25 120'
 '08:00AM-08:00PM SAT $0.25 120, 10:00AM-06:00PM MON-FRI $0.25 120' nan
 '08:00AM-08:00PM MON-FRI $0.25 120'
 '08:00AM-06:00PM SAT $0.25 120, 09:30AM-06:00PM MON-FRI $0.25 120'
 '08:00AM-06:00PM SAT $0.25 120, 09:30AM-04:00PM MON-FR

  
  
  
  
## Step 4: Reorganizing the Data

#### Dropping Columns and Rows
Since we will ultimately be putting this data on a map, we would like to drop all values that don't include a location. We will filter which NaN values to drop by specifying a subset of columns.

Syntax:  
*`DataFrame`*`.dropna(axis=0, how='any', thresh=None, subset=None, inplace=False)`

In [11]:
boston_meters.dropna(subset=['LONGITUDE', 'LATITUDE'])

Unnamed: 0,OBJECTID,METER_ID,VENDOR,PAY_POLICY,PARK_NO_PAY,TOW_AWAY,BLK_NO,STREET,LONGITUDE,LATITUDE,G_DISTRICT,G_SUBZONE,G_ZONE,BASE_RATE
0,1001,,IPS,"08:00AM-04:00PM MON-FRI $0.25 120,08:00AM-08:0...","00:00AM-24:00AM SUN, 00:00AM-08:00AM MON-SAT, ...",,SOME,CAMBRIDGE ST,-71.060641,42.360431,DISTRICT 0,0BA,BA,0.25
1,1002,,IPS,08:00AM-08:00PM MON-SAT $0.25 120,"00:00AM-24:00AM SUN, 00:00AM-08:00AM MON-SAT, ...",,SOME,CAMBRIDGE ST,-71.060526,42.360332,DISTRICT 0,0BA,BA,0.25
2,1003,,IPS,08:00AM-08:00PM MON-SAT $0.25 120,"00:00AM-24:00AM SUN, 00:00AM-08:00AM MON-SAT, ...",,SOME,CAMBRIDGE ST,-71.060479,42.360287,DISTRICT 0,0BA,BA,0.25
3,1004,,IPS,08:00AM-08:00PM MON-SAT $0.25 120,"00:00AM-24:00AM SUN, 00:00AM-08:00AM MON-SAT, ...",,SOME,CAMBRIDGE ST,-71.060373,42.360188,DISTRICT 0,0BA,BA,0.25
4,1005,,IPS,08:00AM-08:00PM MON-SAT $0.25 120,"00:00AM-24:00AM SUN, 00:00AM-08:00AM MON-SAT, ...",,SOME,CAMBRIDGE ST,-71.060330,42.360139,DISTRICT 0,0BA,BA,0.25
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
6949,6950,,IPS,08:00AM-06:00PM MON-SAT $0.25 120,"00:00AM-24:00AM SUN, 00:00AM-08:00AM MON-SAT, ...",,AMOR,COMMONWEALTH AV,-71.111985,42.350522,DISTRICT 0,0KE,KE,0.25
6950,6951,,IPS,08:00AM-06:00PM MON-SAT $0.25 120,"00:00AM-24:00AM SUN, 00:00AM-08:00AM MON-SAT, ...",,AMOR,COMMONWEALTH AV,-71.111871,42.350509,DISTRICT 0,0KE,KE,0.25
6951,6952,,IPS,08:00AM-06:00PM MON-SAT $0.25 120,"00:00AM-24:00AM SUN, 00:00AM-08:00AM MON-SAT, ...",,AMOR,COMMONWEALTH AV,-71.111757,42.350495,DISTRICT 0,0KE,KE,0.25
6952,6953,,IPS,08:00AM-06:00PM MON-SAT $0.25 120,"00:00AM-24:00AM SUN, 00:00AM-08:00AM MON-SAT, ...",,AMOR,COMMONWEALTH AV,-71.111643,42.350482,DISTRICT 0,0KE,KE,0.25


DataFrame.drop(self, labels=None, axis=0, index=None, columns=None, level=None, inplace=False, errors='raise')

In [18]:
boston_meters.drop(['TOW_AWAY','G_DISTRICT', 'G_ZONE', 'G_SUBZONE', 'METER_ID'], axis=1)

Unnamed: 0,OBJECTID,VENDOR,PAY_POLICY,PARK_NO_PAY,BLK_NO,STREET,LONGITUDE,LATITUDE,BASE_RATE
0,1001,IPS,"08:00AM-04:00PM MON-FRI $0.25 120,08:00AM-08:0...","00:00AM-24:00AM SUN, 00:00AM-08:00AM MON-SAT, ...",SOME,CAMBRIDGE ST,-71.060641,42.360431,0.25
1,1002,IPS,08:00AM-08:00PM MON-SAT $0.25 120,"00:00AM-24:00AM SUN, 00:00AM-08:00AM MON-SAT, ...",SOME,CAMBRIDGE ST,-71.060526,42.360332,0.25
2,1003,IPS,08:00AM-08:00PM MON-SAT $0.25 120,"00:00AM-24:00AM SUN, 00:00AM-08:00AM MON-SAT, ...",SOME,CAMBRIDGE ST,-71.060479,42.360287,0.25
3,1004,IPS,08:00AM-08:00PM MON-SAT $0.25 120,"00:00AM-24:00AM SUN, 00:00AM-08:00AM MON-SAT, ...",SOME,CAMBRIDGE ST,-71.060373,42.360188,0.25
4,1005,IPS,08:00AM-08:00PM MON-SAT $0.25 120,"00:00AM-24:00AM SUN, 00:00AM-08:00AM MON-SAT, ...",SOME,CAMBRIDGE ST,-71.060330,42.360139,0.25
...,...,...,...,...,...,...,...,...,...
6950,6951,IPS,08:00AM-06:00PM MON-SAT $0.25 120,"00:00AM-24:00AM SUN, 00:00AM-08:00AM MON-SAT, ...",AMOR,COMMONWEALTH AV,-71.111871,42.350509,0.25
6951,6952,IPS,08:00AM-06:00PM MON-SAT $0.25 120,"00:00AM-24:00AM SUN, 00:00AM-08:00AM MON-SAT, ...",AMOR,COMMONWEALTH AV,-71.111757,42.350495,0.25
6952,6953,IPS,08:00AM-06:00PM MON-SAT $0.25 120,"00:00AM-24:00AM SUN, 00:00AM-08:00AM MON-SAT, ...",AMOR,COMMONWEALTH AV,-71.111643,42.350482,0.25
6953,6954,IPS,08:00AM-06:00PM MON-SAT $0.25 120,"00:00AM-24:00AM SUN, 00:00AM-08:00AM MON-SAT, ...",AMOR,COMMONWEALTH AV,-71.111529,42.350469,0.25


##### Merging Dataframes  

![Merge Types](./data/merges.png)

We have a dataframe listing parking meters for Charlestown and another dataframe listing parking meters for Boston.
Try combining these two into one dataframe.


Syntax: *`dataframe`*`.merge(`*`dataframe_2`*`, how = "")`   (Default is inner merge)


In [10]:
#We will merge charlestown_pay & charlestown_locations
charlestown_meters = charlestown_pay.merge(charlestown_locations, how="outer")

In [11]:
boston_meters.append(charlestown_meters, sort=True)


Unnamed: 0,BASE_RATE,BLK_NO,G_DISTRICT,G_SUBZONE,G_ZONE,LATITUDE,LONGITUDE,METER_ID,OBJECTID,PARK_NO_PAY,PAY_POLICY,STREET,TOW_AWAY,VENDOR
0,0.25,SOME,DISTRICT 0,0BA,BA,42.360431,-71.060641,,1001,"00:00AM-24:00AM SUN, 00:00AM-08:00AM MON-SAT, ...","08:00AM-04:00PM MON-FRI $0.25 120,08:00AM-08:0...",CAMBRIDGE ST,,IPS
1,0.25,SOME,DISTRICT 0,0BA,BA,42.360332,-71.060526,,1002,"00:00AM-24:00AM SUN, 00:00AM-08:00AM MON-SAT, ...",08:00AM-08:00PM MON-SAT $0.25 120,CAMBRIDGE ST,,IPS
2,0.25,SOME,DISTRICT 0,0BA,BA,42.360287,-71.060479,,1003,"00:00AM-24:00AM SUN, 00:00AM-08:00AM MON-SAT, ...",08:00AM-08:00PM MON-SAT $0.25 120,CAMBRIDGE ST,,IPS
3,0.25,SOME,DISTRICT 0,0BA,BA,42.360188,-71.060373,,1004,"00:00AM-24:00AM SUN, 00:00AM-08:00AM MON-SAT, ...",08:00AM-08:00PM MON-SAT $0.25 120,CAMBRIDGE ST,,IPS
4,0.25,SOME,DISTRICT 0,0BA,BA,42.360139,-71.060330,,1005,"00:00AM-24:00AM SUN, 00:00AM-08:00AM MON-SAT, ...",08:00AM-08:00PM MON-SAT $0.25 120,CAMBRIDGE ST,,IPS
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
66,0.25,FIRS,DISTRICT 0,0AG,AG,42.375360,-71.055247,,859,"00:00AM-24:00AM SUN, 00:00AM-08:00AM MON-SAT, ...",08:00AM-08:00PM MON-SAT $0.25 120,SIXTH STREET,,IPS
67,0.25,FIRS,DISTRICT 0,0AG,AG,42.375356,-71.055167,,860,"00:00AM-24:00AM SUN, 00:00AM-08:00AM MON-SAT, ...",08:00AM-08:00PM MON-SAT $0.25 120,SIXTH STREET,,IPS
68,0.25,FIRS,DISTRICT 0,0AG,AG,42.375327,-71.055138,,861,"00:00AM-24:00AM SUN, 00:00AM-08:00AM MON-SAT, ...",08:00AM-08:00PM MON-SAT $0.25 120,SIXTH STREET,,IPS
69,0.25,FIRS,DISTRICT 0,0AG,AG,42.375277,-71.055072,,862,"00:00AM-24:00AM SUN, 00:00AM-08:00AM MON-SAT, ...",08:00AM-08:00PM MON-SAT $0.25 120,SIXTH STREET,,IPS


#### Filtering
Sometimes you may only want data with certain attributes. You can filter the data and save to a new dataframe or delete data from the table.  It can also be useful in cases where you want a count of the data that matches your query.

In [12]:
sun = boston_meters[boston_meters['PAY_POLICY'].str.contains('SUN', na=False)]
print(sun)

      OBJECTID  METER_ID   VENDOR                         PAY_POLICY  \
2071        72  450074.0  Parkeon  08:00AM-08:00PM SUN-SAT $0.25 120   

                                           PARK_NO_PAY TOW_AWAY BLK_NO  \
2071  00:00AM-08:00AM SUN-SAT, 08:00PM-24:00AM SUN-SAT      NaN   CLAR   

               STREET  LONGITUDE   LATITUDE  G_DISTRICT G_SUBZONE G_ZONE  \
2071  BOYLSTON ST C-B -71.073597  42.351037  DISTRICT 0       0MS     MS   

      BASE_RATE  
2071       0.25  



![string.contains documentation](./data/str-contains-method.png)  
One way to do this is to check if the cell contains a certain string (remember a string is a sequence of characters).
syntax: *`dataframe[dataframe['column']`*`.str.contains(`*`'string we are looking for'`*`)]`

   This will return all result that evaluate to true.  In the next example we want all the results that *do not contain* a certain string.  We are in luck! We can easily invert our results by including *`~`* in front of the dataframe path like this: *`dataframe[~ dataframe['column']`*
   
   
   
Some additional methods include `str.startswith("")` and `str.endswith("")`


In [20]:
# let's find out what meters don't require payment on saturdays
# we have included the optional parameter "na=False" to exclude no data values, which can neither be true nor false

free_saturdays = boston_meters[~ boston_meters['PAY_POLICY'].str.contains('SAT', na=False)]
print(free_saturdays)

      OBJECTID  METER_ID   VENDOR                         PAY_POLICY  \
1222      2223       NaN      IPS                                NaN   
1381      2382       NaN      IPS  08:00AM-08:00PM MON-FRI $0.25 120   
2037        38  450040.0  Parkeon  08:00AM-08:00PM MON-FRI $0.25 120   
4575      3576       NaN      IPS  08:00AM-06:00PM MON-FRI $0.25 120   
4576      3577       NaN      IPS  08:00AM-06:00PM MON-FRI $0.25 120   
...        ...       ...      ...                                ...   
4883      3884       NaN      IPS  08:00AM-06:00PM MON-FRI $0.25 120   
4884      3885       NaN      IPS  08:00AM-06:00PM MON-FRI $0.25 120   
4885      3886       NaN      IPS  08:00AM-06:00PM MON-FRI $0.25 120   
4886      3887       NaN      IPS  08:00AM-06:00PM MON-FRI $0.25 120   
6954      6955       NaN      NaN                                NaN   

                                            PARK_NO_PAY TOW_AWAY BLK_NO  \
1222                                                NaN     

# Reverse GeoCoding

# Sep Lists in Columns

## Step 5: Mapping the Data

#### Initializing the map
Start by creating a map object. We need to start by specifying where the map should be and what basemap to use.

Syntax:  
foilum.Map(location=[latitude, longitude], zoom_start = #, tiles='optional custom tiles')

Note: Higher numbers correspond to higher zoom level


In [21]:
map = folium.Map(location=[42.3621, -71.0570], zoom_start = 14, tiles='Stamen Toner')

numMarkers = len(free_saturdays)

for i in range(0,numMarkers):
    try:
        longitude = float(free_saturdays.iloc[i]['LONGITUDE'])
        latitude = float(free_saturdays.iloc[i]['LATITUDE'])
        location = [latitude, longitude]
        popup_text = free_saturdays.iloc[i]['PAY_POLICY']
        folium.CircleMarker(location=location, radius=10, popup=popup_text, color='#FA8072', fill=True, fill_color='#FA8072').add_to(map)
    except Exception as exception:
        print("exception:", exception)
        pass
map

exception: Location values cannot contain NaNs.


## Step 6: Exporting Files

Now we want to save the file we have just created. We could save it as a CSV, but suppose we need it in other formats? Let's write it out to a GeoJSON using built in functionality from the GeoPandas Library.
Syntax:
gdf = geopandas.GeoDataFrame(
    df, geometry=geopandas.points_from_xy(df.Longitude, df.Latitude))

In [22]:
geo_dataframe = gp.GeoDataFrame(free_saturdays, geometry=gp.points_from_xy(free_saturdays.LONGITUDE, free_saturdays.LATITUDE))

print(geo_dataframe.head())

      OBJECTID  METER_ID   VENDOR                         PAY_POLICY  \
1222      2223       NaN      IPS                                NaN   
1381      2382       NaN      IPS  08:00AM-08:00PM MON-FRI $0.25 120   
2037        38  450040.0  Parkeon  08:00AM-08:00PM MON-FRI $0.25 120   
4575      3576       NaN      IPS  08:00AM-06:00PM MON-FRI $0.25 120   
4576      3577       NaN      IPS  08:00AM-06:00PM MON-FRI $0.25 120   

                                            PARK_NO_PAY TOW_AWAY BLK_NO  \
1222                                                NaN      NaN   CHAN   
1381  00:00AM-24:00AM SUN, 00:00AM-08:00AM MON-FRI, ...      NaN   EAST   
2037  00:00AM-24:00AM SUN, 00:00AM-08:00AM MON-FRI, ...      NaN   FAIR   
4575  00:00AM-24:00AM SUN, 00:00AM-08:00AM MON-FRI, ...      NaN   BEAC   
4576  00:00AM-24:00AM SUN, 00:00AM-08:00AM MON-FRI, ...      NaN   BEAC   

              STREET  LONGITUDE   LATITUDE  G_DISTRICT G_SUBZONE G_ZONE  \
1222     COLUMBUS AV -71.073805  42.34689

In [23]:
geo_dataframe.to_file("parking.geojson", driver='GeoJSON')