# Mapping bike lanes in Pittsburgh
As a cyclist who's primary source of transportation is a bicycle, I wanted to get a better understanding of 

## Setting up the notebook
I cheated and wrote a wrapper for accessing Overpass API in a separate file - here is the link to the github gist and, which includes the API class and a quick function to calculate length from the lat/lon coords returned from the API.


In [20]:
from bikeability import overpass
import pandas as pd
api = overpass.API()

In [2]:
cycle_query = """(way[highway=cycleway](area.a);way[cycleway=lane](area.a);
                   way[cycleway=asl](area.a);way[cycleway=bike_box](area.a);
                   way[cycleway=crossing](area.a);way[cycleway=opposite_lane](area.a);
                   way[cycleway=opposite_track](area.a);way[cycleway=separate](area.a);
                   way[cycleway=sidepath](area.a);way[cycleway=track](area.a);
                   way[bicycle=designated](area.a);way[bicycle=yes](area.a);
                   way[bicycle_road=yes](area.a);
                   );"""

top_cycling_cities = {'Seattle':237385,
                      'San Francisco': 111968,
                      'Fort Collins': 112524,
                      'Minneapolis': 136712,
                      'Portland': 186579,
                      'Chicago': 122604,
                      'Eugene': 186706,
                      'Madison': 3352040,
                      'New York': 175905,
                      'Cambridge': 1933745
                      }

In [3]:
results = {}
for city, relation_id in top_cycling_cities.items():
    df = api.get(cycle_query, area_id=relation_id)
    df['city'] = city
    df['length'] = df['geometry'].apply(overpass.length)
    results[city] = df

Retrieved 4358 entries from area: 237385
	Time: 11.79 seconds
	Attempts: 1
Retrieved 1780 entries from area: 111968
	Time: 4.03 seconds
	Attempts: 1
Retrieved 1482 entries from area: 112524
	Time: 3.54 seconds
	Attempts: 1
Retrieved 2043 entries from area: 136712
	Time: 4.98 seconds
	Attempts: 1
Retrieved 12167 entries from area: 186579
	Time: 11.76 seconds
	Attempts: 1
Retrieved 4410 entries from area: 122604
	Time: 9.91 seconds
	Attempts: 1
Retrieved 1761 entries from area: 186706
	Time: 3.31 seconds
	Attempts: 1
Retrieved 1561 entries from area: 3352040
	Time: 4.55 seconds
	Attempts: 1
Retrieved 6119 entries from area: 175905
	Time: 10.92 seconds
	Attempts: 1
Retrieved 661 entries from area: 1933745
	Time: 6.08 seconds
	Attempts: 1


In [17]:
raw_data = pd.concat(results.values(), sort=False)
raw_data.reset_index(inplace=True)

In [22]:
bike_lanes = pd.read_json('2021-04-18_raw_data.json')

In [23]:
# bike_lanes = raw_data.copy()
bike_lanes.info(verbose=True)

<class 'pandas.core.frame.DataFrame'>
Int64Index: 36351 entries, 0 to 36350
Data columns (total 518 columns):
 #    Column                                      Dtype  
---   ------                                      -----  
 0    index                                       int64  
 1    type                                        object 
 2    id                                          int64  
 3    nodes                                       object 
 4    geometry                                    object 
 5    bounds.minlat                               float64
 6    bounds.minlon                               float64
 7    bounds.maxlat                               float64
 8    bounds.maxlon                               float64
 9    tags.bicycle                                object 
 10   tags.cycleway:surface                       object 
 11   tags.foot                                   object 
 12   tags.highway                                object 
 13   tags.name     

We're going to remove all the highways from our dataset that don't have some sort of designated bicycle passageway on them.

In [24]:
remove = ['motorway', 'trunk', 'motorway_link','trunk_link']
bike_lanes.drop(bike_lanes.loc[(bike_lanes['tags.highway'].isin(remove)) & (bike_lanes['tags.cycleway'].isnull()) & (bike_lanes['tags.bicycle'] != 'designated')].index, inplace=True)

In [25]:
remove_columns = ['tiger', 'bounds', 'source', 'note', 'ref', 'horse', 'maxweight',
                  'layer','name', 'description', 'lanes:']
columns = bike_lanes.columns
for r in remove_columns:
    columns = [c for c in columns if r not in c]

bike_lanes = bike_lanes[columns]

columns = bike_lanes.columns
columns = [c.replace('tags.', '') for c in columns]
bike_lanes.columns = columns

In [26]:
search_terms = ['cycleway:', 'bicycle:']

bike_lanes['cycleway'].fillna(bike_lanes['cycleway:right'], inplace=True)
bike_lanes['cycleway'].fillna(bike_lanes['cycleway:left'], inplace=True)
bike_lanes['cycleway'].fillna(bike_lanes['cycleway:buffer'], inplace=True)

bike_lanes['bicycle'].fillna(bike_lanes['bicycle:designated'], inplace=True)
bike_lanes.drop(columns=['cycleway:right', 'cycleway:left', 'cycleway:buffer', 'bicycle:designated'],
                inplace=True)


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  return super().fillna(


In [27]:
bike_lanes = bike_lanes.loc[:, bike_lanes.isnull().mean() <=.99]

In [28]:
bike_lanes.info(verbose=True)

<class 'pandas.core.frame.DataFrame'>
Int64Index: 35804 entries, 0 to 36350
Data columns (total 34 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   index                 35804 non-null  int64  
 1   type                  35804 non-null  object 
 2   id                    35804 non-null  int64  
 3   nodes                 35804 non-null  object 
 4   geometry              35804 non-null  object 
 5   bicycle               28709 non-null  object 
 6   foot                  8392 non-null   object 
 7   highway               35727 non-null  object 
 8   segregated            2915 non-null   object 
 9   surface               13898 non-null  object 
 10  bridge                1687 non-null   object 
 11  lit                   2331 non-null   object 
 12  lcn                   729 non-null    object 
 13  motor_vehicle         2333 non-null   object 
 14  oneway                10660 non-null  object 
 15  lanes              

In [29]:
b_tag = bike_lanes[(bike_lanes['highway'] != 'cycleway') & (bike_lanes['cycleway'].isna())]
b_tag.notna().mean()

index                   1.000000
type                    1.000000
id                      1.000000
nodes                   1.000000
geometry                1.000000
bicycle                 1.000000
foot                    0.249240
highway                 0.995967
segregated              0.072877
surface                 0.417813
bridge                  0.039892
lit                     0.074535
lcn                     0.014365
motor_vehicle           0.052821
oneway                  0.287253
lanes                   0.370794
maxspeed                0.407260
sidewalk                0.345875
cycleway                0.000000
width                   0.033704
parking:lane:both       0.009780
access                  0.034532
trolley_wire            0.014531
turn:lanes              0.035914
crossing                0.019946
footway                 0.060224
city                    1.000000
length                  1.000000
NHS                     0.038566
hgv                     0.089950
hgv:nation

In [12]:
b_tag['foot'].unique()


array([nan, 'yes', 'designated', 'official', 'no', 'permissive',
       'use_sidepath', 'discouraged', 'destination'], dtype=object)

In [13]:
b_tag['surface'].unique()


array(['concrete', nan, 'asphalt', 'paved', 'gravel', 'paving_stones',
       'ground', 'compacted', 'fine_gravel', 'concrete;metal', 'wood',
       'metal', 'dirt', 'unpaved', 'metal_grid', 'concrete:plates',
       'sand', 'tile', 'grass', 'brick', 'pebblestone', 'grass_paver',
       'sett', 'open_grate', 'cobblestone', 'dirt,gravel',
       'paved_concrete', 'paved;wood', 'cement', 'earth', 'Sett'],
      dtype=object)

In [14]:
b_tag['surface'].unique()


array(['concrete', nan, 'asphalt', 'paved', 'gravel', 'paving_stones',
       'ground', 'compacted', 'fine_gravel', 'concrete;metal', 'wood',
       'metal', 'dirt', 'unpaved', 'metal_grid', 'concrete:plates',
       'sand', 'tile', 'grass', 'brick', 'pebblestone', 'grass_paver',
       'sett', 'open_grate', 'cobblestone', 'dirt,gravel',
       'paved_concrete', 'paved;wood', 'cement', 'earth', 'Sett'],
      dtype=object)

In [19]:
raw_data.to_json('raw_data.json', orient='columns')

In [13]:
to assign values based on conditionals
df['c1'].loc[df['c1'] == 'Value'] = 10



SyntaxError: invalid syntax (<ipython-input-13-19816482218a>, line 1)

Features to include outside of OSM
- cycling deaths per year
- cycling accidents per year
