# Mapping bike lanes in Pittsburgh
As a cyclist who's primary source of transportation is a bicycle, I wanted to get a better understanding of 

## Setting up the notebook
I cheated and wrote a wrapper for accessing Overpass API in a separate file - here is the link to the github gist and, which includes the API class and a quick function to calculate length from the lat/lon coords returned from the API.


In [1]:
import pandas as pd

In [None]:
bike_lanes = pd.read_json('bikes.json')

In [28]:
bike_lanes.info(verbose=True)

<class 'pandas.core.frame.DataFrame'>
Int64Index: 2850 entries, 0 to 2849
Data columns (total 184 columns):
 #    Column                                 Dtype  
---   ------                                 -----  
 0    index                                  int64  
 1    type                                   object 
 2    id                                     int64  
 3    nodes                                  object 
 4    geometry                               object 
 5    bounds.minlat                          float64
 6    bounds.minlon                          float64
 7    bounds.maxlat                          float64
 8    bounds.maxlon                          float64
 9    tags.cycleway                          object 
 10   tags.highway                           object 
 11   tags.lcn                               object 
 12   tags.name                              object 
 13   tags.tiger:cfcc                        object 
 14   tags.tiger:name_base                  

We're going to remove all the highways from our dataset that don't have some sort of designated bicycle passageway on them.

In [29]:
remove = ['motorway', 'trunk', 'motorway_link','trunk_link']
bike_lanes.drop(bike_lanes.loc[(bike_lanes['tags.highway'].isin(remove)) & (bike_lanes['tags.cycleway'].isnull()) & (bike_lanes['tags.bicycle'] != 'designated')].index, inplace=True)

In [30]:
remove_columns = ['index','type','tiger', 'bounds', 'source', 'note', 'ref', 'horse', 'maxweight',
                  'layer','name', 'description', 'lanes:']
columns = bike_lanes.columns
for r in remove_columns:
    columns = [c for c in columns if r not in c]

bike_lanes = bike_lanes[columns]

columns = bike_lanes.columns
columns = [c.replace('tags.', '') for c in columns]
bike_lanes.columns = columns

In [31]:
# get search tearms programmatically
search_terms = ['cycleway:', 'bicycle:']

bike_lanes['cycleway'].fillna(bike_lanes['cycleway:right'], inplace=True)
bike_lanes['cycleway'].fillna(bike_lanes['cycleway:left'], inplace=True)
bike_lanes['cycleway'].fillna(bike_lanes['cycleway:buffer'], inplace=True)

bike_lanes['bicycle'].fillna(bike_lanes['bicycle:designated'], inplace=True)
bike_lanes.drop(columns=['cycleway:right', 'cycleway:left'],
                inplace=True)


In [32]:
bike_lanes = bike_lanes.loc[:, bike_lanes.isnull().mean() <=.8]
bike_lanes.info(verbose=True)

<class 'pandas.core.frame.DataFrame'>
Int64Index: 2819 entries, 0 to 2849
Data columns (total 11 columns):
 #   Column    Non-Null Count  Dtype 
---  ------    --------------  ----- 
 0   index     2819 non-null   int64 
 1   type      2819 non-null   object
 2   id        2819 non-null   int64 
 3   nodes     2819 non-null   object
 4   geometry  2819 non-null   object
 5   highway   2815 non-null   object
 6   lcn       1116 non-null   object
 7   bicycle   2634 non-null   object
 8   surface   951 non-null    object
 9   foot      658 non-null    object
 10  city      2819 non-null   object
dtypes: int64(2), object(9)
memory usage: 264.3+ KB


In [33]:
bike_lanes.sample(20)

Unnamed: 0,index,type,id,nodes,geometry,highway,lcn,bicycle,surface,foot,city
566,566,way,45116548,"[572065243, 572065248, 572065251, 572065254, 5...","[[40.46761, -79.9281859], [40.466965, -79.9262...",secondary,yes,,asphalt,,Pittsburgh
215,215,way,11805275,"[105097664, 2715045158, 8006500881, 104931010,...","[[40.4470707, -79.9472806], [40.4466574, -79.9...",residential,yes,designated,,,Pittsburgh
543,543,way,32149965,"[348503522, 3577378231, 3577378224, 3577378200...","[[40.4425347, -79.9005591], [40.4424911, -79.9...",cycleway,,designated,,designated,Pittsburgh
1785,1785,way,366826460,"[3708024661, 3708024665, 3708024677, 370802468...","[[40.4734153, -80.0209537], [40.4734781, -80.0...",service,,yes,,,Pittsburgh
192,192,way,11749562,"[104667658, 105200935, 105200941]","[[40.4549439, -79.8876957], [40.4546961, -79.8...",service,,yes,,,Pittsburgh
771,771,way,109353387,"[1252101696, 6325959025, 1252102247, 125210211...","[[40.4755834, -79.9568282], [40.4755026, -79.9...",service,yes,yes,,,Pittsburgh
1470,1470,way,340732969,"[360939188, 360939186, 360939184, 360939182, 3...","[[40.4460114, -79.9030646], [40.445948, -79.90...",cycleway,yes,designated,,designated,Pittsburgh
161,161,way,11734431,"[104667609, 105062455, 105029429]","[[40.4530834, -79.8884226], [40.4528174, -79.8...",service,,yes,,,Pittsburgh
1253,1253,way,302256387,"[105075863, 8026959671, 105711578, 2985643145,...","[[40.4457173, -79.9563859], [40.4456646, -79.9...",tertiary,yes,designated,asphalt,,Pittsburgh
2285,2285,way,550521575,"[5317313595, 5317313317]","[[40.4346955, -79.9732857], [40.434484, -79.97...",cycleway,yes,designated,concrete,no,Pittsburgh


In [35]:
for c in bike_lanes.columns:
    if c != 'nodes' and c!='geometry':
        print(f'{c}:')
        values = bike_lanes[c].unique()
        print(f'{values}')

index:
[   0    1    2 ... 2847 2848 2849]
type:
['way']
id:
[ 11651864  11652289  11652864 ... 932572338 934280874 934280875]
highway:
['secondary' 'service' 'path' 'residential' 'unclassified' 'tertiary'
 'cycleway' 'primary' None 'steps' 'footway' 'secondary_link'
 'construction' 'tertiary_link' 'pedestrian' 'primary_link' 'track'
 'busway']
lcn:
['yes' None]
bicycle:
[None 'yes' 'designated']
surface:
[None 'paved' 'asphalt' 'concrete' 'fine_gravel' 'sett' 'unpaved' 'gravel'
 'compacted' 'ground' 'dirt' 'paving_stones' 'brick' 'cobblestone' 'earth'
 'wood']
foot:
[None 'yes' 'designated' 'permissive' 'use_sidepath' 'no']
city:
['Pittsburgh']
