# Cleaning and harmonization

Data cleaning and harmonization were conducted to ensure internal consistency, reproducibility, and alignment across all datasets used in the analysis.

## Citi Bike trip data cleaning

Citi Bike trip data were cleaned with an emphasis on schema consistency and data quality.

### Schema standardization

* Trip data were loaded using a standardized schema.
* Column names were harmonized across vintages.
* Data types were enforced consistently for timestamps, identifiers, and numeric fields.

### Filtering and corrections

* Records with missing or invalid timestamps, station identifiers, or latitude/longitude values were removed.
* Trip duration was computed in minutes and seconds, and implausible outliers were excluded.
* Exact duplicate trip records were identified and removed.

### Station master table

* A canonical station master table was constructed.
* Station locations were spatially joined to census tracts and ZIP codes.
* Station identifier changes, relocations, and deprecations were reconciled to maintain longitudinal consistency.

## Cleaning of Contextual Datasets

All contextual datasets were cleaned and prepared to support temporal and spatial integration with the trip data.

### Weather data

* Timestamps were standardized to local time.
* Variable names were aligned across sources.
* Indicator variables were created for precipitation presence and extreme heat or cold conditions.

### Holiday data

* A consistent date or datetime key was ensured for all records.

### Air quality data

* Obvious outliers and erroneous dates were removed.
* AQI values were aggregated to daily resolution at both the borough and citywide levels.

### Traffic and bicycle counts

* Traffic and bicycle count data were cleaned and aggregated to daily or hourly resolution where supported.
* These datasets were prepared for subsequent station-proximal aggregation via spatial joins.

### ACS, PLUTO, and bicycle route data

* All spatial datasets were projected into a common coordinate reference system.
* Duplicate geometries were removed.
* Data were prepared for station-level spatial joins and feature construction.

In [1]:
import pandas as pd

pd.set_option('display.max_columns', None)

In [2]:
df_trip = pd.read_parquet('/Users/zoltanjelovich/Documents/ISEG/MFW/data/trip/year=2017/2017_01.parquet')
df_trip.head()

Unnamed: 0,ride_id,rideable_type,member_casual,birth_year,gender,gender_label,started_at,ended_at,tripduration,year,start_station_id,start_station_name,start_lat,start_lng,end_station_id,end_station_name,end_lat,end_lng
0,0,,member,1965.0,2,female,2017-01-01 00:00:21,2017-01-01 00:11:41,680,2017,3226,W 82 St & Central Park West,40.78275,-73.97137,3165,Central Park West & W 72 St,40.775794,-73.976206
1,1,,member,1987.0,2,female,2017-01-01 00:00:45,2017-01-01 00:22:08,1283,2017,3263,Cooper Square & Astor Pl,40.729515,-73.990753,498,Broadway & W 32 St,40.748549,-73.988084
2,2,,casual,,0,unknown,2017-01-01 00:00:57,2017-01-01 00:11:46,649,2017,3143,5 Ave & E 78 St,40.776321,-73.964274,3152,3 Ave & E 71 St,40.768737,-73.961199
3,3,,casual,,0,unknown,2017-01-01 00:01:10,2017-01-01 00:11:42,632,2017,3143,5 Ave & E 78 St,40.776321,-73.964274,3152,3 Ave & E 71 St,40.768737,-73.961199
4,4,,casual,,0,unknown,2017-01-01 00:01:25,2017-01-01 00:11:47,622,2017,3143,5 Ave & E 78 St,40.776321,-73.964274,3152,3 Ave & E 71 St,40.768737,-73.961199


In [3]:
df_station = pd.read_parquet('/Users/zoltanjelovich/Documents/ISEG/MFW/data/station/station_master.parquet')
df_station.head()

Unnamed: 0,station_id,station_name,first_year,last_year,years_active,is_real_station,lat,lng,borough_code,borough_name,nta_code,nta_name,tract_geoid,tract_name,state_fips,county_fips,tract_code,county_name,nearest_pluto_distance_m,pluto_zipcode,pluto_landuse,pluto_landuse_label,pluto_far,pluto_pct_residential,pluto_pct_commercial,pluto_units_density_per_10k_sqft,unique_names,unique_coords,missing_id,invalid_coords,suspect_coords,acs_population,acs_median_income,acs_pct_bike_commute,acs_pct_transit_commute,acs_pct_no_vehicle,acs_pct_bachelors_plus,acs_bike_commute_share,acs_transit_commute_share,acs_no_vehicle_share,acs_bachelors_plus_share,acs_pop_density,acs_bachelors_density,acs_no_vehicle_density,buffer_area,n_parcels,lotarea_total,bldgarea_total,resarea_total,comarea_total,officearea_total,retailarea_total,unitsres_total,unitstotal_total,far_mean,far_max,pct_residential_mean,pct_commercial_mean,units_density10k_mean,buffer_area_safe,lotarea_per_buffer,bldgarea_per_buffer,resarea_per_buffer,comarea_per_buffer,officearea_per_buffer,retailarea_per_buffer,unitsres_per_buffer,unitstotal_per_buffer,landuse_entropy,residential_share,commercial_share,industrial_share,institutional_share,dist_nearest_bikeroute_ft,dist_nearest_bikeroute_m,bike_lane_length_500m_m,bike_lane_length_500m_ft
0,116,W 17 St & 8 Ave,2017,2019,2,True,40.741776,-74.001497,1.0,Manhattan,MN0401,Chelsea-Hudson Yards,36061008300,Census Tract 83; New York County; New York,36,61,8300,New York,11.517767,10011,3.0,Multi-Family Elevator Buildings,7.158696,93.546918,12.906165,71.73913,1,1,False,False,False,3798.0,70986.0,2.272727,59.284333,82.726854,61.099302,0.022727,0.592843,0.827269,0.610993,0.002028,0.001239,0.001677,784137.122636,99.0,616937.0,3799954.0,1099082.0,2692671.0,2393621.0,189980.0,1415.0,1635.0,3.599714,13.087131,76.527105,30.188812,38.556401,784137.122636,0.786772,4.846033,1.401645,3.433929,3.052554,0.242279,0.001805,0.002085,0.483164,0.607605,0.392395,0.0,0.0,108.242234,32.992233,9134.617339,29969.216992
1,119,Park Ave & St Edwards St,2017,2019,3,True,40.696089,-73.978034,3.0,Brooklyn,BK0203,Fort Greene,36047002901,Census Tract 29.01; Kings County; New York,36,47,2901,Kings,19.929868,11205,10.0,Parking Facilities,0.0,,,0.0,1,1,False,False,False,3974.0,22943.0,0.0,66.666667,85.644938,13.169257,0.0,0.666667,0.856449,0.131693,0.003404,0.000448,0.002915,784137.122636,12.0,781143.0,1023100.0,762095.0,261005.0,130233.0,522.0,776.0,781.0,1.529815,3.528613,59.228609,41.542781,12.818844,784137.122636,0.996182,1.304746,0.97189,0.332856,0.166084,0.000666,0.00099,0.000996,-7.214116e-13,1.0,0.0,0.0,0.0,12.167774,3.708738,4984.633479,16353.784379
2,120,Lexington Ave & Classon Ave,2017,2019,3,True,40.686768,-73.959282,3.0,Brooklyn,BK0301,Bedford-Stuyvesant (West),36047023300,Census Tract 233; Kings County; New York,36,47,23300,Kings,7.329749,11238,5.0,Commercial & Office Buildings,1.12478,0.0,100.0,5.858231,1,1,False,False,False,6821.0,80000.0,4.929779,49.641731,56.444241,55.545287,0.049298,0.496417,0.564442,0.555453,0.003688,0.002049,0.002082,784137.122636,195.0,539802.0,1015726.0,854567.0,104706.0,26372.0,13193.0,804.0,833.0,1.707069,5.013764,81.158198,9.650598,14.96483,784137.122636,0.688403,1.295342,1.089818,0.13353,0.033632,0.016825,0.001025,0.001062,0.1511581,0.955873,0.015668,0.028459,0.0,83.007311,25.300628,4176.952532,13703.912506
3,127,Barrow St & Hudson St,2017,2019,3,True,40.731724,-74.006744,1.0,Manhattan,MN0203,West Village,36061006900,Census Tract 69; New York County; New York,36,61,6900,New York,21.634021,10014,4.0,Mixed Residential & Commercial Buildings,2.599308,80.830671,38.338658,41.522491,1,1,False,False,False,2491.0,231136.0,1.703801,34.993447,73.717949,88.251073,0.017038,0.349934,0.737179,0.882511,0.001054,0.00093,0.000777,784137.122636,121.0,516210.0,2254477.0,1450615.0,695185.0,261200.0,174215.0,1519.0,1610.0,3.109563,10.236949,83.812573,16.637462,29.159467,784137.122636,0.658316,2.875106,1.849951,0.886561,0.333105,0.222174,0.001937,0.002053,0.2614116,0.882262,0.117738,0.0,0.0,35.424687,10.797445,9186.235867,30138.569117
4,128,MacDougal St & Prince St,2017,2019,3,True,40.727103,-74.002971,1.0,Manhattan,MN0201,SoHo-Little Italy-Hudson Square,36061003700,Census Tract 37; New York County; New York,36,61,3700,New York,12.524124,10014,3.0,Multi-Family Elevator Buildings,5.713008,87.953259,24.093483,84.308328,1,1,False,False,False,3380.0,246250.0,1.201717,53.776824,73.879781,90.508354,0.012017,0.537768,0.738798,0.905084,0.000821,0.000743,0.000607,784137.122636,140.0,463028.0,1683675.0,1374597.0,281687.0,51037.0,106998.0,1986.0,2093.0,3.252841,10.576184,80.864782,19.826017,42.15774,784137.122636,0.590494,2.147169,1.753006,0.359232,0.065087,0.136453,0.002533,0.002669,0.04787624,0.987689,0.012311,0.0,0.0,92.331503,28.142642,8746.799894,28696.850046


In [4]:
df_weather = pd.read_parquet('/Users/zoltanjelovich/Documents/ISEG/MFW/data/weather/central_park.parquet')
df_weather.head()

Unnamed: 0,station_name,date,station,name,latitude,longitude,elevation,awnd,prcp,snow,snwd,tavg,tmax,tmin,wt01,wt02,wt03,wt04,wt05,wt06,wt08,wt07,wt09,precip_flag,snow_flag,snow_depth_flag,extreme_heat,extreme_cold,fog,winter_precip
0,central_park,2017-01-01,USW00094728,"NY CITY CENTRAL PARK, NY US",40.77898,-73.96925,42.7,5.59,0.0,0.0,0.0,44.0,48,40,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0
1,central_park,2017-01-02,USW00094728,"NY CITY CENTRAL PARK, NY US",40.77898,-73.96925,42.7,9.17,0.21,0.0,0.0,39.0,41,37,1,0,0,1,0,0,0,0,0,1,0,0,0,0,1,1
2,central_park,2017-01-03,USW00094728,"NY CITY CENTRAL PARK, NY US",40.77898,-73.96925,42.7,10.74,0.58,0.0,0.0,41.0,43,39,1,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0
3,central_park,2017-01-04,USW00094728,"NY CITY CENTRAL PARK, NY US",40.77898,-73.96925,42.7,8.05,0.0,0.0,0.0,43.0,52,34,1,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0
4,central_park,2017-01-05,USW00094728,"NY CITY CENTRAL PARK, NY US",40.77898,-73.96925,42.7,7.83,0.0,0.0,0.0,30.5,34,27,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [5]:
df_weighted = pd.read_parquet('/Users/zoltanjelovich/Documents/ISEG/MFW/data/weather/weighted.parquet')
df_weighted.head()

Unnamed: 0,date,tmin,tmax,tavg,prcp,snow,snwd,awnd,precip_flag,snow_flag,snow_depth_flag,extreme_heat,extreme_cold,fog,winter_precip,blend_method
0,2017-01-01,38.3,49.0,45.0,0.0,0.0,0.0,8.188,0,0,0,0,0,0,0,weighted_cp0.4_lga0.4_jfk0.1_ewr0.1
1,2017-01-02,36.6,41.7,39.4,0.216,0.0,0.0,10.379,1,0,0,0,0,1,1,weighted_cp0.4_lga0.4_jfk0.1_ewr0.1
2,2017-01-03,39.6,45.0,41.6,0.501,0.0,0.0,13.312,1,0,0,0,0,1,0,weighted_cp0.4_lga0.4_jfk0.1_ewr0.1
3,2017-01-04,34.8,53.3,45.1,0.0,0.0,0.0,12.93,0,0,0,0,0,1,0,weighted_cp0.4_lga0.4_jfk0.1_ewr0.1
4,2017-01-05,27.4,34.8,32.4,0.0,0.0,0.0,12.168,0,0,0,0,0,0,0,weighted_cp0.4_lga0.4_jfk0.1_ewr0.1


In [6]:
df_holidays = pd.read_parquet('/Users/zoltanjelovich/Documents/ISEG/MFW/data/calendar/federal_holidays.parquet')
df_holidays.head()

Unnamed: 0,holiday_name,holiday_short,holiday_id,day_of_week,date,is_weekend,is_holiday
0,New Year's Day,NYD,8,0,2017-01-02,False,True
1,Martin Luther King Jr. Day,MLK,6,0,2017-01-16,False,True
2,Washington's Birthday,PRES,11,0,2017-02-20,False,True
3,Memorial Day,MEM,7,0,2017-05-29,False,True
4,Independence Day,IND,3,1,2017-07-04,False,True


In [7]:
df_aqi_county = pd.read_parquet('/Users/zoltanjelovich/Documents/ISEG/MFW/data/environmental/aqi_by_county.parquet')
df_aqi_county.head()

Unnamed: 0,state_name,county_name,state_code,county_code,fips,date,aqi,category,defining_parameter
0,New York,Bronx,36,5,36005,2017-01-01,60,Moderate,PM2.5
1,New York,Bronx,36,5,36005,2017-01-02,55,Moderate,PM2.5
2,New York,Bronx,36,5,36005,2017-01-03,27,Good,PM2.5
3,New York,Bronx,36,5,36005,2017-01-04,52,Moderate,PM2.5
4,New York,Bronx,36,5,36005,2017-01-05,53,Moderate,PM2.5


In [8]:
df_aqi_city = pd.read_parquet('/Users/zoltanjelovich/Documents/ISEG/MFW/data/environmental/aqi_daily_citywide.parquet')
df_aqi_city.head()

Unnamed: 0,date,aqi_citywide
0,2017-01-01,61.0
1,2017-01-02,56.0
2,2017-01-03,34.8
3,2017-01-04,51.0
4,2017-01-05,53.6


In [9]:
df_traffic = pd.read_parquet('/Users/zoltanjelovich/Documents/ISEG/MFW/data/transportation/traffic_counts/traffic_counts.parquet')
df_traffic.head()

Unnamed: 0,request_id,borough,year,month,day,hour,minute,volume,segment_id,wkt_geom,street_name,from_street,to_street,direction,timestamp,geometry
0,25036,Brooklyn,2017,1,4,10,0,50,21399,POINT (987912.1134608461 174329.61681697544),FT HAMILTON PARKWAY,Chester Avenue,36 Street,WB,2017-01-04 10:00:00,b'\x01\x01\x00\x00\x00?\x8a\x17:\x10&.A\t\xbd=...
1,25036,Brooklyn,2017,1,4,10,15,208,21399,POINT (987912.1134608461 174329.61681697544),FT HAMILTON PARKWAY,Chester Avenue,36 Street,WB,2017-01-04 10:15:00,b'\x01\x01\x00\x00\x00?\x8a\x17:\x10&.A\t\xbd=...
2,25036,Brooklyn,2017,1,4,10,30,45,21399,POINT (987912.1134608461 174329.61681697544),FT HAMILTON PARKWAY,Chester Avenue,36 Street,EB,2017-01-04 10:30:00,b'\x01\x01\x00\x00\x00?\x8a\x17:\x10&.A\t\xbd=...
3,25036,Brooklyn,2017,1,4,10,30,185,21399,POINT (987912.1134608461 174329.61681697544),FT HAMILTON PARKWAY,Chester Avenue,36 Street,WB,2017-01-04 10:30:00,b'\x01\x01\x00\x00\x00?\x8a\x17:\x10&.A\t\xbd=...
4,25036,Brooklyn,2017,1,4,10,45,206,21399,POINT (987912.1134608461 174329.61681697544),FT HAMILTON PARKWAY,Chester Avenue,36 Street,EB,2017-01-04 10:45:00,b'\x01\x01\x00\x00\x00?\x8a\x17:\x10&.A\t\xbd=...


In [10]:
df_bicycle_counts_with_counters = pd.read_parquet('/Users/zoltanjelovich/Documents/ISEG/MFW/data/transportation/bicycle_counts/bicycle_counts_with_counters.parquet')
df_bicycle_counts_with_counters.head()

Unnamed: 0,count_record_id,counter_id,timestamp,count,status_code,counter_name,interval_minutes,sensor_type,counter_device_id,geometry,borough_code,borough_name
0,10277709,100009424,2017-01-01 00:00:00,4,4,2nd Avenue - 26th St S,15,5,,b'\x01\x01\x00\x00\x00\x1cn\xb1\x92\xbf5.A\xed...,1,Manhattan
1,10277710,100009424,2017-01-01 00:15:00,6,4,2nd Avenue - 26th St S,15,5,,b'\x01\x01\x00\x00\x00\x1cn\xb1\x92\xbf5.A\xed...,1,Manhattan
2,10277711,100009424,2017-01-01 00:30:00,6,4,2nd Avenue - 26th St S,15,5,,b'\x01\x01\x00\x00\x00\x1cn\xb1\x92\xbf5.A\xed...,1,Manhattan
3,10277712,100009424,2017-01-01 00:45:00,10,4,2nd Avenue - 26th St S,15,5,,b'\x01\x01\x00\x00\x00\x1cn\xb1\x92\xbf5.A\xed...,1,Manhattan
4,10277713,100009424,2017-01-01 01:00:00,16,4,2nd Avenue - 26th St S,15,5,,b'\x01\x01\x00\x00\x00\x1cn\xb1\x92\xbf5.A\xed...,1,Manhattan


In [11]:
df_acs = pd.read_parquet('/Users/zoltanjelovich/Documents/ISEG/MFW/data/demographics/acs_2018_2022.parquet')
df_acs.head()

Unnamed: 0,geoid,state_fips,county_fips,fips,tract_code,county_name,name,population,median_income,pct_bike_commute,pct_transit_commute,pct_no_vehicle,pct_bachelors_plus,bike_commute_share,transit_commute_share,no_vehicle_share,bachelors_plus_share,area,pop_density,bachelors_density,no_vehicle_density,borough_code,borough_name,centroid_x,centroid_y,geom,geom_srid
0,36005003700,36,5,36005,3700,Bronx,Census Tract 37; Bronx County; New York,331,,0.0,34.090909,86.702128,25.0,0.0,0.340909,0.867021,0.25,1787327.0,0.000185,4.6e-05,0.000161,2,Bronx,1008562.0,234924.861434,b'\x01\x03\x00\x00\x00\x01\x00\x00\x004\x00\x0...,2263
1,36005004400,36,5,36005,4400,Bronx,Census Tract 44; Bronx County; New York,4592,17319.0,0.0,52.659574,70.434783,5.759162,0.0,0.526596,0.704348,0.057592,2870570.0,0.0016,9.2e-05,0.001127,2,Bronx,1021213.0,240426.04009,b'\x01\x03\x00\x00\x00\x01\x00\x00\x00N\x00\x0...,2263
2,36005005100,36,5,36005,5100,Bronx,Census Tract 51; Bronx County; New York,6492,22850.0,0.0,59.183673,85.720339,10.660039,0.0,0.591837,0.857203,0.1066,5513652.0,0.001177,0.000126,0.001009,2,Bronx,1004294.0,235553.786791,"b""\x01\x03\x00\x00\x00\x01\x00\x00\x00p\x00\x0...",2263
3,36005005400,36,5,36005,5400,Bronx,Census Tract 54; Bronx County; New York,5715,48750.0,0.0,64.66877,66.198765,10.278114,0.0,0.646688,0.661988,0.102781,2131833.0,0.002681,0.000276,0.001775,2,Bronx,1017530.0,241657.368299,b'\x01\x03\x00\x00\x00\x01\x00\x00\x00$\x00\x0...,2263
4,36005007900,36,5,36005,7900,Bronx,Census Tract 79; Bronx County; New York,7086,34604.0,0.0,61.863676,82.80446,13.128991,0.0,0.618637,0.828045,0.13129,2126626.0,0.003332,0.000437,0.002759,2,Bronx,1010583.0,236505.815582,b'\x01\x03\x00\x00\x00\x01\x00\x00\x00-\x00\x0...,2263


In [12]:
# Raw, cleaned parcel-level PLUTO data
# Each row = one tax lot (parcel)
df_pluto = pd.read_parquet('/Users/zoltanjelovich/Documents/ISEG/MFW/data/demographics/pluto.parquet')
df_pluto.head()

Unnamed: 0,bbl,borough_code,borough_name,zipcode,latitude,longitude,landuse,landuse_label,zonedist1,lotarea,bldgarea,resarea,comarea,officearea,retailarea,numbldgs,numfloors,unitsres,unitstotal,yearbuilt,yearalter1,far,pct_residential,pct_commercial,units_density_per_10k_sqft,geom_point,geom_point_srid
0,4064210038,QN,Queens,11355.0,40.743955,-73.819475,1.0,One & Two Family Buildings,R4B,1800,1296.0,1296.0,0.0,0.0,0.0,1,2.0,1.0,1.0,1950,0,0.72,100.0,0.0,5.555556,b'\x01\x01\x00\x00\x00NX\x87HrtR\xc0w\x99\x06\...,4326
1,4051750020,QN,Queens,11355.0,40.744709,-73.819221,1.0,One & Two Family Buildings,R4B,2000,1260.0,1260.0,0.0,0.0,0.0,1,2.0,2.0,2.0,1940,0,0.63,100.0,0.0,10.0,b'\x01\x01\x00\x00\x00hM=\x1cntR\xc0vCd\xa2R_D@',4326
2,4051730111,QN,Queens,11355.0,40.745663,-73.819727,1.0,One & Two Family Buildings,R4,2506,3386.0,2258.0,0.0,0.0,0.0,1,2.0,2.0,2.0,1935,2018,1.351157,66.686356,0.0,7.980846,b'\x01\x01\x00\x00\x00\xb2\xc9\xd0gvtR\xc0C\xa...,4326
3,4051740016,QN,Queens,11355.0,40.74537,-73.820287,1.0,One & Two Family Buildings,R3X,3800,2285.0,2285.0,0.0,0.0,0.0,2,2.0,2.0,2.0,1930,2023,0.601316,100.0,0.0,5.263158,b'\x01\x01\x00\x00\x00*\xf5v\x95\x7ftR\xc0\xf4...,4326
4,4064130046,QN,Queens,11355.0,40.744404,-73.820864,1.0,One & Two Family Buildings,R4,2200,974.0,974.0,0.0,0.0,0.0,1,1.0,2.0,2.0,1950,0,0.442727,100.0,0.0,9.090909,b'\x01\x01\x00\x00\x00\xba\xbe\x0f\x07\x89tR\x...,4326


In [13]:
# Tract-level built environment summary, keyed by geoid
# For each Census tract, summarizes the PLUTO lots that fall inside it
# Each row = one Census tract
df_pluto_tract = pd.read_parquet('/Users/zoltanjelovich/Documents/ISEG/MFW/data/demographics/pluto_by_tract.parquet')
df_pluto_tract.head()

Unnamed: 0,geoid,borough_name,borough_code,n_lots,total_lotarea,total_bldgarea,total_resarea,total_comarea,total_officearea,total_retailarea,total_numbldgs,avg_numfloors,total_unitsres,total_unitstotal,avg_yearbuilt,max_yearbuilt,max_yearalter1,avg_far,avg_pct_residential,avg_pct_commercial,avg_units_density_per_10k_sqft
0,36047006800,Brooklyn,3,480,1399348.0,2373525.0,1640108.0,605656.0,99618.0,199707.0,527.0,2.446316,1832.0,2022.0,1900.1,2015.0,2021.0,1.485578,77.084496,21.320074,13.541736
1,36081101002,Queens,4,664,3953047.0,2723026.0,2327011.0,322085.0,21172.0,1800.0,730.0,2.040331,1781.0,1847.0,1804.016566,2024.0,2016.0,0.580671,93.415603,2.287156,3.345944
2,36085014608,Staten Island,5,1068,3358872.0,1743301.0,1555254.0,130556.0,0.0,2144.0,1071.0,2.055081,1269.0,1273.0,1938.470037,2024.0,2021.0,0.580891,97.832062,0.286533,4.416022
3,36085027704,Staten Island,5,1019,4467859.0,2003930.0,1970422.0,17747.0,9749.0,0.0,1040.0,2.12463,1471.0,1481.0,1971.77527,2024.0,2020.0,0.598912,98.838905,0.75785,4.458413
4,36005042902,Bronx,2,155,703855.0,1833598.0,1572843.0,235804.0,21044.0,51515.0,172.0,3.143791,1716.0,1756.0,1903.929032,2022.0,2023.0,1.949838,89.241783,6.675164,19.142956


In [14]:
# Master tract-level feature layer
# One row per tract, with ACS demographics, commuting, car ownership, education, built environment, geometry, and centroids
df_acs_pluto_tract = pd.read_parquet('/Users/zoltanjelovich/Documents/ISEG/MFW/data/demographics/acs_pluto_tract_master.parquet')
df_acs_pluto_tract.head()

Unnamed: 0,geoid,state_fips,county_fips,fips,tract_code,county_name,name,population,median_income,pct_bike_commute,pct_transit_commute,pct_no_vehicle,pct_bachelors_plus,bike_commute_share,transit_commute_share,no_vehicle_share,bachelors_plus_share,area,pop_density,bachelors_density,no_vehicle_density,borough_code,borough_name,centroid_x,centroid_y,geom,geom_srid,n_lots,total_lotarea,total_bldgarea,total_resarea,total_comarea,total_officearea,total_retailarea,total_numbldgs,avg_numfloors,total_unitsres,total_unitstotal,avg_yearbuilt,max_yearbuilt,max_yearalter1,avg_far,avg_pct_residential,avg_pct_commercial,avg_units_density_per_10k_sqft
0,36005003700,36,5,36005,3700,Bronx,Census Tract 37; Bronx County; New York,331,,0.0,34.090909,86.702128,25.0,0.0,0.340909,0.867021,0.25,1787327.0,0.000185,4.6e-05,0.000161,2,Bronx,1008562.0,234924.861434,b'\x01\x03\x00\x00\x00\x01\x00\x00\x004\x00\x0...,2263,12,1626813.0,155999.0,91234.0,63929.0,0.0,1200.0,12.0,3.928571,96.0,100.0,1124.0,1996,1991,1.12511,61.197845,35.298131,10.456266
1,36005004400,36,5,36005,4400,Bronx,Census Tract 44; Bronx County; New York,4592,17319.0,0.0,52.659574,70.434783,5.759162,0.0,0.526596,0.704348,0.057592,2870570.0,0.0016,9.2e-05,0.001127,2,Bronx,1021213.0,240426.04009,b'\x01\x03\x00\x00\x00\x01\x00\x00\x00N\x00\x0...,2263,104,1948163.0,1907996.0,1549020.0,337680.0,2718.0,81588.0,148.0,2.618557,1942.0,1975.0,1846.548077,2022,2022,1.064221,74.457403,18.27049,10.862352
2,36005005100,36,5,36005,5100,Bronx,Census Tract 51; Bronx County; New York,6492,22850.0,0.0,59.183673,85.720339,10.660039,0.0,0.591837,0.857203,0.1066,5513652.0,0.001177,0.000126,0.001009,2,Bronx,1004294.0,235553.786791,"b""\x01\x03\x00\x00\x00\x01\x00\x00\x00p\x00\x0...",2263,159,3663178.0,6879947.0,3977032.0,2779286.0,751643.0,178899.0,173.0,4.093254,4169.0,4428.0,1543.597484,2023,2024,2.066074,22.124103,77.380365,11.909948
3,36005005400,36,5,36005,5400,Bronx,Census Tract 54; Bronx County; New York,5715,48750.0,0.0,64.66877,66.198765,10.278114,0.0,0.646688,0.661988,0.102781,2131833.0,0.002681,0.000276,0.001775,2,Bronx,1017530.0,241657.368299,b'\x01\x03\x00\x00\x00\x01\x00\x00\x00$\x00\x0...,2263,427,1403327.0,2118452.0,1686942.0,400629.0,36334.0,129539.0,571.0,2.347418,1750.0,1861.0,1922.480094,1991,2023,1.319076,90.291449,7.971599,12.778176
4,36005007900,36,5,36005,7900,Bronx,Census Tract 79; Bronx County; New York,7086,34604.0,0.0,61.863676,82.80446,13.128991,0.0,0.618637,0.828045,0.13129,2126626.0,0.003332,0.000437,0.002759,2,Bronx,1010583.0,236505.815582,b'\x01\x03\x00\x00\x00\x01\x00\x00\x00-\x00\x0...,2263,189,1362638.0,3042707.0,2218469.0,779093.0,288998.0,116700.0,189.0,3.666667,2770.0,2908.0,1745.417989,2024,2024,2.042675,79.350013,20.188682,22.313939


In [15]:
df_bicycle_routes = pd.read_parquet('/Users/zoltanjelovich/Documents/ISEG/MFW/data/transportation/bicycle_routes/bicycle_routes.parquet')
df_bicycle_routes.head()

Unnamed: 0,geom_wkt,geom,segment_id,bike_id,prev_bike_id,status,borough_code,borough_name,street,from_street,to_street,on_off_st,facility_class,all_classes,bike_dir,lane_count,ft_facility,tf_facility,install_date,retire_date,install_year,retire_year,greenway,greenway_system,spur,greenway_jurisdiction,length_m,facility_type
0,MULTILINESTRING ((-74.192429667502 40.52174149...,b'\x01\x05\x00\x00\x00\x01\x00\x00\x00\x01\x02...,2579,6562,,Current,5,Staten Island,HYLAN BLVD,HOLTEN AV,LUTEN AV,ON,II,II,2,2,Curbside Buffered,Curbside Buffered,2007-10-01,,2007.0,,Greenway,Staten Island Waterfront,Main Alignment,NYCDOT,5223.409416,Curbside Buffered
1,MULTILINESTRING ((-74.16031384424598 40.589027...,b'\x01\x05\x00\x00\x00\x01\x00\x00\x00\x01\x02...,5033,4272,,Current,5,Staten Island,MERRYMOUNT ST,RICHMOND HILL RD,ROCKLAND AV,ON,II,II,2,2,Conventional Buffered,Conventional Buffered,2021-08-12,,2021.0,,,,,,1830.420338,Conventional Buffered
2,MULTILINESTRING ((-74.12631986979562 40.635252...,b'\x01\x05\x00\x00\x00\x01\x00\x00\x00\x01\x02...,10186,2107,,Current,5,Staten Island,CLOVE ROAD,RICHMOND TERR,FOREST AVE,ON,III,III,2,2,Shared,Shared,2015-09-11,,2015.0,,,,,,975.296345,Shared
3,MULTILINESTRING ((-74.00973537112554 40.645666...,b'\x01\x05\x00\x00\x00\x01\x00\x00\x00\x01\x02...,20716,942,,Current,3,Brooklyn,5 AV,23 ST,50 ST,ON,III,III,2,2,Shared,Shared,2013-07-02,,2013.0,,,,,,1104.603353,Shared
4,MULTILINESTRING ((-74.02089492413252 40.626542...,b'\x01\x05\x00\x00\x00\x01\x00\x00\x00\x01\x02...,126857,951,,Current,3,Brooklyn,6 AVENUE,67 ST,FT HAMILTON PKWY,ON,III,III,2,2,Shared,Shared,2015-06-29,,2015.0,,,,,,198.751034,Shared


In [16]:
df_bicycle_routes_yearly = pd.read_parquet('/Users/zoltanjelovich/Documents/ISEG/MFW/data/transportation/bicycle_routes/bicycle_routes_yearly.parquet')
df_bicycle_routes_yearly.head()

Unnamed: 0,segment_id,bike_id,prev_bike_id,status,borough_code,borough_name,street,from_street,to_street,on_off_st,facility_class,all_classes,bike_dir,lane_count,ft_facility,tf_facility,install_date,retire_date,install_year,retire_year,greenway,greenway_system,spur,greenway_jurisdiction,length_m,facility_type,year
0,2579,6562,,Current,5,Staten Island,HYLAN BLVD,HOLTEN AV,LUTEN AV,ON,II,II,2,2,Curbside Buffered,Curbside Buffered,2007-10-01,,2007,,Greenway,Staten Island Waterfront,Main Alignment,NYCDOT,5223.409416,Curbside Buffered,2025
1,5033,4272,,Current,5,Staten Island,MERRYMOUNT ST,RICHMOND HILL RD,ROCKLAND AV,ON,II,II,2,2,Conventional Buffered,Conventional Buffered,2021-08-12,,2021,,,,,,1830.420338,Conventional Buffered,2025
2,10186,2107,,Current,5,Staten Island,CLOVE ROAD,RICHMOND TERR,FOREST AVE,ON,III,III,2,2,Shared,Shared,2015-09-11,,2015,,,,,,975.296345,Shared,2025
3,20716,942,,Current,3,Brooklyn,5 AV,23 ST,50 ST,ON,III,III,2,2,Shared,Shared,2013-07-02,,2013,,,,,,1104.603353,Shared,2025
4,126857,951,,Current,3,Brooklyn,6 AVENUE,67 ST,FT HAMILTON PKWY,ON,III,III,2,2,Shared,Shared,2015-06-29,,2015,,,,,,198.751034,Shared,2025
