# Get Bi-Annual Pedestrian Counts from NYC

We are going to use the New York City pedestrian count data to verify the feasibility of using CNN for measuring pedestrian volume. For NYC pedestrian data see below:

> An index of pedestrian volumes tracking the long-term trends of neighborhood commercial corridors. Data is collected at 114 locations, including 100 on-street locations (primarily retail corridors), 13 East River and Harlem River bridge locations, and the Hudson River Greenway. Screenline sampling is conducted during May and September on the sidewalk, mid-block (or mid-bridge) on both sides of street where applicable. Pedestrian volumes at 50 sample locations around the City are combined to create the Pedestrian Volume Index for the Mayor’s Management Report. Click here for metadata - http://www.nyc.gov/html/dot/downloads/pdf/bi-annual-ped-count-readme.pdf

> from https://data.cityofnewyork.us/Transportation/Bi-Annual-Pedestrian-Counts/2de2-6x2h/about

## streetscape

I'm going to use my streetscape package to collect google street views. https://github.com/yonghah/streetscape

## import libraries

In [1]:
import os
import glob

import matplotlib.pyplot as plt
import matplotlib.image as mpimg

import pandas as pd
import streetscape as ss
from shapely.geometry import Point


## get the dataset from NYC open data

In [2]:
query = ("https://data.cityofnewyork.us/resource/cqsj-cfgu.json")
df = pd.read_json(query)
df.head()

Unnamed: 0,borough,the_geom,objectid,loc,street_nam,from_stree,to_street,index,may_07_am,may_07_pm,...,sept_17_pm,sept_17_md,may_18_am,may_18_pm,may_18_md,sept_18_pm,sept_18_md,may_19_am,may_19_pm,may_19_md
0,Bronx,"{'type': 'Point', 'coordinates': [-73.90459140...",1,1,Broadway,West 231st Street,Naples Terrace,N,1189,4094,...,4044,2731,1271,4502,2899,4464,2967,-,-,-
1,Bronx,"{'type': 'Point', 'coordinates': [-73.92188432...",2,2,East 161st Street,Grand Concourse,Sheridan Avenue,Y,1511,3184,...,5952,2832,1749,5148,2156,4723,1604,1702,4347,1576
2,Bronx,"{'type': 'Point', 'coordinates': [-73.89535781...",3,3,East Fordham Road,Valentine Avenue,Tiebout Avenue,Y,1832,12311,...,12388,7076,2209,9634,7066,8931,6212,1625,11739,7468
3,Bronx,"{'type': 'Point', 'coordinates': [-73.87892467...",4,4,East Gun Hill Road,Bainbridge Avenue,Rochambeau Avenue,N,764,2673,...,3429,1551,1648,2892,1323,2682,1693,-,-,-
4,Bronx,"{'type': 'Point', 'coordinates': [-73.88956389...",5,5,East Tremont Avenue,Prospect Avenue,Clinton Avenue,N,650,2538,...,3330,2479,1016,3781,2565,3761,2461,-,-,-


In [42]:
df.iloc[1].the_geom

{'type': 'Point', 'coordinates': [-73.92188432870219, 40.82662794123289]}

### convert json geometry to geopandas geometry
This dataset's geometry is json format. Let's convert this to geopandas geoseries (shapely format) to utilize streetscape's get_street_views_from_df function.

In [3]:
df['geometry']  = df['the_geom'].apply(lambda r: Point(r['coordinates']))
df['pano_id'] = None
df['index'] = df['loc']

In [4]:
len(df['loc'].unique())

114

### get street view images
18 images per one location (FOV=20) with 2 degree overlap

In [35]:
di = ss.make_gsv_urls(df, npics=18, size=600, pad=2)

Total 2052 urls created.


In [14]:
import nest_asyncio
nest_asyncio.apply()

In [27]:
def download_gsvs(gsv_df, save_dir='', max_conn=50, max_sem=10, timeout=0):
    ''' asynchrounously retrieve gsv images
    Args:
        gsv_df (DataFrame): dataframe for download urls for each image
        save_dir (str): directory for downloaded images
        max_conn (int): number of concurrent connections
        max_sem (int): maximum number of semaphores
        timeout (int): maximum total running time (0 unlimited)
    '''
    key = os.environ['GSV_API_KEY']
    
    async def fetch(session, gsv, sem):
        url = gsv['gsv_url'] + "&key=" + key
        filename = os.path.join(save_dir, gsv['gsv_name'])

        async with session.get(url) as response:
            async with sem:
                with open(filename, 'wb') as f_handle:
                    while True:
                        chunk = await response.content.read(1024)
                        if not chunk:
                            break
                        f_handle.write(chunk)
                return await response.release()

    async def fetch_all(gsvs, loop):
        conn = aiohttp.TCPConnector(limit=max_conn)
        timeout_c = aiohttp.ClientTimeout(total=timeout)  
        sem = asyncio.Semaphore(max_sem)

        async with aiohttp.ClientSession(
            loop=loop, connector=conn, timeout=timeout_c) as session:
            
            tasks = list()
            for gsv in gsvs:
                task = asyncio.ensure_future(fetch(session, gsv, sem))
                tasks.append(task)
            
            results = await asyncio.gather(*tasks, return_exceptions=True)
            
    loop = asyncio.get_event_loop()
    loop.run_until_complete(fetch_all(gsv_df.to_records(), loop))



In [36]:
download_gsvs(di, save_dir='../data/gsv')