This folder contains all of the data used in The Pudding essay A Tale of Two Cities published in March 2018.
Below you'll find metadata for each file.
top5_Seattle.csv & top5_NewYorkCity.csv
- What is this?: Data representing the 5 business types that over-index the most in each of Seattle's and New York City's neighborhoods (respectively).
- Source(s) & Methods: Each of Seattle and New York City's neighborhoods were identified using Zillow's neighborhood-level shapefiles. We then used thesed neighborhood names to query the Yelp API for any available Yelp business categories using a radius of 2 miles (3210m). Using the latitude/longitude coordinates, we mapped each business to a neighborhood, and compared the ratio of businesses from each category in each neighborhood to their ratio in the city as a whole (e.g., if hairdressers made up 10% of a particular neighborhood's businesses but only 5% of the businesses in the city overall, they were judged to "over-index" in that neighborhood). If a single business had more than one category (e.g.,
Sports Wear, and
Bikes) that business would be repeated three times, once for each category. Thus, a single business could be counted in more than one category, but never more than once in the same category.
- Last Modified: Seattle - February 15, 2018. New York City - March 10, 2018.
- Contact Information: Amber Thomas
- Spatial Applicability: These data apply to businesses inside of the Seattle and New York City limits, as defined by Zillow's neighborhood-level shapefiles.
- Temporal Applicability: These data represent a collection of all operating businesses with at least one review on Yelp at the time of collection (Seattle: February 13, 2018 - February 15, 2018, New York City: March 6, 2018 - March 10, 2018).
- Observations (Rows): Each row represents a single category within a single neighborhood.
- Variables (Columns):
||The plain text name of a neighborhood (as defined by Zillow)||text|
||The Yelp-specified string to denote a specific category. These strings contain no spaces or special characters. Full list||text|
||The cleaned and human-readable version of the
||The number of businesses in a specific neighborhood that match a specific
||The total number of businesses operating in that neighborhood.||number|
||The number of businesses in the entire city that match a specific
||The total number of businesses operating in the entire city.||number|
||The amount that a business-type over-indexes in a neighborhood compared to the city. This is calculated as (
||A ranking of the businesses that over-index the most in a specific neighborhood. The business that over-indexes the most has
- Other Notes: Any use of these data needs to include a link back to Yelp as attribution.