Analysis of Baltimore Shootings — The Trace and BuzzFeed News
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
inputs
notebooks
outputs
.gitignore
Pipfile
Pipfile.lock
README.md

README.md

Analysis of Baltimore Shootings

This repository includes methodologies, data, and code supporting the following articles, published by The Trace and BuzzFeed News:

  • "Shoot Someone In A Major US City, And Odds Are You’ll Get Away With It" (January 24, 2019) — The Trace / BuzzFeed News
  • "5 Things To Know About Cities’ Failure To Arrest Shooters" (January 24, 2019) — The Trace / BuzzFeed News

Click here for additional data and code from The Trace and BuzzFeed News's collaboration.

Data sources

The data used in these analysis came from three sources:

  • Baltimore PD's incident database
  • Baltimore PD's police-district-boundary data
  • Google Maps' Geocoding API

Incident database

In response to a public records request submitted by Sarah Ryley of The Trace, the Baltimore Police Department provided a series of spreadsheets detailing the victims and suspects for the following:

  • All homicides between 2012 and mid-2017
  • All non-fatal shootings between 2010 and mid-2017

The records were provided on August 2, 2017, and contained the following columns:

Column Description
DATABASE Either "Non Fatal Shooting" or "Homicide".
CASE_NUMBER The unique identifier assigned to the incident, e.g., "17V0351".
CC_NUMBER The "criminal complaint" number associated with the case. Not present for suspect files.
CASE_DATE The date associated with the case, "2017-07-26".
CASE_ADDRESS The address associated with the case, e.g, "1700 SHERWOOD AV". Not present for suspect files.
LAST_NAME The person's last name.
FIRST_NAME The person's first name.
DESCRIPTION Either "Victim", "Suspect", or "Person of Interest".
VICT_TYPE Either "Non Fatal Shooting" or "Homicide {x}", where {x} is "Shooting", "Stab/Cut", "Blunt Force", "Beating", "Asphyxiation", "Arson", "Auto", "Poison", or "Other". Not present
RACE The person's race.
SEX The person's gender.
DOB The person's date of birth. *
CASE_STATUS Either "Open," "Closed," or "Open/Warrant".

* The earliest date of birth in the database is from 1950, despite many people in the database being verifiably older than that. Conversely, about 0.9% of victims have DOBs in 2018 or later — impossible, given that no incident in the data occurred after 2017. What appears to have happened is that dates of birth before 1950 were pushed forward by 100 years, so that (for example) 1949 became 2049.

Police district shapefiles

The Baltimore Police Department publishes geospatial data, in the form of "shapefiles", detailing the boundaries of each of its nine police districts. The Trace and BuzzFeed News used these boundaries to determine the district in which the incidents above occurred.

Google Maps geocoding

The incident data did not include geocoordinates (latitude/longitude) for the incidents. It did, however, include street addresses for each incident. The Trace and BuzzFeed News submitted those addresses to Google Maps' Geocoding API to obtain geocoordinates. The file inputs/baltimore-case-addresses.csv contains the results of that geocoding. For each incident, the file contains the following columns:

Column Description
agency_ori Always "MDBPD0000"
agency_incident_id The unique identifier assigned to the incident, e.g., "17H0198"
location_address The address provided in Baltimore PD's data, e.g., "1200 GREENMOUNT AV"
location_address_adjusted The address submitted to the Geocoding API, e.g., "1200 GREENMOUNT AV, Baltimore, Maryland, USA"
geocoded_location_interpreted The Geocoding API's interpretation of the address, e.g., "1200 Greenmount Ave, Baltimore, MD 21202, USA"
geocoded_location_type The Geocoding API's determination of what type of address this is, e.g., "street_address", "premise", "route". If the API returned multiple values, they are
geocoding_imprecise A True/False indicator of whether the API returned a precise result (rather than, e.g., the centroid of the city).
lat The latitude returned by the Geocoding API
lng The longitude returned by the Geocoding API
tract_2010 The Census tract ID corresponding to the latitude/longitude in the 2010 Census
tract_state_2010 The state in which that Census tract resides
tract_2000 The Census tract ID corresponding to the latitude/longitude in the 2000 Census
tract_state_2000 The state in which that Census tract resides

Data standardization and anonymization

To aid the analysis, The Trace and BuzzFeed News standardized raw data, taking the following steps:

  • Selected only cases that occurred in 2012 or later, so that the analysis operates on the same timeframe for both fatal and non-fatal shootings
  • Excluded non-shooting homicides
  • Fixed one incorrect case number
  • Identified names that were vague/non-specific (e.g., "SUSPECT", "UNKNOWN", etc.).
  • Combined the victim and suspect spreadsheets into a single "persons" table
  • Constructed a person_uid based on the person's listed name and date of birth
  • Manually adjusted the person_uids for people whose listed names appear to involve typos

The file in inputs/baltimore-shootings-persons.csv contains the results of that data-standardization, with the following addition steps taken for the purposes of anonymization:

  • Removes the FIRST_NAME and LAST_NAME columns
  • Replaces the DOB column with an age_at_case column
  • Replaces the person_uid values with a cryptographic hash of the values (first 10 characters of SHA-256 with salt)

Data Analysis

The notebooks/analyze-baltimore-shootings.ipynb notebook, written in Python, takes the data output in the previous step, and does the following:

  • Calculates basic case trends and victim demographics
  • Identifies groups of shootings connected connected by a common suspect or victim
  • Calculates the number of people listed as a victim in one case and a suspect in another, and similar statistics
  • Identifies, whether, for persons shot in multiple incidents, their final recorded victimization was fatal or non-fatal
  • Counts the number of people listed as suspects in multiple cases, and the number of victims for those cases
  • Calculates the demographic groupings of mulitply-involved persons
  • Maps and spatially analyzes cases connected by a common victim/suspect

Licensing

All code in this repository is available under the MIT License. All the incident and address data files are available under the Creative Commons Attribution 4.0 International (CC BY 4.0) license. The police district shapefile data is published by the City of Baltimore with "no license specified".

Questions / Comments?

Please contact Jeremy Singer-Vine at jeremy.singer-vine@buzzfeed.com.