# Prepare example data

In order to illustrate the application of the tools contained here, some example datasets are provided. This Notebook outlines the pre-processing steps involved in preparing these datasets.

The data will be drawn from the [Crime Open Database (CODE)](https://osf.io/zyaqn/), maintained by Matt Ashby. This collates crime data from a number of open sources in a harmonised format. Snapshots of this data for several years were downloaded in CSV format.

The spatial data is provided in lat/lon format; here the PyProj library will be used to re-project the coordinates to metric units for distance calculations.

In [1]:
import pandas as pd
from pyproj import CRS, Transformer

For the test data, data from the city of **Chicago** will be used, for the offence category '**residential burglary/breaking & entering**'. Data is concatenated for 2014-2017, inclusive.

In [2]:
data14 = pd.read_csv("../data/crime_open_database_core_2014.csv", parse_dates=['date_single'])
data15 = pd.read_csv("../data/crime_open_database_core_2015.csv", parse_dates=['date_single'])
data16 = pd.read_csv("../data/crime_open_database_core_2016.csv", parse_dates=['date_single'])
data17 = pd.read_csv("../data/crime_open_database_core_2017.csv", parse_dates=['date_single'])
data = pd.concat([data14, data15, data16, data17], axis=0)
data = data[data['city_name'] == "Chicago"]
data = data[data['offense_type'] == "residential burglary/breaking & entering"]
data.shape

  has_raised = await self.run_ast_nodes(code_ast.body, cell_name,


(45319, 14)

The total number of incidents across the 4 years is 45,319.

The re-projection will use the [Illinois State Plane](http://www.spatialreference.org/ref/epsg/26971/) as the target reference system.

In [3]:
wgs84 = CRS.from_epsg(4326)
isp = CRS.from_epsg(26971)
transformer = Transformer.from_crs(wgs84, isp)

x, y = transformer.transform(data["latitude"].values, data["longitude"].values)
data = data.assign(x=x, y=y)

Finally, save the derived data in minimal form.

In [4]:
data.to_csv("../data/chicago_burglary_2014_2017.csv", 
            columns=['x','y','date_single'], 
            date_format='%d/%m/%Y', index=False)