d-sandbox

<div style="text-align: center; line-height: 0; padding-top: 9px;">
  <img src="https://databricks.com/wp-content/uploads/2018/03/db-academy-rgb-1200px.png" alt="Databricks Learning" style="width: 600px; height: 163px">
</div>

# Project: Exploratory Data Analysis
Perform exploratory data analysis (EDA) to gain insights from a data lake.

## Audience
* Primary Audience: Data Analysts
* Additional Audiences: Data Engineers and Data Scientists

## Prerequisites
* Web browser: **Chrome**
* A cluster configured with **8 cores** and **DBR 6.2**
* Familiarity with <a href="https://www.w3schools.com/sql/" target="_blank">ANSI SQL</a> is required
* Suggested Courses from <a href="https://academy.databricks.com/" target="_blank">Databricks Academy</a>:
  - Spark-SQL

## Instructions

In `dbfs:/mnt/training/crime-data-2016`, there are a number of Parquet files containing 2016 crime data from seven United States cities:

* New York
* Los Angeles
* Chicago
* Philadelphia
* Dallas
* Boston


The data is cleaned up a little but has not been normalized. Each city reports crime data slightly differently, so you have to
examine the data for each city to determine how to query it properly.

Your job is to use some of this data to gain insights about certain kinds of crimes.

## ![Spark Logo Tiny](https://files.training.databricks.com/images/105/logo_spark_tiny.png) Classroom-Setup

For each lesson to execute correctly, please make sure to run the **`Classroom-Setup`** cell at the<br/>
start of each lesson (see the next cell) and the **`Classroom-Cleanup`** cell at the end of each lesson.

In [4]:
%run ./Includes/Classroom-Setup

-sandbox
## Step 1

Start by creating temporary views for Los Angeles, Philadelphia, and Dallas

Use `CREATE TEMPORARY VIEW` to create named views for the files you choose. Use a similar syntax as `CREATE TABLE`:

```
CREATE OR REPLACE TEMPORARY VIEW name
  USING parquet
  OPTIONS (
    ...
  )
```

Use the following view names:

| City          | Table Name              | Path to DBFS file
| ------------- | ----------------------- | -----------------
| Los Angeles   | `CrimeDataLosAngeles`   | `dbfs:/mnt/training/crime-data-2016/Crime-Data-Los-Angeles-2016.parquet`
| Philadelphia  | `CrimeDataPhiladelphia` | `dbfs:/mnt/training/crime-data-2016/Crime-Data-Philadelphia-2016.parquet`
| Dallas        | `CrimeDataDallas`       | `dbfs:/mnt/training/crime-data-2016/Crime-Data-Dallas-2016.parquet`



<img alt="Hint" title="Hint" style="vertical-align: text-bottom; position: relative; height:1.75em; top:0.3em" src="https://files.training.databricks.com/static/images/icon-light-bulb.svg"/>&nbsp;**Hint:** You learned how to create a table from an external file in [Lesson 3]($./03-Accessing-Data). The syntax is exactly the same, except that you use `CREATE OR REPLACE TEMPORARY VIEW` instead of `CREATE TABLE IF EXISTS`.

#### Los Angeles

In [7]:
%sql
CREATE OR REPLACE TEMPORARY VIEW CrimeDataLosAngeles
  USING parquet
  OPTIONS (
    path "dbfs:/mnt/training/crime-data-2016/Crime-Data-Los-Angeles-2016.parquet"
  )
  
  


In [8]:
# TEST - Run this cell to test your solution.

rowsLosAngeles = spark.sql('SELECT count(*) FROM CrimeDataLosAngeles').collect()[0][0]
dbTest("SQL-L7-crimeDataLA-count", 217945, rowsLosAngeles)

print("Tests passed!")

#### Philadelphia

In [10]:
%sql
-- TODO
CREATE OR REPLACE TEMPORARY VIEW CrimeDataPhiladelphia
  USING parquet
  OPTIONS (
    path "dbfs:/mnt/training/crime-data-2016/Crime-Data-Philadelphia-2016.parquet"
  )

In [11]:
# TEST - Run this cell to test your solution.

rowsPhiladelphia = spark.sql('SELECT count(*) FROM CrimeDataPhiladelphia').collect()[0][0]
dbTest("SQL-L7-crimeDataPA-count", 168664, rowsPhiladelphia)

print("Tests passed!")

#### Dallas

In [13]:
%sql
CREATE OR REPLACE TEMPORARY VIEW CrimeDataDallas
  USING parquet
  OPTIONS (
    path "dbfs:/mnt/training/crime-data-2016/Crime-Data-Dallas-2016.parquet"
  )

In [14]:
# TEST - Run this cell to test your solution.

rowsDallas = spark.sql('SELECT count(*) FROM CrimeDataDallas').collect()[0][0]
dbTest("SQL-L7-crimeDataDAL-count", 99642, rowsDallas) 

print("Tests passed!")

-sandbox
## Step 2

For each table, examine the data to figure out how to extract _robbery_ statistics.

<img alt="Side Note" title="Side Note" style="vertical-align: text-bottom; position: relative; height:1.75em; top:0.05em; transform:rotate(15deg)" src="https://files.training.databricks.com/static/images/icon-note.webp"/> Each city uses different values to indicate robbery. Some cities use "larceny", "burglary", and "robbery".  These challenges are common in data lakes.  To simplify things, restrict yourself to only the word "robbery" (and not attempted-roberty, larceny, or burglary).

Explore the data for the three cities until you understand how each city records robbery information. If you don't want to worry about upper- or lower-case, remember that SQL has a `LOWER()` function that converts a column's value to lowercase.

Create a temporary view containing only the robbery-related rows, as shown in the table below.

<img alt="Hint" title="Hint" style="vertical-align: text-bottom; position: relative; height:1.75em; top:0.3em" src="https://files.training.databricks.com/static/images/icon-light-bulb.svg"/>&nbsp;**Hint:** For each table, focus your efforts on the column listed below.

Focus on the following columns for each table:

| Table Name              | Robbery View Name     | Column
| ----------------------- | ----------------------- | -------------------------------
| `CrimeDataLosAngeles`   | `RobberyLosAngeles`   | `crimeCodeDescription`
| `CrimeDataPhiladelphia` | `RobberyPhiladelphia` | `ucr_general_description`
| `CrimeDataDallas`       | `RobberyDallas`       | `typeOfIncident`

#### Los Angeles

In [17]:
%sql
CREATE OR REPLACE TEMPORARY VIEW RobberyLosAngeles AS
  SELECT *
  FROM CrimeDataLosAngeles
  WHERE lower(crimeCodeDescription) LIKE 'robbery%' 


In [18]:
# TEST - Run this cell to test your solution.

totalLosAngeles = spark.sql("SELECT count(*) AS total FROM RobberyLosAngeles").collect()[0].total
dbTest("SQL-L7-robberyDataLA-count", 9048, totalLosAngeles)

print("Tests passed!")

#### Philadelphia

In [20]:
%sql
CREATE OR REPLACE TEMPORARY VIEW RobberyPhiladelphia AS
  SELECT *
  FROM CrimeDataPhiladelphia
  WHERE lower(ucr_general_description) LIKE 'robbery%'

In [21]:
# TEST - Run this cell to test your solution.

totalPhiladelphia = spark.sql("SELECT count(*) AS total FROM RobberyPhiladelphia").collect()[0].total
dbTest("SQL-L7-robberyDataPA-count", 6149, totalPhiladelphia)

print("Tests passed!")

#### Dallas

In [23]:
%sql
CREATE OR REPLACE TEMPORARY VIEW RobberyDallas AS
  SELECT *
  FROM CrimeDataDallas
  WHERE lower(typeOfIncident) LIKE 'robbery%'

In [24]:
# TEST - Run this cell to test your solution.

totalDallas = spark.sql("SELECT count(*) AS total FROM RobberyDallas").collect()[0].total
dbTest("SQL-L7-robberyDataDAL-count", 6824, totalDallas)

print("Tests passed!")

-sandbox
## Step 3

Now that you have views of only the robberies in each city, create temporary views for each city, summarizing the number of robberies in each month.

Your views must contain two columns:
* `month`: The month number (e.g., 1 for January, 2 for February, etc.)
* `robberies`: The total number of robberies in the month

Use the following temporary view names and date columns:


| City          | View Name     | Date Column 
| ------------- | ------------- | -------------
| Los Angeles   | `RobberiesByMonthLosAngeles` | `timeOccurred`
| Philadelphia  | `RobberiesByMonthPhiladelphia` | `dispatch_date_time`
| Dallas        | `RobberiesByMonthDallas` | `startingDateTime`

<img alt="Side Note" title="Side Note" style="vertical-align: text-bottom; position: relative; height:1.75em; top:0.05em; transform:rotate(15deg)" src="https://files.training.databricks.com/static/images/icon-note.webp"/> For each city, figure out which column contains the date of the incident. Then, extract the month from that date.

#### Los Angeles

In [27]:
%sql
Create or Replace temporary View RobberiesByMonthLosAngeles AS
--Describe RobberyLosAngeles
Select month(timeOccurred) month, Count(*) robberies
From RobberyLosAngeles
Group by month(timeOccurred);

Select * From RobberiesByMonthLosAngeles

month,robberies
12,853
1,719
6,698
3,709
5,790
9,722
4,713
8,765
7,826
10,814


In [28]:
# TEST - Run this cell to test your solution.

rows = spark.sql("SELECT month, robberies FROM RobberiesByMonthLosAngeles ORDER BY month").collect()
la = [ f"{r[0]}: {r[1]}" for r in rows ]

dbTest("SQL-L7-robberyByMonthLA-counts-1", "1: 719", la[0])
dbTest("SQL-L7-robberyByMonthLA-counts-2", "2: 675", la[1])
dbTest("SQL-L7-robberyByMonthLA-counts-3", "3: 709", la[2])
dbTest("SQL-L7-robberyByMonthLA-counts-4", "4: 713", la[3])
dbTest("SQL-L7-robberyByMonthLA-counts-5", "5: 790", la[4])
dbTest("SQL-L7-robberyByMonthLA-counts-6", "6: 698", la[5])
dbTest("SQL-L7-robberyByMonthLA-counts-7", "7: 826", la[6])
dbTest("SQL-L7-robberyByMonthLA-counts-8", "8: 765", la[7])
dbTest("SQL-L7-robberyByMonthLA-counts-9", "9: 722", la[8])
dbTest("SQL-L7-robberyByMonthLA-counts-10", "10: 814", la[9])
dbTest("SQL-L7-robberyByMonthLA-counts-11", "11: 764", la[10])
dbTest("SQL-L7-robberyByMonthLA-counts-12", "12: 853", la[11])

print("Tests passed!")

#### Philadelphia

In [30]:
%sql
Describe RobberyPhiladelphia

col_name,data_type,comment
district,int,
dispatch_date_time,timestamp,
dispatch_date,timestamp,
dispatch_time,string,
hour,int,
unique_id,bigint,
location_block,string,
ucr_general,int,
text_general_code,string,
point_x,double,


In [31]:
%sql
--Describe RobberyPhiladelphia

Create or Replace temporary View RobberiesByMonthPhiladelphia AS
Select month(dispatch_date_time) month, Count(*) robberies
From RobberyPhiladelphia
Group by month(dispatch_date_time);

Select * From RobberiesByMonthPhiladelphia

month,robberies
12,544
1,520
6,509
3,432
5,533
9,514
4,466
8,561
7,537
10,572


In [32]:
# TEST - Run this cell to test your solution.

rows = spark.sql("SELECT month, robberies FROM RobberiesByMonthPhiladelphia ORDER BY month").collect()
philadelphia = [ f"{r[0]}: {r[1]}" for r in rows ]

dbTest("SQL-L7-robberyByMonthPA-counts-1", "1: 520", philadelphia[0])
dbTest("SQL-L7-robberyByMonthPA-counts-2", "2: 416", philadelphia[1])
dbTest("SQL-L7-robberyByMonthPA-counts-3", "3: 432", philadelphia[2])
dbTest("SQL-L7-robberyByMonthPA-counts-4", "4: 466", philadelphia[3])
dbTest("SQL-L7-robberyByMonthPA-counts-5", "5: 533", philadelphia[4])
dbTest("SQL-L7-robberyByMonthPA-counts-6", "6: 509", philadelphia[5])
dbTest("SQL-L7-robberyByMonthPA-counts-7", "7: 537", philadelphia[6])
dbTest("SQL-L7-robberyByMonthPA-counts-8", "8: 561", philadelphia[7])
dbTest("SQL-L7-robberyByMonthPA-counts-9", "9: 514", philadelphia[8])
dbTest("SQL-L7-robberyByMonthPA-counts-10", "10: 572", philadelphia[9])
dbTest("SQL-L7-robberyByMonthPA-counts-11", "11: 545", philadelphia[10])
dbTest("SQL-L7-robberyByMonthPA-counts-12", "12: 544", philadelphia[11])

print("Tests passed!")

#### Dallas

In [34]:
%sql
Describe RobberyDallas

col_name,data_type,comment
incidentNumberWithYear,string,
incidentNumberWithoutYear,int,
offenseServiceNumber,string,
serviceNumberID,string,
watch,string,
call911Problem,string,
typeOfIncident,string,
penaltyClass,string,
typeOfLocation,string,
typeOfProperty,string,


In [35]:
%sql
--Describe RobberyPhiladelphia

Create or Replace temporary View RobberiesByMonthDallas AS
Select month(startingDateTime) month, Count(*) robberies
From RobberyDallas
Group by month(startingDateTime);

Select * From RobberiesByMonthDallas

month,robberies
12,664
1,743
6,495
3,412
5,615
9,512
4,594
8,627
7,535
10,603


In [36]:
# TEST - Run this cell to test your solution.

rows = spark.sql("SELECT month, robberies FROM RobberiesByMonthDallas ORDER BY month").collect()
dallas = [ f"{r[0]}: {r[1]}" for r in rows ]

dbTest("SQL-L7-robberyByMonthDAL-counts-1", "1: 743", dallas[0])
dbTest("SQL-L7-robberyByMonthDAL-counts-2", "2: 435", dallas[1])
dbTest("SQL-L7-robberyByMonthDAL-counts-3", "3: 412", dallas[2])
dbTest("SQL-L7-robberyByMonthDAL-counts-4", "4: 594", dallas[3])
dbTest("SQL-L7-robberyByMonthDAL-counts-5", "5: 615", dallas[4])
dbTest("SQL-L7-robberyByMonthDAL-counts-6", "6: 495", dallas[5])
dbTest("SQL-L7-robberyByMonthDAL-counts-7", "7: 535", dallas[6])
dbTest("SQL-L7-robberyByMonthDAL-counts-8", "8: 627", dallas[7])
dbTest("SQL-L7-robberyByMonthDAL-counts-9", "9: 512", dallas[8])
dbTest("SQL-L7-robberyByMonthDAL-counts-10", "10: 603", dallas[9])
dbTest("SQL-L7-robberyByMonthDAL-counts-11", "11: 589", dallas[10])
dbTest("SQL-L7-robberyByMonthDAL-counts-12", "12: 664", dallas[11])

print("Tests passed!")

-sandbox

## Step 4

Plot the robberies per month for each of your three cities, producing a plot similar to the following:

<img src="https://files.training.databricks.com/images/eLearning/robberies-by-month.png" style="max-width: 700px; border: 1px solid #aaaaaa; border-radius: 10px 10px 10px 10px"/>

When you first run your cell, you'll get an HTML table as the result. To configure the plot,

1. Click the graph button
2. If the plot doesn't look correct, click the **Plot Options** button
3. Configure the plot similar to the following example

<img src="https://files.training.databricks.com/images/eLearning/capstone-plot-1.png" style="width: 440px; margin: 10px; border: 1px solid #aaaaaa; border-radius: 10px 10px 10px 10px"/>
<img src="https://files.training.databricks.com/images/eLearning/capstone-plot-2.png" style="width: 268px; margin: 10px; border: 1px solid #aaaaaa; border-radius: 10px 10px 10px 10px"/>
<img src="https://files.training.databricks.com/images/eLearning/capstone-plot-3.png" style="width: 362px; margin: 10px; border: 1px solid #aaaaaa; border-radius: 10px 10px 10px 10px"/>

#### Los Angeles

In [39]:
%sql
Select * From RobberiesByMonthLosAngeles

month,robberies
12,853
1,719
6,698
3,709
5,790
9,722
4,713
8,765
7,826
10,814


#### Philadelphia

In [41]:
%sql
Select * From RobberiesByMonthPhiladelphia

month,robberies
12,544
1,520
6,509
3,432
5,533
9,514
4,466
8,561
7,537
10,572


#### Dallas

In [43]:
%sql
Select * From RobberiesByMonthDallas

month,robberies
12,664
1,743
6,495
3,412
5,615
9,512
4,594
8,627
7,535
10,603


-sandbox
## Step 5

Create another temporary view called `CombinedRobberiesByMonth`, that combines all three robberies-per-month views into one.
In creating this view, add a new column called `city`, that identifies the city associated with each row.
The final view will have the following columns:

* `city`: The name of the city associated with the row (Use the strings "Los Angeles", "Philadelphia", and "Dallas".)
* `month`: The month number associated with the row
* `robbery`: The number of robbery in that month (for that city)

<img alt="Hint" title="Hint" style="vertical-align: text-bottom; position: relative; height:1.75em; top:0.3em" src="https://files.training.databricks.com/static/images/icon-light-bulb.svg"/>&nbsp;**Hint:** You may want to use `UNION` in this example to combine the three datasets.

<img alt="Hint" title="Hint" style="vertical-align: text-bottom; position: relative; height:1.75em; top:0.3em" src="https://files.training.databricks.com/static/images/icon-light-bulb.svg"/>&nbsp;**Hint:** In Databricks, all table schemas are immutable and therefore standard SQL commands such as `ALTER…ADD` and `UPDATE…SET` do not work for adding the new "city" column. 

Instead, new columns can be added by simply naming them in the `SELECT` statement within the `CREATE OR REPLACE TEMPORARY VIEW` statement.

In [45]:
%sql
CREATE OR REPLACE TEMPORARY VIEW CombinedRobberiesByMonth AS
  SELECT 'Dallas' city, * FROM RobberiesByMonthDallas
    UNION ALL
  SELECT 'Los Angeles' city, * FROM RobberiesByMonthLosAngeles
      UNION ALL
  SELECT 'Philadelphia' city, * FROM RobberiesByMonthPhiladelphia


In [46]:
# TEST - Run this cell to test your solution.

rows = spark.sql("SELECT concat(city,'|',month,'|',robberies) FROM CombinedRobberiesByMonth order by robberies, month").collect()

dbTest("SQL-L7-combinedRobberiesByMonth-counts-0",  "Dallas|3|412",  rows[0][0])
dbTest("SQL-L7-combinedRobberiesByMonth-counts-10", "Philadelphia|5|533", rows[10][0])
dbTest("SQL-L7-combinedRobberiesByMonth-counts-20", "Dallas|5|615", rows[20][0])

print("Tests passed!")

-sandbox
## Step 6

Graph the contents of `CombinedRobberiesByMonth`, producing a graph similar to the following. (The diagram below deliberately
uses different data.)

<img src="https://files.training.databricks.com/images/eLearning/combined-homicides.png" style="width: 800px; border: 1px solid #aaaaaa; border-radius: 10px 10px 10px 10px"/>

Adjust the plot options to configure the plot properly, as shown below:

<img src="https://files.training.databricks.com/images/eLearning/capstone-plot-4.png" style="width: 362px; margin: 10px; border: 1px solid #aaaaaa; border-radius: 10px 10px 10px 10px"/>

<img alt="Hint" title="Hint" style="vertical-align: text-bottom; position: relative; height:1.75em; top:0.3em" src="https://files.training.databricks.com/static/images/icon-light-bulb.svg"/>&nbsp;**Hint:** Order your results by `month`, then `city`.

In [48]:
%sql
Select * From CombinedRobberiesByMonth Order by month

city,month,robberies
Philadelphia,1,520
Dallas,1,743
Los Angeles,1,719
Philadelphia,2,416
Los Angeles,2,675
Dallas,2,435
Dallas,3,412
Los Angeles,3,709
Philadelphia,3,432
Philadelphia,4,466


## Step 7

While the above graph is interesting, it's flawed: it's comparing the raw numbers of robberies, not the per capita robbery rates.

The table (already created) called `CityData`  contains, among other data, estimated 2016 population values for all United States cities
with populations of at least 100,000. (The data is from [Wikipedia](https://en.wikipedia.org/wiki/List_of_United_States_cities_by_population).)

* Use the population values in that table to normalize the robberies so they represent per-capita values (i.e. total robberies divided by population)
* Save your results in a temporary view called `RobberyRatesByCity`
* The robbery rate value must be stored in a new column, `robberyRate`

Next, graph the results, as above.

In [50]:
%sql
Select * from CityData

rankIn2016,state,stateAbbrev,population2010,estPopulation2016,city
1,New York,NY,8175133,8537673,New York
2,California,CA,3792621,3976322,Los Angeles
3,Illinois,IL,2695598,2704958,Chicago
4,Texas,TX,2100263,2303482,Houston
5,Arizona,AZ,1445632,1615017,Phoenix
6,Pennsylvania,PA,1526006,1567872,Philadelphia
7,Texas,TX,1327407,1492510,San Antonio
8,California,CA,1307402,1406630,San Diego
9,Texas,TX,1197816,1317929,Dallas
10,California,CA,945942,1025350,San Jose


In [51]:
%sql
--CREATE OR REPLACE TEMPORARY VIEW  RobberyRatesByCity AS
Select  CombinedRobberiesByMonth.city, month, Robberies/estPopulation2016 AS robberyRate
From CombinedRobberiesByMonth Left Join CityData
On CombinedRobberiesByMonth.city = CityData.city
  Order by month

city,month,robberyRate
Dallas,1,0.0005637632983263893
Los Angeles,1,0.00018082036615746917
Philadelphia,1,0.0003316597273246796
Philadelphia,2,0.00026532778185974366
Los Angeles,2,0.00016975486391695643
Dallas,2,0.00033006330386538274
Philadelphia,3,0.0002755326965466569
Los Angeles,3,0.00017830547928462533
Dallas,3,0.0003126116809023855
Dallas,4,0.0004507071321747985


In [52]:
%sql
SELECT concat(city,'|',month,'|',cast(robberyRate*10000000 as int)) FROM RobberyRatesByCity order by robberyRate, month

"concat(city, |, CAST(month AS STRING), |, CAST(CAST((robberyRate * CAST(10000000 AS DOUBLE)) AS INT) AS STRING))"
Los Angeles|2|1697
Los Angeles|6|1755
Los Angeles|3|1783
Los Angeles|4|1793
Los Angeles|1|1808
Los Angeles|9|1815
Los Angeles|11|1921
Los Angeles|8|1923
Los Angeles|5|1986
Los Angeles|10|2047


In [53]:
# TEST - Run this cell to test your solution.

rows = spark.sql("SELECT concat(city,'|',month,'|',cast(robberyRate*10000000 as int)) FROM RobberyRatesByCity order by robberyRate, month").collect()

dbTest("SQL-L7-roberryRatesByCity-counts-0",  "Los Angeles|2|1697", rows[0][0])
dbTest("SQL-L7-roberryRatesByCity-counts-10", "Los Angeles|7|2077", rows[10][0])
dbTest("SQL-L7-roberryRatesByCity-counts-20", "Philadelphia|5|3399", rows[20][0])

print("Tests passed!")

## ![Spark Logo Tiny](https://files.training.databricks.com/images/105/logo_spark_tiny.png) Classroom-Cleanup<br>

Run the **`Classroom-Cleanup`** cell below to remove any artifacts created by this lesson.

In [55]:
%run ./Includes/Classroom-Cleanup

<h2><img src="https://files.training.databricks.com/images/105/logo_spark_tiny.png"> All done!</h2>

Thank you for your participation!

## References

The crime data used in this notebook comes from the following locations:

| City          | Original Data 
| ------------- | -------------
| Boston        | <a href="https://data.boston.gov/group/public-safety" target="_blank">https&#58;//data.boston.gov/group/public-safety</a>
| Chicago       | <a href="https://data.cityofchicago.org/Public-Safety/Crimes-2001-to-present/ijzp-q8t2" target="_blank">https&#58;//data.cityofchicago.org/Public-Safety/Crimes-2001-to-present/ijzp-q8t2</a>
| Dallas        | <a href="https://www.dallasopendata.com/Public-Safety/Police-Incidents/tbnj-w5hb/data" target="_blank">https&#58;//www.dallasopendata.com/Public-Safety/Police-Incidents/tbnj-w5hb/data</a>
| Los Angeles   | <a href="https://data.lacity.org/A-Safe-City/Crime-Data-From-2010-to-Present/y8tr-7khq" target="_blank">https&#58;//data.lacity.org/A-Safe-City/Crime-Data-From-2010-to-Present/y8tr-7khq</a>
| New Orleans   | <a href="https://data.nola.gov/Public-Safety-and-Preparedness/Electronic-Police-Report-2016/4gc2-25he/data" target="_blank">https&#58;//data.nola.gov/Public-Safety-and-Preparedness/Electronic-Police-Report-2016/4gc2-25he/data</a>
| New York      | <a href="https://data.cityofnewyork.us/Public-Safety/NYPD-Complaint-Data-Historic/qgea-i56i" target="_blank">https&#58;//data.cityofnewyork.us/Public-Safety/NYPD-Complaint-Data-Historic/qgea-i56i</a>
| Philadelphia  | <a href="https://www.opendataphilly.org/dataset/crime-incidents" target="_blank">https&#58;//www.opendataphilly.org/dataset/crime-incidents</a>

-sandbox
&copy; 2020 Databricks, Inc. All rights reserved.<br/>
Apache, Apache Spark, Spark and the Spark logo are trademarks of the <a href="http://www.apache.org/">Apache Software Foundation</a>.<br/>
<br/>
<a href="https://databricks.com/privacy-policy">Privacy Policy</a> | <a href="https://databricks.com/terms-of-use">Terms of Use</a> | <a href="http://help.databricks.com/">Support</a>