# Map Creation Code

## Data Preprocessing

First of all we read the CSV file into the pandas dataframe (df).

In [3]:
import pandas as pd
import numpy as np
from arcgis.gis import *
from ast import literal_eval

df = pd.read_csv('../Crime_Data_from_2010_to_Present.csv')
df

Unnamed: 0,DR Number,Date Reported,Date Occurred,Time Occurred,Area ID,Area Name,Reporting District,Crime Code,Crime Code Description,MO Codes,...,Weapon Description,Status Code,Status Description,Crime Code 1,Crime Code 2,Crime Code 3,Crime Code 4,Address,Cross Street,Location
0,1208575,03/14/2013,03/11/2013,1800,12,77th Street,1241,626,INTIMATE PARTNER - SIMPLE ASSAULT,0416 0446 1243 2000,...,"STRONG-ARM (HANDS, FIST, FEET OR BODILY FORCE)",AO,Adult Other,626.0,,,,6300 BRYNHURST AV,,"(33.9829, -118.3338)"
1,102005556,01/25/2010,01/22/2010,2300,20,Olympic,2071,510,VEHICLE - STOLEN,,...,,IC,Invest Cont,510.0,,,,VAN NESS,15TH,"(34.0454, -118.3157)"
2,418,03/19/2013,03/18/2013,2030,18,Southeast,1823,510,VEHICLE - STOLEN,,...,,IC,Invest Cont,510.0,,,,200 E 104TH ST,,"(33.942, -118.2717)"
3,101822289,11/11/2010,11/10/2010,1800,18,Southeast,1803,510,VEHICLE - STOLEN,,...,,IC,Invest Cont,510.0,,,,88TH,WALL,"(33.9572, -118.2717)"
4,42104479,01/11/2014,01/04/2014,2300,21,Topanga,2133,745,VANDALISM - MISDEAMEANOR ($399 OR UNDER),0329,...,,IC,Invest Cont,745.0,,,,7200 CIRRUS WY,,"(34.2009, -118.6369)"
5,120125367,01/08/2013,01/08/2013,1400,1,Central,111,110,CRIMINAL HOMICIDE,1243 2000 1813 1814 2002 0416 0400,...,"STRONG-ARM (HANDS, FIST, FEET OR BODILY FORCE)",AA,Adult Arrest,110.0,,,,600 N HILL ST,,"(34.0591, -118.2412)"
6,101105609,01/28/2010,01/27/2010,2230,11,Northeast,1125,510,VEHICLE - STOLEN,,...,,IC,Invest Cont,510.0,,,,YORK,AVENUE 51,"(34.1211, -118.2048)"
7,101620051,11/11/2010,11/07/2010,1600,16,Foothill,1641,510,VEHICLE - STOLEN,,...,,IC,Invest Cont,510.0,,,,EL DORADO,TRUESDALE,"(34.241, -118.3987)"
8,101910498,04/07/2010,04/07/2010,1600,19,Mission,1902,510,VEHICLE - STOLEN,,...,,IC,Invest Cont,510.0,,,,GLENOAKS,DRELL,"(34.3147, -118.4589)"
9,120908292,03/29/2013,01/15/2013,800,9,Van Nuys,904,668,"EMBEZZLEMENT, GRAND THEFT ($950.01 & OVER)",0344 1300,...,,IC,Invest Cont,668.0,,,,7200 SEPULVEDA BL,,"(34.2012, -118.4662)"


The data in the location column is parsed as a string. The location column has two types of impurities. It has nan values of the type float and strings '(0, 0)' which also do not contain any true information. We fetch indices for the values of nan in the location column.

In [4]:
temp = [p for p in df['Location ']]
badentries = []
for i in range(len(temp)):
    if type(temp[i]) != str:
        badentries.append(i)
badentries #ones with location nan

[659026, 736132, 1473914, 1532070, 1532072, 1532073, 1532086, 1532087, 1532089]

There are 9 rows with nan values in the location column.We drop these rows with these indices and pass this new dataframe as df1 and verify that non-strings are truly gone from the dataset.

In [5]:
df1 = df.drop(df.index[badentries])
temp = [p for p in df1['Location ']]
for i in range(len(temp)):
    if type(temp[i]) != str:
        print(i)

We apply literal_eval to the location column to convert strings to tuples

In [6]:
df1['Location '] = df1['Location '].apply(literal_eval) 

We fetch the (0, 0) rows and drop them as well. There are 5869 rows with location (0, 0).

In [7]:
grp =df.groupby('Location ')
Nan = grp.get_group('(0, 0)')# "No location"
Nan.count()
df1 = df1.drop(Nan.index)

In [8]:
df1

Unnamed: 0,DR Number,Date Reported,Date Occurred,Time Occurred,Area ID,Area Name,Reporting District,Crime Code,Crime Code Description,MO Codes,...,Weapon Description,Status Code,Status Description,Crime Code 1,Crime Code 2,Crime Code 3,Crime Code 4,Address,Cross Street,Location
0,1208575,03/14/2013,03/11/2013,1800,12,77th Street,1241,626,INTIMATE PARTNER - SIMPLE ASSAULT,0416 0446 1243 2000,...,"STRONG-ARM (HANDS, FIST, FEET OR BODILY FORCE)",AO,Adult Other,626.0,,,,6300 BRYNHURST AV,,"(33.9829, -118.3338)"
1,102005556,01/25/2010,01/22/2010,2300,20,Olympic,2071,510,VEHICLE - STOLEN,,...,,IC,Invest Cont,510.0,,,,VAN NESS,15TH,"(34.0454, -118.3157)"
2,418,03/19/2013,03/18/2013,2030,18,Southeast,1823,510,VEHICLE - STOLEN,,...,,IC,Invest Cont,510.0,,,,200 E 104TH ST,,"(33.942, -118.2717)"
3,101822289,11/11/2010,11/10/2010,1800,18,Southeast,1803,510,VEHICLE - STOLEN,,...,,IC,Invest Cont,510.0,,,,88TH,WALL,"(33.9572, -118.2717)"
4,42104479,01/11/2014,01/04/2014,2300,21,Topanga,2133,745,VANDALISM - MISDEAMEANOR ($399 OR UNDER),0329,...,,IC,Invest Cont,745.0,,,,7200 CIRRUS WY,,"(34.2009, -118.6369)"
5,120125367,01/08/2013,01/08/2013,1400,1,Central,111,110,CRIMINAL HOMICIDE,1243 2000 1813 1814 2002 0416 0400,...,"STRONG-ARM (HANDS, FIST, FEET OR BODILY FORCE)",AA,Adult Arrest,110.0,,,,600 N HILL ST,,"(34.0591, -118.2412)"
6,101105609,01/28/2010,01/27/2010,2230,11,Northeast,1125,510,VEHICLE - STOLEN,,...,,IC,Invest Cont,510.0,,,,YORK,AVENUE 51,"(34.1211, -118.2048)"
7,101620051,11/11/2010,11/07/2010,1600,16,Foothill,1641,510,VEHICLE - STOLEN,,...,,IC,Invest Cont,510.0,,,,EL DORADO,TRUESDALE,"(34.241, -118.3987)"
8,101910498,04/07/2010,04/07/2010,1600,19,Mission,1902,510,VEHICLE - STOLEN,,...,,IC,Invest Cont,510.0,,,,GLENOAKS,DRELL,"(34.3147, -118.4589)"
9,120908292,03/29/2013,01/15/2013,800,9,Van Nuys,904,668,"EMBEZZLEMENT, GRAND THEFT ($950.01 & OVER)",0344 1300,...,,IC,Invest Cont,668.0,,,,7200 SEPULVEDA BL,,"(34.2012, -118.4662)"


We split the location tuple into two columns x and y, where x is longitude and y is latitude. Notice, that these columns are flipped in the data. We delete the location column after these operations.

In [18]:
new_col_list = ['y','x']
for n,col in enumerate(new_col_list):
    df1[col] = df1['Location '].apply(lambda location: location[n])
#del df1['Location ']

KeyError: 0

In [10]:
df1.columns

Index(['DR Number', 'Date Reported', 'Date Occurred', 'Time Occurred',
       'Area ID', 'Area Name', 'Reporting District', 'Crime Code',
       'Crime Code Description', 'MO Codes', 'Victim Age', 'Victim Sex',
       'Victim Descent', 'Premise Code', 'Premise Description',
       'Weapon Used Code', 'Weapon Description', 'Status Code',
       'Status Description', 'Crime Code 1', 'Crime Code 2', 'Crime Code 3',
       'Crime Code 4', 'Address', 'Cross Street', 'Location ', 'y', 'x'],
      dtype='object')

In [17]:
df1 = df1.groupby('Location ')
df1.count()

Unnamed: 0_level_0,DR Number,Date Reported,Date Occurred,Time Occurred,Area ID,Area Name,Reporting District,Crime Code,Crime Code Description,MO Codes,...,Status Code,Status Description,Crime Code 1,Crime Code 2,Crime Code 3,Crime Code 4,Address,Cross Street,y,x
Location,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
"(33.3427, -118.3258)",6,6,6,6,6,6,6,6,6,6,...,6,6,6,0,0,0,6,6,6,6
"(33.7058, -118.2906)",1,1,1,1,1,1,1,1,1,1,...,1,1,1,0,0,0,1,1,1,1
"(33.706, -118.2898)",5,5,5,5,5,5,5,5,5,5,...,5,5,5,0,0,0,5,0,5,5
"(33.7062, -118.2917)",28,28,28,28,28,28,28,28,28,24,...,28,28,28,3,0,0,28,0,28,28
"(33.7065, -118.2928)",30,30,30,30,30,30,30,30,30,27,...,30,30,30,5,1,0,30,30,30,30
"(33.7065, -118.2879)",3,3,3,3,3,3,3,3,3,2,...,3,3,3,0,0,0,3,1,3,3
"(33.7067, -118.2879)",3,3,3,3,3,3,3,3,3,2,...,3,3,3,0,0,0,3,3,3,3
"(33.7068, -118.2879)",1,1,1,1,1,1,1,1,1,1,...,1,1,1,0,0,0,1,0,1,1
"(33.707, -118.2939)",39,39,39,39,39,39,39,39,39,39,...,39,39,39,1,0,0,39,4,39,39
"(33.707, -118.2934)",1,1,1,1,1,1,1,1,1,1,...,1,1,1,0,0,0,1,0,1,1


## ArcGis Import

Our data is finally free of impurities, and we are ready to upload all of this data to the ArcGIS server. However, it did require us to register as an ArcGIS developer and also create credentials that are associated with us. For that reason, we log in as an anonymous user for the purposes of showing the method, and leave the cells that require us to use creadentials uncompliled.

We log into https://www.arcgis.com and create a map object m1.

In [12]:
from arcgis.gis import *
from arcgis import SpatialDataFrame
import pandas as pd

gis = GIS("https://www.arcgis.com", "yessenbayev", "ece180final")
m1 = gis.map('Los Angeles')

Due to the limitations of the free developer ArcGIS account we only upload the 'DR Number' and 'Crime Code Description' columns.

We create the shape column, which holds ArcGIS's native object Geometry in each of the rows. This column will tell ArcGIS the coordinates to plot the data.

In [13]:
df2 = pd.DataFrame({'DR Number':df1['DR Number'],
                    'Crime Code Description': df1["Crime Code Description"],
                    'SHAPE':[0]*len(df1['DR Number'])})
df2['SHAPE'] = df1.apply(lambda row : arcgis.geometry.Geometry({'x': row['x'], 'y': row['y']}), axis=1 )

In the end, we pass the truncated dataframe df2 to the SpatialDataFrame constructor (sdf). This object will be uploaded to ArcGIS and this method is the only way to upload a dataset of this size.

In [14]:
sdf = SpatialDataFrame(df2)
layer = gis.content.import_data(sdf,"LA Crime 2010-2017")
m1.add_layer(layer)
m1

Exception: SpatialDataFrame's must have either pyshp or arcpy available to use import_data

In [None]:
layer

The layer is already uploaded to ArcGIS. The link below will lead you to the map.

https://yessenbayev.maps.arcgis.com/home/webmap/viewer.html?layers=e6fea7082d50465a92b8df060ba4a134