This is an example of correcting a bad field using Pandas for later use in GIS.  The <a href="https://data.baltimorecity.gov/Public-Safety/911-Police-Calls-for-Service/xviu-ezkt" target="_blank">Baltimore 911 call data</a> includes a location field that combines an address and a latitude/longitude.  We can use Pandas to help us split the field correctly.

In [1]:
import pandas as pd

In [2]:
df = pd.read_csv('911_Police_Calls_for_Service.csv',nrows=10000)

In [3]:
# How many rows are there?
print(len(df))

10000


In [4]:
# Inspect the dataframe
df.head()

Unnamed: 0,recordId,callDateTime,priority,district,description,callNumber,incidentLocation,location
0,1423624,05/04/2016 09:58:00 PM,High,ND,SILENT ALARM,P161253035,400 WINSTON AV,"400 WINSTON AV\nBALTIMORE, MD\n(39.349792, -76..."
1,1402097,04/27/2016 03:57:00 PM,Medium,SW,911/HANGUP,P161182081,1400 BRADDISH AV,"1400 BRADDISH AV\nBALTIMORE, MD\n(39.303941, -..."
2,1420176,05/03/2016 06:40:00 PM,Medium,ED,DISORDERLY,P161242705,200 E NORTH AV,"200 E NORTH AV\nBALTIMORE, MD\n(39.311294, -76..."
3,1423653,05/04/2016 10:10:00 PM,Medium,NE,911/NO VOICE,P161253068,2500-1 HARFORD RD,"2500 1 HARFORD RD\nBALTIMORE, MD\n(39.316763, ..."
4,1417949,05/03/2016 12:29:00 AM,Non-Emergency,SD,Private Tow,P161240063,100 W PATAPSCO AV,"100 W PATAPSCO AV\nBALTIMORE, MD\n(39.239215, ..."


We'll fix the bad location column in the blocks below.  The field contains a number of problems.  It's generally an address with an attached latitude longitude.  The "\n" is a newline character, and the lat/lon is placed in parentheses.

In [5]:
# Create a temporary series object from the malformed "location" column, starting the split at the left parentheses
location = df['location'].str.rsplit('(',n=1,expand=True)[1]
# Replace the right parentheses with nothing (i.e., remove right parentheses)
location = location.str.replace(')','')
# Split again on the comma to get lat and lon, yielding a dataframe
location = location.str.split(',',expand=True)

In [6]:
# Now we can assign a new field of just the latitude and longitudes

df['latitude'] = location.loc[:,0]
df['longitude'] = location.loc[:,1]

In [7]:
# And inspect the result
df.head()

Unnamed: 0,recordId,callDateTime,priority,district,description,callNumber,incidentLocation,location,latitude,longitude
0,1423624,05/04/2016 09:58:00 PM,High,ND,SILENT ALARM,P161253035,400 WINSTON AV,"400 WINSTON AV\nBALTIMORE, MD\n(39.349792, -76...",39.349792,-76.613468
1,1402097,04/27/2016 03:57:00 PM,Medium,SW,911/HANGUP,P161182081,1400 BRADDISH AV,"1400 BRADDISH AV\nBALTIMORE, MD\n(39.303941, -...",39.303941,-76.66084
2,1420176,05/03/2016 06:40:00 PM,Medium,ED,DISORDERLY,P161242705,200 E NORTH AV,"200 E NORTH AV\nBALTIMORE, MD\n(39.311294, -76...",39.311294,-76.612806
3,1423653,05/04/2016 10:10:00 PM,Medium,NE,911/NO VOICE,P161253068,2500-1 HARFORD RD,"2500 1 HARFORD RD\nBALTIMORE, MD\n(39.316763, ...",39.316763,-76.595269
4,1417949,05/03/2016 12:29:00 AM,Non-Emergency,SD,Private Tow,P161240063,100 W PATAPSCO AV,"100 W PATAPSCO AV\nBALTIMORE, MD\n(39.239215, ...",39.239215,-76.61151


In [8]:
# And finally save it back out

df.to_csv('baltimore_911_10k.csv',index=False)