Geocoding (converting a physical address or location into latitude/longitude) and reverse geocoding (converting a lat/long to a physical address or location) are common tasks when working with geo-data.

Python offers a number of packages to make the task incredibly easy. In the tutorial below, I use pygeocoder, a wrapper for Google's geo-API, to both geocode and reverse geocode.

## Preliminaries

First we want to load the packages we will want to use in the script. Specifically, I am loading pygeocoder for its geo-functionality, pandas for its dataframe structures, and numpy for its missing value (np.nan) functionality.

In [19]:
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity="all"

In [2]:
# Load packages
from pygeocoder import Geocoder
import pandas as pd
import numpy as np

## Create some simulated geo data

Geo-data comes in a wide variety of forms, in this case we have a Python dictionary of five latitude and longitude strings, with each coordinate in a coordinate pair separated by a comma.

In [3]:
# Create a dictionary of raw data
data = {'Site 1': '31.336968, -109.560959',
        'Site 2': '31.347745, -108.229963',
        'Site 3': '32.277621, -107.734724',
        'Site 4': '31.655494, -106.420484',
        'Site 5': '30.295053, -104.014528'}

While technically unnecessary, because I originally come from R, I am a big fan of dataframes, so let us turn the dictionary of simulated data into a dataframe.

In [32]:
# Convert the dictionary into a pandas dataframe
df = pd.DataFrame.from_dict(data, orient='index')

In [14]:
# View the dataframe
df

Unnamed: 0,0
Site 1,"31.336968, -109.560959"
Site 2,"31.347745, -108.229963"
Site 3,"32.277621, -107.734724"
Site 4,"31.655494, -106.420484"
Site 5,"30.295053, -104.014528"


In [52]:
# df[0].apply(lambda x:float(x.split(',')[0]))
# df[0].apply(lambda x:float(x.split(',')[1]))
# df[0].apply(lambda x:x.split(',')).apply(pd.Series).applymap(lambda x:float(x)).rename(columns=lambda x:'lat' if x==0 else 'long')
df[0].str.split(',',expand=True)

Unnamed: 0,0,1
Site 1,31.336968,-109.560959
Site 2,31.347745,-108.229963
Site 3,32.277621,-107.734724
Site 4,31.655494,-106.420484
Site 5,30.295053,-104.014528


You can see now that we have a a dataframe with five rows, with each now containing a string of latitude and longitude. Before we can work with the data, we'll need to 1) seperate the strings into latitude and longitude and 2) convert them into floats. The function below does just that.

In [53]:
# Create two lists for the loop results to be placed
lat = []
lon = []

# For each row in a varible,
for row in df[0]:
    # Try to,
    try:
        # Split the row by comma, convert to float, and append
        # everything before the comma to lat
        lat.append(float(row.split(',')[0]))
        # Split the row by comma, convert to float, and append
        # everything after the comma to lon
        lon.append(float(row.split(',')[1]))
    # But if you get an error
    except:
        # append a missing value to lat
        lat.append(np.NaN)
        # append a missing value to lon
        lon.append(np.NaN)

# Create two new columns from lat and lon
df['latitude'] = lat
df['longitude'] = lon

Let's take a took a what we have now.

In [54]:
# View the dataframe
df

Unnamed: 0,0,latitude,longitude
Site 1,"31.336968, -109.560959",31.336968,-109.560959
Site 2,"31.347745, -108.229963",31.347745,-108.229963
Site 3,"32.277621, -107.734724",32.277621,-107.734724
Site 4,"31.655494, -106.420484",31.655494,-106.420484
Site 5,"30.295053, -104.014528",30.295053,-104.014528


Awesome. This is exactly what we want to see, one column of floats for latitude and one column of floats for longitude.

## Reverse Geocoding

To reverse geocode, we feed a specific latitude and longitude pair, in this case the first row (indexed as '0') into pygeocoder's reverse_geocoder function. 

In [59]:
from geopy.geocoders import Nominatim
geolocator = Nominatim()

  


In [85]:
rs=geolocator.reverse(df['latitude'][0], df['longitude'][0])
rs.raw
rs.point
rs.raw.get('address').get('city')
rs.raw.get('address').get('country')

{'place_id': 199027552,
 'licence': 'Data © OpenStreetMap contributors, ODbL 1.0. https://osm.org/copyright',
 'osm_type': 'relation',
 'osm_id': 6533987,
 'lat': '31.70572305',
 'lon': '0.312208508128401',
 'display_name': 'Lebnoud, daïra El Abiodh Sidi Cheikh, El Bayadh البيض, ⴷⵣⴰⵢⵔ الجزائر',
 'address': {'city': 'Lebnoud',
  'county': 'daïra El Abiodh Sidi Cheikh',
  'state': 'El Bayadh البيض',
  'country': 'ⴷⵣⴰⵢⵔ الجزائر',
  'country_code': 'dz'},
 'boundingbox': ['30.701657', '32.710731', '-0.415873', '1.101816']}

Point(31.70572305, 0.312208508128401, 0.0)

'Lebnoud'

'ⴷⵣⴰⵢⵔ الجزائر'

In [56]:
# Convert longitude and latitude to a location
results = Geocoder.reverse_geocode(df['latitude'][0], df['longitude'][0])

GeocoderError: Error REQUEST_DENIED
Query: https://maps.google.com/maps/api/geocode/json?latlng=31.336968%2C-109.560959&sensor=false&bounds=&region=&language=

Now we can take can start pulling out the data that we want.

In [57]:
# Print the lat/long
results.coordinates

NameError: name 'results' is not defined

In [None]:
# Print the city
results.city

In [None]:
# Print the country
results.country

In [None]:
# Print the street address (if applicable)
results.street_address

In [None]:
# Print the admin1 level
results.administrative_area_level_1

## Geocoding

For geocoding, we need to submit a string containing an address or location (such as a city) into the geocode function. However, not all strings are formatted in a way that Google's geo-API can make sense of them. We can text if an input is valid by using the .geocode().valid_address function.

In [61]:
geolocator.geocode("4207 N Washington Ave, Douglas, AZ 85607")

Location(Washington Avenue, Douglas, Cochise County, Arizona, 85607-6261, USA, (31.3786809, -109.5283535, 0.0))

In [58]:
# Verify that an address is valid (i.e. in Google's system)
Geocoder.geocode("4207 N Washington Ave, Douglas, AZ 85607").valid_address

GeocoderError: Error REQUEST_DENIED
Query: https://maps.google.com/maps/api/geocode/json?address=4207+N+Washington+Ave%2C+Douglas%2C+AZ+85607&sensor=false&bounds=&region=&language=&components=

Because the output was True, we now know that this is a valid address and thus can print the latitude and longitude coordinates.

In [None]:
# Print the lat/long
results.coordinates

But even more interesting, once the address is processed by the Google geo API, we can parse it and easily separate street numbers, street names, etc. 

In [None]:
# Find the lat/long of a certain address
result = Geocoder.geocode("7250 South Tucson Boulevard, Tucson, AZ 85756")

In [None]:
# Print the street number
result.street_number

In [None]:
# Print the street name
result.route

And there you have it. Python makes this entire process easy and inserting it into an analysis only takes a few minutes. Good luck!