# Geocoding
__Geocoding__ is the process of converting addresses to longitude and latitude coordinates. __Reverse geocoding__ is the inverse operation.

The package `geopy` helps do both these things.

> ⚠️ You need an internect connection for this since geopy retrieves the data from the internet.

## Set up

In [1]:
import sys
sys.path.append('/usr/local/lib/python3.9/site-packages')
    # this is because my Jupyter installation doesn't look in this directory for packages & it is were they are kept

In [2]:
import os
os.listdir()

['supermarkets.json',
 'supermarkets.csv',
 'data.csv',
 'How to jupyter & pandas.ipynb',
 'supermarkets-semi-colons.txt',
 'supermarkets.xlsx',
 '.ipynb_checkpoints',
 'Geocoding addresses.ipynb',
 'supermarkets-commas.txt']

## Testing geocoding

In [3]:
from geopy.geocoders import ArcGIS
nom = ArcGIS()

We can test this out with an address fed in manually, this example is just copy and pasted from the table above. It will return the full address as well as a tuple with the longitude and latitude. 

In [4]:
n = nom.geocode( "3666 21st St, San Francisco, CA 94114" )
n

Location(3666 21st St, San Francisco, California, 94114, (37.756648011392286, -122.42937496976432, 0.0))

Printing this address prints only the verbatim address and not the latitude and longitude tuple. To print these they need to be selected:

In [5]:
print(     'This is what is printed when you select n only  :  ', n, 
       '\n\nThis is what is printed when specifying n[1]    :  ', n[1] )

This is what is printed when you select n only  :   3666 21st St, San Francisco, California, 94114 

This is what is printed when specifying n[1]    :   (37.756648011392286, -122.42937496976432)


If the address doesn't exist, it will return `None`:

In [6]:
print( nom.geocode( "añlkdsfhlajks" ) )

None


The best way to extract the latitude and longitude is to use the built in methods:

In [7]:
n.longitude        # same can be done for n.latitude

-122.42937496976432

This is because `n` is a special type of object:

In [8]:
type(n)

geopy.location.Location

## Manipulating the dataframe to make it easily accessible to `geopy`

We will be editing the `Address` column that we can then feed straight into the `geocode` method to return a latitude and longitude. At the moment this data is spread out over three columns: `Address`, `City` and `State`. 

First, we import the data.

In [9]:
import pandas

data = pandas.read_csv("supermarkets.csv")
data = data.set_index( "ID" ) 
data

Unnamed: 0_level_0,Address,City,State,Country,Name,Employees
ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
1,3666 21st St,San Francisco,CA 94114,USA,Madeira,8
2,735 Dolores St,San Francisco,CA 94119,USA,Bready Shop,15
3,332 Hill St,San Francisco,California 94114,USA,Super River,25
4,3995 23rd St,San Francisco,CA 94114,USA,Ben's Shop,10
5,1056 Sanchez St,San Francisco,California,USA,Sanchez,12
6,551 Alvarado St,San Francisco,CA 94114,USA,Richvalley,20


In [10]:
data[ "Address" ] = data[ "Address" ] + ', ' + data[ "City" ] + ', ' + data[ "State" ] + ", " + data[ "Country" ]

# we can now drop all the other columns since they've been integrated
data = data.drop( columns = [ "City", "State", "Country" ] )
data

Unnamed: 0_level_0,Address,Name,Employees
ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1,"3666 21st St, San Francisco, CA 94114, USA",Madeira,8
2,"735 Dolores St, San Francisco, CA 94119, USA",Bready Shop,15
3,"332 Hill St, San Francisco, California 94114, USA",Super River,25
4,"3995 23rd St, San Francisco, CA 94114, USA",Ben's Shop,10
5,"1056 Sanchez St, San Francisco, California, USA",Sanchez,12
6,"551 Alvarado St, San Francisco, CA 94114, USA",Richvalley,20


Now we can use `geopy` to find the coordinates for each address, and `pandas` to store them in the dataframe.

The `apply` method (from `pandas`) allows us to apply a method to every row of a dataframe without a for loop.

In [11]:
data[ "Location" ] = data[ "Address" ].apply( nom.geocode )  #  location object

# column for latitude and longitude
data[ "Coordinates" ] = data[ "Location" ].apply( lambda x: (x.latitude, x.longitude) )

data = data.drop( columns = [ "Location" ] )

data

Unnamed: 0_level_0,Address,Name,Employees,Coordinates
ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1,"3666 21st St, San Francisco, CA 94114, USA",Madeira,8,"(37.756648011392286, -122.42937496976432)"
2,"735 Dolores St, San Francisco, CA 94119, USA",Bready Shop,15,"(37.757819005175406, -122.42533698790956)"
3,"332 Hill St, San Francisco, California 94114, USA",Super River,25,"(37.755874990371936, -122.42881598064156)"
4,"3995 23rd St, San Francisco, CA 94114, USA",Ben's Shop,10,"(37.75292200397371, -122.43169700840102)"
5,"1056 Sanchez St, San Francisco, California, USA",Sanchez,12,"(37.75213100377104, -122.43002800384072)"
6,"551 Alvarado St, San Francisco, CA 94114, USA",Richvalley,20,"(37.75351100030984, -122.43322896884439)"
