<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Test" data-toc-modified-id="Test-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Test</a></span></li><li><span><a href="#User-input" data-toc-modified-id="User-input-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>User input</a></span></li></ul></div>

In [1]:
!pip install geopy



In [37]:
import pandas as pd
import numpy as np 
from geopy import distance # calc. distance on the surface

We'll use this US cities dataset from [kelvins's github](https://github.com/kelvins/US-Cities-Database). 

In [100]:
# Load df
df = pd.read_csv(r'C:\Users\yuvem\OneDrive\Documents\us_cities.csv')
df.head()

Unnamed: 0,ID,STATE_CODE,STATE_NAME,CITY,COUNTY,LATITUDE,LONGITUDE
0,1,AK,Alaska,Adak,Aleutians West,55.999722,-161.207778
1,2,AK,Alaska,Akiachak,Bethel,60.891854,-161.39233
2,3,AK,Alaska,Akiak,Bethel,60.890632,-161.199325
3,4,AK,Alaska,Akutan,Aleutians East,54.143012,-165.785368
4,5,AK,Alaska,Alakanuk,Kusilvak,62.746967,-164.60228


In [101]:
df_subset = df[['STATE_CODE', 'CITY', 'LATITUDE','LONGITUDE']]
df_subset.head()

Unnamed: 0,STATE_CODE,CITY,LATITUDE,LONGITUDE
0,AK,Adak,55.999722,-161.207778
1,AK,Akiachak,60.891854,-161.39233
2,AK,Akiak,60.890632,-161.199325
3,AK,Akutan,54.143012,-165.785368
4,AK,Alakanuk,62.746967,-164.60228


We'll use geodesic or the shortest distance on surface of Earth. You can use various approximations:

- Great-circle distnace on the surface of sphere -
- Distances from geodesics since Earth is approximated as oblate ellipsoid
- Haversine formula - https://en.wikipedia.org/wiki/Haversine_formula, https://towardsdatascience.com/calculating-distance-between-two-geolocations-in-python-26ad3afe287b


Geopy.distance module already implemented all of these distnance calculation, it returns the values in kilometers (km), miles (mi), nautical miles (nm) or feet (ft). All these methods are part of distance class we have already imported from geopy.

distance((latitude_point_1, longitude_point_1), (lat_2, lon_2)) - using geodesic on WGS-84 ellipsoid

geodesic((latitude_point_1, longitude_point_1), (lat_2, lon_2))

great_circle((latitude_point_1, longitude_point_1), (lat_2, lon_2))

In [102]:
d = distance.distance((df_subset.loc[0, 'LATITUDE'], df_subset.loc[0, 'LONGITUDE']), (df_subset.loc[1, 'LATITUDE'], df_subset.loc[1, 'LONGITUDE']))
d, d.km, d.miles

(Distance(545.0169553145254), 545.0169553145254, 338.65783531334836)

In [103]:
df_subset.CITY.unique()

array(['Adak', 'Akiachak', 'Akiak', ..., 'Worland', 'Wyarno',
       'Yellowstone National Park'], dtype=object)

In [104]:
print(len(df_subset))

29880


## Test

In [129]:
# Let's start with New York and Los Angeles
ny_la = df_subset[df_subset['CITY'].isin(['New York', 'Los Angeles'])].reset_index()
ny_la

Unnamed: 0,index,STATE_CODE,CITY,LATITUDE,LONGITUDE
0,2304,CA,Los Angeles,33.973093,-118.247896
1,18874,NY,New York,40.74838,-73.996705


In [130]:
d = distance.distance((ny_la.loc[0, 'LATITUDE'], ny_la.loc[0, 'LONGITUDE']), (ny_la.loc[1, 'LATITUDE'], ny_la.loc[1, 'LONGITUDE']))
d, d.km, d.miles

(Distance(3948.894057720738), 3948.894057720738, 2453.7290086648586)

In [131]:
results = []

for f in [distance.distance, distance.great_circle, distance.geodesic]:
    for mes in ["kilometers","km","miles","mi","nautical","nm","feet","ft"]:
        d2 = f((ny_la.loc[0, "LATITUDE"], ny_la.loc[0, "LONGITUDE"]), (ny_la.loc[1, "LATITUDE"], ny_la.loc[1, "LONGITUDE"]))
        results.append({"method": f.__name__, "measurement": mes, "value": getattr(d2, mes)})

# show as dataframe
results_df = pd.DataFrame(results)
results_df.pivot_table(index="method", columns="measurement", values="value")

measurement,feet,ft,kilometers,km,mi,miles,nautical,nm
method,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
geodesic,12955690.0,12955690.0,3948.894058,3948.894058,2453.729009,2453.729009,2132.232213,2132.232213
great_circle,12927290.0,12927290.0,3940.23831,3940.23831,2448.350577,2448.350577,2127.558483,2127.558483


In [132]:
# the distnace for various ellipsiods
for ellipsoid in distance.ELLIPSOIDS:
    for mes in ["kilometers","km","miles","mi","nautical","nm","feet","ft"]:
        d3 = distance.geodesic((ny_la.loc[0, "LATITUDE"], ny_la.loc[0, "LONGITUDE"]), 
                              (ny_la.loc[1, "LATITUDE"], ny_la.loc[1, "LONGITUDE"]), ellipsoid=ellipsoid)
        results.append({"method": f"geodesic: {ellipsoid}", "measurement": mes, "value": getattr(d3, mes)})

# show as dataframe
results_df1 = pd.DataFrame(results)
results_df1.pivot_table(index="method", columns="measurement", values="value")

measurement,feet,ft,kilometers,km,mi,miles,nautical,nm
method,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
geodesic,12955690.0,12955690.0,3948.894058,3948.894058,2453.729009,2453.729009,2132.232213,2132.232213
geodesic: Airy (1830),12954470.0,12954470.0,3948.52377,3948.52377,2453.498922,2453.498922,2132.032273,2132.032273
geodesic: Clarke (1880),12956140.0,12956140.0,3949.032865,3949.032865,2453.81526,2453.81526,2132.307163,2132.307163
geodesic: GRS-67,12955740.0,12955740.0,3948.908401,3948.908401,2453.737921,2453.737921,2132.239957,2132.239957
geodesic: GRS-80,12955690.0,12955690.0,3948.894058,3948.894058,2453.729009,2453.729009,2132.232213,2132.232213
geodesic: Intl 1924,12956260.0,12956260.0,3949.067443,3949.067443,2453.836745,2453.836745,2132.325833,2132.325833
geodesic: WGS-84,12955690.0,12955690.0,3948.894058,3948.894058,2453.729009,2453.729009,2132.232213,2132.232213
great_circle,12927290.0,12927290.0,3940.23831,3940.23831,2448.350577,2448.350577,2127.558483,2127.558483


In [133]:
!pip install folium



In [134]:
#from tkinter import *

#master = Tk()
#e = Entry(master)
#e.pack()

#e.focus_set()

#def callback():
#    print(e.get()) # This is the text you may want to use later

#my_label

#b = Button(master, text = "OK", width = 10, command = callback)
#b.pack()

#mainloop()

## User input

In [135]:
# Create a list of U.S. cities by converting city column to list
us_list = df_subset.CITY.to_list()

In [136]:
print(len(us_list))

29880


In [137]:
# Create a list of U.S. states by converting state code to list
us_state_codes = df_subset.STATE_CODE.to_list()

In [138]:
print(len(us_state_codes))

29880


In [140]:
var1 = input('Enter a U.S. city: ')
if var1 not in us_list:
    print('Not in data. Enter another city')

Enter a U.S. city: Cleveland


In [141]:
var1

'Cleveland'

In [142]:
state1 = input('Enter the state code for first city: ')
if state1 not in us_state_codes:
    print('Incorrect code. Try again')

Enter the state code for first city: OH


In [143]:
state1

'OH'

In [144]:
var2 = input('Enter a U.S. city: ')
if var2 not in us_list:
    print('Not in data. Enter another city')

Enter a U.S. city: Boston


In [145]:
var2

'Boston'

In [146]:
state2 = input('Enter the state code for second city: ')
if state2 not in us_state_codes:
    print('Incorrect code. Try again')

Enter the state code for second city: MA


In [147]:
state2

'MA'

In [148]:
# Filtering to get city, state
us_df_var1 = df_subset.loc[df_subset.loc[:,"CITY"] == var1, :]
first_city = us_df_var1.loc[us_df_var1.loc[:,"STATE_CODE"] == state1, :]

In [149]:
# Do the same thing for second city, filtering
us_df_var2 = df_subset.loc[df_subset.loc[:,"CITY"] == var2, :]
second_city = us_df_var2.loc[us_df_var2.loc[:,"STATE_CODE"] == state2, :]

In [155]:
two_cities = first_city.append(second_city).reset_index()
two_cities

Unnamed: 0,index,STATE_CODE,CITY,LATITUDE,LONGITUDE
0,19696,OH,Cleveland,41.4918,-81.6757
1,10122,MA,Boston,42.357603,-71.068432


In [156]:
d1 = distance.distance((two_cities.loc[0, 'LATITUDE'], two_cities.loc[0, 'LONGITUDE']), (two_cities.loc[1, 'LATITUDE'], two_cities.loc[1, 'LONGITUDE']))
d1, d1.km, d1.miles

(Distance(884.5082333676318), 884.5082333676318, 549.6079355113833)

According to this, the shortest distance between Cleveland and Boston is approximately 550 miles. I Googled the distance and it's close to what Google says (Google says it is 640 miles).

In [160]:
results2 = []

for f in [distance.distance, distance.great_circle, distance.geodesic]:
    for mes in ["kilometers","km","miles","mi","nautical","nm","feet","ft"]:
        d3 = f((two_cities.loc[0, "LATITUDE"], two_cities.loc[0, "LONGITUDE"]), (two_cities.loc[1, "LATITUDE"], two_cities.loc[1, "LONGITUDE"]))
        results2.append({"method": f.__name__, "measurement": mes, "value": getattr(d3, mes)})

# show as dataframe
results_df2 = pd.DataFrame(results2)
results_df2.pivot_table(index="method", columns="measurement", values="value")

measurement,feet,ft,kilometers,km,mi,miles,nautical,nm
method,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
geodesic,2901930.0,2901930.0,884.508233,884.508233,549.607936,549.607936,477.596238,477.596238
great_circle,2894480.0,2894480.0,882.237544,882.237544,548.196994,548.196994,476.370164,476.370164


In [161]:
# the distance for various ellipsiods
for ellipsoid in distance.ELLIPSOIDS:
    for mes in ["kilometers","km","miles","mi","nautical","nm","feet","ft"]:
        d4 = distance.geodesic((two_cities.loc[0, "LATITUDE"], two_cities.loc[0, "LONGITUDE"]), 
                              (two_cities.loc[1, "LATITUDE"], two_cities.loc[1, "LONGITUDE"]), ellipsoid=ellipsoid)
        results2.append({"method": f"geodesic: {ellipsoid}", "measurement": mes, "value": getattr(d4, mes)})

# show as dataframe
results_df3 = pd.DataFrame(results2)
results_df3.pivot_table(index="method", columns="measurement", values="value")

measurement,feet,ft,kilometers,km,mi,miles,nautical,nm
method,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
geodesic,2901930.0,2901930.0,884.508233,884.508233,549.607936,549.607936,477.596238,477.596238
geodesic: Airy (1830),2901654.0,2901654.0,884.424106,884.424106,549.555661,549.555661,477.550813,477.550813
geodesic: Clarke (1880),2902050.0,2902050.0,884.544759,884.544759,549.630631,549.630631,477.615961,477.615961
geodesic: GRS-67,2901940.0,2901940.0,884.511454,884.511454,549.609937,549.609937,477.597977,477.597977
geodesic: GRS-80,2901930.0,2901930.0,884.508233,884.508233,549.607936,549.607936,477.596238,477.596238
geodesic: Intl 1924,2902062.0,2902062.0,884.548479,884.548479,549.632943,549.632943,477.617969,477.617969
geodesic: WGS-84,2901930.0,2901930.0,884.508233,884.508233,549.607936,549.607936,477.596238,477.596238
great_circle,2894480.0,2894480.0,882.237544,882.237544,548.196994,548.196994,476.370164,476.370164


In [162]:
# If you want to print the full list of cities
for x in range(len(us_list)):
    print(us_list[x])

Adak
Akiachak
Akiak
Akutan
Alakanuk
Aleknagik
Allakaket
Ambler
Anaktuvuk Pass
Anchor Point
Anchorage
Anderson
Angoon
Aniak
Anvik
Arctic Village
Atka
Atqasuk
Auke Bay
Barrow
Beaver
Bethel
Bettles Field
Big Lake
Brevig Mission
Buckland
Cantwell
Central
Chalkyitsik
Chefornak
Chevak
Chicken
Chignik
Chignik Lagoon
Chignik Lake
Chitina
Chugiak
Circle
Clam Gulch
Clarks Point
Clear
Coffman Cove
Cold Bay
Cooper Landing
Copper Center
Cordova
Craig
Crooked Creek
Deering
Delta Junction
Denali National Park
Dillingham
Douglas
Dutch Harbor
Eagle
Eagle River
Eek
Egegik
Eielson Afb
Ekwok
Elfin Cove
Elim
Elmendorf Afb
Emmonak
Ester
Fairbanks
False Pass
Fort Greely
Fort Richardson
Fort Wainwright
Fort Yukon
Gakona
Galena
Gambell
Girdwood
Glennallen
Goodnews Bay
Grayling
Gustavus
Haines
Healy
Holy Cross
Homer
Hoonah
Hooper Bay
Hope
Houston
Hughes
Huslia
Hydaburg
Hyder
Iliamna
Indian
Juneau
Kake
Kaktovik
Kalskag
Kaltag
Karluk
Kasigluk
Kasilof
Kenai
Ketchikan
Kiana
King Cove
King Salmon
Kipnuk
Kivalina
Kla