### WM Versioning

* v0 - Takes date; returns start date of next shower
* v1 - Above + takes location; returns location of closest park
* v2 - Above + takes max driving time, returns optimal dark sky location within max driving time
* v3 - Above + takes date range, returns best date + best time
* v4 - Above + available as a website!

### v1 steps

1. Request user location through browser
2. Create dataframe of local parks
3. Compare user location to local parks
4. Return closest park

### Scoping
Should decide what area we'll be looking at -- Bay Area only, CA only, US only?

## 1. Request user location through browser

In [23]:
import pandas as pd
import numpy as np
import urllib2
import json

from math import radians, sin, cos, asin, sqrt, pi, atan2
import numpy as np
import itertools


In [2]:
def get_user_location():
    # Automatically geolocate the connecting IP
    f = urllib2.urlopen('http://freegeoip.net/json/')
    
    # Processes and formats JSON file from URL
    json_string = f.read()
    f.close()
    location = json.loads(json_string)
    return location
    
    # This actually returns a string and we can't access dictionary values
    # return json.dumps(location, indent=4, sort_keys=True)

In [3]:
print get_user_location()

{u'city': u'San Francisco', u'region_code': u'CA', u'region_name': u'California', u'ip': u'8.21.168.105', u'time_zone': u'America/Los_Angeles', u'longitude': -122.4382, u'metro_code': 807, u'latitude': 37.8018, u'country_code': u'US', u'country_name': u'United States', u'zip_code': u'94123'}


## 2. Create dataframe of local parks

Get a list of parks - names/addresses, best case is long/lat

Deciding against US parks (US national parks: 59), going for CA national + state parks (9 + 118)

Using campground data http://www.uscampgrounds.info/:

> We include only vehicle-accessible, family campgrounds with 4 or more campsites - whose existence and location we can verify.  We want to be sure you find an actual family campground when you drive there. We try to tell you if you need a 4 wheel drive or high-clearance vehicle.  (no backpack-in,  boat-in, horse camps or group-only camps). We do include some selected "boondocks" (no facilities), (we call "dispersed") We will tell you if you need to take a ferry or if a short walk to the sites is required. We do not include privately owned campgrounds.  

Starting with west coast camps

In [4]:
df = pd.read_csv('west-coast-camps.csv')
df.shape

(3926, 16)

In [5]:
df.head(1)

Unnamed: 0,lon,lat,gps composite field,campground code,campground name,type,phone,dates open,comments,number of campsites,elevation (ft),amenities,state,distance from nearest town,bearing from nearest town,nearest town
0,-146.343,61.086,ALLI/Allison Point CP PH:907.835.2282 mid may...,ALLI,Allison Point,CP,907.835.2282,mid may-mid sep,,61,,45ft,AK,3.1,S,Valdez


In [6]:
df['state'].value_counts()

CA    1301
OR     720
WA     576
ID     497
MT     481
WY     251
AK     100
Name: state, dtype: int64

In [7]:
df.columns.values

array(['lon', 'lat', 'gps composite field', 'campground code',
       'campground name', 'type', 'phone', 'dates open', 'comments',
       'number of campsites', 'elevation (ft)', 'amenities', 'state',
       'distance from nearest town', 'bearing from nearest town',
       'nearest town'], dtype=object)

### Getting CA campgrounds only

In [8]:
df_ca = df.loc[df['state']=='CA']
df_ca.shape

(1301, 16)

Adding legend info for full_type

In [9]:
df_ca['full_type'] = df_ca['type'].map({
    'NP'  :'US National Park',
    'NM'  :'National Monument', 
    'CNP' :'Canadian National Park', 
    'NF'  :'US National Forest', 
    'BLM' :'US Bureau of Land Management', 
    'USFW':'US Fish and Wildlife', 
    'BOR' :'US Bureau of Reclamation', 
    'COE' :'US Corps of Engineers', 
    'TVA' :'Tennessee Valley Auth.', 
    'SP'  :'State Park', 
    'PP'  :'Canadian Provincial Park', 
    'SRA' :'State Rec. Area',
    'SRVA':'State Rec. Vehicular Area',
    'SPR' :'State Preserve', 
    'SB'  :'State Beach', 
    'SF'  :'State Forest', 
    'SFW' :'State Fish and Wildlife',
    'MIL' :'Military only',
    'CP'  :'County/City/Regional Park', 
    'AUTH':'Authority', 
    'UTIL':'Utility', 
    'RES' :'Native American Reservation', 
    ' '   :'Unknown type'
})

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy


In [10]:
df_ca['type'].value_counts()

NF      749
CP      183
SP       80
BLM      72
NP       52
SRA      31
MIL      30
         24
UTIL     20
COE      18
SB       15
BOR      14
SF        6
NM        5
SRVA      1
RES       1
Name: type, dtype: int64

In [11]:
df_ca['full_type'].value_counts()

US National Forest              749
County/City/Regional Park       183
State Park                       80
US Bureau of Land Management     72
US National Park                 52
State Rec. Area                  31
Military only                    30
Unknown type                     24
Utility                          20
US Corps of Engineers            18
State Beach                      15
US Bureau of Reclamation         14
State Forest                      6
National Monument                 5
State Rec. Vehicular Area         1
Native American Reservation       1
Name: full_type, dtype: int64

Drop original 'type'

In [12]:
df_ca.drop('type', axis=1, inplace=True)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  if __name__ == '__main__':


In [13]:
df_ca.shape

(1301, 16)

Checking to see campground names that include 'State Park' or 'National Park' to see if a value in the 'full_type' column would also show up in the campground name. 

In [14]:
df_national_park = df_ca.loc[df_ca['campground name'].str.contains('National Park')]
df_state_park = df_ca.loc[df_ca['campground name'].str.contains('State Park')]

print 'Number of campground names with State Park:', len(df_state_park)
print 'Number of campground names with National Park:', len(df_national_park)

Number of campground names with State Park: 69
Number of campground names with National Park: 50


Checking to see campground names that include 'Military'; same idea as above. 

In [15]:
df_ca.loc[df_ca['campground name'].str.contains('Military')][['lon', 'lat', 'campground name', 'comments', 'state']].head(3)

Unnamed: 0,lon,lat,campground name,comments,state
105,-117.103,32.792,Admiral Baker Military - San Diego NS,military only - do not use our lat/lon - conta...,CA
159,-121.385,39.13,Beale AFB Military,voted best - military only - do not use our la...,CA
178,-116.879,34.235,Big Bear Military,military only - do not use our lat/lon - conta...,CA


## 3. Compare user location to local parks

In [16]:
from math import radians, cos, sin, asin, sqrt

In [21]:
def haversine(lon1, lat1, lon2, lat2):
    """
    Calculate the great circle distance between two points 
    on the earth (specified in decimal degrees)
    Returns in miles
    """
    # convert decimal degrees to radians 
    lon1, lat1, lon2, lat2 = map(radians, [lon1, lat1, lon2, lat2])

    # haversine formula 
    dlon = lon2 - lon1 
    dlat = lat2 - lat1 
    a = sin(dlat/2)**2 + cos(lat1) * cos(lat2) * sin(dlon/2)**2
    c = 2 * asin(sqrt(a)) 

    # 6367 km is the radius of the Earth
    km = 6367 * c
    return km * 0.621371

### Unit tests on haversine:
#### 1. User location compared against itself
Should return 0.0

In [18]:
user_location = get_user_location()

lonUser = user_location['longitude']
latUser = user_location['latitude']

lonPark = user_location['longitude']
latPark = user_location['latitude']

print haversine(lonUser, latUser, lonPark, latPark)

0.0


#### 2. User location compared against Mt. Diablo

In [19]:
df_ca.loc[df_ca['campground name'].str.contains('Mt. Diablo')][['lon', 'lat', 'campground name', 'comments', 'state']].head(3)

Unnamed: 0,lon,lat,campground name,comments,state
923,-121.933,37.851,Mt. Diablo State Park,3 campgrounds,CA


In [20]:
user_location = get_user_location()

lonUser = user_location['longitude']
latUser = user_location['latitude']

lonPark = df_ca.loc[df_ca['campground name'].str.contains('Mt. Diablo'), 'lon']
latPark = df_ca.loc[df_ca['campground name'].str.contains('Mt. Diablo'), 'lat']

print haversine(lonUser, latUser, lonPark, latPark)

44.6794846374


In [19]:
# We should use a bounding box as an additional test

### Function comparing user location to all campground locations:

In [None]:
def get_nearest_park():
    # call get_user_location and save lon/lat
    user_location = get_user_location()

    lonUser = user_location['longitude']
    latUser = user_location['latitude']
    
    # create local version of df_ca
    df_user_distance = df_ca.copy()
    
    # create distance column, populate with distance between lon/lat of row using haversine formula
    # return row with smallest distance value

In [21]:
df_user_distance = df_ca.copy()

In [22]:
# df_user_distance['Distance'] = haversine(lonUser, latUser, df_user_distance['lon'], df_user_distance['lat'])

# may not need this if we can use iterrows:
# http://stackoverflow.com/questions/19914937/applying-function-with-multiple-arguments-to-create-a-new-pandas-column

In [23]:
# try iterrows

In [36]:
# Testing a way to compare distances faster
# source: http://stackoverflow.com/questions/6656475/python-speeding-up-geographic-comparison

earth_radius_miles = 3956.0

def get_shortest_in(needle, haystack):
    """needle is a single (lat,long) tuple.
        haystack is a numpy array to find the point in
        that has the shortest distance to needle
    """
    dlat = np.radians(haystack[:,0]) - radians(needle[0])
    #print dlat
    dlon = np.radians(haystack[:,1]) - radians(needle[1])
    
    a = np.square(np.sin(dlat/2.0)) + cos(radians(needle[0])) * np.cos(np.radians(haystack[:,0])) * np.square(np.sin(dlon/2.0))
    print a
    great_circle_distance = 2 * np.arcsin(np.minimum(np.sqrt(a), np.repeat(1, len(a))))
    d = earth_radius_miles * great_circle_distance
    print d
    return np.min(d)



In [43]:
x = (37.160316546736745, -78.75)
y = (39.095962936305476, -121.2890625)


lots = np.array(list(itertools.repeat(y, 100)))
def donumpy():
    get_shortest_in(x, np.array(df_ca_lon_lat))
    


In [40]:
# We've decided to pull the first 2 columns from our original table, then use index of the shortest distance to look up the name

# Pull columns from table
df_ca_lon_lat = df_ca[['lon','lat']]


# Find shortest distance and return index

# Turn index into campground name

In [44]:
donumpy()

[ 0.64097472  0.64762637  0.66703835 ...,  0.67457282  0.66419697
  0.69877087]
[ 7344.79547573  7399.76277507  7561.58468113 ...,  7625.01409946
  7537.75855694  7831.42637733]
