# Finding The Best Locality Based On Interest

## 1. Introduction
### 1.1 Background

People migrate from one location to another. It is part of our life. Most people would have migrated from one location to another at least once in their life. It may be due to job change, looking for the school for children or moving to another rented house. 

### 1.2 Problem

It will be challenging to find the place of your interest. Everyone will have different demand and interest. In this article we will see how we use publicly available date to find solution.

### 1.3 Interest

Anyone migrating from one location to another will be interested in understanding the place they are moving to.

## 2. About the Data

- www.unitedstateszipcodes.org for the Zip Code and Location information
- Foursquare APIs to get the venues around 

When I was searching for data, I found about www.unitedstateszipcodes.org . They have all the US Zip code information along with location information. This has added advantage for me as I don’t have to go and find the map location. Here is a sample of the data

| zip | type     | decommissioned | primary_city | acceptable_cities | unacceptable_cities                                                                                                                                                                                                                                                                                                                                                                                                                                                                         | state | county              | timezone            | area_codes | world_region | country | latitude | longitude | irs_estimated_population_2015 |
|-----|----------|----------------|--------------|-------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------|---------------------|---------------------|------------|--------------|---------|----------|-----------|-------------------------------|
| 501 | UNIQUE   | 0              | Holtsville   |                   | I R S Service Center                                                                                                                                                                                                                                                                                                                                                                                                                                                                        | NY    | Suffolk County      | America/New_York    | 631        | NA           | US      | 40.81    | -73.04    | 562                           |
| 544 | UNIQUE   | 0              | Holtsville   |                   | Irs Service Center                                                                                                                                                                                                                                                                                                                                                                                                                                                                          | NY    | Suffolk County      | America/New_York    | 631        | NA           | US      | 40.81    | -73.04    | 0                             |
| 601 | STANDARD | 0              | Adjuntas     |                   | Colinas Del Gigante, Jard De Adjuntas, Urb San Joaquin                                                                                                                                                                                                                                                                                                                                                                                                                                      | PR    | Adjuntas Municipio  | America/Puerto_Rico | 787,939    | NA           | US      | 18.16    | -66.72    | 0                             |
| 602 | STANDARD | 0              | Aguada       |                   | Alts De Aguada, Bo Guaniquilla, Comunidad Las Flores, Ext Los Robles, Urb Isabel La Catolica                                                                                                                                                                                                                                                                                                                                                                                                | PR    | Aguada Municipio    | America/Puerto_Rico | 787,939    | NA           | US      | 18.38    | -67.18    | 0                             |
| 603 | STANDARD | 0              | Aguadilla    | Ramey             | Bda Caban, Bda Esteves, Bo Borinquen, Bo Ceiba Baja, Ext El Prado, Ext Marbella, Repto Jimenez, Repto Juan Aguiar, Repto Lopez, Repto Tres Palmas, Sect Las Villas, Urb Borinquen, Urb El Prado, Urb Esteves, Urb Garcia, Urb Las Americas, Urb Las Casitas Country Club, Urb Maleza Gdns, Urb Marbella, Urb Ramey, Urb Rubianes, Urb San Carlos, Urb Santa Marta, Urb Victoria, Villa Alegria, Villa Linda, Villa Lydia, Villa Universitaria, Villas De Almeria, Vista Alegre, Vista Verde | PR    | Aguadilla Municipio | America/Puerto_Rico | 787        | NA           | US      | 18.43    | -67.15    | 0                             |
| 604 | PO BOX   | 0              | Aguadilla    | Ramey             |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             | PR    |                     | America/Puerto_Rico |            | NA           | US      | 18.43    | -67.15    | 0                             |
| 605 | PO BOX   | 0              | Aguadilla    |                   |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             | PR    |                     | America/Puerto_Rico |            | NA           | US      | 18.43    | -67.15    | 0                             |
| 606 | STANDARD | 0              | Maricao      |                   | Urb San Juan Bautista                                                                                                                                                                                                                                                                                                                                                                                                                                                                       | PR    | Maricao Municipio   | America/Puerto_Rico | 787,939    | NA           | US      | 18.18    | -66.98    | 0                             |
| 610 | STANDARD | 0              | Anasco       |                   | Brisas De Anasco, Est De Valle Verde, Jard De Anasco, Paseo Del Valle, Repto Daguey, Urb San Antonio                                                                                                                                                                                                                                                                                                                                                                                        | PR    | Anasco Municipio    | America/Puerto_Rico | 787        | NA           | US      | 18.28    | -67.14    | 0                             |
| 611 | PO BOX   | 0              | Angeles      |                   |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             | PR    |                     | America/Puerto_Rico |            | NA           | US      | 18.28    | -66.79    | 0                             |

I am going to use the **Foursquare APIs** to get the venues near by for each of these zip code for the specific city/location. Then find which zip code has all or most of my interest with-in specified distance. 

## 3. Analysis/Code

Import libs and update Foursquare credentials 

In [1]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Collecting package metadata (current_repodata.json): done
Solving environment: done


  current version: 4.9.1
  latest version: 4.9.2

Please update conda by running

    $ conda update -n base -c defaults conda



## Package Plan ##

  environment location: /home/jupyterlab/conda/envs/python

  added / updated specs:
    - geopy


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    ca-certificates-2020.11.8  |       ha878542_0         145 KB  conda-forge
    certifi-2020.11.8          |   py36h5fab9bb_0         150 KB  conda-forge
    geographiclib-1.50         |             py_0          34 KB  conda-forge
    geopy-2.0.0                |     pyh9f0ad1d_0          63 KB  conda-forge
    ------------------------------------------------------------
                                           Total:         392 KB

The following NEW packages will be INSTALLED:

  geographiclib      conda-forg

Set your credientials.

In [2]:
CLIENT_ID = 'xxxx' # your Foursquare ID
CLIENT_SECRET = 'xxxx' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 100 # A default Foursquare API limit value
RADIUS= 4000

Import zipcode database and do the cleaning of the data

In [3]:
zip_info=pd.read_csv("zip_code_database.csv")
zip_info.head()

Unnamed: 0,zip,type,decommissioned,primary_city,acceptable_cities,unacceptable_cities,state,county,timezone,area_codes,world_region,country,latitude,longitude,irs_estimated_population_2015
0,501,UNIQUE,0,Holtsville,,I R S Service Center,NY,Suffolk County,America/New_York,631,,US,40.81,-73.04,562
1,544,UNIQUE,0,Holtsville,,Irs Service Center,NY,Suffolk County,America/New_York,631,,US,40.81,-73.04,0
2,601,STANDARD,0,Adjuntas,,"Colinas Del Gigante, Jard De Adjuntas, Urb San...",PR,Adjuntas Municipio,America/Puerto_Rico,787939,,US,18.16,-66.72,0
3,602,STANDARD,0,Aguada,,"Alts De Aguada, Bo Guaniquilla, Comunidad Las ...",PR,Aguada Municipio,America/Puerto_Rico,787939,,US,18.38,-67.18,0
4,603,STANDARD,0,Aguadilla,Ramey,"Bda Caban, Bda Esteves, Bo Borinquen, Bo Ceiba...",PR,Aguadilla Municipio,America/Puerto_Rico,787,,US,18.43,-67.15,0


In [4]:
print(zip_info.dtypes)
zip_info=zip_info.astype({'zip':'object'})
print(zip_info.dtypes)

zip                                int64
type                              object
decommissioned                     int64
primary_city                      object
acceptable_cities                 object
unacceptable_cities               object
state                             object
county                            object
timezone                          object
area_codes                        object
world_region                      object
country                           object
latitude                         float64
longitude                        float64
irs_estimated_population_2015      int64
dtype: object
zip                               object
type                              object
decommissioned                     int64
primary_city                      object
acceptable_cities                 object
unacceptable_cities               object
state                             object
county                            object
timezone                          object
ar

In [5]:
pd.DataFrame(zip_info['zip'].isna()).value_counts()

zip  
False    42632
dtype: int64

In [6]:
zip_info_filtered=zip_info[['zip','primary_city','state','latitude','longitude']]
zip_info_filtered.head()

Unnamed: 0,zip,primary_city,state,latitude,longitude
0,501,Holtsville,NY,40.81,-73.04
1,544,Holtsville,NY,40.81,-73.04
2,601,Adjuntas,PR,18.16,-66.72
3,602,Aguada,PR,18.38,-67.18
4,603,Aguadilla,PR,18.43,-67.15


In [7]:
print("There are {0} zip codes available in USA".format(zip_info.shape[0]))

There are 42632 zip codes available in USA


In [8]:
# drop duplicate lat & lng
zip_info_filtered.drop_duplicates(['latitude','longitude'],inplace=True)
zip_info_filtered.shape

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  


(36391, 5)

In [9]:
print("There are {0} zip codes available in USA (After removing duplicate location)".format(zip_info.shape[0]))

There are 42632 zip codes available in USA (After removing duplicate location)


Here comes the users input. Which city he/she is moving and what are their preference

In [10]:
new_city="Austin"
new_state="TX"
my_preference=['Gym','Park','Indian Restaurant','Bar']

Now filter the zip codes belonging to Austin

In [11]:
new_city_zip_info=zip_info_filtered[(zip_info_filtered['primary_city']==new_city) & (zip_info_filtered['state']==new_state) ]
new_city_zip_info.reset_index(inplace=True)
new_city_zip_info.drop(['index'],axis=1,inplace=True)
print("There are {0} zip codes in {1},{2}".format(new_city_zip_info.shape[0],new_city,new_state))

There are 47 zip codes in Austin,TX


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  errors=errors,


In [12]:
new_city_zip_info.head()

Unnamed: 0,zip,primary_city,state,latitude,longitude
0,73301,Austin,TX,30.26,-97.74
1,78702,Austin,TX,30.26,-97.71
2,78703,Austin,TX,30.29,-97.77
3,78704,Austin,TX,30.24,-97.77
4,78705,Austin,TX,30.29,-97.74


In [13]:
new_city_zip_info=new_city_zip_info.astype({'zip':'object'})
new_city_zip_info.dtypes

zip              object
primary_city     object
state            object
latitude        float64
longitude       float64
dtype: object

Lets visualize how the zips are distributed

In [14]:
latitude=30.3076863
longitude=-97.8934863
map_austin = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, label in zip(new_city_zip_info['latitude'], new_city_zip_info['longitude'], new_city_zip_info['zip']):
    label = folium.Popup(str(label), parse_html=True)
    folium.Circle(
        [lat, lng],
        radius=RADIUS,
        popup=label,
        color='blue',
        fill=False,
        parse_html=False).add_to(map_austin)  
    
map_austin

Function to download neighbours from Foursquare. I did download once and copied to a csv file. So that I don't have to call it everytime.

In [15]:
def getNearbyVenues(names, latitudes, longitudes):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            RADIUS, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['zip', 
                  'Latitude', 
                  'Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

Download neighbour info from Foursquare and copy to csv file. If csv file already exists then import from the csv file.

In [16]:
import os

if os.path.isfile("austin_city_neighbors_by_zip.csv"):
    new_city_neighbor=pd.read_csv("austin_city_neighbors_by_zip.csv")
    new_city_neighbor.drop('Unnamed: 0',axis=1,inplace=True)
    new_city_neighbor=new_city_neighbor.astype({'zip':'object'})
else:
    new_city_neighbor=getNearbyVenues(names=new_city_zip_info['zip'],
                                       latitudes=new_city_zip_info['latitude'],
                                       longitudes=new_city_zip_info['longitude'])
    new_city_neighbor.to_csv("austin_city_neighbors_by_zip.csv")


In [17]:
print(new_city_neighbor.dtypes)
new_city_neighbor.head()

zip                 object
Latitude           float64
Longitude          float64
Venue               object
Venue Latitude     float64
Venue Longitude    float64
Venue Category      object
dtype: object


Unnamed: 0,zip,Latitude,Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,73301,30.26,-97.74,Ladybird Lake Hike & Bike Trail (Rainey St.),30.258917,-97.740925,Trail
1,73301,30.26,-97.74,Kimpton Hotel Van Zandt,30.260078,-97.739147,Hotel
2,73301,30.26,-97.74,Craft Pride,30.257896,-97.738862,Beer Garden
3,73301,30.26,-97.74,Rainey Street Outdoor Food Trucks,30.258436,-97.738968,Food Truck
4,73301,30.26,-97.74,Fairmont Austin,30.262074,-97.738261,Hotel


In [18]:
print('There are {} uniques categories.'.format(len(new_city_neighbor['Venue Category'].unique())))

There are 292 uniques categories.


Now filter the Venue Category users have provided

In [19]:
new_city_neighbor_filtered=new_city_neighbor[new_city_neighbor['Venue Category'].isin(my_preference)].reset_index(drop=True)
new_city_neighbor_filtered.head()

Unnamed: 0,zip,Latitude,Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,73301,30.26,-97.74,Lustre Pearl,30.260715,-97.738281,Bar
1,73301,30.26,-97.74,Auditorium Shores at Lady Bird Lake,30.259792,-97.749013,Park
2,73301,30.26,-97.74,RIDE Indoor Cycling,30.26456,-97.746643,Gym
3,73301,30.26,-97.74,Butler Park (formerly Town Lake Park),30.261792,-97.754974,Park
4,73301,30.26,-97.74,Rustic Tap,30.269533,-97.749145,Bar


In [20]:
new_city_neighbor_filtered[['Venue Category']].value_counts()

Venue Category   
Park                 205
Bar                  129
Gym                   92
Indian Restaurant     12
dtype: int64

Let's visualize how these venues are distributed across Austin. I am using layer control which is a cool feature where I can select the category that I want to see.

In [21]:
unique_venus=new_city_neighbor_filtered[['Venue','Venue Latitude','Venue Longitude','Venue Category']].drop_duplicates()
print(new_city_neighbor_filtered.shape)
print(unique_venus.shape)

(438, 7)
(150, 4)


In [22]:
map_austin_neighbor = folium.Map(location=[latitude, longitude], zoom_start=11)
my_color_map=['red','blue','gray','darkred','lightred','orange','beige','green','darkgreen',
              'lightgreen','darkblue','lightblue','purple','darkpurple','pink','cadetblue','lightgray','black']

feature_group={}
for f in my_preference:
    feature_group[f] = folium.map.FeatureGroup(name=f)

# add markers to map
for lat,lng,cat,name in zip(unique_venus['Venue Latitude'],unique_venus['Venue Longitude'],unique_venus['Venue Category'],unique_venus['Venue']):
    color=my_color_map[my_preference.index(cat)]
    label = folium.Popup(name, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        popup=label,
        radius=5,
        color=color
        ).add_to(feature_group[cat])  

for f in feature_group:
    map_austin_neighbor.add_child(feature_group[f])
map_austin_neighbor.add_child(folium.map.LayerControl())

map_austin_neighbor

Now pivot the neighbor information and add the zip/location information. Later we will be going group sum to find how many Venues are there in each category. As user is not interested in how many, we convert it into present or not present. Then order by the total will give the preference for the user to go through.

In [23]:
new_city_neighbor_onehot = pd.get_dummies(new_city_neighbor_filtered[['Venue Category']], prefix="", prefix_sep="")

In [24]:
new_city_neighbor_onehot[['zip','Latitude','Longitude']]=new_city_neighbor_filtered[['zip','Latitude','Longitude']]
new_city_neighbor_onehot.head()

Unnamed: 0,Bar,Gym,Indian Restaurant,Park,zip,Latitude,Longitude
0,1,0,0,0,73301,30.26,-97.74
1,0,0,0,1,73301,30.26,-97.74
2,0,1,0,0,73301,30.26,-97.74
3,0,0,0,1,73301,30.26,-97.74
4,1,0,0,0,73301,30.26,-97.74


In [25]:
new_city_neighbor_onehot['total']=new_city_neighbor_onehot[my_preference].sum(axis=1)

In [26]:
filtered_zips=new_city_neighbor_onehot[new_city_neighbor_onehot['total']>0][my_preference+['zip','Latitude','Longitude','total']].groupby(['zip','Latitude','Longitude']).sum()


In [27]:
filtered_zips.reset_index(inplace=True)


In [28]:
filtered_zips.sort_values('total',ascending=False,inplace=True)

In [29]:
filtered_zips.reset_index(drop=True,inplace=True)

In [30]:
filtered_zips_new=filtered_zips
filtered_zips_new[my_preference]=filtered_zips[my_preference]>0

In [31]:
filtered_zips_new['total']=filtered_zips_new[my_preference].sum(axis=1)

In [32]:
filtered_zips_new.sort_values('total',ascending=False,inplace=True)

In [33]:
filtered_zips_new.reset_index(drop=True,inplace=True)

## Result

In [34]:
from folium.features import DivIcon

top=5

# add markers to map
for index, (lat, lng, label) in enumerate(zip(filtered_zips_new['Latitude'][0:top], filtered_zips_new['Longitude'][0:top], filtered_zips_new['zip'][0:top]),start=1):
    label1 = folium.Popup(str(label), parse_html=True)
    label2 = folium.Popup(str(label), parse_html=True)
    folium.Circle(
        [lat, lng],
        radius=RADIUS,
        popup=label1,
        color='blue',
        fill=False,
        parse_html=False
        ).add_to(map_austin_neighbor)
    folium.Marker([lat, lng], 
        popup=label2,
        icon=DivIcon(
        icon_size=(36,36),
        icon_anchor=(7,20),
        html='<div style="font-size: 18pt; color : blue">{0}</div>'.format(index),
        )).add_to(map_austin_neighbor)
    
map_austin_neighbor

In [35]:
filtered_zips_new[0:10]

Unnamed: 0,zip,Latitude,Longitude,Gym,Park,Indian Restaurant,Bar,total
0,78717,30.49,-97.75,True,True,True,True,4
1,78745,30.21,-97.8,True,True,True,True,4
2,78727,30.43,-97.72,True,True,True,True,4
3,78728,30.45,-97.69,True,True,True,True,4
4,78757,30.35,-97.73,True,True,True,True,4
5,78753,30.38,-97.67,True,True,True,True,4
6,78731,30.35,-97.77,True,True,True,True,4
7,78716,30.26,-97.74,True,True,False,True,3
8,78715,30.26,-97.74,True,True,False,True,3
9,78714,30.26,-97.74,True,True,False,True,3


## Discussion

In this model, I mostly used the visualization methods to identify the solution. We can solve this used multiple ways as I discussed in the Data section. This show how we can utilize the data science to solve the day to day problems. One additional improvement we could do is, we can group some of the zip codes using DBSCAN algorithm. After seeing the final result, I felt that I don’t need to go for that option.

## Conclusion
As more people are migrating to new places and this model will help them to identify the places they can start looking before they move in. Again, there is always place for improvements