#                                  Capstone Project - The Battle of Neighborhoods (Week 1)

## Title: Explore new business opportunities in various cities in NC state

### Introduction & Business problem

A businessman wants to start a new business in North Carolina, USA. For him to decide in which city he shall make investment to start the business, he wants to explore the following data:  
1)	What are the top-10 most populous cities in NC state?  
2)	What are the various businesses in these cities?  
3)	What is the count of each these businesses in each of the Top-10 cities?  
4)	What are the 10 most common  & least common businesses per every 10,000 population in these cities? 

Fortunately for the businessman, also for all other potential investors/businesses, we can provide this information using Data Science methodologies & Python language.

### Solution approach and Data points

Here are the data points that we need for this project:  

1)	Identify the Top-10 cities in NC by population: This data may be available on the Internet.  
2)	Address info (i.e. longitude & latitude) for the top-10 cities.  
3)	Identify & explore various businesses, venues in these cities.  
4)	Identify the no. of businesses, their frequency, top-10 most/least common businesses in these cities and calculate them for every 10k population.  
5)	Identify the business densities in these cities using above data points.  


#### Data Sources:
Here are the sources for this data:  

1) List of NC cities by population is available on Internet at the following website:

https://worldpopulationreview.com/states/cities/north-carolina

2)	We can get the longitude & latitude of the top-10 cities using geopy.geocoders.  
3)	Using the longitude & latitude values from above step, we can explore the various businesses, venues in these cities using Foursquare API.  
4)	We can identify the no. of businesses, frequency, most common, least common businesses using data from above step & city population.  
5)	We can visualize the business densities in these cities using Folium visualization library.

## Download and Explore data sets

#### First download all the required libraries/packages

In [2]:
import requests # library to handle requests
import pandas as pd # library for data analsysis
import numpy as np # library to handle data in a vectorized manner
import random # library for random number generation

!conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim # module to convert an address into latitude and longitude values

# libraries for displaying images
from IPython.display import Image 
from IPython.core.display import HTML 
    
# tranforming json file into a pandas dataframe library
from pandas.io.json import json_normalize

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python36

  added / updated specs: 
    - geopy


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    certifi-2020.6.20          |   py36h9f0ad1d_0         151 KB  conda-forge
    ca-certificates-2020.6.20  |       hecda079_0         145 KB  conda-forge
    openssl-1.1.1g             |       h516909a_0         2.1 MB  conda-forge
    geographiclib-1.50         |             py_0          34 KB  conda-forge
    geopy-2.0.0                |     pyh9f0ad1d_0          63 KB  conda-forge
    python_abi-3.6             |          1_cp36m           4 KB  conda-forge
    ------------------------------------------------------------
                                           Total:         2.5 MB

The following NEW packages will be INSTALLED:

    geographiclib:   1.50-py_0          conda-forge
    geopy:           

#### Load and explore the first data set (i.e. NC Population by City)

In [3]:
url = 'https://worldpopulationreview.com/states/cities/north-carolina'
html = requests.get(url).content
df_list = pd.read_html(html)
df_list

[     Rank                 Name  2020 Pop  2010 Census   Change  Density (km²)
 0       1            Charlotte    905318       738534   22.58%           1138
 1       2              Raleigh    481958       406355   18.61%           1276
 2       3           Greensboro    299946       269587   11.26%            897
 3       4               Durham    282737       230710   22.55%            973
 4       5        Winston-Salem    251762       230033    9.45%            733
 5       6         Fayetteville    205646       208335   -1.29%            537
 6       7                 Cary    175102       137021   27.79%           1151
 7       8           Wilmington    126669       106750   18.66%            951
 8       9           High Point    112900       104697    7.83%            779
 9      10              Concord     98842        79510   24.31%            601
 10     11           Greenville     95183        85109   11.84%           1031
 11     12            Asheville     93758        834

### Install the required libraries/packages for identifying the longitude & latitude

In [4]:
!conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim

Solving environment: done

# All requested packages already installed.



### Sample code for getting address, venue info for the first city. This will be expanded to other cities in the next lab assignment.

In [5]:
address1 = 'Charlotte, NC'
geolocator = Nominatim(user_agent="ny_explorer")
location1 = geolocator.geocode(address1)
latitude1 = location1.latitude
longitude1 = location1.longitude
print('The geograpical coordinate of Charlotte City are {}, {}.'.format(latitude1, longitude1))

The geograpical coordinate of Charlotte City are 35.2272, -80.843083.


#### Extracting Foursquare location data for first city (This will be expanded to other cities in the next lab)

In [6]:
CLIENT_ID = 'MSNMMGSCRZIA0SASHUDGUISUSIUL2NM2UZNLLOD5VUGC3KA2' # your Foursquare ID
CLIENT_SECRET = 'PBKYG2N2JJQAMSAW0DBNWUDTGGENQTI2NJ5OOX4B04EEXMFB' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: MSNMMGSCRZIA0SASHUDGUISUSIUL2NM2UZNLLOD5VUGC3KA2
CLIENT_SECRET:PBKYG2N2JJQAMSAW0DBNWUDTGGENQTI2NJ5OOX4B04EEXMFB


In [7]:
LIMIT = 100 # limit of number of venues returned by Foursquare API
radius = 500 # define radius

url1 = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    latitude1, 
    longitude1, 
    radius, 
    LIMIT)
url1 

'https://api.foursquare.com/v2/venues/explore?&client_id=MSNMMGSCRZIA0SASHUDGUISUSIUL2NM2UZNLLOD5VUGC3KA2&client_secret=PBKYG2N2JJQAMSAW0DBNWUDTGGENQTI2NJ5OOX4B04EEXMFB&v=20180605&ll=35.2272,-80.843083&radius=500&limit=100'

In [8]:
results = requests.get(url1).json()
results

{'meta': {'code': 200, 'requestId': '5f0e2a93ca618c567bc5e447'},
 'response': {'suggestedFilters': {'header': 'Tap to show:',
   'filters': [{'name': 'Open now', 'key': 'openNow'},
    {'name': '$-$$$$', 'key': 'price'}]},
  'headerLocation': 'Fourth Ward',
  'headerFullLocation': 'Fourth Ward, Charlotte',
  'headerLocationGranularity': 'neighborhood',
  'totalResults': 119,
  'suggestedBounds': {'ne': {'lat': 35.231700004500006,
    'lng': -80.83758445541405},
   'sw': {'lat': 35.2226999955, 'lng': -80.84858154458594}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '4bad5829f964a52071483be3',
       'name': 'Blumenthal Performing Arts Center',
       'location': {'address': '130 N Tryon St',
        'crossStreet': 'at 5th St',
        'lat': 35.22792953956913,
        