## Introduction/Business Problem

With the new skills I have developed as a data scientist, I want to apply to for a new role in an educational community to begin my career in data science. I picture myself working for a college/university. and I really want to work in my hometown of Boston. My problem is which college or university in the Greater Boston do I want to work for? To assist in my selection, I am going to leverage the power of public datasets and location data to select a school which I think will best fit my new career as a data scientist. With over 50 colleges/universities in the city of Boston, the criteria I am going to use to select a school to apply to work for are

-School located in the Greater Boston Area  
-School must have over 5000 students. As a data scientist, I am going to want to intake as much data as possible and believe that 5000 is a solid minimum to retrieve valuable and diverse data.  
-School must have access to public transportation in the immediate radius (500M)  
-Sports Complexes/Arenas must be located on central campus. Sports is one of the industries with high demands for data scientists and want to be within walking distance to all sports events on campus.  


The target audience for this project would be for any prospective student or employee interested in applying to and learning more about the population size and surrounding neighborhood for schools located in Boston with over 5000 students. 

## Data

For this project I will be leveraging public datasets from Analyze Boston (https://data.boston.gov/) to retrieve the names, coordinates and number of students enrolled in all college/universities.  There are over 159 datasets available and I will be utilizing the College and Universities CSV. With the information retrieved from the College and Universities Dataset, I will create pandas dataframe to visualize and compare the student bodies of each university. With the location coordinates, I can visualize and mark the each college on a map of Boston using the Folium library. I will also leverage the location coordinates to analyze the venues within a 500M radius of each university to get a sense of each area. 

### 1.) Import Neccessary Libraries NumPy, Pandas, JSON, GeoPY and Folium

In [3]:
import numpy as np 

import pandas as pd 
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json

!pip install geopy 
from geopy.geocoders import Nominatim 

import requests
from pandas.io.json import json_normalize 


import folium # map rendering library




### 2.) Retrieve Boston College and University CSV and create Dataframe listing each College located in Boston, the location coordinates and number of students

In [8]:
df = pd.read_csv('BostonCU.csv')
d1 = df[['Name','City','Latitude','Longitude', "NumStudents13"]]
d1.drop(d1.index[[57,58,59]])

Unnamed: 0,Name,City,Latitude,Longitude,NumStudents13
0,Massachusetts General Hospital Dietetic Intern...,West End,42.362591,-71.070141,20
1,Suffolk University,Boston,42.358905,-71.061948,8675
2,Benjamin Franklin Institute of Technology,South End,42.346103,-71.070186,482
3,Bunker Hill Community College,Charlestown,42.375117,-71.069572,14023
4,MGH Institute of Health Professions,Charlestown,42.374917,-71.053972,1096
5,Emmanuel College,Fenway/Kenmore,42.341516,-71.103478,2436
6,School of the Museum of Fine Arts-Boston,Fenway/Kenmore,42.338538,-71.096694,651
7,Simmons,Fenway/Kenmore,42.339187,-71.09994,4900
8,Boston University,Fenway/Kenmore,42.34956,-71.099709,32411
9,The Boston Conservatory,Fenway/Kenmore,42.346058,-71.090011,774


### 3.) Create new dataframe for only schools with more than 5000 students

In [9]:
d2 = d1.sort_values(by = 'NumStudents13', ascending = False)
topschools = d2[d2['NumStudents13'] >= 5000 ]
topschools.reset_index(drop=True)


Unnamed: 0,Name,City,Latitude,Longitude,NumStudents13
0,Boston University,Fenway/Kenmore,42.34956,-71.099709,32411
1,University of Massachusetts-Boston,North Dorchester,42.313809,-71.039202,16277
2,Boston College,Chestnut Hill,42.333833,-71.169719,14309
3,Bunker Hill Community College,Charlestown,42.375117,-71.069572,14023
4,Suffolk University,Boston,42.358905,-71.061948,8675
5,Northeastern University,Fenway/Kenmore,42.340048,-71.088892,8479
6,MCPHS University,Fenway/Kenmore,42.33688,-71.10112,6548


### 4.) Create a Visual Map of Boston with the Locations marked for each college. 

In [10]:
address =   'Boston, MA'

geolocator = Nominatim(user_agent="b_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print(latitude, longitude)

42.3602534 -71.0582912


In [11]:
map_boston = folium.Map(location=[latitude, longitude], zoom_start=12)


for lat, lng, name in zip(topschools['Latitude'],topschools['Longitude'],topschools['Name'],):
    label = '{}'.format(name)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=10,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=1,
        parse_html=False).add_to(map_boston)  
    
map_boston

### 5.) Leverage Foursquare API to retrive Top 10 Most common Venues located within a 500 meter radius of each College/University. This includes creating a function that will retrieve venues, type of venues (category) and their frequency in the 500 meter radius of each College/University and creating a dataframe that displays the results.

In [13]:
CLIENT_ID = 'YVRH3NSKVX3LENJPPXXZCZDSQ3BILN2JZP2W5MFLFPGN50R2' # your Foursquare ID
CLIENT_SECRET = '3AMGH3ME3UUH5ZSDGLTHKBCUP1TIC41AXPKKLQKL2XKKK3N4' # your Foursquare Secret
VERSION = '20200601'
LIMIT = 30
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)

print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: YVRH3NSKVX3LENJPPXXZCZDSQ3BILN2JZP2W5MFLFPGN50R2
CLIENT_SECRET:3AMGH3ME3UUH5ZSDGLTHKBCUP1TIC41AXPKKLQKL2XKKK3N4


In [14]:
radius = 250
LIMIT = 100 
url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, radius, LIMIT)
url

'https://api.foursquare.com/v2/venues/search?client_id=YVRH3NSKVX3LENJPPXXZCZDSQ3BILN2JZP2W5MFLFPGN50R2&client_secret=3AMGH3ME3UUH5ZSDGLTHKBCUP1TIC41AXPKKLQKL2XKKK3N4&ll=42.3602534,-71.0582912&v=20200601&radius=250&limit=100'

In [15]:
results = requests.get(url).json()


In [16]:
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [17]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['School', 
                  'School Latitude', 
                  'School Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [18]:
Bostonvenues = getNearbyVenues(names=topschools['Name'],
                                   latitudes=topschools['Latitude'],
                                   longitudes=topschools['Longitude']
                                  )
 

Boston University
University of Massachusetts-Boston
Boston College
Bunker Hill Community College
Suffolk University
Northeastern University
MCPHS University


In [19]:
b_onehot = pd.get_dummies(Bostonvenues[['Venue Category']], prefix="", prefix_sep="")

b_onehot['Name'] = Bostonvenues['School']


fixed_columns = [b_onehot.columns[-1]] + list(b_onehot.columns[:-1])
b_onehot =b_onehot[fixed_columns]

bgrouped = b_onehot.groupby('Name').mean().reset_index()
bgrouped


Unnamed: 0,Name,American Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Baseball Stadium,Beer Garden,Belgian Restaurant,Bookstore,Bowling Alley,Brewery,Bubble Tea Shop,Burger Joint,Burrito Place,Bus Station,Bus Stop,Café,Caribbean Restaurant,Chinese Restaurant,Circus,Clothing Store,Club House,Cocktail Bar,Coffee Shop,College Theater,Concert Hall,Convenience Store,Cosmetics Shop,Deli / Bodega,Department Store,Dessert Shop,Donut Shop,Ethiopian Restaurant,Falafel Restaurant,Fast Food Restaurant,Food Court,Food Truck,French Restaurant,Gastropub,Gift Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Historic Site,History Museum,Hockey Arena,Hot Dog Joint,Hotel,Hotel Bar,Ice Cream Shop,Indian Restaurant,Irish Pub,Italian Restaurant,Japanese Restaurant,Jewelry Store,Juice Bar,Korean Restaurant,Lake,Library,Light Rail Station,Liquor Store,Lounge,Market,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Monument / Landmark,Museum,Music Venue,Neighborhood,New American Restaurant,Optical Shop,Other Repair Shop,Park,Pedestrian Plaza,Performing Arts Venue,Pet Store,Pharmacy,Pizza Place,Playground,Plaza,Pub,Record Shop,Restaurant,Roof Deck,Salad Place,Sandwich Place,Sculpture Garden,Seafood Restaurant,Shipping Store,Shoe Store,Shopping Mall,Smoke Shop,Snack Place,Soccer Field,Souvenir Shop,Sports Bar,Steakhouse,Street Food Gathering,Sushi Restaurant,Tea Room,Thai Restaurant,Theater,Theme Restaurant,Tour Provider,Tourist Information Center,Trail,Train Station,Video Game Store,Wine Shop,Yoga Studio
0,Boston College,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.142857,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Boston University,0.060241,0.024096,0.0,0.0,0.0,0.012048,0.0,0.012048,0.012048,0.012048,0.024096,0.012048,0.0,0.0,0.0,0.012048,0.024096,0.0,0.012048,0.012048,0.0,0.0,0.024096,0.0,0.0,0.0,0.0,0.0,0.012048,0.048193,0.0,0.0,0.0,0.012048,0.0,0.012048,0.012048,0.024096,0.0,0.0,0.012048,0.012048,0.012048,0.0,0.0,0.0,0.0,0.012048,0.012048,0.024096,0.0,0.0,0.0,0.024096,0.036145,0.0,0.0,0.012048,0.012048,0.012048,0.012048,0.0,0.012048,0.012048,0.0,0.0,0.0,0.0,0.060241,0.0,0.0,0.012048,0.0,0.0,0.0,0.012048,0.012048,0.0,0.0,0.0,0.0,0.012048,0.0,0.0,0.0,0.024096,0.0,0.0,0.024096,0.012048,0.012048,0.012048,0.0,0.012048,0.0,0.012048,0.012048,0.0,0.0,0.012048,0.0,0.0,0.012048,0.060241,0.0,0.0,0.024096,0.0,0.012048,0.012048,0.012048,0.012048,0.0,0.012048,0.012048,0.0,0.0,0.0
2,Bunker Hill Community College,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.117647,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.058824,0.058824,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.058824
3,MCPHS University,0.052632,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.105263,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.105263,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.105263,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.105263,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Northeastern University,0.02439,0.02439,0.02439,0.04878,0.02439,0.0,0.0,0.02439,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02439,0.0,0.02439,0.0,0.0,0.073171,0.02439,0.02439,0.0,0.0,0.0,0.0,0.02439,0.0,0.04878,0.0,0.0,0.0,0.0,0.0,0.02439,0.02439,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04878,0.02439,0.0,0.0,0.0,0.02439,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04878,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02439,0.0,0.0,0.04878,0.0,0.02439,0.0,0.0,0.02439,0.0,0.0,0.121951,0.0,0.0,0.0,0.0,0.0,0.0,0.02439,0.02439,0.0,0.0,0.0,0.0,0.02439,0.0,0.02439,0.04878,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,Suffolk University,0.02,0.0,0.0,0.0,0.01,0.01,0.01,0.02,0.0,0.01,0.0,0.0,0.01,0.01,0.01,0.0,0.0,0.0,0.02,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.02,0.01,0.02,0.07,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.01,0.0,0.01,0.01,0.01,0.0,0.01,0.06,0.02,0.0,0.0,0.03,0.01,0.0,0.0,0.01,0.02,0.0,0.01,0.01,0.01,0.01,0.01,0.0,0.0,0.0,0.02,0.03,0.01,0.0,0.02,0.0,0.0,0.0,0.04,0.01,0.01,0.01,0.0,0.0,0.0,0.0,0.02,0.01,0.01,0.01,0.0,0.03,0.0,0.02,0.02,0.01,0.04,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.01,0.02,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.02,0.01
6,University of Massachusetts-Boston,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [20]:
num_top_venues = 10

for school in bgrouped['Name']:
    print("----"+school+"----")
    temp = bgrouped[bgrouped['Name'] == school].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Boston College----
                   venue  freq
0         Baseball Field  0.14
1      Convenience Store  0.14
2            Bus Station  0.14
3        College Theater  0.14
4               Bus Stop  0.14
5                   Café  0.14
6           Hockey Arena  0.14
7    American Restaurant  0.00
8       Pedestrian Plaza  0.00
9  Performing Arts Venue  0.00


----Boston University----
                  venue  freq
0   American Restaurant  0.06
1                Lounge  0.06
2            Sports Bar  0.06
3           Coffee Shop  0.05
4                 Hotel  0.04
5               Brewery  0.02
6  Gym / Fitness Center  0.02
7            Donut Shop  0.02
8         Hot Dog Joint  0.02
9           Art Gallery  0.02


----Bunker Hill Community College----
                 venue  freq
0          Coffee Shop  0.12
1  American Restaurant  0.06
2      Thai Restaurant  0.06
3        Shopping Mall  0.06
4          Pizza Place  0.06
5             Pharmacy  0.06
6            Pet Store  0.06
7     

In [21]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

num_top_venues = 10

indicators = ['st', 'nd', 'rd']


columns = ['School']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Cool Close Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Cool Close Venue'.format(ind+1))

# create a new dataframe
school_venues_sorted = pd.DataFrame(columns=columns)
school_venues_sorted['School'] = bgrouped['Name']

for ind in np.arange(bgrouped.shape[0]):
    school_venues_sorted.iloc[ind, 1:] = return_most_common_venues(bgrouped.iloc[ind, :], num_top_venues)

school_venues_sorted

Unnamed: 0,School,1st Cool Close Venue,2nd Cool Close Venue,3rd Cool Close Venue,4th Cool Close Venue,5th Cool Close Venue,6th Cool Close Venue,7th Cool Close Venue,8th Cool Close Venue,9th Cool Close Venue,10th Cool Close Venue
0,Boston College,College Theater,Café,Convenience Store,Bus Station,Hockey Arena,Bus Stop,Baseball Field,Fast Food Restaurant,Falafel Restaurant,Donut Shop
1,Boston University,American Restaurant,Lounge,Sports Bar,Coffee Shop,Hotel,Sushi Restaurant,Donut Shop,Gym / Fitness Center,Hot Dog Joint,Pub
2,Bunker Hill Community College,Coffee Shop,Yoga Studio,Shopping Mall,Bank,Convenience Store,Donut Shop,Gastropub,Grocery Store,Light Rail Station,Liquor Store
3,MCPHS University,Gym,Coffee Shop,Sandwich Place,Sushi Restaurant,American Restaurant,Bus Station,Italian Restaurant,Gastropub,Falafel Restaurant,Pizza Place
4,Northeastern University,Sandwich Place,Café,Middle Eastern Restaurant,Concert Hall,Theater,Pizza Place,Arts & Crafts Store,Grocery Store,Donut Shop,Ethiopian Restaurant
5,Suffolk University,Coffee Shop,Historic Site,New American Restaurant,Seafood Restaurant,Restaurant,Mediterranean Restaurant,Hotel,Falafel Restaurant,Sandwich Place,Salad Place
6,University of Massachusetts-Boston,Museum,Donut Shop,Fast Food Restaurant,Coffee Shop,Yoga Studio,Food Truck,Concert Hall,Convenience Store,Cosmetics Shop,Deli / Bodega
