# Capstone Project - The Battle of Neighborhoods

This file, and other associated files, make up my contribution to the final Peer Reviewed Assignment for the Coursera Capstone Project for Applied Data Science Capstone. This was my final module in the IBM Data Science Professional Certificate programme.

For reference I include the original definition for each part of the assignment.

### Part 1 [Week 1]

Clearly define a problem or an idea of your choice, where you would need to leverage the Foursquare location data to solve or execute. Remember that data science problems always target an audience and are meant to help a group of stakeholders solve a problem, so make sure that you explicitly describe your audience and why they would care about your problem.

This submission will eventually become your Introduction / Business Problem section in your final report. So I recommend that you push the report (having your Introduction/Business Problem section only for now) to your Github repository and submit a link to it.

### Part 2 [Week 1]

Describe the data that you will be using to solve the problem or execute your idea. Remember that you will need to use the Foursquare location data to solve the problem or execute your idea. You can absolutely use other datasets in combination with the Foursquare location data. So make sure that you provide adequate explanation and discussion, with examples, of the data that you will be using, even if it is only Foursquare location data.

This submission will eventually become your Data section in your final report. So I recommend that you push the report (having your Data section) to your Github repository and submit a link to it.

### Part 3 [Week 2]

In this week, you will continue working on your capstone project. Please remember by the end of this week, you will need to submit the following:

A full report consisting of all of the following components (15 marks):
Introduction where you discuss the business problem and who would be interested in this project.
Data where you describe the data that will be used to solve the problem and the source of the data.
Methodology section which represents the main component of the report where you discuss and describe any exploratory data analysis that you did, any inferential statistical testing that you performed, and what machine learnings were used and why.
Results section where you discuss the results.
Discussion section where you discuss any observations you noted and any recommendations you can make based on the results.
Conclusion section where you conclude the report.
A link to your Notebook on your Github repository pushed showing your code. (15 marks).
Your choice of a presentation or blogpost. (10 marks)

### Section 1: Introduction

In this section I will clearly define the idea of my choosing, where I leverage the Foursquare location data to solve the imagined business opportunity.

### Background

There are 100's, maybe even 1000's, of travel sites on the Internet, including FourSquare, that will tell you all about places to go, things to see, restaurants to eat at, bars to drink in, nightclubs to part the night away in and then where to go in the morning to get breakfast and a strong coffee. The problems with these sites is that they are one dimensional. If you want to find out all this information about a city you plan to visit next month, you have to do the hard work. Also, just because a venue is the hottest place to go for a night out does not always mean that the unwitting tourist should just ramble in unprepared. The areas surrounding this new venue might be riddled with crime including muggings, car theft and assault, for example. Approach the venue from any direction other than from the north and you could be putting your life in danger. This is when my idea comes in.

Imagine the following scenario:

- You like to plan ahead and always review your options and make your choices about where you will visit and eat up front before you travel.
- You are flying to Chicogo for a Data Science Conference.
- You arrive in Chicago the day the conference starts but you've managed to convince your boss to delay your return by a few days giving you time to explore.
- But you know no one in Chicago to show you around to all the top sites and to bring you to the best restaurants.
- Also the last time you went to a conference you were mugged and had you passport. money and credit cards stolen so you're now nervous of going somewhere without first researching the venue and the surrounding area.
- The conference is next week and you don't have time to do all the research you'd like.

What do you do ... ?

## Project

### Indroduction

#### Description & Disscusing of the Background

My Capstone Project will be using the city of New York to show that when driven by venue and location data from FourSquare, it is possible to present potentail house renters and buying with a list of attractive areas to rent or buy from. Using graphics to show locations availability of attractive social amenities like school, resturants e.t.c.

New York City's demographic show that it is a large and ethnically diverse metropolis, It is the larges city in the United States and the 10th largest city in the world by population. New York city has a long history of international immigration with neary 8.5 million people in 2014, accounting for over 40% of the population of New York State. Over the last decade the city has been growing faster than the region and one of the many reason for this is that New York City has been a major point of entry for immigration: the team "melting pot" was coined to describe densely pupolated immigration neighborhoods on the Lower East Side.

With the influx of immigrants, comes the increase in the need to rent or buy property in the City. 

This project will focus on listing and visualizing all major part of New York City that has some essential socail amenities like school, hospital taken into consideration by families when moving into a new Neighborhoods

### Data Description

In order to acheive the this project, we will be using the below data source

- New York City data that contains list Boroughs, Neighborhoods along with their latitude and longitude. Data source : https://cocl.us/new_york_dataset Description : This data set contains the required information. And we will use this data set to explore various neighborhoods of new york city. 
- Social amenities in each neighborhood of new york city. Data source : Fousquare API Description : By using this api we will get all the venues in each neighborhood. 
- We can filter these venues to get the type and number of social amenities like schools and hospital in each Boroughs. 
- GeoSpace data Data source : https://data.cityofnewyork.us/City-Government/Borough-Boundaries/tqmj-j8zm Description : By using this geo space data we will get the New york Borough boundaries that will help us visualize choropleth map.

### Approach

- Use an open source New York City dataset from https://cocl.us/new_york_dataset 
- User FourSqure API to explore the boroughs and segment them.
- Using FourSquare api to query venue for each neighborhood.
- Filter out FourSquare api result by social amenities like Hospital and school
- Rate each neighborhood based of the number of amenities
- Use python folium library to visualize geographic details of each neighborhood based on it's rating

### Analysis

#### Required Libraries

- pandas and numpy for handling data.
- request module for using FourSquare API.
- geopy to get co-ordinates of City of New York.
- Python folium to visualize the results on a map

In [None]:
!conda install -c conda-forge folium=0.5.0 --yes
print('folium intalled')
!conda install -c conda-forge geocoder
print('geocoder intalled')

import pandas as pd
import numpy as np
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
import requests
import sys
from bs4 import BeautifulSoup
import geocoder
import os
import folium # map rendering library
from geopy.geocoders import Nominatim 
import matplotlib.pyplot as plt
import matplotlib.cm as cm
import matplotlib.colors as colors
%matplotlib inline


print('Libraries imported.')

In [None]:
def geo_location(address):
    # get geo location of address
    geolocator = Nominatim(user_agent="ny_explorer")
    location = geolocator.geocode(address)
    latitude = location.latitude
    longitude = location.longitude
    return latitude,longitude