# Finding your next apartment in another city - Coursera Capstone for IBM Applied Data Science Capstone
Nam Le

June 12, 2020

### 1. Introduction Section :

##### 1.1 Background
I am currently residing in Dupont Circle, in Washington, DC. I enjoy venues within the neighborhood, specifically gym/fitness centers, restaurants, and local parks. Recently, I have been offered a job in Manhattan, NY. I am somewhat concerned if I take the job, I will need to find a neighborhood that is comparable to my current residence. What better way to end this coursera journey than to use the learned skills to explore specific neighborhoods for my future that will not cause a huge culture shock.

New York City is the center of financial markets, business, and global culture in the US. Within each neighborhood, I can find different types of venues and enjoy different aspects of living in the city. Dupont Circle, my current residence, is the city center of Washington, DC. DC is also a diverse city with many aspects of living in a city. I know it will be a big change to my current internal environment; however, I want to use what I've learned throughout the certification track to make this a seamless process.

##### 1.2 Problem:
The challenge is creating a process that will help search for an apartment unit in NYC that provides similar characteristics and amenities to my current scenario. Therefore, i would like to search out an apartment to the subsequent conditions:

-	A two to three-bedroom apartment with monthly rent to not exceed $6000/month

-	Similar amenities and venues of current location (i.e. gym, food, parks, etc)

-   Close proximity to my future office in Chelsea

Thus, the high-level business probem here is: Given unit specifications, nearby venues, rent costs, and proximity to the office, can a clustering and segmentation model recommend an apartment unit that satisfies my expectations?
    

##### 1.3 Targeted Audience:
Overall, I believe that this study can provide a base model for those looking to move to other metropolitan areas without a complete culture shock. Specifically, moving to NYC from somewhere far away can provide many complications regarding housing. From this study, we can utilize clustering and segmentation, coinciding with exploratory data analysis and visualization, to make a recommendation on specific unit and neighborhood. 


### Data
##### To solve the problem, we will need the following data:
- Information on neighborhoods from Manhattan with their Geodata (Latitude and Longitude) https://cocl.us/new_york_dataset
- Listed residences for rent in Manhattan with descriptions (bedrooms, price, location, address)
    - http://www.rentmanhattan.com/index.cfm?page=search&state=results 
    - https://www.nestpick.com/search?city=new-york&page=1&order=relevance&district=manhattan&gclid=CjwKCAiAjNjgBRAgEiwAGLlf2hkP3A-cPxjZYkURqQEswQK2jKQEpv_MvKcrIhRWRzNkc_r-fGi0lxoCA7cQAvD_BwE&type=apartment&display=list 
    - https://www.realtor.com/apartments/Manhattan_NY
- Venues and amenities within Manhattan 

##### Sources of data and methods to extract them:

The list of Manhattan neighborhoods is discovered throughout the NYC clustering and neighboring exercise throughout the course, using the 2014 New York City Neighborhood Names data provided by NYU. The residence information is extracted into csv files and uploaded into github from these various sources. The venues and amenities data will be provided throught the Foursquare API.

### Methodology
The first step of the data analysis and manipulation is fetching the list of neighborhoods in Washington, DC. Fortunately, I leveraged the Foursquare API and the folium package to display a map of Dupont Circle as well as nearby venues (top 10).
![image.png](attachment:image.png)

![image.png](attachment:image.png)

For Manhattan data, I performed webscraping techniques to extract data from the links provided for the apartment units, using Python requests, the BeautifulSoup package, and Python Geocoder package for location data (latitude and longitude coordinates). From there, I used the pandas Dataframe and visualized the neighborhoods using the Folium package 

![image.png](attachment:image.png)

![image.png](attachment:image.png)

Next, we perform clustering and segmentation analysis by using k-means clustering. On a high-level, K-means clustering is an algorithm that identifies the number of centroids, k, and allocates all data points to to the closest cluster (euclidian distance), while minimizing centroid size. It is an unsupervised machine learning algorithm that is suited for this study. We will cluster neighborhoods into 5 separate clusters based on their frequency of occurence for "rent price." From there, I will be able to narrow down specific units close to different clusters. Ultimately, this will help me make a decision on location and unit based on given specifications.

### Results
The results from k-means clustering show our categorization of neighborhoods into 5 clusters based on rent price:

Cluster 0
![image.png](attachment:image.png)

Cluster 1

![image.png](attachment:image.png)

Cluster 2

![image.png](attachment:image.png)

![image.png](attachment:image.png)

![image.png](attachment:image.png)

Cluster 3

![image.png](attachment:image.png)

Cluster 4

![image.png](attachment:image.png)

The result of these clusters are visualized in the map below with the following cluster mapping:

- Cluster 0 is orange
- Cluster 1 is purple
- Cluster 2 is blue
- Cluster 3 is green
- Cluster 4 is yellow (not seen, all the way at the northern tip of manhattan)


![image.png](attachment:image.png)

##### Apartment Selection
The last step of the analysis is to narrow down to specific apartments. After looking at the prices and locations relatively matching with my demands, I narrowed it down to two of the following apartments:

Apartment 1: West 23rd St, Flatiron. The rent cost is $4250, which is perfectly in the range that I had specified. My future office is located in Chelsea which is in walking distance from this location. Venues for this apt are as of Cluster 2. The top venues in this neighborhood that coincide with my interests are gyms, medditeranean restaurants, italian food, and parks.  

![image.png](attachment:image.png)

Apartment 2: 52 Spring St,Little Italy. The rent cost is $6000, which is at the cap of my specified range. This apartment is not located close to the future office, would have to take a subway or taxi.Venues for this apt are as of Cluster 3. The top venues in this neighborhood that coincide with my interests are food places (i.e. italian restaurants, bubble tea, etc), which is great regarding convenience to location and my interests in cuisine. 

![image.png](attachment:image.png)

Based on my current, I feel that Cluster 2 type of venues is a closer resemblance to my current environment in Dupont Circle. I strongly weigh proximity to parks, gyms, and food in my selection, thus apartment 1 located in Flatiron District is a better choice. 

### Discussion
As observed from the results section above, I was able to narrow down to the two clusters that based on the k-means clustering of rent prices. My evaluation of the two apartments were based on the given information from the nearby venues data, location in Manhattan, and rent price. This is where the data scientist judgment comes in, where the data itself has given sufficient mapping and results based on our required criteria. However, the last piece is to use personal judgement on weighing these items in consideration. Thus, similar to my current residence in DC, West 23rd St in Flatiron district along with its accomodations and amenities is the recommended unit/neighborhood for my transition. Thus, as a newly aspiring data scientist, instead of going through open source tools, I wanted to see if I could create my own system to help me out, and I did.  

### Conclusion
In this study, I analyzed the relationships between Manhattan neighborhoods, venues around the neighborhoods, and rent costs to create a recommendation for a future living arrangement that is similar to current residence in a different city. Throughout the study, I leveraged web data, python's built-in visualization tools, and the Foursquare API tools. As simple as the variables were used in the project, it provides a very straightforward method to creating a recommendation. From my approach, I believe the unit at W 23rd St in the Flatiron District is most suitable for me.

That being said; however, this project only utilized few variables for analysis, which resulted in a simplified algorithm and approach. For future iterations of studies like this, one can add additional data variables i.e. distance to subway stations, population density, and others to create a more powerful recommender system using Python and Foursquare API.

In general, following the content and labs during the Capstone Course has been extremely useful in learning the field of Data Science. I have created a process that could be replicated for anyone who is looking to move from one metropolitan city to another.