## Scenario

Food trucks have been growing since it was known to the public. The main aim of the food truck is to take the food to the customers. The business plan would be successful if the truck was aimed at the right target customers. Unlike traditional restaurant and fast food chains, the food is cooked at that moment and served fresh to the customer. The turnaround time for the customer's orders should be quick to satisfy customers. The following business problem section is based on this scenario.

### 1. Business Problem

A client of mine who is based in Houston, Texas, wants to start a food truck with the concept of selling middle eastern cuisine. It mainly concentrates on foods like shawarma, wraps, salads with hummus etc. These foods are pretty quick to cook and serve to the customers. The main goal of the client is to aim at the customers like students, business employees, constructions sites, public parks, or mostly commonly visited places by the public. The challenge is to find the top three neighborhoods among the most popular neighborhoods in Houston, Texas so that my client can kick-start their business and estimate the turn out profits in the next two to three years by concentrating on the top three neighborhoods.

This business problem mainly concentrates on finding the top three neighborhoods to start the food truck business.

### 2.Data

##### Sources: 

1. List of super neighborhoods in Houston, TX: https://en.wikipedia.org/wiki/List_of_Houston_neighborhoods
2. FourSquare data to find the popular venues

##### How will the data be used to answer the business needs?
The data mentioned above will be used to explore and target locations across different venues present in the neighborhoods. 

1. Use Foursquare and geopy data to map top 10 venues for the super neighborhoods of Houston and cluster them in groups
2. Wikipedia data to get the neighborhoods information
3. Additional data will be added from open data sources if available in the future if the data is insufficient

By extracting the venues of the neighborhoods we can determine the most visited venues which would determine that the customer count is high in that area. By using Foursquare data and the Houston's neighborhood data, we can recommend the top three neighborhoods by performing machine learning techniques and can visualize them through a graph or a map. 

In [4]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

#!conda install -c conda-forge geopy --yes # 
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors
import matplotlib.pyplot as plt

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # 
import folium # map rendering library

#!conda install -c conda-forge lxml --yes
#Lib for html handling
from lxml import html

print('Libraries imported.')

Libraries imported.


##### Storing the neighborhoods data into the dataframe

In [90]:
url = 'https://en.wikipedia.org/wiki/List_of_Houston_neighborhoods'

tables = pd.read_html(url, header=0)
df = tables[0]

#Data Preprocessing
df_Houston = df.drop(['#', 'Approximate boundaries'], axis = 1)
df_Houston = df_Houston.rename(columns = {"Name": "Neighborhood", "Location relative to Downtown Houston": "Borough"})
#df_test = df_Houston['Neighborhood'].str.replace("(", '/').str.split("/").str[0]
#df_test
df_Houston['Neighborhood'] = df_Houston['Neighborhood'].str.replace("(", '/').str.split("/").str[0]

#Manipulating data individually since the data is constant. Inefficient for continuous data.
df_Houston['Neighborhood'][45] = df_Houston['Neighborhood'][45].replace("-", '/').split("/")[0]
df_Houston['Neighborhood'][19] = df_Houston['Neighborhood'][19].replace("-", ' ')

df_Houston.head()

Unnamed: 0,Neighborhood,Borough
0,Willowbrook,Northwest
1,Greater Greenspoint,North
2,Carverdale,Northwest
3,Fairbanks,Northwest
4,Greater Inwood,Northwest


##### Using Geo Py to retrieve the longitude and latitude of Houston

In [82]:
address = 'Houston, TX'

geolocator = Nominatim(user_agent="foursquare_agent")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print("Geographical co-ordinates of Houston, TX are (lat):{} and (long): {}".format(latitude, longitude))

Geographical co-ordinates of Houston, TX are (lat):29.7589382 and (long): -95.3676974


Script to extract the latitude and longitudes of the neighborhood using geo locator 

In [84]:
#Create lists for lat and long
lat = []
lng = []

#Loop through all neigborhoods in Helsinki
for adr in df_Houston['Neighborhood']:
    #Use geolocator to get coordinates of neigborhoods
    loc = geolocator.geocode(adr)
    if loc == None:
        lat.append('NAN')
        lng.append('NAN')
    #Append coordinates to lists
    else:
        lat.append(loc.latitude)
        lng.append(loc.longitude)

#Map coordinate lists to data frame 
#print(lat, lng)
df_Houston['lat'] = lat
df_Houston['lng'] = lng

##### Following dataframe consists of neighborhoods with boroughs and their latitudes and longitudes that can be used for clustering the venues according the neighorhoods

In [89]:
df_Houston.head()

Unnamed: 0,Neighborhood,Borough,lat,lng
0,Willowbrook,Northwest,33.9188,-118.234
1,Greater Greenspoint,North,29.9527,-95.4053
2,Carverdale,Northwest,29.8487,-95.5395
3,Fairbanks,Northwest,64.8378,-147.717
4,Greater Inwood,Northwest,51.4626,-0.361756
