# FOOD AND BEVERAGE BUSINESS OPPORTUNITY SEEKER

This is an independent project centered around data science application in food and beverage industry. 

## Table of Contents

<div class="alert alert-block alert-info" style="margin-top: 20px">

<font size = 4>

1. <a href="#item1">Introduction</a>
    
2. <a href="#item2">Data</a>

</font>
</div>

## 1. Introduction

There is a businessman lives in the capital city of Indonesia, Jakarta. He wants to start a food and Beverage (FnB) business in the city where he is living, but he doesn't know how to start. Without any experience in this kind of business, he comes to us and tells us about his plan. This new businessman has two critical questions here, they are:
1. What kind of food and beverage business that he should start with?
2. Where in the city should he start this business?

He hopes that we can help him to answer these questions to lead him make a right decision. As a Data Scientist, you are challenged to support him by giving him best recommendation in order to build a sucessful FnB business.

## 2. Data

In order to answer those two questions, as a data scientist, we need to have data related to the food and beverage market in Jakarta. But first, we need to have list of neighborhoods in Jakarta to find the list of existing foods venues near each neighborhood.

In Indonesia, we name neighborhood as "Kecamatan". We will use list of neighborhood in the city of Jakarta from wikipedia page (<a href="https://id.wikipedia.org/wiki/Daftar_kecamatan_dan_kelurahan_di_Daerah_Khusus_Ibukota_Jakarta">here</a>). The coordinate of the neighborhood will be used to find nearest food venues. Foursquare has public API that can provide us the data.  Foursquare has some account tiers for developers. Each tier has different set of available features. You can read the full documentation <a href="https://developer.foursquare.com/comparison">here</a> and <a href="https://developer.foursquare.com/docs">here</a>. In this project we are using personal tier. You can see the available endpoint for this tier <a href="https://developer.foursquare.com/docs/api/endpoints">here</a>. In this tier, we can get the data of venues near the specific location with specific category. We can use this endpoint to find existing restaurant venues in Jakarta. The data should contain:
+ longitude
+ latitude
+ venue category

We will do some analysis on this data to answer the these two questions. We can use the data in determining what kind of FnB business that we want to begin with and in which neighborhood we should start our new business.

## 3. Methodology

There are steps that we need to do to answer the business problem. Here they are:
1. Web scrapping Wikipedia Page to get list of neighborhood in Jakarta
2. Get longitude and latitude of each neighborhood
3. Get list of food venues near each neighborhood
4. Analyze these data using descriptive statistics
5. Cluster all these food venues using DBscan algorithm
6. Analyze each cluster using bar chart of top nearby venue types vs frequency
7. Visualize the clusters in map

In [5]:
import pandas as pd
import numpy as np
import requests
from geopy.geocoders import Nominatim
import time
from bs4 import BeautifulSoup
import folium

In [55]:
# URL of the wikipedia page
url = "https://id.wikipedia.org/wiki/Daftar_kecamatan_dan_kelurahan_di_Daerah_Khusus_Ibukota_Jakarta"
source = requests.get(url).text

# Create beautiful soup object
soup = BeautifulSoup(source, 'lxml')

# Find table that contains the neighborhoods data in the page
tables = soup.findAll("table", {"class": "wikitable sortable"})
jakarta_regions = ["Jakarta Pusat", "Jakarta Utara", "Jakarta Timur", "Jakarta Selatan", "Jakarta Barat"]

neighborhoods = pd.DataFrame({"Region":[], "Neighborhood":[]})
for i,table in enumerate(tables[1:-1]):
    body = table.find('tbody')
    region = jakarta_regions[i]
    for row in body.find_all('tr')[1:-1]:
        items = row.find_all('td')
        neighborhood_name = items[1].a.text
        new_row = {"Region":region, "Neighborhood":neighborhood_name}
        neighborhoods = neighborhoods.append(new_row, ignore_index=True)


In [56]:
neighborhoods

Unnamed: 0,Region,Neighborhood
0,Jakarta Pusat,Cempaka Putih
1,Jakarta Pusat,Cempaka Putih
2,Jakarta Pusat,Johar Baru
3,Jakarta Pusat,Kemayoran
4,Jakarta Pusat,Menteng
5,Jakarta Pusat,Sawah Besar
6,Jakarta Pusat,Senen
7,Jakarta Pusat,Tanah Abang
8,Jakarta Utara,Cilincing
9,Jakarta Utara,Kelapa Gading


In [57]:
def getLongLat(neighborhood_df):
    geolocator = Nominatim(user_agent="test")
    longlat_df = pd.DataFrame({"Neighborhood":[], "Longtiude":[], "Latitude":[]})
    for index, row in neighborhood_df.iterrows():
        neighborhood = row["Neighborhood"]
        region = row["Region"]
        location = geolocator.geocode(neighborhood, timeout=5)
        new_row = {
            "Neighborhood": neighborhood + "," + region,
            "Longitude": location.longitude,
            "Latitude": location.latitude,
        }
        longlat_df.append(new_row, ignore_index=True)
        time.sleep(1)
    neighborhood_df.set_index("Neighborhood", inplace=True)
    longlat_df.set_index("Neighborhood", inplace=True)
    result_df = pd.concat([neighborhood_df, longlat_df], axis=1)
    
    return result_df

In [58]:
neighborhood_df = getLongLat(neighborhoods)

In [59]:
neighborhood_df.head()

Unnamed: 0_level_0,Region,Longtiude,Latitude
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Cempaka Putih,Jakarta Pusat,,
Cempaka Putih,Jakarta Pusat,,
Johar Baru,Jakarta Pusat,,
Kemayoran,Jakarta Pusat,,
Menteng,Jakarta Pusat,,


In [61]:
neighborhood_df.to_csv(r"jakarta_neighborhoods.csv")