# Final project
# Where should we open a new Café in Gold Coast, Queensland, Australia?

## 1. Import library needed

In [78]:
import pandas as pd                                        # library to process data as dataframes
!conda install -c conda-forge folium=0.5.0 --yes           # uncomment this line if 'folium' library is not recognized
import folium                                              # map rendering library
# !conda install -c conda-forge geopy --yes                # uncomment this line if 'geopy' library is not recognized
from geopy.geocoders import Nominatim                      # convert an address into latitude and longitude values
import numpy as np                                         # library for vectorized computation
import matplotlib.pyplot as plt                            # plotting library
import matplotlib.cm as cm
import matplotlib.colors as colors
%matplotlib inline
import requests
from sklearn.cluster import KMeans 
from sklearn.datasets.samples_generator import make_blobs
print('Libraries imported.')

Solving environment: done

# All requested packages already installed.

Libraries imported.


## 2. Read the data
Information about suburbs in Goldcoast is available at Wikipedia  
Use the **read_html** function in the **pandas** library and then select the returned component needed.  
Keep only columns needed for this assignment and rename the column name

In [82]:
url='https://en.wikipedia.org/wiki/List_of_Gold_Coast_suburbs#Suburbs'
df = pd.read_html(url, header=1)[1]                    # Get the second element of the returned data and use the first line as header
df = df.loc[:,['Name', 'Postcode', '2016 census']]     # Only get 3 columns: the name of the suburb, its postcode and population in 2016
df.columns = ['Suburb', 'PostalCode', 'Population']    # Rename the columns
print(df.shape)                                        # Check if there are 81 suburbs in Gold Coast, Queensland, Australia as compared to Wikipedia Website
df.head(10)                                            # Print the first 10 suburbs

(81, 3)


Unnamed: 0,Suburb,PostalCode,Population
0,Advancetown,4211,482
1,Alberton,4207,590
2,Arundel,4214,10246
3,Ashmore,4214,11910
4,Austinville,4213,356
5,Benowa,4217,8741
6,Biggera Waters,4216,8534
7,Bilinga,4225,1804
8,Bonogin,4213,4573
9,Broadbeach,4218,5514


## 3. Clean the data  
Remove the redundant information (i.e. [a]) from PostalCode   
Remove suburbs with no PostalCode

In [83]:
df.replace('4220[a]', '4220', inplace=True)
df = df[df.PostalCode != '0']
print(df.shape)
df.head(10)

(80, 3)


Unnamed: 0,Suburb,PostalCode,Population
0,Advancetown,4211,482
1,Alberton,4207,590
2,Arundel,4214,10246
3,Ashmore,4214,11910
4,Austinville,4213,356
5,Benowa,4217,8741
6,Biggera Waters,4216,8534
7,Bilinga,4225,1804
8,Bonogin,4213,4573
9,Broadbeach,4218,5514


## 4. Get the latitude and the longitude coordinates of each suburb  
Use **Nominatim** function from **geopy.geocoders** library to get the coordinates  
Add the retrieved coordinates to the dataframe

In [84]:
df['Latitude'] = ''
df['Longitude'] = ''
for index, data in df.iterrows():
    address = data['Suburb'] + ', Queensland, Australia'
    geolocator = Nominatim(user_agent="gc_explorer")
    location = geolocator.geocode(address)
    df.at[index, 'Latitude'] = location.latitude
    df.at[index, 'Longitude'] = location.longitude
    print('The geograpical coordinate: {}, {}, {}.'.format(address, location.latitude, location.longitude))
print(df.shape)
df.head(10)

The geograpical coordinate: Advancetown, Queensland, Australia, -28.0255182, 153.2828917.
The geograpical coordinate: Alberton, Queensland, Australia, -27.7002085, 153.2754081.
The geograpical coordinate: Arundel, Queensland, Australia, -27.9360655, 153.3653514.
The geograpical coordinate: Ashmore, Queensland, Australia, -27.9909266, 153.3770522.
The geograpical coordinate: Austinville, Queensland, Australia, -28.1331291, 153.3158014.
The geograpical coordinate: Benowa, Queensland, Australia, -28.0041744, 153.3834595.
The geograpical coordinate: Biggera Waters, Queensland, Australia, -27.9271595, 153.3983923.
The geograpical coordinate: Bilinga, Queensland, Australia, -28.1585408, 153.505324.
The geograpical coordinate: Bonogin, Queensland, Australia, -28.1277689, 153.3604583.
The geograpical coordinate: Broadbeach, Queensland, Australia, -28.0235886, 153.4293335.
The geograpical coordinate: Broadbeach Waters, Queensland, Australia, -28.0316652, 153.4122307.
The geograpical coordinate:

Unnamed: 0,Suburb,PostalCode,Population,Latitude,Longitude
0,Advancetown,4211,482,-28.0255,153.283
1,Alberton,4207,590,-27.7002,153.275
2,Arundel,4214,10246,-27.9361,153.365
3,Ashmore,4214,11910,-27.9909,153.377
4,Austinville,4213,356,-28.1331,153.316
5,Benowa,4217,8741,-28.0042,153.383
6,Biggera Waters,4216,8534,-27.9272,153.398
7,Bilinga,4225,1804,-28.1585,153.505
8,Bonogin,4213,4573,-28.1278,153.36
9,Broadbeach,4218,5514,-28.0236,153.429


## 5. Map the suburbs  
Apart from the coordinates retrieved above, we need the background map of Goldcoast (i.e. shapefile or GeoJSON file)  
This can be extracted from Australian Goverment website at https://data.gov.au/dataset/ds-dga-6bedcb55-1b1f-457b-b092-58e88952e9f0/distribution/dist-dga-d20d0a54-7680-43c4-8c46-a08e3bc43fa0/details?q=queensland%20suburb

In [87]:
!wget --quiet https://www.trim.vn/GoldCoast_QLD.json -O gold_coast.json            # Download GeoJSON file
gold_coast_geo = r'gold_coast.json'                                                # Read geojson file

### Get the coordinates of Goldcoast, Australia

In [88]:
address = 'Gold Coast, Queensland, Australia'
geolocator = Nominatim(user_agent="gc_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Gold Coast are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Gold Coast are -28.0023731, 153.4145987.


### Map the suburbs  
The map includes the background indicating the border of Goldcoast area and markers indicating suburbs' name and postal code

In [89]:
df['Suburb'] = df['Suburb'].str.upper()
map_goldcoast = folium.Map(location=[latitude, longitude], zoom_start=10)

# Add background to map
map_goldcoast.choropleth(
    geo_data=gold_coast_geo,
    fill_color='Yellow',
    fill_opacity=0.4,
    line_opacity=0.2
)

# Add markers to map
for lat, lng, suburb, postalcode in zip(df['Latitude'], df['Longitude'], df['Suburb'], df['PostalCode']):    
    label = 'QLD{}, {}'.format(postalcode, suburb)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_goldcoast)  

map_goldcoast

## 6. Retreive information about the suburbs using Foursquare

### Declare credential to access Foursquare

In [90]:
# The code was removed by Watson Studio for sharing.

### Get data from Foursquare about the venue in each suburb

In [91]:
radius = 500
LIMIT = 500
venues = []
for lat, long, post, suburb in zip(df['Latitude'], df['Longitude'], df['PostalCode'], df['Suburb']):
    url = "https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}".format(
        CLIENT_ID,
        CLIENT_SECRET,
        VERSION,
        lat,
        long,
        radius, 
        LIMIT)
    
    results = requests.get(url).json()["response"]['groups'][0]['items']
    
    for venue in results:
        venues.append((
            post, 
            suburb,
            lat, 
            long, 
            venue['venue']['name'], 
            venue['venue']['location']['lat'], 
            venue['venue']['location']['lng'],  
            venue['venue']['categories'][0]['name']))

### Convert the list of venues into a dataframe

In [92]:
venues_df = pd.DataFrame(venues)
venues_df.columns = ['PostalCode', 'Suburb', 'SuburbLatitude', 'SuburbLongitude', 'VenueName', 'VenueLatitude', 'VenueLongitude', 'VenueCategory']
print(venues_df.shape)
venues_df.head(10)

(345, 8)


Unnamed: 0,PostalCode,Suburb,SuburbLatitude,SuburbLongitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
0,4211,ADVANCETOWN,-28.025518,153.282892,Advancetown Hotel,-28.026245,153.281276,Pub
1,4214,ARUNDEL,-27.936066,153.365351,Absolute Glass Fencing,-27.93582,153.365235,Pool
2,4214,ARUNDEL,-27.936066,153.365351,Camtech Engineering Pty Ltd,-27.935612,153.365235,Laser Tag
3,4214,ARUNDEL,-27.936066,153.365351,PranaOn,-27.936332,153.365321,Health Food Store
4,4214,ARUNDEL,-27.936066,153.365351,Oporto,-27.93768,153.36302,Fast Food Restaurant
5,4214,ASHMORE,-27.990927,153.377052,Bus Tour Rhymes,-27.990915,153.377112,Bus Station
6,4214,ASHMORE,-27.990927,153.377052,Elite Football Academy,-27.993511,153.374319,Sports Club
7,4214,ASHMORE,-27.990927,153.377052,Sushi Haru,-27.99056,153.381582,Sushi Restaurant
8,4214,ASHMORE,-27.990927,153.377052,Mualla Dr Bus Stop,-27.990766,153.381812,Bus Stop
9,4214,ASHMORE,-27.990927,153.377052,Fishermans Wharf,-27.992641,153.372394,Harbor / Marina


## 7. Explorer the venus for each suburb

In [93]:
venues_df.groupby(["Suburb"]).count()

Unnamed: 0_level_0,PostalCode,SuburbLatitude,SuburbLongitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
Suburb,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
ADVANCETOWN,1,1,1,1,1,1,1
ARUNDEL,4,4,4,4,4,4,4
ASHMORE,5,5,5,5,5,5,5
BENOWA,3,3,3,3,3,3,3
BILINGA,5,5,5,5,5,5,5
BONOGIN,1,1,1,1,1,1,1
BROADBEACH,20,20,20,20,20,20,20
BROADBEACH WATERS,3,3,3,3,3,3,3
BUNDALL,11,11,11,11,11,11,11
BURLEIGH HEADS,6,6,6,6,6,6,6


In [94]:
print('There are {} uniques categories.'.format(len(venues_df['VenueCategory'].unique())))

There are 121 uniques categories.


In [95]:
venues_df['VenueCategory'].unique()[:50]

array(['Pub', 'Pool', 'Laser Tag', 'Health Food Store',
       'Fast Food Restaurant', 'Bus Station', 'Sports Club',
       'Sushi Restaurant', 'Bus Stop', 'Harbor / Marina', 'Golf Course',
       'Home Service', 'National Park', 'Beer Garden', 'Airport Terminal',
       'Airport', 'Roof Deck', 'Japanese Restaurant', 'Café',
       'Italian Restaurant', 'Seafood Restaurant',
       'Residential Building (Apartment / Condo)', 'Thai Restaurant',
       'Korean Restaurant', 'Hotel', 'Bowling Green', 'Restaurant',
       'Resort', 'Noodle House', 'Liquor Store', 'Gym',
       'Portuguese Restaurant', 'Park', 'Shopping Mall',
       'Department Store', 'Furniture / Home Store', 'Grocery Store',
       'Boutique', 'Pet Store', 'Paper / Office Supplies Store', 'Office',
       'Arts & Crafts Store', 'Mexican Restaurant', 'Electronics Store',
       'Construction & Landscaping', 'Dog Run', 'Business Service',
       'Beach', 'Pizza Place', 'Ice Cream Shop'], dtype=object)

### Manipulate data for each area

In [96]:
# One hot encoding
goldcoast_onehot = pd.get_dummies(venues_df[['VenueCategory']], prefix="", prefix_sep="")

# Add postal code and suburb to dataframe
goldcoast_onehot['PostalCode'] = venues_df['PostalCode'] 
goldcoast_onehot['Suburb'] = venues_df['Suburb'] 

# Move postal, borough and neighborhood column to the first column
fixed_columns = list(goldcoast_onehot.columns[-2:]) + list(goldcoast_onehot.columns[:-2])
goldcoast_onehot = goldcoast_onehot[fixed_columns]

print(goldcoast_onehot.shape)
goldcoast_onehot.head(10)

(345, 123)


Unnamed: 0,PostalCode,Suburb,Airport,Airport Terminal,American Restaurant,Art Gallery,Arts & Crafts Store,Asian Restaurant,Australian Restaurant,Bakery,...,Supermarket,Sushi Restaurant,Tapas Restaurant,Thai Restaurant,Theater,Thrift / Vintage Store,Video Store,Whisky Bar,Yoga Studio,Zoo Exhibit
0,4211,ADVANCETOWN,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,4214,ARUNDEL,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,4214,ARUNDEL,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,4214,ARUNDEL,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,4214,ARUNDEL,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
5,4214,ASHMORE,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
6,4214,ASHMORE,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
7,4214,ASHMORE,0,0,0,0,0,0,0,0,...,0,1,0,0,0,0,0,0,0,0
8,4214,ASHMORE,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
9,4214,ASHMORE,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [97]:
# Group rows by suburb and by taking the mean of the frequency of occurrence of each category
goldcoast_grouped = goldcoast_onehot.groupby(["Suburb"]).mean().reset_index()
print(goldcoast_grouped.shape)
goldcoast_grouped.head(10)

(54, 122)


Unnamed: 0,Suburb,Airport,Airport Terminal,American Restaurant,Art Gallery,Arts & Crafts Store,Asian Restaurant,Australian Restaurant,Bakery,Bar,...,Supermarket,Sushi Restaurant,Tapas Restaurant,Thai Restaurant,Theater,Thrift / Vintage Store,Video Store,Whisky Bar,Yoga Studio,Zoo Exhibit
0,ADVANCETOWN,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,ARUNDEL,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,ASHMORE,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,BENOWA,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,BILINGA,0.2,0.6,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,BONOGIN,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,BROADBEACH,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.05,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0
7,BROADBEACH WATERS,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,BUNDALL,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,BURLEIGH HEADS,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


### Create the new dataframe and display the top 10 venues for each PostalCode

In [98]:
indicators = ['st', 'nd', 'rd']
num_top_venues = 10

# create columns according to number of top venues
areaColumns = ['PostalCode', 'Suburb']
freqColumns = []
for ind in np.arange(num_top_venues):
    try:
        freqColumns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        freqColumns.append('{}th Most Common Venue'.format(ind+1))
columns = areaColumns+freqColumns

# create a new dataframe
goldcoast_venues_sorted = pd.DataFrame(columns=columns)
goldcoast_venues_sorted['Suburb'] = goldcoast_grouped['Suburb']

for ind in np.arange(goldcoast_grouped.shape[0]):
    row_categories = goldcoast_grouped.iloc[ind, :].iloc[3:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    goldcoast_venues_sorted.iloc[ind, 2:] = row_categories_sorted.index.values[0:num_top_venues]

# neighborhoods_venues_sorted.sort_values(freqColumns, inplace=True)
print(goldcoast_venues_sorted.shape)
goldcoast_venues_sorted

(54, 12)


Unnamed: 0,PostalCode,Suburb,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,,ADVANCETOWN,Pub,Zoo Exhibit,Construction & Landscaping,Deli / Bodega,Department Store,Diner,Discount Store,Dog Run,Donut Shop,Electronics Store
1,,ARUNDEL,Health Food Store,Laser Tag,Fast Food Restaurant,Pool,Halal Restaurant,Gym,Convenience Store,Deli / Bodega,Department Store,Diner
2,,ASHMORE,Sushi Restaurant,Harbor / Marina,Bus Station,Sports Club,Bus Stop,Donut Shop,Farmers Market,Event Service,Electronics Store,Zoo Exhibit
3,,BENOWA,Home Service,Golf Course,National Park,Flea Market,Deli / Bodega,Department Store,Diner,Discount Store,Dog Run,Donut Shop
4,,BILINGA,Beer Garden,Zoo Exhibit,Flower Shop,Deli / Bodega,Department Store,Diner,Discount Store,Dog Run,Donut Shop,Electronics Store
5,,BONOGIN,Roof Deck,Zoo Exhibit,Construction & Landscaping,Deli / Bodega,Department Store,Diner,Discount Store,Dog Run,Donut Shop,Electronics Store
6,,BROADBEACH,Italian Restaurant,Café,Resort,Fast Food Restaurant,Seafood Restaurant,Residential Building (Apartment / Condo),Liquor Store,Bus Stop,Korean Restaurant,Japanese Restaurant
7,,BROADBEACH WATERS,Portuguese Restaurant,Park,Bus Stop,Construction & Landscaping,Deli / Bodega,Department Store,Diner,Discount Store,Dog Run,Donut Shop
8,,BUNDALL,Café,Furniture / Home Store,Boutique,Department Store,Pet Store,Grocery Store,Sushi Restaurant,Shopping Mall,Restaurant,Donut Shop
9,,BURLEIGH HEADS,Mexican Restaurant,Department Store,Arts & Crafts Store,Electronics Store,Paper / Office Supplies Store,Office,Zoo Exhibit,Diner,Discount Store,Dog Run


## 8. Fit k-means to cluster the Goldcoast areas into 3 clusters the data

### Prepare the data

In [99]:
goldcoast_grouped_cafe = goldcoast_grouped[["Suburb", "Café"]]
goldcoast_grouped_cafe = df.merge(goldcoast_grouped_cafe, on="Suburb", how="left")
goldcoast_grouped_cafe['Café'] = goldcoast_grouped_cafe['Café'].fillna(0)
goldcoast_grouped_cafe.head(10)

Unnamed: 0,Suburb,PostalCode,Population,Latitude,Longitude,Café
0,ADVANCETOWN,4211,482,-28.0255,153.283,0.0
1,ALBERTON,4207,590,-27.7002,153.275,0.0
2,ARUNDEL,4214,10246,-27.9361,153.365,0.0
3,ASHMORE,4214,11910,-27.9909,153.377,0.0
4,AUSTINVILLE,4213,356,-28.1331,153.316,0.0
5,BENOWA,4217,8741,-28.0042,153.383,0.0
6,BIGGERA WATERS,4216,8534,-27.9272,153.398,0.0
7,BILINGA,4225,1804,-28.1585,153.505,0.0
8,BONOGIN,4213,4573,-28.1278,153.36,0.0
9,BROADBEACH,4218,5514,-28.0236,153.429,0.1


### Fit k-means to cluster

In [100]:
kclusters = 3
goldcoast_grouped_clustering = goldcoast_grouped_cafe.drop(["Suburb", "Population", "PostalCode", "Latitude", "Longitude"], 1)
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(goldcoast_grouped_clustering)
# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 2], dtype=int32)

### Add results from k-mean clustering to the dataset

In [101]:
goldcoast_merged = df.copy()
goldcoast_merged["Cluster Label"] = kmeans.labels_
# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
goldcoast_merged = goldcoast_merged.join(goldcoast_venues_sorted.drop(["PostalCode"], 1).set_index("Suburb"), on="Suburb")
print(goldcoast_merged.shape)
goldcoast_merged.head(10)

(80, 16)


Unnamed: 0,Suburb,PostalCode,Population,Latitude,Longitude,Cluster Label,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,ADVANCETOWN,4211,482,-28.0255,153.283,0,Pub,Zoo Exhibit,Construction & Landscaping,Deli / Bodega,Department Store,Diner,Discount Store,Dog Run,Donut Shop,Electronics Store
1,ALBERTON,4207,590,-27.7002,153.275,0,,,,,,,,,,
2,ARUNDEL,4214,10246,-27.9361,153.365,0,Health Food Store,Laser Tag,Fast Food Restaurant,Pool,Halal Restaurant,Gym,Convenience Store,Deli / Bodega,Department Store,Diner
3,ASHMORE,4214,11910,-27.9909,153.377,0,Sushi Restaurant,Harbor / Marina,Bus Station,Sports Club,Bus Stop,Donut Shop,Farmers Market,Event Service,Electronics Store,Zoo Exhibit
4,AUSTINVILLE,4213,356,-28.1331,153.316,0,,,,,,,,,,
5,BENOWA,4217,8741,-28.0042,153.383,0,Home Service,Golf Course,National Park,Flea Market,Deli / Bodega,Department Store,Diner,Discount Store,Dog Run,Donut Shop
6,BIGGERA WATERS,4216,8534,-27.9272,153.398,0,,,,,,,,,,
7,BILINGA,4225,1804,-28.1585,153.505,0,Beer Garden,Zoo Exhibit,Flower Shop,Deli / Bodega,Department Store,Diner,Discount Store,Dog Run,Donut Shop,Electronics Store
8,BONOGIN,4213,4573,-28.1278,153.36,0,Roof Deck,Zoo Exhibit,Construction & Landscaping,Deli / Bodega,Department Store,Diner,Discount Store,Dog Run,Donut Shop,Electronics Store
9,BROADBEACH,4218,5514,-28.0236,153.429,2,Italian Restaurant,Café,Resort,Fast Food Restaurant,Seafood Restaurant,Residential Building (Apartment / Condo),Liquor Store,Bus Stop,Korean Restaurant,Japanese Restaurant


### Sort the data to see the cluster

In [102]:
print(goldcoast_merged.shape)
goldcoast_merged.sort_values(["Cluster Label"], inplace=True)
goldcoast_merged.head(10)

(80, 16)


Unnamed: 0,Suburb,PostalCode,Population,Latitude,Longitude,Cluster Label,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,ADVANCETOWN,4211,482,-28.0255,153.283,0,Pub,Zoo Exhibit,Construction & Landscaping,Deli / Bodega,Department Store,Diner,Discount Store,Dog Run,Donut Shop,Electronics Store
37,LUSCOMBE,4207,307,-27.7851,153.206,0,,,,,,,,,,
79,WORONGARY,4213,5613,-28.0361,153.339,0,Print Shop,Construction & Landscaping,Convenience Store,Deli / Bodega,Department Store,Diner,Discount Store,Dog Run,Donut Shop,Electronics Store
41,MERMAID WATERS,4218,12045,-28.0513,153.419,0,,,,,,,,,,
42,MERRIMAC,4226,7071,-28.0524,153.374,0,,,,,,,,,,
43,MIAMI,4220,6843,-28.068,153.438,0,Coffee Shop,Golf Course,Beer Garden,Farmers Market,Fast Food Restaurant,Brewery,Flea Market,Flower Shop,Department Store,Diner
44,MOLENDINAR,4214,6375,-27.9746,153.375,0,Coffee Shop,Ice Cream Shop,Noodle House,Bakery,Gym,Liquor Store,Zoo Exhibit,Event Service,Fast Food Restaurant,Farmers Market
45,MOUNT NATHAN,4211,1214,-27.9953,153.273,0,,,,,,,,,,
46,MUDGEERABA,4213,13624,-28.0805,153.358,0,Thrift / Vintage Store,Zoo Exhibit,Flower Shop,Deli / Bodega,Department Store,Diner,Discount Store,Dog Run,Donut Shop,Electronics Store
47,NATURAL BRIDGE,4211,108,-28.2128,153.234,0,,,,,,,,,,


## 9. Build folium map with clusters  
The background of the map is the population density of each suburb, the darker and red color indicates high population density  
The markers are the categories of clusters with labels indicating the suburb's name and postal code

In [103]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=10)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# Add background to map
map_clusters.choropleth(
    geo_data=gold_coast_geo,
    data=df,
    columns=['Suburb', 'Population'],
    key_on='feature.properties.Suburb',    # These are names in json file to match between geo_data and data
    fill_color='YlOrRd',
    fill_opacity=0.7,
    line_opacity=0.2,
    legend_name='Population'
)

# add markers to the map
markers_colors = []
for lat, lon, post, sub, cluster in zip(goldcoast_merged['Latitude'], goldcoast_merged['Longitude'], goldcoast_merged['PostalCode'], goldcoast_merged['Suburb'], goldcoast_merged['Cluster Label']):
    label = folium.Popup('{} ({}): - Cluster {}'.format(sub, post, cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters),
        
map_clusters

## 10. Dislay and explore each cluster

In [104]:
goldcoast_merged.loc[goldcoast_merged['Cluster Label'] == 0, goldcoast_merged.columns[[0] + list(range(5, goldcoast_merged.shape[1]))]]

Unnamed: 0,Suburb,Cluster Label,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,ADVANCETOWN,0,Pub,Zoo Exhibit,Construction & Landscaping,Deli / Bodega,Department Store,Diner,Discount Store,Dog Run,Donut Shop,Electronics Store
37,LUSCOMBE,0,,,,,,,,,,
79,WORONGARY,0,Print Shop,Construction & Landscaping,Convenience Store,Deli / Bodega,Department Store,Diner,Discount Store,Dog Run,Donut Shop,Electronics Store
41,MERMAID WATERS,0,,,,,,,,,,
42,MERRIMAC,0,,,,,,,,,,
43,MIAMI,0,Coffee Shop,Golf Course,Beer Garden,Farmers Market,Fast Food Restaurant,Brewery,Flea Market,Flower Shop,Department Store,Diner
44,MOLENDINAR,0,Coffee Shop,Ice Cream Shop,Noodle House,Bakery,Gym,Liquor Store,Zoo Exhibit,Event Service,Fast Food Restaurant,Farmers Market
45,MOUNT NATHAN,0,,,,,,,,,,
46,MUDGEERABA,0,Thrift / Vintage Store,Zoo Exhibit,Flower Shop,Deli / Bodega,Department Store,Diner,Discount Store,Dog Run,Donut Shop,Electronics Store
47,NATURAL BRIDGE,0,,,,,,,,,,


In [105]:
goldcoast_merged.loc[goldcoast_merged['Cluster Label'] == 1, goldcoast_merged.columns[[0] + list(range(5, goldcoast_merged.shape[1]))]]

Unnamed: 0,Suburb,Cluster Label,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
66,SPRINGBROOK,1,Café,Scenic Lookout,Zoo Exhibit,Flea Market,Deli / Bodega,Department Store,Diner,Discount Store,Dog Run,Donut Shop
51,NUMINBAH VALLEY,1,Café,Zoo Exhibit,Construction & Landscaping,Deli / Bodega,Department Store,Diner,Discount Store,Dog Run,Donut Shop,Electronics Store
71,TALLEBUDGERA,1,Café,Lawyer,Zoo Exhibit,Flower Shop,Department Store,Diner,Discount Store,Dog Run,Donut Shop,Electronics Store


In [106]:
goldcoast_merged.loc[goldcoast_merged['Cluster Label'] == 2, goldcoast_merged.columns[[0] + list(range(5, goldcoast_merged.shape[1]))]]

Unnamed: 0,Suburb,Cluster Label,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
73,TUGUN,2,Fast Food Restaurant,Grocery Store,Fruit & Vegetable Store,Gas Station,Café,Beach,Fish & Chips Shop,Park,Thai Restaurant,Indian Restaurant
35,LABRADOR,2,Soccer Field,Café,Food Court,Flea Market,Flower Shop,Deli / Bodega,Department Store,Diner,Discount Store,Dog Run
57,PARADISE POINT,2,Café,Malay Restaurant,Supermarket,Park,Fish & Chips Shop,Liquor Store,Sporting Goods Shop,Deli / Bodega,Australian Restaurant,Italian Restaurant
11,BUNDALL,2,Café,Furniture / Home Store,Boutique,Department Store,Pet Store,Grocery Store,Sushi Restaurant,Shopping Mall,Restaurant,Donut Shop
59,PIMPAMA,2,Café,Restaurant,Shopping Plaza,Zoo Exhibit,Fish & Chips Shop,Deli / Bodega,Department Store,Diner,Discount Store,Dog Run
33,JACOBS WELL,2,Park,Pier,Australian Restaurant,Bakery,Campground,Café,Donut Shop,Fast Food Restaurant,Farmers Market,Event Service
56,PALM BEACH,2,Pizza Place,Café,Liquor Store,Coffee Shop,Convenience Store,Pub,Rental Car Location,Food Court,Fish & Chips Shop,Sushi Restaurant
21,CURRUMBIN,2,Zoo Exhibit,Park,Café,Sporting Goods Shop,Fish & Chips Shop,Flea Market,Deli / Bodega,Department Store,Diner,Discount Store
48,NERANG,2,Skate Park,Farmers Market,Multiplex,Theater,Sandwich Place,Café,Burger Joint,Electronics Store,Event Service,Dog Run
40,MERMAID BEACH,2,Café,Steakhouse,Beach,Mini Golf,Fast Food Restaurant,Resort,Japanese Restaurant,Italian Restaurant,Noodle House,Mexican Restaurant


## 11. Conclusion

There are a few Café in Goldcoast, Queensland, Australia. Most of the suburbs (69/80, 86.3%) are classified into Cluster 0 with no Café. Three suburbs (Springbrook, Numibah Valley and Tallebudgera) and eight suburbs (Jacobs Well, Main Beach, Palm Beach, Pimpama, Coolangatta, Broadbeach and Yatala) are classified into Cluster 1 (moderate density of Café) and Cluster 2 (high density of Café) respectively.  
    Based on the map which took into account the population density, **Robina, Upper Coomera and Southport** should be considered as places to open new Café because the population density is high in these suburbs and there is no Café in these suburbs and their neiboughood.

#### Copy the list of clusters with suburbs to appendix for the report

In [107]:
# The code was removed by Watson Studio for sharing.

Unnamed: 0,Suburb,PostalCode,Population,Latitude,Longitude,Cluster Label,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,ADVANCETOWN,4211,482,-28.0255,153.283,0,Pub,Zoo Exhibit,Construction & Landscaping,Deli / Bodega,Department Store,Diner,Discount Store,Dog Run,Donut Shop,Electronics Store
37,LUSCOMBE,4207,307,-27.7851,153.206,0,,,,,,,,,,
79,WORONGARY,4213,5613,-28.0361,153.339,0,Print Shop,Construction & Landscaping,Convenience Store,Deli / Bodega,Department Store,Diner,Discount Store,Dog Run,Donut Shop,Electronics Store
41,MERMAID WATERS,4218,12045,-28.0513,153.419,0,,,,,,,,,,
42,MERRIMAC,4226,7071,-28.0524,153.374,0,,,,,,,,,,
43,MIAMI,4220,6843,-28.068,153.438,0,Coffee Shop,Golf Course,Beer Garden,Farmers Market,Fast Food Restaurant,Brewery,Flea Market,Flower Shop,Department Store,Diner
44,MOLENDINAR,4214,6375,-27.9746,153.375,0,Coffee Shop,Ice Cream Shop,Noodle House,Bakery,Gym,Liquor Store,Zoo Exhibit,Event Service,Fast Food Restaurant,Farmers Market
45,MOUNT NATHAN,4211,1214,-27.9953,153.273,0,,,,,,,,,,
46,MUDGEERABA,4213,13624,-28.0805,153.358,0,Thrift / Vintage Store,Zoo Exhibit,Flower Shop,Deli / Bodega,Department Store,Diner,Discount Store,Dog Run,Donut Shop,Electronics Store
47,NATURAL BRIDGE,4211,108,-28.2128,153.234,0,,,,,,,,,,
