# Michelin Guide Restaurant Exploratory Data Analysis

Author: Yuwen Xiang

The data is from https://www.kaggle.com/datasets/ngshiheng/michelin-guide-restaurants-2021

Micheline guide is published by Michelin group. It contains selected restaurants around the world for tourists. The standard of Michelin Guide is very high. It divides selected restaurants into 3 level of stars. 
The 3 levels are described as:
- ⭐️: "A very good restaurant in its category" 
- ⭐️⭐️: "Excellent cooking, worth a detour"
- ⭐️⭐️⭐️: "Exceptional cuisine, worth a special journey"

In addition, there is an award called *Bib Gourmand*. Restaurants with this award have high quality food and affordable price.

Gaining a Michelin star can be a lifelong dream for chefs.If the restaurant has a Michelin star, tourists will flock to the restaurant. Meanwhile, losing a star could be a nightmare for chefs.

The dataset contains 6353 observations of restaurant in Michelin Guide 2021. I like to travel, when I plan for a trip to a new country, I usually make reservations to local Michelin restaurant in advance. So, for my final project of BDI 475, I found that making an exploratory data analysis of Michelin guide will be useful for me and someone enjoy traveling. Though COVID has not ended yet, it is great to know those restaurants in advance in case of a trip without plan!

### Potential Questions
- What are the average prices of Michelin restaurants in different cities?
- What are the average prices of different cuisines?
- Which type of cuisines accounts for the most in the Michelin Guide? And which cuisine has the most 3 Star Micheline restaurants?
- Which city has the most Michelin restaurants? And which city has the most 3 Star Micheline restaurants?
- What cuisine and city are recommended for a tight budget? What cuisine and city are recommended for an adequate budget?
- What is the breakdown of Michelin restaurants in different cities?

## Exploratory Data Analysis

### Clean the data

First, let's load packages needed and the dataset.

In [78]:
import pandas as pd
import numpy as np
import plotly
import plotly.graph_objects as go
import plotly.express as px

In [77]:
df = pd.read_csv('https://raw.githubusercontent.com/viviennexiang/BDI475FinalProj/main/michelin_my_maps.csv')

Now, let's have a look of the dataset's basic information.

In [79]:
# Display the number of rows and columns

nrow = df.shape[0]
ncol = df.shape[1]
print(f'Michelin data contains {nrow} rows and {ncol} columns.')

Michelin data contains 6502 rows and 13 columns.


In [80]:
# Display all the columns

pd.set_option('display.max_columns', 50)
display(df)

Unnamed: 0,Name,Address,Location,MinPrice,MaxPrice,Currency,Cuisine,Longitude,Latitude,PhoneNumber,Url,WebsiteUrl,Award
0,Aqua,"Parkstraße 1, Wolfsburg, 38440, Germany",Wolfsburg,225,225,EUR,"Creative, Modern Cuisine",10.789999,52.433172,4.953616e+11,https://guide.michelin.com/en/niedersachsen/wo...,http://www.restaurant-aqua.com,3 MICHELIN Stars
1,The Table Kevin Fehling,"Shanghaiallee 15, Hamburg, 20457, Germany",Hamburg,230,230,EUR,Creative,10.002980,53.542623,4.940229e+11,https://guide.michelin.com/en/hamburg-region/h...,http://www.the-table-hamburg.de/,3 MICHELIN Stars
2,Restaurant Überfahrt Christian Jürgens,"Überfahrtstraße 10, Rottach-Egern, 83700, Germany",Rottach-Egern,259,319,EUR,Creative,11.758229,47.696685,4.980227e+09,https://guide.michelin.com/en/bayern/rottach-e...,http://www.althoffcollection.com,3 MICHELIN Stars
3,Victor's Fine Dining by christian bau,"Schlossstraße 27, Perl, 66706, Germany",Perl,205,295,EUR,Creative,6.387211,49.535173,4.968668e+10,https://guide.michelin.com/en/saarland/perl/re...,https://www.victors-fine-dining.de/,3 MICHELIN Stars
4,Rutz,"Chausseestraße 8, Berlin, 10115, Germany",Berlin,198,245,EUR,"Modern Cuisine, Creative",13.386087,52.528351,4.930246e+11,https://guide.michelin.com/en/berlin-region/be...,https://www.rutz-restaurant.de/,3 MICHELIN Stars
...,...,...,...,...,...,...,...,...,...,...,...,...,...
6497,Szóstka,"Plac Powstańców Warszawy 9, Warsaw, 00 039, Po...",Warsaw,180,180,PLN,"Polish, Modern Cuisine",21.012698,52.235803,4.822470e+10,https://guide.michelin.com/en/masovia/warsaw/r...,http://www.warszawa.hotel.com.pl/hotel-warszaw...,Bib Gourmand
6498,Fiorentina,"ul. Grodzka 63, Cracow, 31 044, Poland",Cracow,175,290,PLN,"Creative, Polish",19.938179,50.055898,4.812426e+10,https://guide.michelin.com/en/lesser-poland/cr...,https://www.fiorentina.com.pl/,Bib Gourmand
6499,Zazie,"ul. Józefa 34, Cracow, 32 056, Poland",Cracow,41,95,PLN,French,19.946949,50.051240,4.850041e+10,https://guide.michelin.com/en/lesser-poland/cr...,https://www.zaziebistro.pl/,Bib Gourmand
6500,Butchery & Wine,"ul. Żurawia 22, Warsaw, 00 515, Poland",Warsaw,235,235,PLN,"Meats and Grills, Traditional Cuisine",21.015495,52.228581,4.822502e+10,https://guide.michelin.com/en/masovia/warsaw/r...,https://www.butcheryandwine.pl/,Bib Gourmand


In [81]:
# Check the data types of each column and non-missing rows

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6502 entries, 0 to 6501
Data columns (total 13 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   Name         6502 non-null   object 
 1   Address      6502 non-null   object 
 2   Location     6502 non-null   object 
 3   MinPrice     6501 non-null   object 
 4   MaxPrice     6501 non-null   object 
 5   Currency     6501 non-null   object 
 6   Cuisine      6502 non-null   object 
 7   Longitude    6502 non-null   float64
 8   Latitude     6502 non-null   float64
 9   PhoneNumber  6381 non-null   float64
 10  Url          6502 non-null   object 
 11  WebsiteUrl   5375 non-null   object 
 12  Award        6502 non-null   object 
dtypes: float64(3), object(10)
memory usage: 660.5+ KB


A data cleaning will be conducted. The insterested columns with non-missing rows will be kept.

In [82]:
# Clean the data, keep rows with non-missing Name, Address, Location, MinPrice, MaxPrice, Currency, Cuisine, Longtitude, Latitude, and Award.
michelin = df.loc[:, ["Name", "Address", "Location", "MinPrice", "MaxPrice", "Currency", "Cuisine", "Longitude", "Latitude","Award"]]
michelin = michelin.dropna()
michelin = michelin.reset_index(drop=True)
michelin.head(5)

Unnamed: 0,Name,Address,Location,MinPrice,MaxPrice,Currency,Cuisine,Longitude,Latitude,Award
0,Aqua,"Parkstraße 1, Wolfsburg, 38440, Germany",Wolfsburg,225,225,EUR,"Creative, Modern Cuisine",10.789999,52.433172,3 MICHELIN Stars
1,The Table Kevin Fehling,"Shanghaiallee 15, Hamburg, 20457, Germany",Hamburg,230,230,EUR,Creative,10.00298,53.542623,3 MICHELIN Stars
2,Restaurant Überfahrt Christian Jürgens,"Überfahrtstraße 10, Rottach-Egern, 83700, Germany",Rottach-Egern,259,319,EUR,Creative,11.758229,47.696685,3 MICHELIN Stars
3,Victor's Fine Dining by christian bau,"Schlossstraße 27, Perl, 66706, Germany",Perl,205,295,EUR,Creative,6.387211,49.535173,3 MICHELIN Stars
4,Rutz,"Chausseestraße 8, Berlin, 10115, Germany",Berlin,198,245,EUR,"Modern Cuisine, Creative",13.386087,52.528351,3 MICHELIN Stars


The next step is to convert currencies. To give a intuitive comparison, all prices will be converted to USD.

In [83]:
michelin2 = michelin.copy()

In [84]:
michelin2['Rate'] = 0
michelin2['cminprice'] = 0
michelin2['cmaxprice'] = 0

In [85]:
%%capture --no-stderr
!pip install requests

In [86]:
st = ','.join(set(list((michelin['Currency']))))

In [87]:
pd.options.mode.chained_assignment = None
import requests

api = "p9CdBS0XFeoB0CR3iEm9ErnxPndZjtdE"

params = {'access_key': api, 'currencies': st, 'format': 1}

url = "https://v6.exchangerate-api.com/v6/{0}/latest/USD".format(api)
url = "https://v6.exchangerate-api.com/v6/a3b189d15daf0fe8105487e6/latest/USD"

response = requests.get(url)

r = requests.get('http://apilayer.net/api/live', params = params)

data = response.json()['conversion_rates']

unique_currency = list(set(list(michelin['Currency'])))

for currency in unique_currency:
    michelin2['Rate'][michelin2['Currency'] == currency] = data[currency]


In [88]:
pd.options.mode.chained_assignment = None
for i in range(michelin2.shape[0]):
        michelin2['cminprice'][i] = float("".join(michelin2['MinPrice'][i].split(',')))/michelin2['Rate'][i]
        michelin2['cmaxprice'][i] = float("".join(michelin2['MaxPrice'][i].split(',')))/michelin2['Rate'][i]

In [89]:
michelin = michelin2.loc[:,['Name', 'Address', 'Location','Cuisine', 'Longitude', 'Latitude', 'Award','cminprice', 'cmaxprice']]
michelin.rename(columns={"cminprice": "MinPrice", "cmaxprice": "MaxPrice"}, inplace=True)


In [90]:
michelin.to_csv("michelin_cleaned.csv")

### Explore the price differences between cuisines and cities.

First, let's have a look to the distribution of minimum price and maximum price.

In [91]:
fig = px.box(michelin,
             x='MinPrice',
             orientation='h',
             title='Range of MinPrice')
fig.show()

The lowest minimum price is about \\$0.5 while the highest is \\$600. There are many outliers, which makes sense, the minimum price of some 3-Star Michelins can be extremely high.

In [92]:
fig = px.box(michelin,
             x='MaxPrice',
             orientation='h',
             title='Range of MaxPrice')
fig.show()

The lowest maximum price is about \\$0.5, which is same as the minimum price, a possible interpretation maybe it is a stand that only sell one streetfood. The highest maximum price is about \\$1320.7. There are many outliers, too. 

Let's check the top 10 cuisines. There are 1045 Michelin restaurants are in Mordern Cuisine. Creative Cuisine is the second highest, 438 Michelin restaurants are in Creative cuisine. As expected, the number of Michelin restaurants making Japanese cuisine is as high as 272.

In [93]:
by_cuisine = michelin.groupby('Cuisine', as_index=False).agg({'Name': 'count'})
by_cuisine.rename(columns={'Name': 'Count'}, inplace=True)
by_cuisine.sort_values('Count', ascending=False, inplace=True)
by_cuisine.head(10)

Unnamed: 0,Cuisine,Count
514,Modern Cuisine,1045
196,Creative,438
409,Japanese,272
748,Traditional Cuisine,260
277,French,169
722,Street Food,142
378,Italian,131
105,Classic Cuisine,108
621,Regional Cuisine,107
74,Cantonese,101


In [72]:
fig = px.bar(by_cuisine[0:10], 
             x="Cuisine", 
             y="Count", 
             title="Top 10 cuisines with the highest number of Michelin restaurants",
             color="Cuisine", 
             width=800,height=500)
fig.show()

Then, let's check top 10 cities with Michelin restaurants. If you are a gourmet, the cities below are the recommend place for you to travel. 

In [94]:
by_city = michelin.groupby('Location', as_index=False).agg({'Name': 'count'})
by_city.rename(columns={'Name': 'Count'}, inplace=True)
by_city.sort_values('Count', ascending=False, inplace=True)
by_city.head(10)

Unnamed: 0,Location,Count
2403,Tokyo,430
1165,Kyoto,207
1714,Osaka,206
1623,New York,180
1760,Paris,159
1049,Hong Kong,135
2256,Singapore,118
1351,London,106
215,Bangkok,94
2019,SEOUL,91


In [73]:
fig = px.bar(by_city[0:10], 
             x="Location", 
             y="Count", 
             title="Top 10 cities with the highest number of Michelin restaurants",
             color="Location", 
             width=800,
             height=500)
fig.show()

Next, let's explore the maximum price and minimum price differences between cuisines and cities.

In [95]:
max_by_cuisine = michelin.groupby('Cuisine', as_index=False).agg({'Name': 'count','MaxPrice': ['min', 'max', 'mean']})
max_by_cuisine.rename(columns={'count': 'Number','min': 'Minimum', 'max': 'Maximum', 'mean': 'Average'}, inplace=True)
max_by_cuisine.sort_values(('MaxPrice', 'Average'), ascending=False, inplace=True)
max_by_cuisine.head(10)

Unnamed: 0_level_0,Cuisine,Name,MaxPrice,MaxPrice,MaxPrice
Unnamed: 0_level_1,Unnamed: 1_level_1,Number,Minimum,Maximum,Average
195,Crab Specialities,1,536.615992,536.615992,536.615992
530,"Modern Cuisine, French Contemporary",1,444.912856,444.912856,444.912856
173,"Corsican, Modern Cuisine",1,321.899736,321.899736,321.899736
210,"Creative, Classic French",1,300.791557,300.791557,300.791557
90,"Chinese, Beijing Cuisine",1,300.0,300.0,300.0
563,"Modern French, French",2,277.022263,322.353178,299.68772
421,"Japanese, Seafood",1,295.0,295.0,295.0
155,"Contemporary, French",5,68.601583,450.0,289.320317
298,Fugu / Pufferfish,4,260.642053,314.303653,283.639882
167,"Contemporary, Seafood",3,89.709763,600.0,279.903254


Sorted by average maximum price, cuisines above are the top 10 expensive. The average maximum prices are all over $250. If you plan to visit restaurants with these cuisines, maybe you need to increase your budget. 

In [96]:
min_by_cuisine = michelin.groupby('Cuisine', as_index=False).agg({'Name': 'count','MinPrice': ['min', 'max', 'mean']})
min_by_cuisine.rename(columns={'count': 'Number','min': 'Minimum', 'max': 'Maximum', 'mean': 'Average'}, inplace=True)
min_by_cuisine.sort_values(('MinPrice', 'Average'), ascending=True, inplace=True)
min_by_cuisine.head(10)

Unnamed: 0_level_0,Cuisine,Name,MinPrice,MinPrice,MinPrice
Unnamed: 0_level_1,Unnamed: 1_level_1,Number,Minimum,Maximum,Average
700,Small eats,5,1.485928,13.472686,5.820987
465,"Market Cuisine, Home Cooking",1,5.856744,5.856744,5.856744
699,Singaporean and Malaysian,1,6.366831,6.366831,6.366831
722,Street Food,142,0.471544,64.715611,6.898714
586,Onigiri,1,7.665943,7.665943,7.665943
620,Ramen,37,7.665943,19.100494,8.604723
688,"Shanghainese, Sichuan",1,8.658891,8.658891,8.658891
776,Udon,17,7.665943,22.997828,8.820262
252,Curry,18,7.665943,15.331885,8.9436
576,Noodles,38,3.368171,39.744423,10.793053


Sorted by average minimum price, cuisines above are the top 10 affordable. Most of them have an average minimum price less than 10 dollars. If you have a tight budget, try them!

Below is the top 10 cities with the highest average maximum price. If you plan to visit here, remember to prepare enough money.

In [97]:
max_by_city = michelin.groupby('Location', as_index=False).agg({'Name': 'count','MaxPrice': ['min', 'max', 'mean']})
max_by_city.rename(columns={'count': 'Number','min': 'Minimum', 'max': 'Maximum', 'mean': 'Average'}, inplace=True)
max_by_city.sort_values(('MaxPrice', 'Average'), ascending=False, inplace=True)
max_by_city.head(10)

Unnamed: 0_level_0,Location,Name,MaxPrice,MaxPrice,MaxPrice
Unnamed: 0_level_1,Unnamed: 1_level_1,Number,Minimum,Maximum,Average
1163,Kruishoutem,1,559.366755,559.366755,559.366755
1416,Manigod,1,416.886544,416.886544,416.886544
1485,Menton,1,401.055409,401.055409,401.055409
1162,Kruiningen,1,401.055409,401.055409,401.055409
378,Brusaporto,1,401.055409,401.055409,401.055409
2506,Vejle,2,325.373472,466.840199,396.106836
643,Crissier,1,392.867936,392.867936,392.867936
1388,Machynlleth,1,370.050574,370.050574,370.050574
1723,Ouches,1,348.28496,348.28496,348.28496
2532,Veyrier-du-Lac,1,348.28496,348.28496,348.28496


So, what cities have the top 10 lowest average minimum price? It seems like Chiang Mai is a good place to visit for the number of Micheline restaurants it has and their prices.

In [98]:
min_by_city = michelin.groupby('Location', as_index=False).agg({'Name': 'count','MinPrice': ['min', 'max', 'mean']})
min_by_city.rename(columns={'count': 'Number','min': 'Minimum', 'max': 'Maximum', 'mean': 'Average'}, inplace=True)
min_by_city.sort_values(('MinPrice', 'Average'), ascending=True, inplace=True)
min_by_city.head(10)

Unnamed: 0_level_0,Location,Name,MinPrice,MinPrice,MinPrice
Unnamed: 0_level_1,Unnamed: 1_level_1,Number,Minimum,Maximum,Average
1592,Nakhon Pathom,1,2.748111,2.748111,2.748111
1908,Quintanar de la Orden,1,10.55409,10.55409,10.55409
1772,Pedra Furada,1,10.55409,10.55409,10.55409
581,Coimbra,1,10.55409,10.55409,10.55409
9,Abrantes,1,10.55409,10.55409,10.55409
2305,Split,1,11.206051,11.206051,11.206051
2155,Sant Climent de Llobregat,1,11.609499,11.609499,11.609499
1887,Puente-Genil,1,11.609499,11.609499,11.609499
548,Chiang Mai,23,0.57855,37.60573,12.375933
796,Esteiro,1,12.664908,12.664908,12.664908


As mentioned in data description. Stars are important for chefs. Let's check the 3-star Michelin restaurants cuisine break down.

In [99]:
threestar = michelin[michelin['Award'] == '3 MICHELIN Stars']
fig = px.pie(
    threestar,
    names='Cuisine',
    title='Three star Michelin restaurant cuisine breakdown',
    width=800,
    height=700
)

fig.show()

Detailed Top 10 cuisines with the most 3-star Michelin restaurants is shown below.

In [100]:
by_cuisine_star = threestar.groupby(['Cuisine'], as_index=False).agg({'Name': 'count'})
by_cuisine_star.rename(columns={'Name':'Count'}, inplace = True)
by_cuisine_star.sort_values('Count', ascending=False, inplace=True)

fig = px.bar(
    by_cuisine_star[0:10],
    x = 'Cuisine',
    y = 'Count',
    title = 'Top 10 cuisines with the most 3-star Michelin Restaurants',
    color="Cuisine", 
    width=800,
    height=500
)

fig.show()

Creative cuisine and Japanese cuisine account for high proportion in both overall Michelin restaurants and 3-Star Micheline restaurants. Though the number of Michelin restaurant in Modern style is the highest, about two times higher than Creative cuisine restaurants, there are not many 3-Star Modern restaurants.

Similar to cuisines, we are also interested in  3-star Michelin restaurants city break down. Because cities with 1 three-star Michelin restaurant account for the most, which makes the plot messy, only cities with more than 2 three-star Michelin restaurants will be shown.

In [101]:
threestar2 = threestar.groupby(['Location'], as_index=False).agg({'Name': 'count'})
threestar2 = threestar2[threestar2['Name']>=2]
st2 = ','.join(set(list((threestar2['Location']))))
unique_city = list(set(list(threestar2['Location'])))

threestar3 = threestar[(threestar['Location'].isin(unique_city))]

In [102]:
fig = px.pie(
    threestar3,
    names='Location',
    title='Three star Michelin restaurant city breakdown(>=2 three-star Michelins)',
    width=800,
    height=700
)

fig.show()

The top 10 cities with the most 3-star Michelin restaurants:

In [103]:
by_city_star = threestar.groupby(['Location'], as_index=False).agg({'Name': 'count'})
by_city_star.rename(columns={'Name':'Count'}, inplace = True)
by_city_star.sort_values('Count', ascending=False, inplace=True)

fig = px.bar(
    by_city_star[1:10],
    x = 'Location',
    y = 'Count',
    title = 'Top 10 cities with the most 3-star Michelin Restaurants',
    color="Location", 
    width=800,
    height=500
)

fig.show()

Tokyo is the city with both highest amount of Michelin restaurants and 3-Star Michelin restaurants, making it worth visiting. Paris comes second, though it doesn't have as many Michelin restaurants as Tokyo does, the quality of its Michelin restaurants are relatively high. Hong Kong is similar to Paris,if you are considering for visiting Hong Kong, please don't be hesitate!

However, consider price and difficulties in reservation, 3-Star Michelins are not the main goal for most tourists. An Award break down of Cities with more than 2 3-Star Michelins will be conducted for travelers who are insterested in Michelins in levels other than 3-star.

In [104]:
top_cities = michelin[(michelin['Location'].isin(unique_city))]
by_city_award = top_cities.groupby(['Location', 'Award'], as_index=False).agg({
        'Name': 'count'
    }).rename(columns={
        'Name': 'num_listings'
    })

In [105]:
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)

fig = px.treemap(
    by_city_award,
    path=['Location', 'Award'],
    title='Top cities breakdown',
    values='num_listings',
    height=700
)

fig.show()

Most of the top cities have similar trends, Bib gourmand is in the lead. In cities like Paris, London, Shanghai, and Beijing, 1-star Michelin accounts for the most. Cities above are definitely worth visiting, where you can enjoy both affordable Michelin Restaurant and expensive delicacies.

### Summary
- Tokyo is the city with both the most Michelin restaurants and 3-Star Michelin restaurants, offering tourists various choices of high quality foods. Moreover, Kyoto and Osaka, which located in Japan as well, are worth visiting, too. 
- Michelin restaurants are mostly located in East Asia, Europe, and the United States.
- If you have a tight budget, Chiang Mai will be recommended. It has 23 Michelin restaurants and the average minimum expense is as low as 12 dollars.
- Among all cuisines, Modern cuisine has the most Michelin restaurants while creative cuisine has the most 3-star Michelin restaurants. Crab Specialities is the most expensive cuisine and Street food is the most affordable.