# Case Study 1 : Yelp Data Analysis

**Required Readings:** 
* [Yelp Dataset Challenge](https://www.yelp.com/dataset_challenge) 
* Please download the Yelp dataset from the above webpage.

**NOTE**
* Please don't forget to save the notebook frequently when working in Jupyter Notebook, otherwise the changes you made can be lost.


Here is an example of the data format. More details are included [here](https://www.yelp.com/dataset_challenge)

## Business Objects

Business objects contain basic information about local businesses. The fields are as follows:

```json
{
  'type': 'business',
  'business_id': (a unique identifier for this business),
  'name': (the full business name),
  'neighborhoods': (a list of neighborhood names, might be empty),
  'full_address': (localized address),
  'city': (city),
  'state': (state),
  'latitude': (latitude),
  'longitude': (longitude),
  'stars': (star rating, rounded to half-stars),
  'review_count': (review count),
  'photo_url': (photo url),
  'categories': [(localized category names)]
  'open': (is the business still open for business?),
  'schools': (nearby universities),
  'url': (yelp url)
}
```
## Checkin Objects
```json
{
    'type': 'checkin',
    'business_id': (encrypted business id),
    'checkin_info': {
        '0-0': (number of checkins from 00:00 to 01:00 on all Sundays),
        '1-0': (number of checkins from 01:00 to 02:00 on all Sundays),
        ...
        '14-4': (number of checkins from 14:00 to 15:00 on all Thursdays),
        ...
        '23-6': (number of checkins from 23:00 to 00:00 on all Saturdays)
    }, # if there was no checkin for a hour-day block it will not be in the dict
}
```

# Problem: pick a data science problem that you plan to solve using Yelp Data
* The problem should be important and interesting, which has a potential impact in some area.
* The problem should be solvable using yelp data and data science solutions.

Please briefly describe in the following cell: what problem are you trying to solve? why this problem is important and interesting?

The second aspect to consider the problem of the open or close status of a restaurant is from the rating star and all the attributes (e.g. food types, business time, parking availabilty and type, noiselevel, seat arrangement, decoration style, TV and WiFi availabilty, kids friendly, alcohol availability, food delivery, drive through, smoking allowing, disability friendly, pets friendly, music style, date of week operation, allergy identification, etc). The way to predict the restaurant business good or bad is very important and interesting since both restaurant owners and customers want to know what factors significantly influence a restaurant's profits. The restaurant owners may make usage of the discoveries to promote their businesses. Customers may better notice what kind of restaurants are popular and deserve to try.

# Data Collection/Processing: 

In [0]:
#----------------------------------------------
# Your code starts here
#   Please add comments or text cells in between to explain the general idea of each block of the code.
#   Please feel free to add more cells below this cell if necessary

import json
import pandas as pd
import numpy as np
import xgboost as xgb
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import KFold
from sklearn.metrics import confusion_matrix
from sklearn.metrics import classification_report
from sklearn.metrics import accuracy_score

In [0]:
# read the business and the checkins json as pandas dataframe
business = pd.read_json('yelp_dataset/business.json', lines=True)
checkins = pd.read_json('yelp_dataset/checkin.json', lines=True)

In [0]:
business

Unnamed: 0,business_id,name,address,city,state,postal_code,latitude,longitude,stars,review_count,is_open,attributes,categories,hours
0,1SWheh84yJXfytovILXOAQ,Arizona Biltmore Golf Club,2818 E Camino Acequia Drive,Phoenix,AZ,85016,33.522143,-112.018481,3.0,5,0,{'GoodForKids': 'False'},"Golf, Active Life",
1,QXAEGFB4oINsVuTFxEYKFQ,Emerald Chinese Restaurant,30 Eglinton Avenue W,Mississauga,ON,L5R 3E7,43.605499,-79.652289,2.5,128,1,"{'RestaurantsReservations': 'True', 'GoodForMe...","Specialty Food, Restaurants, Dim Sum, Imported...","{'Monday': '9:0-0:0', 'Tuesday': '9:0-0:0', 'W..."
2,gnKjwL_1w79qoiV3IC_xQQ,Musashi Japanese Restaurant,"10110 Johnston Rd, Ste 15",Charlotte,NC,28210,35.092564,-80.859132,4.0,170,1,"{'GoodForKids': 'True', 'NoiseLevel': 'u'avera...","Sushi Bars, Restaurants, Japanese","{'Monday': '17:30-21:30', 'Wednesday': '17:30-..."
3,xvX2CttrVhyG2z1dFg_0xw,Farmers Insurance - Paul Lorenz,"15655 W Roosevelt St, Ste 237",Goodyear,AZ,85338,33.455613,-112.395596,5.0,3,1,,"Insurance, Financial Services","{'Monday': '8:0-17:0', 'Tuesday': '8:0-17:0', ..."
4,HhyxOkGAM07SRYtlQ4wMFQ,Queen City Plumbing,"4209 Stuart Andrew Blvd, Ste F",Charlotte,NC,28217,35.190012,-80.887223,4.0,4,1,"{'BusinessAcceptsBitcoin': 'False', 'ByAppoint...","Plumbing, Shopping, Local Services, Home Servi...","{'Monday': '7:0-23:0', 'Tuesday': '7:0-23:0', ..."
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
192604,nqb4kWcOwp8bFxzfvaDpZQ,Sanderson Plumbing,,North Las Vegas,NV,89032,36.213732,-115.177059,5.0,9,1,{'BusinessAcceptsCreditCards': 'True'},"Water Purification Services, Water Heater Inst...","{'Monday': '0:0-0:0', 'Tuesday': '0:0-0:0', 'W..."
192605,vY2nLU5K20Pee-FdG0br1g,Chapters,17440 Yonge Street,Newmarket,ON,L3Y 6Y9,44.052658,-79.481850,4.5,3,1,"{'RestaurantsPriceRange2': '2', 'BikeParking':...","Books, Mags, Music & Video, Shopping",
192606,MiEyUDKTjeci5TMfxVZPpg,Phoenix Pavers,21230 N 22nd St,Phoenix,AZ,85024,33.679992,-112.035569,4.5,14,1,"{'BusinessAcceptsCreditCards': 'True', 'ByAppo...","Home Services, Contractors, Landscaping, Mason...","{'Monday': '7:0-15:0', 'Tuesday': '7:0-15:0', ..."
192607,zNMupayB2jEHVDOji8sxoQ,Beasley's Barber Shop,4406 E Main St,Mesa,AZ,85205,33.416137,-111.735743,4.5,15,1,"{'RestaurantsPriceRange2': '1', 'BusinessAccep...","Beauty & Spas, Barbers","{'Tuesday': '8:30-17:30', 'Wednesday': '8:30-1..."


In [0]:
checkins

Unnamed: 0,business_id,date
0,--1UhMGODdWsrMastO9DZw,"2016-04-26 19:49:16, 2016-08-30 18:36:57, 2016..."
1,--6MefnULPED_I942VcFNA,"2011-06-04 18:22:23, 2011-07-23 23:51:33, 2012..."
2,--7zmmkVg-IMGaXbuVd0SQ,"2014-12-29 19:25:50, 2015-01-17 01:49:14, 2015..."
3,--8LPVSo5i0Oo61X01sV9A,2016-07-08 16:43:30
4,--9QQLMTbFzLJ_oT-ON3Xw,"2010-06-26 17:39:07, 2010-08-01 20:06:21, 2010..."
...,...,...
161945,zzvlwkcNR1CCqOPXwuvz2A,"2017-05-06 20:05:15, 2017-05-12 22:37:03, 2017..."
161946,zzwaS0xn1MVEPEf0hNLjew,"2010-02-16 02:09:56, 2010-07-05 05:40:48, 2010..."
161947,zzwhN7x37nyjP0ZM8oiHmw,"2016-03-06 13:27:02, 2016-03-09 00:41:53, 2016..."
161948,zzwicjPC9g246MK2M1ZFBA,"2012-09-22 00:26:15, 2012-09-23 20:12:00, 2012..."


# Data Exploration: Exploring the Yelp Dataset

**(1) Finding the most popular business categories:** 
* print the top 10 most popular business categories in the dataset and their counts (i.e., how many business objects in each category). Here we say a category is "popular" if there are many business objects in this category (such as 'restaurants').

In [0]:
# Your code starts here
#   Please add comments or text cells in between to explain the general idea of each block of the code.
#   Please feel free to add more cells below this cell if necessary

# split the categories column, so that we can group and count the containing objects
business["categories"]= business["categories"].str.split(",", expand = True) 

# group the categories unique and count their appearances
categories = business.groupby('categories')['business_id'].nunique()

# sort the appearances descending
categories = categories.sort_values(ascending=False)

# only show the first 10 rows
categories.head(10)

categories
Restaurants         17948
Food                 8023
Shopping             7791
Beauty & Spas        5832
Home Services        5267
Health & Medical     4570
Automotive           4130
Local Services       3413
Nightlife            2713
Active Life          2319
Name: business_id, dtype: int64

** (2) Find the most popular business objects** 
* print the top 10 most popular business objects in the dataset and their counts (i.e., how many checkins in total for each business object).  Here we say a business object is "popular" if the business object attracts a large number of checkins from the users.

In [0]:
# Your code starts here
#   Please add comments or text cells in between to explain the general idea of each block of the code.
#   Please feel free to add more cells below this cell if necessary

# split the date column, so that we can group and count the containing objects
checkins['date'] = checkins["date"].str.split(",")

# count the amount of dates in each row to get the total amount of check ins
checkins['num_checkins'] = checkins.apply(lambda row: len(row.date), axis = 1)

# sort the amount of checkins descending
checkins = checkins.sort_values(by=['num_checkins'], ascending=False)


In [0]:
most_checkins = checkins.head(10).copy()

# search for the business name with the corresponding business_id as key
most_checkins['name'] = most_checkins.apply(lambda row: business.loc[business['business_id'] == row['business_id']].name.values[0], axis = 1)

In [0]:
most_checkins

Unnamed: 0,business_id,date,num_checkins,name
42142,FaHADZARwnY4yvlvpnsfGA,"[2010-01-15 22:59:12, 2010-01-16 02:39:11, 2...",143061,McCarran International Airport
52735,JmI9nslLD7KZqRr__Bg6NQ,"[2010-01-16 04:30:54, 2010-01-16 13:47:11, 2...",123126,Phoenix Sky Harbor International Airport
157953,yQab5dxZzgBLTEHCw9V7_w,"[2010-01-17 18:28:10, 2010-01-23 18:18:38, 2...",54787,Charlotte Douglas International Airport
15955,5LNZ67Yw9RD6nf4_UhXOjw,"[2010-07-20 13:11:27, 2010-11-08 16:56:23, 2...",46384,The Cosmopolitan of Las Vegas
49667,IZivKqtHyz4-ts8KsnvMrA,"[2014-07-03 05:26:09, 2014-07-04 02:08:10, 2...",38277,Kung Fu Tea
74469,SMPbvZLSMMb7KU76YNYMGg,"[2010-01-17 06:14:44, 2010-01-17 19:51:31, 2...",34353,ARIA Resort & Casino
86134,Wxxvi3LZbHNIDwJ-ZimtnA,"[2010-01-16 18:05:13, 2010-01-24 10:27:06, 2...",32343,The Venetian Las Vegas
130714,na4Th5DrNauOv-c43QQFvA,"[2010-01-17 02:56:04, 2010-01-23 22:18:47, 2...",31185,Bellagio Hotel
83644,VyjyHoBg3KC5BSFRlD0ZPQ,"[2010-01-17 03:59:05, 2010-01-18 02:23:32, 2...",30782,Caesars Palace Las Vegas Hotel & Casino
39989,El4FC8jcawUVgw_0EIcbaQ,"[2010-01-24 03:48:14, 2010-01-24 05:50:43, 2...",30098,MGM Grand Hotel


# The Solution: implement a data science solution to the problem you are trying to solve.

Briefly describe the idea of your solution to the problem in the following cell:

In [0]:
# The problem we are trying to solve is:
# We want to figure out the relationship between some given factors that may impact a business to terminate and then communicate that reasoning to Yelp so that struggling businesses have a chance to survive in the market.
# We will be investigating the restaurant business category because it the number one most occurring category out of the whole Yelp dataset.

For the second part of predicting restaurants are whether open or permenantly closed, we primarily apply four steps: (a) feature selection (b) Missing value handling (c) model selection (d) evaluation. For feature selection, we consider features from initilly 90 narrow down to 41 features; for missing value handling, we tried mean and extreme value; for model selection we focus on logstic regression and XGBoost; for evaluation we use cross validation, confusion matrix, precision, recall, and F1 score.

In [0]:
# Our solution is:
# Collect data of the Closed Business regarding its ratings and customer reviews
# Develop a Review Attitude Model and Rating Model to analyze how various factors (ratings and reviews in this case) impact the business closure

Write codes to implement the solution in python:

In [0]:
# Your code starts here
#   Please add comments or text cells in between to explain the general idea of each block of the code.
#   Please feel free to add more cells below this cell if necessary

# load business dataframe
data_df = pd.read_json('yelp_dataset/business.json', lines=True)

#restaurant selection and review number threshold 
## pick only restaurants
drop_array = []
for index, row in data_df.iterrows():
    if row['categories'] is None:
        drop_array.append(index)
        continue
    cat = row['categories'].split(', ')
    if not 'Restaurants' in cat:
        drop_array.append(index)
data_df = data_df.drop(data_df.index[drop_array])

## review number threshold >= 50
data_df = data_df[data_df['review_count'] >= 50]

In [0]:
#(a) feature selection
delete_column = ['index', 'business_id', 'Unnamed: 0', 'is_open']
X = data_df.drop(delete_column, axis=1)
y = data_df['is_open'].values

## remove columns full of nan
mean_X = X
sum(mean_X.isna().sum() == mean_X.shape[0])
delcol = []
for column in mean_X:
    if mean_X[column].isna().sum() == mean_X.shape[0]:
        delcol.append(column)
print(delcol)
mean_X = mean_X.drop(delcol, axis = 1)
print(mean_X.shape)
print(X.shape)

## delete #nan > 10000
proportion_X = mean_X
del_column = []
for column in proportion_X:
    if proportion_X[column].isna().sum() > 10000:
        del_column.append(column)
proportion_X = proportion_X.drop(del_column, axis=1)
proportion_X.shape #41 columns

In [0]:
#(b) missing value handling
proportion_X_1 = proportion_X
proportion_X_2 = proportion_X
proportion_X_3 = proportion_X
## replace nan with mean value -- method 1
for column in proportion_X_1:
    tempsum = proportion_X_1[column].sum()
    tempcount = proportion_X_1.shape[0] - proportion_X_1[column].isna().sum()
    if tempcount==0:
        continue
    meanvalue = tempsum / tempcount
    proportion_X_1[column] = proportion_X_1[column].fillna(meanvalue)

## replace nan with -99999 -- method 2
proportion_X_2 = proportion_X_2.fillna(-9999)

## replace nan with 0 -- method 3
proportion_X_3 = proportion_X_3.fillna(0)

In [0]:
#(c) model selection
input_set = proportion_X_3
output_set = y
input_set = np.array(input_set)
kf = KFold(n_splits=2)
kf.get_n_splits(input_set)
for train_index, test_index in kf.split(input_set):
    print("TRAIN:", train_index, "TEST:", test_index)
    X_train, X_test = input_set[train_index], input_set[test_index]
    y_train, y_test = output_set[train_index], output_set[test_index]

## logistic 
lr = LogisticRegression()
lr.fit(X_train, y_train)
y_pred_lr = lr.predict(X_test)

##XGBOOST
xgb_model = xgb.XGBClassifier(objective="binary:logistic", random_state=42)
xgb_model.fit(X_train, y_train)
y_pred_xgb = xgb_model.predict(X_test)

In [0]:
#(d) evalution
print(confusion_matrix(y_test, y_pred_xgb))
print(classification_report(y_test, y_pred_xgb))

In [0]:
# Your code starts here
#   Please add comments or text cells in between to explain the general idea of each block of the code.
#   Please feel free to add more cells below this cell if necessary
import json

business = "yelp_dataset/business.json"
yelpdata = ""

with open(business, 'r') as file:
    yelpdata = file.read()
# print(json.dumps(yelpdata, separators=('\n', ':')))
str_close_business = yelpdata.split("\n")

close_business = []

for obj in str_close_business:
    if obj != "":
        close_business.append(json.loads(obj))
        
print('Collect data from: ', business, ' Complete !!!')

In [0]:
close_restaurant = []
res_review = dict()

for c in close_business:
    number = 0
    business_id = c['business_id']
    business_name = c['name']
    b_categories = str(c['categories']).split(", ")
    if 'is_open' in c and c['is_open'] == 0 and 'Restaurants' in b_categories:
        close_restaurant.append(business_id)
        res_review[business_id] = 0
print(len(close_restaurant))

In [0]:
review = "yelp_dataset/review.json"
reviewdata = ""

with open(review, 'r') as file:
    sf = file.readline()
    while(sf != ""):
        onereview = json.loads(sf)
        bID = onereview['business_id']
        if bID in close_restaurant:  # only reviews from closed business!
            res_review[bID] += 1
        sf = file.readline()

print('Collect data from: ', review, ' Complete !!!')
print(len(res_review))

In [0]:
import operator
print(len(res_review))
sorted_review = sorted(res_review.items(), key=operator.itemgetter(1), reverse=True)
review_ten = dict()

count = 10
for key, value in sorted_review:
    if count < 1:
        break
    for c in close_business:
        if key in c['business_id']:
            review_ten[key] = (c['name'], value, [])
            print(key, ":  (", c['name'], ":\t", value, ")")
            break
    count -= 1

In [0]:
review = "yelp_dataset/review.json"
idList = list(review_ten.keys())

with open(review, 'r') as file:
    sf = file.readline()
    while(sf != ""):
        onereview = json.loads(sf)
        bID = onereview['business_id']
        if bID in idList:  # only reviews from closed business!
            review_ten[bID][2].append(onereview['text'])
        sf = file.readline()

In [0]:
#######
#
#  Apply the NaiveBayesModel to analysis the top 10 closed restaurants. 
#
#######

def NaiveBayesModel(review_list):
    neg = 0
    pos = 0
    total = len(review_list)
    for sentence in review_list:
        words = sentence.lower().split(' ')
        negSum = 0
        posSum = 0
        for word in words:
            classResult = classifier.classify(word_feats(word))
            if classResult == 'neg':
                negSum += 1
            elif classResult == 'pos':
                posSum += 1
        neg += float(negSum)/len(words)
        pos += float(posSum)/len(words)
    
    return (float(pos)/total, float(neg)/total, 1 - float(pos)/total - float(neg)/total)

In [0]:
nb_results = dict()

for key, value in review_ten.items():
    result = NaiveBayesModel(value[2])
    nb_results[key] = result
    print(value[0], result[0], result[1], result[2])

In [0]:
# Here is the NaiveBayesClassifier Model to analyze the review with positive, nagative and neutral.
from random import shuffle
import nltk.classify.util
from nltk.classify import NaiveBayesClassifier
from nltk.corpus import names
import string

def word_feats(words):
    return dict([(word, True) for word in words])

positive_vocab = ['thanks','awesome', 'outstanding', 'fantastic', 'terrific', 'good', 'nice', 'great', 'super', 'wonderful', 'achievable', 'Delicious','tasty','again']
negative_vocab = ['bad', 'terrible', 'useless', 'hate', ':(', 'not', 'no', 'aweful', 'sad', 'accuse', 'aggressive' , 'annoy']
neutral_vocab = ['when','where','who','why','what','which','is','was','do','did','how','were']

# input the positive words list
with open('yelp_dataset/positive.txt', 'r') as file:  
    sf = file.readline()
    while(sf != ""):
        positive_vocab.append(sf.rstrip())
        sf = file.readline()

# input the negative words list
with open('yelp_dataset/negative.txt', 'r') as file:
    sf = file.readline()
    while(sf != ""):
        negative_vocab.append(sf.rstrip())
        sf = file.readline()
        
positive_features = [(word_feats(pos), 'pos') for pos in positive_vocab]
negative_features = [(word_feats(neg), 'neg') for neg in negative_vocab]
neutral_features = [(word_feats(neu), 'neu') for neu in neutral_vocab]

# shuffle to make sure the length of negtive features is the same as the positive features
shuffle(negative_features)
negative_features = negative_features[0:len(positive_features)]

train_set = negative_features + positive_features + neutral_features
 
classifier = NaiveBayesClassifier.train(train_set) 

# Predict
neg = 0
pos = 0
sentence = "Delicious! Ordered through Grubhub and it came within an hour on a Sunday night. It was hot still. We ordered a Hawaiian pizza and it was delicious. Perfect amount of cheese and sauce was sweet. The dough was super tasty- I usually skip the crust but this crust was too good! The fries were good but the to-go box should have some vent holes in it so they don't get too soggy. Great pizza place. I'll be ordering again!"
words = sentence.split(' ')
for word in words:
    classResult = classifier.classify(word_feats(word))
    if classResult == 'neg':
        neg = neg + 1
    elif classResult == 'pos':
        pos = pos + 1

print('Positive: ' + str(float(pos)/len(words)))
print('Negative: ' + str(float(neg)/len(words)))
print('Neutral: ' + str(1 - float(pos)/len(words) - float(neg)/len(words)))

# Results: summarize and visualize the results discovered from the analysis

Please use figures, tables, or videos to communicate the results with the audience.


In [0]:
# Your code starts here
#   Please add comments or text cells in between to explain the general idea of each block of the code.
#   Please feel free to add more cells below this cell if necessary

#################### results for the second aspect ##################
## logistic 
#[[ 758  797]
# [ 318 6967]]
#              precision    recall  f1-score   support
#
#          -1       0.70      0.49      0.58      1555
#           1       0.90      0.96      0.93      7285
#


##XGBoost
#[[ 989  566]
# [ 247 7038]]
#              precision    recall  f1-score   support
#
#          -1       0.80      0.63      0.70      1555
#           1       0.92      0.97      0.94      7285
#






*-----------------
# Done

All set! 

**What do you need to submit?**

* **Notebook File**: Save this Jupyter notebook, and find the notebook file in your folder (for example, "filename.ipynb"). This is the file you need to submit. Please make sure all the plotted tables and figures are in the notebook. If you used "jupyter notebook --pylab=inline" to open the notebook, all the figures and tables should have shown up in the notebook.

* **PPT Slides**: please prepare PPT slides (for 7 minutes' talk) to present about the case study . Each team present their case studies in class for 7 minutes.

Please compress all the files in a zipped file.


**How to submit:**

        Please submit through Canvas, in the Assignment "Case Study 1".
        
**Note: Each team only needs to submit one submission in Canvas**


# Peer-Review Grading Template:

**Total Points: (100 points)** Please don't worry about the absolute scores, we will rescale the final grading according to the performance of all teams in the class.

Please add an "**X**" mark in front of your rating: 

For example:

*2: bad*
          
**X** *3: good*
    
*4: perfect*


    ---------------------------------
    The Problem: 
    ---------------------------------
    
    1. (10 points) how well did the team describe the problem they are trying to solve using the data? 
       0: not clear
       2: I can barely understand the problem
       4: okay, can be improved
       6: good, but can be improved
       8: very good
       10: crystal clear
    
    2. (10 points) do you think the problem is important or has a potential impact?
        0: not important at all
        2: not sure if it is important
        4: seems important, but not clear
        6: interesting problem
        8: an important problem, which I want to know the answer myself
       10: very important, I would be happy invest money on a project like this.
    
    ----------------------------------
    Data Collection and Processing:
    ----------------------------------
    
    3. (10 points) Do you think the data collected/processed are relevant and sufficient for solving the above problem? 
       0: not clear
       2: I can barely understand what data they are trying to collect/process
       4: I can barely understand why the data is relevant to the problem
       6: the data are relevant to the problem, but better data can be collected
       8: the data collected are relevant and at a proper scale
      10: the data are properly collected and they are sufficient

    -----------------------------------
    Data Exploration:
    -----------------------------------
    4. How well did the team solve the following task:
    
    (1) Finding the most popular business categories (5 points):
       0: missing answer
       1: okay, but with major problems
       3: good, but with minor problems
       5: perfect
    
    (2) Find the most popular business objects (5 points)
       0: missing answer
       1: okay, but with major problems
       3: good, but with minor problems
       5: perfect
    
    -----------------------------------
    The Solution
    -----------------------------------
    5.  how well did the team describe the solution they used to solve the problem? (10 points)
       0: not clear
       2: I can barely understand
       4: okay, can be improved
       6: good, but can be improved
       8: very good
       10: crystal clear
       
    6. how well is the solution in solving the problem? (10 points)
       0: not relevant
       2: barely relevant to the problem
       4: okay solution, but there is an easier solution.
       6: good, but can be improved
       8: very good, but solution is simple/old
       10: innovative and technically sound
       
    7. how well did the team implement the solution in python? (10 points)
       0: the code is not relevant to the solution proposed
       2: the code is barely understandable, but not relevant
       4: okay, the code is clear but incorrect
       6: good, the code is correct, but with major errors
       8: very good, the code is correct, but with minor errors
      10: perfect 
   
    -----------------------------------
    The Results
    -----------------------------------
     8.  How well did the team present the results they found in the data? (10 points)
       0: not clear
       2: I can barely understand
       4: okay, can be improved
       6: good, but can be improved
       8: very good
      10: crystal clear
       
     9.  How do you think of the results they found in the data?  (5 points)
       0: not clear
       1: likely to be wrong
       2: okay, maybe wrong
       3: good, but can be improved
       4: make sense, but not interesting
       5: make sense and very interesting
     
    -----------------------------------
    The Presentation
    -----------------------------------
    10. How all the different parts (data, problem, solution, result) fit together as a coherent story?  
       0: they are irrelevant
       1: I can barely understand how they are related to each other
       2: okay, the problem is good, but the solution doesn't match well, or the problem is not solvable.
       3: good, but the results don't make much sense in the context
       4: very good fit, but not exciting (the storyline can be improved/polished)
       5: a perfect story
      
    11. Did the presenter make good use of the 10 minutes for presentation?  
       0: the team didn't present
       1: bad, barely finished a small part of the talk
       2: okay, barely finished most parts of the talk.
       3: good, finished all parts of the talk, but some part is rushed
       4: very good, but the allocation of time on different parts can be improved.
       5: perfect timing and good use of time      

    12. How well do you think of the presentation (overall quality)?  
       0: the team didn't present
       1: bad
       2: okay
       3: good
       4: very good
       5: perfect


    -----------------------------------
    Overall: 
    -----------------------------------
    13. How many points out of the 100 do you give to this project in total?  Please don't worry about the absolute scores, we will rescale the final grading according to the performance of all teams in the class.
    Total score:
    
    14. What are the strengths of this project? Briefly, list up to 3 strengths.
       1: 
       2:
       3:
    
    15. What are the weaknesses of this project? Briefly, list up to 3 weaknesses.
       1:
       2:
       3:
    
    16. Detailed comments and suggestions. What suggestions do you have for this project to improve its quality further.
    
    
    

    ---------------------------------
    Your Vote: 
    ---------------------------------
    [Overall Quality] Between the two submissions that you are reviewing, which team would you vote for a better score?  (5 bonus points)
        0: I vote the other team is better than this team
        5: I vote this team is better than the other team 
        


