# DS-SF-34 | 02 | The `pandas` Library | Assignment | Starter Code

## The Bistro Meets `pandas`

You've just told one of your friend that you are taking a Data Science class.  (Yeah!)  Your friend is running a bistro, a small restaurant, serving moderately priced simple meals in a modest setting ([Wikipedia](https://en.wikipedia.org/wiki/Bistro)).  She collected over some period of time the following information of her patrons' visits.

| Variable's name | Its meaning |
|:---:|:---|
| `name` | Patron's first name |
| `gender` | Patron's gender |
| `is_smoker` | Whether the patron is smoking or not |
| `party` | Party's size |
| `check` | Check amount (\$) (after taxes but before tips) |
| `tip` | Tip (\$) that the patron added to the check |
| `day` | Week day of the visit |
| `time` | Rough time estimate of the visit |

In this assignment, we will be exploring this dataset using `pandas`.<sup>(*)</sup>

<sup>(*)</sup> this dataset was adapted from the `tips` dataset of the `seaborn` package (https://github.com/mwaskom/seaborn-data)

> ### Question 1.  Import `numpy` (as `np`) and `pandas` (as `pd`).

In [1]:
import os

import numpy as np

import pandas as pd

pd.set_option('display.max_rows', 10)
pd.set_option('display.notebook_repr_html', True)
pd.set_option('display.max_columns', 10)

> ### Question 2.  Read the `dataset-02-tips.csv` dataset.

In [2]:
df = pd.read_csv(os.path.join('..', 'datasets', 'dataset-02-bistro.csv'))

> ### Question 3.  What is the class of the `pandas` object storing the dataset?

In [3]:
type(df)

pandas.core.frame.DataFrame

Answer: TODO

> ### Question 4.  How many samples (i.e., rows) are in this dataset?

In [4]:
df.shape[0]

244

Answer: TODO

> ### Question 5.  How many variables (i.e., columns) are in this dataset?

In [5]:
df.shape[1]

8

Answer: TODO

> ### Question 6.  Print the name of each column in the dataset, one name per line.

In [6]:
df.index.name
df

Unnamed: 0,day,time,name,gender,is_smoker,party,check,tip
0,Sunday,Dinner,Kimberly,Female,False,2,16.99,1.01
1,Sunday,Dinner,Nicholas,Male,False,3,10.34,1.66
2,Sunday,Dinner,Larry,Male,False,3,21.01,3.50
3,Sunday,Dinner,Joseph,Male,False,2,23.68,3.31
4,Sunday,Dinner,Janice,Female,False,4,24.59,3.61
...,...,...,...,...,...,...,...,...
239,Saturday,Dinner,Kevin,Male,False,3,29.03,5.92
240,Saturday,Dinner,Sandra,Female,False,2,27.18,2.00
241,Saturday,Dinner,Carl,Male,False,2,22.67,2.00
242,Saturday,Dinner,Jon,Male,False,2,17.82,1.75


> ### Question 7.  Print the first two rows of the dataset to the console.  What does the output look like?

In [7]:
df.loc[ [1,2] ]

Unnamed: 0,day,time,name,gender,is_smoker,party,check,tip
1,Sunday,Dinner,Nicholas,Male,False,3,10.34,1.66
2,Sunday,Dinner,Larry,Male,False,3,21.01,3.5


Answer: TODO

> ### Question 8.  Extract the last 2 rows of the data frame and print them to the console.  What does the output look like?

In [8]:
df.loc[ [242,243]]

Unnamed: 0,day,time,name,gender,is_smoker,party,check,tip
242,Saturday,Dinner,Jon,Male,False,2,17.82,1.75
243,Thursday,Dinner,Brandi,Female,False,2,18.78,3.0


Answer: TODO

> ### Question 9.  Does the dataset contain any missing values?

In [9]:
df.isnull().sum()

day          0
time         0
name         0
gender       0
is_smoker    0
party        0
check        0
tip          0
dtype: int64

Answer: TODO

> ### Question 10.  What can you say about the `is_smoker` variable?  I.e., will it bring any insights when analyzing the dataset?  What do you want to do with it?  (and do it...)

In [10]:
##convert Bools to 0's and 1's

df.is_smoker *= 1
df.is_smoker

0      0
1      0
2      0
3      0
4      0
      ..
239    0
240    0
241    0
242    0
243    0
Name: is_smoker, dtype: int64

Answer: TODO

> ### Question 11.  For which week days does the dataset has data for?

In [11]:
df.day.unique()


array(['Sunday', 'Saturday', 'Thursday', 'Friday'], dtype=object)

Answer: TODO

> ### Question 12.  How often was the bistro patronized for each week day?

(check `.value_counts()`; it could come in handy)

(http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.value_counts.html)

In [12]:
df.day.value_counts()

Saturday    87
Sunday      76
Thursday    62
Friday      19
Name: day, dtype: int64

Answer: TODO

> ### Question 13.  How much tip did waiters collect for each week day?

In [22]:
df[ ['day', 'tip'] ].groupby('day').sum()

Unnamed: 0_level_0,tip
day,Unnamed: 1_level_1
Friday,51.96
Saturday,260.4
Sunday,247.39
Thursday,171.83


Answer: TODO

> ### Question 14.  What is the average tip per check (in absolute \$) for each week day?

In [23]:
df[ ['day', 'tip'] ].groupby('day').mean()

Unnamed: 0_level_0,tip
day,Unnamed: 1_level_1
Friday,2.734737
Saturday,2.993103
Sunday,3.255132
Thursday,2.771452


Answer: TODO

> ### Question 15.  What is the average tip per check (as a percentage of the check) for each week day?

In [29]:
((df.tip)/(df.check)).groupby('day')

KeyError: 'day'

Answer: TODO

> ### Question 16.  Are there any name in common between male and female patrons?  (E.g., `Chris` can refer to either a man or a woman)

(check `numpy.intersect1d()`; it could come in handy)

(https://docs.scipy.org/doc/numpy/reference/generated/numpy.intersect1d.html)

In [33]:
np.intersect1d(df.name, df.name)

array(['Adalberto', 'Aiden', 'Alejandro', 'Alex', 'Alexander', 'Alfred',
       'Alice', 'Amos', 'Andrew', 'Angeline', 'Anna', 'Anne', 'Antonio',
       'Arturo', 'Bailey', 'Barbara', 'Benjamin', 'Brandi', 'Brandon',
       'Brenda', 'Brent', 'Brian', 'Bryson', 'Carl', 'Carter', 'Casey',
       'Cassandra', 'Celeste', 'Chandler', 'Charles', 'Christian',
       'Christopher', 'Cisco', 'Claude', 'Connie', 'Curtis', 'Dale',
       'Daniel', 'Darwin', 'David', 'Dena', 'Dennis', 'Destiny', 'Diana',
       'Diego', 'Dominic', 'Donald', 'Donna', 'Dora', 'Dorothy', 'Dustin',
       'Eduardo', 'Edwin', 'Eleanor', 'Elizabeth', 'Enrique', 'Erick',
       'Erin', 'Francis', 'Frank', 'Fred', 'Gerald', 'Gregory', 'Harold',
       'Harry', 'Hasini', 'Henry', 'Ian', 'Imani', 'Jacob', 'Jacqueline',
       'James', 'Jamie', 'Jan', 'Janice', 'Javier', 'Jaycee', 'Jean',
       'Jeffery', 'Jennie', 'Jeremy', 'Jerry', 'Jessica', 'Jo', 'Jocelyn',
       'John', 'Jon', 'Jonathan', 'Joseph', 'Kaleb', 'Karen', 

Answer: TODO

> ### Question 17.  If no patrons share the same name, how many unique patrons are in the dataset?

In [None]:
# TODO

Answer: TODO

> ### Question 18.  How many times did `Kevin` patronized the bistro?  How about `Alice`?

In [None]:
# TODO

Answer: TODO

> ### Question 19.  Who are the top 3 female and male patrons?

In [None]:
# TODO

Answer: TODO

> ### Question 20.  Who's the best tipper (as a fraction of all tips over all check totals)?  Who's the worst?  How many times did they patronize the bistro?

(check `numpy.intersect1d()`; it could come in handy)

- (http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.sort_values.html)
- (http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.sort_values.html)

In [None]:
# TODO

Answer: TODO