In [1]:
import pandas as pd
import numpy as np

### 1) Attendance Data

Read the data from the attendance table and calculate an attendance percentage for each student. One half day is worth 50% of a full day, and 10 tardies is equal to one absence.

You should end up with something like this:

```
name
Billy    0.5250
Jane     0.6875
John     0.9125
Sally    0.7625
Name: grade, dtype: float64
```

In [2]:
attend = pd.read_csv('https://gist.githubusercontent.com/o0amandagomez0o/20c8edc2cb83d33da03c8fd2f9db4c4c/raw/attendance.csv', index_col=0)

attend

Unnamed: 0,2018-01-01,2018-01-02,2018-01-03,2018-01-04,2018-01-05,2018-01-06,2018-01-07,2018-01-08
Sally,P,T,T,H,P,A,T,T
Jane,A,P,T,T,T,T,A,T
Billy,A,T,A,A,H,T,P,T
John,P,T,H,P,P,T,P,P


In [3]:
#Writing out what each value means numerically to help me figure out how to map values in the next step.

P = 1
A = 0
H = .5
T = P - 0.1

In [4]:
for col in attend:
    attend[col] = attend[col].map({'P': 1
                                  ,'A': 0
                                  ,'H': .5
                                  ,'T': .9})

In [5]:
attend

Unnamed: 0,2018-01-01,2018-01-02,2018-01-03,2018-01-04,2018-01-05,2018-01-06,2018-01-07,2018-01-08
Sally,1.0,0.9,0.9,0.5,1.0,0.0,0.9,0.9
Jane,0.0,1.0,0.9,0.9,0.9,0.9,0.0,0.9
Billy,0.0,0.9,0.0,0.0,0.5,0.9,1.0,0.9
John,1.0,0.9,0.5,1.0,1.0,0.9,1.0,1.0


In [6]:
attend['rate'] = attend.mean(axis=1)

In [7]:
attend[['rate']]

Unnamed: 0,rate
Sally,0.7625
Jane,0.6875
Billy,0.525
John,0.9125


### 2) Coffee Levels

#### a. Read the coffee_levels table.

In [8]:
coffee = pd.read_csv('https://gist.githubusercontent.com/o0amandagomez0o/f6ea956fedae90420fd2ce4bd382ea8a/raw/coffee_levels.csv')

In [9]:
coffee.head()

Unnamed: 0,hour,coffee_carafe,coffee_amount
0,8,x,0.816164
1,9,x,0.451018
2,10,x,0.843279
3,11,x,0.335533
4,12,x,0.898291


#### b. Transform the data so that each carafe is in it's own column.

In [10]:
coffee.pivot_table(index='hour', columns='coffee_carafe')

Unnamed: 0_level_0,coffee_amount,coffee_amount,coffee_amount
coffee_carafe,x,y,z
hour,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2
8,0.816164,0.189297,0.999264
9,0.451018,0.521502,0.91599
10,0.843279,0.023163,0.144928
11,0.335533,0.235529,0.311495
12,0.898291,0.017009,0.771947
13,0.310711,0.997464,0.39852
14,0.507288,0.058361,0.864464
15,0.215043,0.144644,0.436364
16,0.183891,0.544676,0.280621
17,0.39156,0.594126,0.436677


#### c. Is this the best shape for the data?

It really depends on what you want to do with the data. Since we know have columns for each carafe, it's easier to run calculations just by using that column alone. For me, personally, it is also easier to read.

### 3) Cake Recipes

#### Read the cake_recipes table. This data set contains cake tastiness scores for combinations of different recipes, oven rack positions, and oven temperatures.

In [11]:
cake = pd.read_csv('https://gist.githubusercontent.com/o0amandagomez0o/6bb870ddd6cae613999b9cf33ac41c33/raw/cake_recipes.csv')

In [12]:
cake.head()

Unnamed: 0,recipe:position,225,250,275,300
0,a:bottom,61.738655,53.912627,74.41473,98.786784
1,a:top,51.709751,52.009735,68.576858,50.22847
2,b:bottom,57.09532,61.904369,61.19698,99.248541
3,b:top,82.455004,95.224151,98.594881,58.169349
4,c:bottom,96.470207,52.001358,92.893227,65.473084


#### b. Tidy the data as necessary.

In [13]:
cake[['recipe:position', 'position']] = cake['recipe:position'].str.split(':', expand=True)

In [14]:
cake = cake.rename(columns = {'recipe:position': 'recipe'})

In [15]:
cake = cake.melt(id_vars=['recipe', 'position'], var_name='temp', value_name='t_score')

In [16]:
cake.head()

Unnamed: 0,recipe,position,temp,t_score
0,a,bottom,225,61.738655
1,a,top,225,51.709751
2,b,bottom,225,57.09532
3,b,top,225,82.455004
4,c,bottom,225,96.470207


#### c. Which recipe, on average, is the best?

In [17]:
cake.groupby('recipe').t_score.mean().sort_values(ascending=False)

recipe
b    76.736074
c    75.874748
a    63.922201
d    62.864844
Name: t_score, dtype: float64

#### d. Which oven temperature, on average, produces the best results?

In [18]:
cake.groupby('temp').t_score.mean().sort_values(ascending=False)

temp
275    74.886754
225    71.306022
300    66.627655
250    66.577437
Name: t_score, dtype: float64

#### e. Which combination of recipe, rack position, and temperature gives the best result?

In [19]:
cake[cake.t_score == cake.t_score.max()]

Unnamed: 0,recipe,position,temp,t_score
26,b,bottom,300,99.248541
