## Group and mutate

**Problem:** we want to make a calculation that depends on an aggregate by groups. For example, we want to standardize a variable using the mean and standard deviation by groups of another variable.

 * **Solution in R:** we use `dplyr::group_by(...)` followed by `dplyr::mutate(...)`. 

 * **Solution in python:** we use `pandas.DataFrame.groupby(..., group_keys=False).apply(...)`

In [1]:
# Activate the interface between R and python and also load the rpy2 extension to use R in the notebook cells
from rpy2.robjects import r, pandas2ri
pandas2ri.activate()
%load_ext rpy2.ipython

#### In R

In [2]:
%%R
suppressWarnings(suppressMessages(library(dplyr)))

iris_z_score <- iris %>% 
    group_by(Species) %>%
    mutate(Sepal.Length_z_score = (Sepal.Length - mean(Sepal.Length)) / sd(Sepal.Length))

head(iris_z_score)

[90m# A tibble: 6 x 6[39m
[90m# Groups:   Species [1][39m
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species Sepal.Length_z_sco…
         [3m[90m<dbl>[39m[23m       [3m[90m<dbl>[39m[23m        [3m[90m<dbl>[39m[23m       [3m[90m<dbl>[39m[23m [3m[90m<fct>[39m[23m                 [3m[90m<dbl>[39m[23m
[90m1[39m         5.10        3.50         1.40       0.200 setosa               0.267 
[90m2[39m         4.90        3.00         1.40       0.200 setosa              -[31m0[39m[31m.[39m[31m301[39m 
[90m3[39m         4.70        3.20         1.30       0.200 setosa              -[31m0[39m[31m.[39m[31m868[39m 
[90m4[39m         4.60        3.10         1.50       0.200 setosa              -[31m1[39m[31m.[39m[31m15[39m  
[90m5[39m         5.00        3.60         1.40       0.200 setosa              -[31m0[39m[31m.[39m[31m0[39m[31m17[4m0[24m[39m
[90m6[39m         5.40        3.90         1.70       0.400 setosa       

#### In python

In [3]:
iris = r['iris']

def z_score(df, var):
    return (df[var] - df[var].mean()) / df[var].std()

iris_z_score = iris.copy()
iris_z_score['Sepal.Length_z_score'] = (iris_z_score
                                        .groupby('Species', group_keys=False)
                                        .apply(lambda g: z_score(g, 'Sepal.Length')))
iris_z_score.head()

Unnamed: 0,Sepal.Length,Sepal.Width,Petal.Length,Petal.Width,Species,Sepal.Length_z_score
0,5.1,3.5,1.4,0.2,setosa,0.266674
1,4.9,3.0,1.4,0.2,setosa,-0.300718
2,4.7,3.2,1.3,0.2,setosa,-0.868111
3,4.6,3.1,1.5,0.2,setosa,-1.151807
4,5.0,3.6,1.4,0.2,setosa,-0.017022


### Explanation
The key concept here is the `group_keys=True` parameter. This tells pandas that we don't want to use grouping variables as indexes.