# Split-Apply-Combine of the frog data set {#exr-split-apply-combine-frog}

<hr>

We will continue working with the frog tongue adhesion data set.


You'll now practice your split-apply-combine skills. First load in the data set. Then, 

**a)** Compute standard deviation of the impact forces for each frog.

**b)** Compute the coefficient of variation of the impact forces *and* adhesive forces for each frog.

**c)** Compute a data frame that has the mean, median, standard deviation, and coefficient of variation of the impact forces and adhesive forces for each frog.

<br />

## Solution

<hr>

In [1]:
# Colab setup ------------------
import os, sys, subprocess
if "google.colab" in sys.modules:
    cmd = "pip install --upgrade polars bebi103 watermark"
    process = subprocess.Popen(cmd.split(), stdout=subprocess.PIPE, stderr=subprocess.PIPE)
    stdout, stderr = process.communicate()
    data_path = "https://s3.amazonaws.com/bebi103.caltech.edu/data/"
else:
    data_path = "../data/"
# ------------------------------

import numpy as np
import polars as pl

Of course, we start by loading in the data frame.

In [2]:
df = pl.read_csv(os.path.join(data_path, 'frog_tongue_adhesion.csv'), comment_prefix='#')

**a)** To compute the standard deviation of impact forces for each frog, we first group by the frog ID and then apply the `std()` method to the `GroupBy` object.

In [3]:
df.group_by('ID', maintain_order=True).agg(pl.col('impact force (mN)').std())

ID,impact force (mN)
str,f64
"""I""",630.207952
"""II""",424.573256
"""III""",124.273849
"""IV""",234.864328


**b)** We can write a function that gives a Polars expression to compute the coefficient of variation for a given column. We can then use this to generate expressions in our group by/agg context.

In [4]:
def cov_expr(col, ddof=0):
    """A Polars expression that computes the coefficient of variation."""
    return col.std(ddof=ddof) / col.mean().abs()


df.group_by('ID', maintain_order=True).agg(
    cov_expr(pl.col('impact force (mN)')),
    cov_expr(pl.col('adhesive force (mN)')),
)

ID,impact force (mN),adhesive force (mN)
str,f64,f64
"""I""",0.401419,0.247435
"""II""",0.585033,0.429701
"""III""",0.220191,0.415435
"""IV""",0.546212,0.308042


**c)** Now we will apply all of the statistical functions to the impact force and adhesive force. This is as simple as using a list of aggregating functions in the `agg()` method of the `GroupBy` object.

In [5]:
(
    df
    .group_by("ID", maintain_order=True)
    .agg(
        pl.col("impact force (mN)").mean().alias('mean impact force (mN)'),
        pl.col("impact force (mN)").median().alias('median impact force (mN)'),
        pl.col("impact force (mN)").std().alias('std impact force (mN)'),
        cov_expr(pl.col('impact force (mN)')).alias('cov impact force'),
        pl.col("adhesive force (mN)").mean().alias('mean adhesive force (mN)'),
        pl.col("adhesive force (mN)").median().alias('median adhesive force (mN)'),
        pl.col("adhesive force (mN)").std().alias('std adhesive force (mN)'),
        cov_expr(pl.col('adhesive force (mN)')).alias('cov adhesive force'),
    )
)

ID,mean impact force (mN),median impact force (mN),std impact force (mN),cov impact force,mean adhesive force (mN),median adhesive force (mN),std adhesive force (mN),cov adhesive force
str,f64,f64,f64,f64,f64,f64,f64,f64
"""I""",1530.2,1550.5,630.207952,0.401419,-658.4,-664.5,167.143619,0.247435
"""II""",707.35,573.0,424.573256,0.585033,-462.3,-517.0,203.8116,0.429701
"""III""",550.1,544.0,124.273849,0.220191,-206.75,-201.5,88.122448,0.415435
"""IV""",419.1,460.5,234.864328,0.546212,-263.6,-233.5,83.309442,0.308042


## Computing environment

In [6]:
%load_ext watermark
%watermark -v -p numpy,polars,jupyterlab

Python implementation: CPython
Python version       : 3.12.9
IPython version      : 9.1.0

numpy     : 2.1.3
polars    : 1.29.0
jupyterlab: 4.3.7

