# Python for (open) Neuroscience

_Lecture 1.4_ - In progress

Luigi Petrucco

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/vigji/python-cimec-2024/blob/main/lectures/Lecture1.3_(In-progress).ipynb)

## Aggregate statistics

It can be useful to aggregate statistics based on the values of a column.

Imagine we want to quickly compute the mean of the values across trials for each subject.



### `.groupby()`

We have a handy syntax to average within each category with `.groupby()`.

The sintax is :
```python
df.groupby("name_of_the_category_column").operation()
```

Now, we want to compute average for every subject:

In [None]:
trials_df.head(5)

In [None]:
# In this case, the operation is `mean()`.
# Note how the result will have the variable we group by as index:

subj_means_df = trials_df.groupby("subject").mean()
subj_means_df

By the way, this is a reason why methods are better than functions in this case: they can be chained with a clearer syntax!

# Index broadcasting in `pandas`

Let's subtract from each subject the mean for each variable.

In [None]:
trials_df.head(3)

In [None]:
subj_means_df.head(3)

The shapes obviously don't match:

In [None]:
print(trials_df.shape)
print(subj_means_df.shape)

In [None]:
trials_df - subj_means_df  # this is obviously funny:

But pandas will broadcast values using indices if we make them consistent!

In [None]:
subj_means_df

In [None]:
trials_df.set_index("subject") - subj_means_df
 #trials_df.head()

So now we can write:

In [None]:
normalized = trials_df - subj_means_df
normalized.head()

This broadcasting is super powerful! Give us very expressive and concise syntax to work with aggregated data without using loops.

## Multi-indexing

Sometimes, we might want to average keeping segregations over multiple categories:

In [None]:
# Create again our trials_df (not relevant how here):
trials_df = pd.DataFrame([dict(subject=i, trial_type=j % 2, **all_subjects_data[i][j])
                             for i in all_subjects_data.keys()
                             for j in range(n_repetitions)])

trials_df

In [None]:
trial_subj_avg = trials_df.groupby(["subject", "trial_type"]).mean()
trial_subj_avg

In [None]:
trials_df.set_index(["subject", "trial_type"]) - trial_subj_avg

(Practicals 1.3.1)

## (bonus) Rolling functions with `.rolling()`

Imagine we have a time series of data, and we want to compute the mean over a window of time (e.g., for smoothing).

In [None]:
# Let's create a time series:
time_series = pd.Series(np.random.rand(100))

In [None]:
# This will compute the mean in a rolling window - ie, smoothing it!
rolling_wnd_size = 10
smoothed = time_series.rolling(rolling_wnd_size, center=True).mean()

In [None]:
time_series.plot()
smoothed.plot()

When done with averaging, same results as other smoothing tools

But now we can use arbitrary functions! (standard deviation, significance tests, etc)

(Practicals 1.3.2)

## Aggregate statistics

It can be useful to aggregate statistics based on the values of a column.

Imagine we want to quickly compute the mean of the values across trials for each subject.



### `.groupby()`

We have a handy syntax to average within each category with `.groupby()`.

The sintax is :
```python
df.groupby("name_of_the_category_column").operation()
```

Now, we want to compute average for every subject:

In [None]:
trials_df.head(5)

In [None]:
# In this case, the operation is `mean()`.
# Note how the result will have the variable we group by as index:

subj_means_df = trials_df.groupby("subject").mean()
subj_means_df

By the way, this is a reason why methods are better than functions in this case: they can be chained with a clearer syntax!

# Index broadcasting in `pandas`

Let's subtract from each subject the mean for each variable.

In [None]:
trials_df.head(3)

In [None]:
subj_means_df.head(3)

The shapes obviously don't match:

In [None]:
print(trials_df.shape)
print(subj_means_df.shape)

In [None]:
trials_df - subj_means_df  # this is obviously funny:

But pandas will broadcast values using indices if we make them consistent!

In [None]:
subj_means_df

In [None]:
trials_df.set_index("subject") - subj_means_df
 #trials_df.head()

So now we can write:

In [None]:
normalized = trials_df - subj_means_df
normalized.head()

This broadcasting is super powerful! Give us very expressive and concise syntax to work with aggregated data without using loops.

## Multi-indexing

Sometimes, we might want to average keeping segregations over multiple categories:

In [None]:
# Create again our trials_df (not relevant how here):
trials_df = pd.DataFrame([dict(subject=i, trial_type=j % 2, **all_subjects_data[i][j])
                             for i in all_subjects_data.keys()
                             for j in range(n_repetitions)])

trials_df

In [None]:
trial_subj_avg = trials_df.groupby(["subject", "trial_type"]).mean()
trial_subj_avg

In [None]:
trials_df.set_index(["subject", "trial_type"]) - trial_subj_avg

(Practicals 1.3.1)

## (bonus) Rolling functions with `.rolling()`

Imagine we have a time series of data, and we want to compute the mean over a window of time (e.g., for smoothing).

In [None]:
# Let's create a time series:
time_series = pd.Series(np.random.rand(100))

In [None]:
# This will compute the mean in a rolling window - ie, smoothing it!
rolling_wnd_size = 10
smoothed = time_series.rolling(rolling_wnd_size, center=True).mean()

In [None]:
time_series.plot()
smoothed.plot()

When done with averaging, same results as other smoothing tools

But now we can use arbitrary functions! (standard deviation, significance tests, etc)

(Practicals 1.3.2)