# Python for (open) Neuroscience

_Lecture 1.4_ - More on `pandas` and plotting

Luigi Petrucco

Jean-Charles Mariani

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/vigji/python-cimec/blob/main/lectures/Lecture1.4_More-panda-plotting.ipynb)

In [None]:
import pandas as pd
import numpy as np

### `.groupby()`

We have a handy syntax to average within each category with `.groupby()`.

The sintax is :
```python
df.groupby("name_of_the_category_column").operation()
```

Let's create a dataframe with some data for each of three subjects:

In [None]:
n_subjects, n_trials = 3, 4
trials_df = pd.DataFrame(dict(subject=[f"subj{i}" for i in range(n_subjects) for _ in range(n_trials)],
                              accuracy=np.random.uniform(0, 1, n_trials*n_subjects),
                              rt=np.random.uniform(0, 100, n_trials*n_subjects)))
trials_df

In [None]:
# In this case, the operation is `mean()`.
# Note how the result will have the variable we group by as index:

subj_means_df = trials_df.groupby("subject").mean()
subj_means_df

By the way, this is a reason why methods are better than functions in this case: they can be chained with a clearer syntax!

## Index broadcasting in `pandas`

Let's subtract from each subject the mean for each variable.

In [None]:
trials_df.head(5)

In [None]:
subj_means_df.head(3)

The shapes obviously don't match:

In [None]:
print(trials_df.shape)
print(subj_means_df.shape)

In [None]:
trials_df - subj_means_df  # this is obviously funny:

But pandas will broadcast values using indices if we make them consistent!

In [None]:
subj_means_df

In [None]:
trials_df.set_index("subject") - subj_means_df
 #trials_df.head()

So now we can write:

In [None]:
normalized = trials_df.set_index("subject") - subj_means_df
normalized.head()

## Multi-indexing

Sometimes, we might want to average keeping the segregation over multiple categories:

In [None]:
# Create again our trials_df (not relevant how here):
trials_df = pd.DataFrame(dict(subject=[f"subj{i}" for i in range(3) for _ in range(3)],
                              trial_type
                              accuracy=np.random.uniform(0, 1, 9),
                              rt=np.random.uniform(0, 100, 9)))

trials_df

In [None]:
trial_subj_avg = trials_df.groupby(["subject", "trial_type"]).mean()
trial_subj_avg

In [None]:
trials_df.set_index(["subject", "trial_type"]) - trial_subj_avg

(Practicals 1.4.0)

## Rolling functions with `.rolling()`

Imagine we have a time series of data, and we want to compute the mean in a sliding window (e.g., for smoothing).

In [None]:
# Let's create a time series:
time_series = pd.Series(np.random.rand(100))
time_series.plot()

In [None]:
# This will compute the mean in a rolling window - ie, smooth it!
rolling_wnd_size = 10
smoothed = time_series.rolling(rolling_wnd_size).mean()

In [None]:
time_series.plot(legend="Original")
smoothed.plot(legend="Time averaged")

Note that there will be nan values at the borders, where we do not have enough data to compute the mean

By default, the window will not be centered:

In [None]:
dirac_series = pd.Series(np.zeros(30))
dirac_series[15] = 1

dirac_series.plot(figsize=(3,2))
dirac_series.rolling(8).mean().plot()

In [None]:
dirac_series = pd.Series(np.zeros(30))
dirac_series[15] = 1

dirac_series.plot(figsize=(3,2))
dirac_series.rolling(8, center=True).mean().plot()

When done with averaging, same results as other smoothing tools

But now we can use arbitrary functions! (standard deviation, significance tests, etc)

In [None]:
win_size = 10
time_series.plot()
time_series.rolling(window=win_size, center=True).min().plot()
time_series.rolling(window=win_size, center=True).max().plot()

(Practicals 1.4.1)

# Object-oriented plotting using `matplotlib`

Recap: the standard plotting library in Python is `matplotlib`.

In [None]:
import matplotlib.pyplot as plt

To open a new empty figure, we call `plt.figure()` (if we don't, matplotlib will plot on the last figure we opened).

In [None]:
plt.figure(figsize=(3,2))  # with the figsize argument we can control the dimension of the plot

In [None]:
# We can plot a line with the plt.plot() function:
plt.figure()
plt.plot([1, 2, 2, 3])


In a simple plot we can control attributes of the plot with some functions:

In [None]:
plt.figure(figsize=(3,2))
plt.hist(np.random.randn(1000))
plt.xlabel("Values")
plt.ylabel("Count")

## Object oriented interface

`matplotlib` has two interfaces: a simple one, and an object oriented one.

To make more complex plots, we should use the object oriented interface: it's more flexible and expressive.

In [None]:
fig, ax = plt.subplots(figsize=(3,2)) # this will create a figure and an axis object
type(fig), type(ax)

In [None]:

# We can now call methods on the axis object:
ax.plot([1, 2, 2, 3])

# We can modify the aspect of the axis using its methods:
ax.set(xlabel="Time", ylabel="Money", title="My plot")

My recommandation is to get used to the object oriented interface!

Also, I would generally stick to the `matplotlib` functions to generate panels and not mix code with the pandas plotting functions.

## Matplotlib subplots

Using `plt.subplots()` we can create a figure with multiple subplots:

In [None]:
fig, axes = plt.subplots(2, 2, figsize=(6, 4))

Now axes will be a 2x2 array `numpy` array of axis objects!

In [None]:
type(axes)

In [None]:
axes.shape

In [None]:
type(axes[0, 0])

We can now plot on each of the axes indexing them the numpy way:

In [None]:
fig, axes = plt.subplots(2, 2, figsize=(6, 4))
axes[0, 0].plot([1, 2, 2, 3])

It is easy to iterate over multiple axes:

In [None]:
fig, axes = plt.subplots(2, 2, figsize=(6, 4))
x = np.arange(-5, 5, 0.1)
for i in range(4):
    current_axis = axes.flat[i]  # Select one axis
    current_axis.plot(x, x**i)  # Plot on it
    current_axis.set(title=f"Power {i}")  # Set title

plt.tight_layout()

Practicals 1.4.2