# Import and Pick Session

In [8]:
from utils import get_dataset

# Get data for our stimulus 
regions = ["VISam", "VISp", "CA1", "CA3"]
s_df = get_dataset(750332458, "static_gratings", regions)
d_df = get_dataset(750332458, "drifting_gratings", regions)
print(f'There are {len(s_df)} rows in the Static Gratings dataset and {len(d_df)} in the Drifting Gratings dataset.')

There are 1629407 rows in the Static Gratings dataset and 1362054 in the Drifting Gratings dataset.


# EDA

For this study, we analyzed two subsets of the 750332458 session dataset to investigate how different regions of the mouse brain respond to distinct visual stimuli. We focused on static and drifting gratings, as they are similar in nature but differ enough to potentially produce different neural responses. To examine regional differences in activity, we selected brain areas from the visual cortex and the hippocampus. Within the visual cortex, we chose the anteromedial visual area (VISam) and the primary visual cortex (VISp). For the hippocampus, we picked CA1 and CA3 to explore.

Let's start with a general overview of the data.

In [6]:
s_df.describe()

Unnamed: 0,stimulus_presentation_id,unit_id,time_since_stimulus_presentation_onset,start_time,stop_time,duration,stimulus_condition_id,waveform_PT_ratio,waveform_amplitude,amplitude_cutoff,...,probe_channel_number,probe_horizontal_position,probe_id,probe_vertical_position,ecephys_structure_id,anterior_posterior_ccf_coordinate,dorsal_ventral_ccf_coordinate,left_right_ccf_coordinate,probe_sampling_rate,probe_lfp_sampling_rate
count,1629407.0,1629407.0,1629407.0,1629407.0,1629407.0,1629407.0,1629407.0,1629407.0,1629407.0,1629407.0,...,1629407.0,1629407.0,1629407.0,1629407.0,1629407.0,322300.0,322300.0,322300.0,1629407.0,1629407.0
mean,59177.9,951816200.0,0.1204053,7431.514,7431.764,0.2502121,4845.586,0.567007,147.2483,0.01826956,...,236.9589,29.72459,757904500.0,2383.371,398.683,8428.072457,2978.717645,8773.482004,29999.97,1249.999
std,7860.903,3889.181,0.07014959,1291.422,1291.422,0.0003191465,35.33415,0.3416659,86.75452,0.02461307,...,60.72519,20.48353,3.432027,607.3455,25.38088,216.566207,416.314095,271.678546,0.02692248,0.00112177
min,49434.0,951808900.0,2.769411e-08,5398.833,5399.083,0.2501888,4787.0,0.01457269,26.21795,7.623134e-06,...,74.0,11.0,757904500.0,760.0,382.0,8192.0,2308.0,8462.0,29999.92,1249.997
25%,51090.0,951815400.0,0.06194211,5813.213,5813.463,0.2502062,4815.0,0.4390617,92.93914,0.001047949,...,234.0,11.0,757904500.0,2360.0,385.0,8267.0,2482.0,8569.0,29999.97,1249.999
50%,56301.0,951815800.0,0.1144032,7476.602,7476.852,0.2502088,4845.0,0.5316867,129.7331,0.005789122,...,257.0,27.0,757904600.0,2580.0,394.0,8297.0,3244.0,8614.0,29999.97,1249.999
75%,68775.0,951820000.0,0.1794591,8747.914,8748.164,0.2502137,4876.0,0.6093458,166.2309,0.02858403,...,273.0,59.0,757904600.0,2740.0,394.0,8694.0,3283.0,9103.0,30000.0,1250.0
max,70389.0,951821000.0,0.2829064,9151.752,9152.002,0.2835805,4907.0,7.531912,634.3024,0.09127016,...,300.0,59.0,757904600.0,3020.0,463.0,8767.0,3390.0,9200.0,30000.0,1250.0


In [7]:
d_df.describe()

Unnamed: 0,stimulus_presentation_id,unit_id,time_since_stimulus_presentation_onset,start_time,stop_time,duration,stimulus_condition_id,waveform_PT_ratio,waveform_amplitude,amplitude_cutoff,...,probe_channel_number,probe_horizontal_position,probe_id,probe_vertical_position,ecephys_structure_id,anterior_posterior_ccf_coordinate,dorsal_ventral_ccf_coordinate,left_right_ccf_coordinate,probe_sampling_rate,probe_lfp_sampling_rate
count,1362054.0,1362054.0,1362054.0,1362054.0,1362054.0,1362054.0,1362054.0,1362054.0,1362054.0,1362054.0,...,1362054.0,1362054.0,1362054.0,1362054.0,1362054.0,312497.0,312497.0,312497.0,1362054.0,1362054.0
mean,25755.81,951816200.0,0.9888546,3302.242,3304.244,2.001672,266.0146,0.5852441,152.7562,0.01862476,...,232.5288,29.02414,757904500.0,2338.842,400.6358,8424.022003,2987.061786,8768.11811,29999.97,1249.999
std,18521.57,4119.901,0.583459,1287.456,1287.456,1.808359e-05,11.70509,0.5811659,90.43233,0.02506648,...,64.07804,20.74508,3.605622,640.7423,26.86643,213.800919,409.565029,268.143565,0.02729562,0.001137318
min,3798.0,951808900.0,3.899372e-06,1585.648,1587.649,2.00161,246.0,0.01457269,26.21795,7.623134e-06,...,74.0,11.0,757904500.0,760.0,382.0,8192.0,2308.0,8462.0,29999.92,1249.997
25%,3924.0,951815200.0,0.4709797,1963.964,1965.965,2.00166,256.0,0.4265929,100.0804,0.0008969164,...,229.0,11.0,757904500.0,2300.0,385.0,8267.0,2498.0,8569.0,29999.97,1249.999
50%,31076.0,951815800.0,0.9866576,3402.165,3404.167,2.00167,266.0,0.5166103,137.0916,0.004899159,...,257.0,11.0,757904600.0,2580.0,394.0,8297.0,3244.0,8614.0,29999.97,1249.999
75%,49220.0,951820000.0,1.495985,4759.299,4761.301,2.00168,276.0,0.6001969,170.5581,0.03024052,...,273.0,59.0,757904600.0,2740.0,394.0,8687.0,3283.0,9094.0,30000.0,1250.0
max,49432.0,951821000.0,2.001673,5395.831,5397.832,2.00174,286.0,7.531912,634.3024,0.09127016,...,300.0,59.0,757904600.0,3020.0,463.0,8767.0,3390.0,9200.0,30000.0,1250.0


In [15]:
d_df.columns

Index(['stimulus_presentation_id', 'unit_id',
       'time_since_stimulus_presentation_onset', 'stimulus_block',
       'start_time', 'stop_time', 'orientation', 'temporal_frequency',
       'spatial_frequency', 'phase', 'stimulus_name', 'contrast', 'size',
       'duration', 'stimulus_condition_id', 'waveform_PT_ratio',
       'waveform_amplitude', 'amplitude_cutoff', 'cluster_id',
       'cumulative_drift', 'd_prime', 'firing_rate', 'isi_violations',
       'isolation_distance', 'L_ratio', 'local_index', 'max_drift',
       'nn_hit_rate', 'nn_miss_rate', 'peak_channel_id', 'presence_ratio',
       'waveform_recovery_slope', 'waveform_repolarization_slope',
       'silhouette_score', 'snr', 'waveform_spread', 'waveform_velocity_above',
       'waveform_velocity_below', 'waveform_duration', 'filtering',
       'probe_channel_number', 'probe_horizontal_position', 'probe_id',
       'probe_vertical_position', 'structure_acronym', 'ecephys_structure_id',
       'ecephys_structure_acronym'

There are 41 columns capturing information about the neural response to the stimulus presentation, such as stimulus characteristics, neural unit responses, waveform features, probe position, and location in the brain. The unique identifier for this table, the `unit_id` represents the neural unit. Since there is no guarantee that a single neuron is measured, the neural unit represents single or multiple neurons.

### Neural Response & Quality Metrics

The data, by default, is already filtered with appropriate values for `isi_violations`, `amplitude_cutoff`, `presence_ratio`. These variables measure the refractory period violations, the proportion of missed spikes, and the fraction of the time that the unit is detected throughout the recording respectively. In addition to this, there are other metrics which help to understand the quality of our data such as: `firing_rate`, `snr`, `isolation_distance`, `d_prime`, `nn_hit_rate`/`nn_miss_rate`. The Allen Brain Observatory provides the boxplots for these metrics to contextualize our results. Finally, we have `L_ratio`, `local_index`, `cluster_id`, and `filtering` which provide details on the quality of the recorded response.

### Waveform Features

This next group of variables describe the shape and properties of the spike waveforms:

`waveform_PT_ratio`: Peak-to-trough amplitude ratio.

`waveform_amplitude`: Amplitude of the waveform.

`waveform_duration`: Duration of the spike waveform.

`waveform_recovery_slope`: Slope after the waveform trough (recovery phase).

`waveform_repolarization_slope`: Slope leading to the trough.

`waveform_spread`: Spatial extent of waveform across electrodes.

`waveform_velocity_above`, `waveform_velocity_below`: Velocity of waveform propagation.

### Stimulus Presentation Info

The dataset includes the `stimulus_presentation_id` and `stimulus_name` which we've filtered to be *static_gratings* and *drifting_gratings*. For each stimulus, we get information on its `orientation`, `phase`, `contrast`, `size`, and `duration`. The `temporal_frequency` and `spatial_frequency` are also available.

### Probe & Recording Info

Probes are the apparatus used to record the neural responses. The unique identifier is the `probe_id`, and there is also a `probe_description`. Some more info:

`probe_channel_number`: Channel on the probe.

`peak_channel_id`: Channel with peak unit activity.

`probe_horizontal_position`, `probe_vertical_position`: Position on the probe grid.

`probe_sampling_rate`, `probe_lfp_sampling_rate`: Sampling rates (for spikes and LFPs).

`probe_has_lfp_data`: Boolean, whether local field potential data is available.


### Anatomical Mapping

These columns link the recorded neural data to the anatomical regions in the Allen Common Coordinate Framework (CCF), allowing mapping to specific brain structures and 3D spatial localization:

`structure_acronym`: Acronym of the brain structure.

`ecephys_structure_acronym`: Brain structure acronym aligned with electrophysiology data.

`ecephys_structure_id`: Integer ID for the brain region.

`anterior_posterior_ccf_coordinate`: Position along the anterior-posterior axis (µm).

`dorsal_ventral_ccf_coordinate`: Position along the dorsal-ventral axis (µm).

`left_right_ccf_coordinate`: Position along the left-right axis (µm).

`location`: Text label of the anatomical location (may combine region and layer).

### Temporal Tracking & Drift

These final features track when recordings occurred and how stable the recording was across time:

`start_time`, `stop_time`: Start and stop times of the stimulus or recording window (relative to session start).

`cumulative_drift`: Total drift in unit position over the session (likely due to tissue movement).

`max_drift`: Maximum displacement observed for the unit during recording.