## Purpose of the notebook

This notebook prints statistics of duration of passive time and sequence lengths for participant data.

In [1]:
import sys

sys.path.insert(
    1, "..\\utilities\\"
)  # adds utilities folder to path so we can import modules from it, won't be needed after packaging

In [None]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import os
import loading_utils as load

participant_list = [200, 201, 202, 204, 205, 206, 207, 209, 210, 211, 212, 213]

In [None]:
# This is a block for making participant dataframes from raw files in directory data_dir (here Windows path to retinawise mirror folder on drive)
# It saves the participant dfs into directory defined in save_path (about 300 - 400 MB per participant), format 2xx_recording_data.csv
# Uncomment and run if you don't have these dataframes (remember that the folder specified in save_path must exist)
# data_dir = "D:/retinawise_mirror/raw/"
# save_path = './results/'
# for participant_id in participant_list:
#     data_df, protocol_timecourse_df, protocol_vars_df = load.load_participant_data(participant_no=participant_id,
#                                                                                    data_dir=data_dir,
#                                                                                    include_failed=False,
#                                                                                    save=True,
#                                                                                    save_path=save_path)

### Passive times

In [3]:
data_dir = "./results/new/"

for file in os.listdir(data_dir):
    if file.endswith("recording_data.csv"):
        print(f"Statistics of passive time for participant {file[0:3]}")
        path_to_file = os.path.join(data_dir, file)
        participant_df = pd.read_csv(path_to_file)
        transition_times = []
        transition_times_sec = []
        for recording in participant_df["Recording id"].unique():
            recording_df = participant_df[participant_df["Recording id"] == recording]
            transition_times_per_sess = [
                len(
                    recording_df[
                        (recording_df["Sequence index"] == i)
                        & (recording_df["Experiment state"] == "Passive")
                    ]
                )
                for i in sorted(recording_df["Sequence index"].unique())[1::]
            ]
            transition_times.append(transition_times_per_sess)

            transition_times_per_sess_sec = [
                np.max(
                    recording_df["Overall time Sec"][
                        (recording_df["Sequence index"] == i)
                        & (recording_df["Experiment state"] == "Passive")
                    ]
                )
                - np.min(
                    recording_df["Overall time Sec"][
                        (recording_df["Sequence index"] == i)
                        & (recording_df["Experiment state"] == "Passive")
                    ]
                )
                for i in sorted(recording_df["Sequence index"].unique())[1::]
            ]

            transition_times_sec.append(transition_times_per_sess_sec)
            for i, tr_time in enumerate(transition_times_per_sess_sec):
                if tr_time > 1:
                    print(
                        f"Transition time longer than 1 s in recording: {recording}, sequence: {i+2}, outlier time in seconds: {tr_time}, outlier time in samples: {transition_times_per_sess[i]}"
                    )

            transition_times_per_sess_sec = []
            transition_times_per_sess = []
        transition_times_flat = sum(transition_times, [])
        transition_times_flat_sec = sum(transition_times_sec, [])
        max_time = np.max(transition_times_flat)
        min_time = np.min(transition_times_flat)
        mean_time = np.mean(transition_times_flat)
        std_time = np.std(transition_times_flat)
        print(f"In samples:")
        print(f"max: {max_time}, min: {min_time}, mean: {mean_time}, stdev: {std_time}")
        max_time = np.max(transition_times_flat_sec)
        min_time = np.min(transition_times_flat_sec)
        mean_time = np.mean(transition_times_flat_sec)
        std_time = np.std(transition_times_flat_sec)
        participant_df = []
        print(f"In seconds:")
        print(
            f"max: {max_time}, min: {min_time}, mean: {mean_time}, stdev: {std_time}\n"
        )

Statistics of passive time for participant 200
In samples:
max: 11, min: 4, mean: 8.238260869565217, stdev: 1.1285683588700872
In seconds:
max: 0.1449999999999818, min: 0.038000000000010914, mean: 0.10964347826087063, stdev: 0.01370838607463436

Statistics of passive time for participant 201
Transition time longer than 1 s in recording: 6, sequence: 8, outlier time in seconds: 19.605999999999995, outlier time in samples: 10
In samples:
max: 11, min: 4, mean: 8.201904761904762, stdev: 1.197297031402774
In seconds:
max: 19.605999999999995, min: 0.05899999999996908, mean: 0.1469219047619067, stdev: 0.850172103358905

Statistics of passive time for participant 202
Transition time longer than 1 s in recording: 13, sequence: 4, outlier time in seconds: 19.58499999999998, outlier time in samples: 9
In samples:
max: 11, min: 4, mean: 8.273333333333333, stdev: 1.0654993612803756
In seconds:
max: 19.58499999999998, min: 0.06799999999998363, mean: 0.1425983333333352, stdev: 0.7944842773757205

St

Passive time doesn't seem to have outliers in terms of length in samples but there are a few trials where 10 samples of it equal to 20 seconds. It appears to only amount to 1 outlier sequence per participant. Apart from that, the passive times don't exceed 1 s. Segmenting the signal would be easier if the passive times were excluded because they appear to be inhomogeneous. However, this way the 1 second baseline wouldn't be taken from right before the stimulation start. I would suggest including these passive times into the 1 second baseline in non-outlier trials and looking closer at the outliers separately. Then it could be determined how the 20 seconds transition is distributed among the 10 samples.

### Sequence lengths

In [4]:
data_dir = "./results/new/"

for file in os.listdir(data_dir):
    if file.endswith("recording_data.csv"):
        print(f"Statistics of sequence length for participant {file[0:3]}")
        path_to_file = os.path.join(data_dir, file)
        participant_df = pd.read_csv(path_to_file)
        sequence_times = []
        sequence_times_sec = []
        for recording in participant_df["Recording id"].unique():
            recording_df = participant_df[participant_df["Recording id"] == recording]
            sequence_times_per_sess = [
                len(
                    recording_df[
                        (recording_df["Sequence index"] == i)
                        & (recording_df["Experiment state"] == "Active")
                    ]
                )
                for i in sorted(recording_df["Sequence index"].unique())[1::]
            ]
            sequence_times_sec_per_sess = [
                np.max(
                    recording_df["Sequence time Sec"][
                        (recording_df["Sequence index"] == i)
                        & (recording_df["Experiment state"] == "Active")
                    ]
                )
                - np.min(
                    recording_df["Sequence time Sec"][
                        (recording_df["Sequence index"] == i)
                        & (recording_df["Experiment state"] == "Active")
                    ]
                )
                for i in sorted(recording_df["Sequence index"].unique())[1::]
            ]
            sequence_times.append(sequence_times_per_sess)
            sequence_times_sec.append(sequence_times_sec_per_sess)
            for i, sequence_time in enumerate(sequence_times_per_sess):
                if sequence_time > 1200 or sequence_time < 500:
                    print(
                        f"Outlier in recording: {recording}, sequence: {i+2}, outlier time in samples: {sequence_time}, outlier time in seconds: {sequence_times_sec_per_sess[i]}"
                    )
            sequence_times_per_sess = []
        sequence_times_flat = sum(sequence_times, [])
        sequence_times_sec_flat = sum(sequence_times_sec, [])
        max_time = np.max(sequence_times_flat)
        min_time = np.min(sequence_times_flat)
        mean_time = np.mean(sequence_times_flat)
        std_time = np.std(sequence_times_flat)
        participant_df = []
        print(
            f"max: {max_time}, min: {min_time}, mean: {mean_time}, stdev: {std_time} samples"
        )
        print(
            f"max: {np.max(sequence_times_sec_flat)}, min: {np.array(sequence_times_sec_flat)[np.array(sequence_times_flat)==min_time]} or {np.min(sequence_times_sec_flat)} , mean: {np.mean(sequence_times_sec_flat)}, stdev: {np.std(sequence_times_sec_flat)} seconds \n"
        )

Statistics of sequence length for participant 200
Outlier in recording: 23, sequence: 26, outlier time in samples: 3306, outlier time in seconds: 57.401
max: 3306, min: 1001, mean: 1138.175652173913, stdev: 96.4597979708369 samples
max: 57.401, min: [19.421] or 19.267000000000003 , mean: 19.497813913043476, stdev: 1.5821939314369755 seconds 

Statistics of sequence length for participant 201
max: 1176, min: 953, mean: 1131.8209523809523, stdev: 36.04761849405548 samples
max: 19.468, min: [19.445] or 19.235999999999997 , mean: 19.432224761904763, stdev: 0.01965092888124353 seconds 

Statistics of sequence length for participant 202
max: 1176, min: 1007, mean: 1142.325, stdev: 27.471367670115495 samples
max: 19.467, min: [19.459] or 19.389 , mean: 19.433439999999997, stdev: 0.013328030612209675 seconds 

Statistics of sequence length for participant 205
max: 1176, min: 1058, mean: 1148.3809523809523, stdev: 20.01712645396244 samples
max: 19.464, min: [19.435] or 19.398 , mean: 19.4339695

Looking at sequence times (not including the adaptation sequence), it seems there are a few outliers present (defined as sequence duration>1200 samples or <500 samples). They extremely short ones still have longer than 5 seconds recording, so the stimulation period should be recorded properly.

Looking at the times in seconds, sampling is not uniform, e.g. for participant 210: 6031 samples is 430 seconds, which is a sampling frequency of 14 Hz, meanwhile 815 samples is 16.5 seconds, which is a sampling frequency of 50 Hz. 