# Overview

#### Here, we will be looking at a portion of code which scores participant behavioral data collected during [Will Decker's](w-decker.github.io) honors thesis project. 

# Project description

#### The overall aim of this project is to characterize the spatiotemporal neural dynamics of statistical learning (SL), a neurocognitive mechanism critical for perceptual learning. Here, participants listen to either structured or un-structured phonological sequences made up of 12 syllables while laying in a MRI scanner. The structure of the sequence is determined by transitional probabilities (TP). 

#### In the strucutred sequence, three phonemes were paired together to make a "word", which is the base unit repeated throughout the sequence, whereas the un-structured sequence contained no "words" and all syllables were randomly played, thus the base unit being a single phoneme. This means that the TP across base units in the structured group is $\frac{1}{3}$[^1] and within base units is $1$, while the TP of the base unit in the un-structured group is $\frac{1}{11}$. Three decades of evidence have shown that humans are sensitive to these TP's and are able to segment continuous input using the strucutre as defined by the TPs. Below is an illustrative example of the TPs in the structured versus un-structured sequence.

```mermaid
---
title: Structured TPs
---
flowchart LR

    t1((/pu/))-.->|1.0| t2((/bi/))-.->|1.0| t3((/ka/))-.->|0.33| t4((/di/))-.->|1.0| t5((/da/))-.->|1.0| t6((/bu/))
    t1---|WORD|t3
    t4---|WORD|t6
    style t1 fill:#f9f,stroke:#333,stroke-width:4px,color:#000
    style t2 fill:#f9f,stroke:#333,stroke-width:4px,color:#000
    style t3 fill:#f9f,stroke:#333,stroke-width:4px,color:#000
    style t4 fill:#69f,stroke:#333,stroke-width:4px,color:#000
    style t5 fill:#69f,stroke:#333,stroke-width:4px,color:#000
    style t6 fill:#69f,stroke:#333,stroke-width:4px,color:#000

```

```mermaid
---
title: Un-structured TPs
---
flowchart LR

    t1((/pu/))-.->|0.09| t2((/bi/))-.->|0.09| t3((/ka/))-.->|0.09| t4((/di/))-.->|0.09| t5((/da/))-.->|0.09| t6((/bu/))
    style t1 fill:#f45,stroke:#333,stroke-width:4px,color:#000
    style t2 fill:#69f,stroke:#333,stroke-width:4px,color:#000
    style t3 fill:#f9f,stroke:#333,stroke-width:4px,color:#000
    style t4 fill:#451,stroke:#333,stroke-width:4px,color:#fff
    style t5 fill:#205,stroke:#333,stroke-width:4px,color:#fff
    style t6 fill:#904,stroke:#333,stroke-width:4px,color:#000

```

#### Ultimately, using advanced computational techniques--such as a [Hidden Markov Model (HMM)](https://brainiak.org/tutorials/12-hmm/)--, I expect to uncover three distinct subprocess of SL in the brain: a perceptual, encoding and predictive process.[^2] This would provide a spatially detailed mechanistic account of SL and give credence to existing evidence positing that SL is compositional.[^3] However, we must confirm that participants actually learned the structure.

#### Upon completion of the sequence in the scanner, participants exited the scanner and completed a test which assessed whether an individual learned the structure of the phonological sequence. This test is composed of a three-alternative force choiced task, in which a word from the structured sequence (dubbed "target word") is pitted against two other foil words, which have never been heard by the participant. It is the participants job to discriminate between the foils and target word by correctly selecting the target word. Below is an example trial.

```mermaid
---
title: Example trial
---
flowchart LR

A[FOIL]-.-> B[TARGET]-.-> C[FOIL]

style A fill:#025
style B fill:#f45
style C fill:#025
```

```mermaid
flowchart LR
D[Each presentation is played sequentially with a total of 12 trials.]
```

#### The code used to determine whether participants reliably learned the structure is reviewed in this notebook. 

 [^1]: The construction of the sequences was constrained such that individual units could not immediately repeat themselves. More on the algorithm used to create sequences and its implementation can be found [here](/Honors-Thesis/README.md).
 [^2]: Information on the HMM implementation can be found [here](https://github.com/w-decker/SNL23_plots/blob/main/plotting.ipynb).
 [^3]: See [Batterink & Paller (2017)](https://www.batterinklab.com/_files/ugd/a9b75d_53f0f5269f5942cb81105ef47c84dba5.pdf) and [Moser et al. (2021)](https://www.batterinklab.com/_files/ugd/a9b75d_ab33c519fa7a406e92e68369eacfce2f.pdf)

# More on the behavioral assessment

#### As seen above, participants had to determine the target word amidst two foil words. To assess learning, I examined whether individuals who were exposed to the structured sequence perform above chance ($0.33$) using a one-tailed one sample t-test and whether this performance was significantly greater than those exposed to the un-structured sequence using a one-tailed independent samples t-test.

#### Additionally, this experiment was run on PsychoPy; the resulting output is noisy and the actual results of the assessment must be extracted from a file containing highly erroneous (in our particular case) data.

#### Therefore, to analyze participant's SL abilities, I created a custom module for specifically handling the PsychoPy behavioral output. This is located within the repo submodule. For the purposes of this notebook, I'll extract it and bring it over to the current path.

# Getting the custom module

#### The custom module, named `scoring_module.py`, is housed locally and cannot be installed via `pip` or another package manager. Therefore, it must be downloaded/cloned from GitHub. I have included `scoring_module.py` as part of a $\texttt{git}$ submodule in this repo. Let's bring `scoring_module.py` into the current directory so we can run through it. 

#### First we need to check whether it already exists in the current working directory. To see the current working directory, type `pwd` in the terminal or run `!pwd` in a Python env. Below is a function which checks whether `scoring_module.py` exists in the current directory.

In [1]:
# Import necessary modules
import os
import shutil

def does_scoring_module_exist():
    curr = os.path.abspath(os.path.dirname(__name__)) # gets current path/directory
    module = curr + '/Honors-Thesis/scoring/scoring_module.py' # string variable pointing to location of scoring_module.py

    if os.path.exists(f'{curr}/scoring_module.py'): # check if scoring_module.py is already in current path/directory
        print(f'scoring_module.py already in current directory\n')
    else: # if scoring_module.py is not in current path/directory, then add it 
        shutil.copy(module, curr)
        print(f'scoring_module.py succsefully added\n')


#### Let's call `does_scoring_module_exist()`

In [13]:
does_scoring_module_exist()

scoring_module.py succsefully added



#### As you can see, `scoring_module.py`, has now been added to a place where you and this notebook can easily access it.

# What does `scoring_module.py` consist of?

#### `scoring_module.py` is written in an object-oriented fashion. There are two classes, `Data` and `Stats`. The former cleans and prepares the data for the latter to compute the correct statistical tests mentioned in a [previous section](#more-on-the-behavioral-assessment). Let's see what's actually in `scoring_module.py`.

In [14]:
!cat scoring_module.py # the '!' operator allows you to execute unix/bash commands within a Python env. This particular one prints the contents of a file

#!/usr/bin/env python

# imports for this module
import pandas as pd
import os 
from scipy.stats import ttest_1samp, ttest_ind

class Data(object):
    """Class for getting all the files you wish to analyze and putting them in a single object
    
    Parameters
    ----------
    path: str, default: current path
        Absolute path to the folder which holds data files.
        Must be in .csv format
    """

    def __init__(self, path=os.path.dirname(os.path.abspath(__name__))):
        self.path = path
        self.files = os.listdir(self.path)

    def parse_files(self, subids):
        """Find all of the files you wish to score
        
        Parameters
        ----------
        subids: list, str
            List of subject IDs that match the filenames. 
            Example: subids = ['sub-001', 'sub-002', 'sub-003']
        """
        
        files = []
        for id in subids:
            found= False
            for filename in self.files:
                filename2 = fi

#### However, what we just did doesn't tell us anything important unless you're up for reading the entire module. For now, look at each class and its attributes and methods individually. To start, let's import the module.

In [2]:
import scoring_module as sm # import scoring_module.py and give it a call heuristic
# alternatively...
# from scoring_module import Data, Stats

#### Now that we've imported the module, let's look at `Data`

In [3]:
print(sm.Data.__doc__) # prints the docstring. Certain IDEs will allow easier access to this (e.g., VSCODE or PyCharm)

Class for getting all the files you wish to analyze and putting them in a single object
    
    Parameters
    ----------
    path: str, default: current path
        Absolute path to the folder which holds data files.
        Must be in .csv format
    


#### What this is telling us is that `Data` requires a path pointing the object to the data. Let's do this! First we need to get some data. Fortunately, there is some sample data in the $\texttt{git}$ submodule. Let's move it over here. 

In [43]:
import os, shutil 
curr = os.path.abspath(os.path.dirname(__name__))
sample_data = curr + "/Honors-Thesis/scoring/testdata/"
shutil.copytree(sample_data, curr, dirs_exist_ok=True) # copies contents of subdir recursively to parent dir

'/Users/lendlab/Box Sync/willdecker/GitHub/GaTech_Code'

#### The filename is `sub-001-3afc.csv`. Let's give it to `Data`.

In [4]:
data = sm.Data(path="sub-001-3afc.csv")

#### But what happens next? Well, we need to use the *methods* within the `Data` class to clean and prep the data for the necessary statistical tests. To see which methods belong to the data class, let's run the following code.

In [5]:
Data_methods = [i for i in dir(sm.Data) if callable(getattr(sm.Data, i))] # This gets all of the methods and attributes of the Data class and appends it to a list

# However, this list contains some unneccessary items, so lets remove them (comment out this conditional statement to see the extra stuff)
[Data_methods.remove(i) for i in Data_methods[:] if i.startswith("__")]

# see what methods are in Data
print(Data_methods)

['clean', 'indiv_score', 'parse_files', 'rm_subs', 'score']


#### The methods are what the object can *do*. To see what is required of each method, you can call the docstring using the same steps to access the docstring for `Data`.

#### Let's look at what's under the hood of each method. To do this, we can use the `inspect` package, which comes preinstalled with Python.

In [39]:
from inspect import getsource # This prints the source code of a specified function, class, method, etc
print(getsource(sm.Data.clean))

    def clean(self):
        """Remove erroneous columns from .csv file generated from PsychoPy"""

        # remove unnecessary columns 
        label = ['example_outer_loop.thisRepN',
        'example_outer_loop.thisTrialN',
        'example_outer_loop.thisN',
        'example_outer_loop.thisIndex',
        'example_inner_loop.thisRepN',
        'example_inner_loop.thisTrialN',
        'example_inner_loop.thisN',
        'example_inner_loop.thisIndex',
        'example_shift_loop.thisRepN',
        'example_shift_loop.thisTrialN',
        'example_shift_loop.thisN',
        'example_shift_loop.thisIndex',
        'trials_loop.thisRepN',
        'trials_loop.thisTrialN',
        'trials_loop.thisN',
        'trials_loop.thisIndex',
        'transition_loop.thisRepN',
        'transition_loop.thisTrialN',
        'transition_loop.thisN',
        'transition_loop.thisIndex',
        'blocks_loop.thisRepN',
        'blocks_loop.thisTrialN',
        'blocks_loop.thisN',
        'blocks_lo

#### If you scroll to the top of this output, you can see the docstring, wrapped in tripple quotes `""" """`. You can see that this method, `clean` "Remove[s] erroneous columns from .csv file generated from PsychoPy". It does this by creating a list of the "to-be deleted" column headers and assigning it to the variables `labels`.

#### Using `pandas` (a powerful and widely used Python library), it drops the columns listed in the `labels` variable. Let's do this.

In [6]:
data.clean()

FileNotFoundError: [Errno 2] No such file or directory: 's'