<a href="https://colab.research.google.com/github/vekteo/ASRT_rapid_consolidation/blob/main/PDP_analysis.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Process Dissociation Procedure (PDP) task analysis code


---

1.   Upload the ASRT output files merged into one single .csv file (*asrt.csv* on OSF)
2.   Upload the PDP output files merged into one single .csv file (*pdp.csv* on OSF)
3. Hit *Run all*

Code by Teodóra Vékony https://github.com/vekteo

Lyon Neuroscience Research Center (CRNL), Université Claude Bernard Lyon 1

# Import Python packages

In [None]:
import pandas as pd
import numpy as np
from google.colab import files
import glob

# Read data files

In [None]:
df = pd.read_csv("pdp.csv", encoding="ISO-8859-1")

# Drop participants

In [None]:
participants_to_drop = ['wxxo75wg','9f29r2qv','7dh53ycb','86pcb2mb','esmmznl1','npsaibu7','qrzq7ts2','sbkvhjhy','trkopxt7','ql49dn7x','c6yyouxw','297gibh5','wunf7i3o','7m76soye','cayu2l31','uqgl5lcz','bxb8djmc','w6vcdqme','p3fyit39','30mhiihl','01hhoswd','j0g4kqyn','bg1yka4d','2j62a51o','9hy16b49','o9aanl9z','t1t2dyid','3pw99p9u','4oltkref','p4f8l137','q965bort','smvr23e8','95f4z9l1','cv841tyx','qmcdul7z','79byw8v4','4xnyxjl0','139lnsjl','zxyx8jio','dvs1rx99','j1mttxb7','ye3yxiiu','02lil6a8','clkmmxkg','rra05osw','32zod29m','l452387h','ne1zmdp0','olyffqcf','bj8l5rz6','ic7gxv2p','k4vkynk3','qc8kj5y4','w5xvw32d','2fjf1gdf','993mmbxg','da8q2m1i','fehyc211','m5zau3bf','54pj2wer','fdvkiue5','fv4eqjr5','3wif8txo','ymmro5ep','sn22rw6s','ia5zwn5m','4yiwcr2r','858nish3','ofenlmw3','ovmb6sgq','wcsfo7p6','dor4944c','hk8jwhaz','bsh7t661','536ektl5','gr7qbqqj','my49x5bc','z63ep3ee','2uaif6hy','ran6tboj']
df = df[~df['Participant Public ID'].isin(participants_to_drop)]
df.head()

# Preprocessing

**1. Drop unnecessary columns**

In [None]:
df = df.loc[:, ['Participant Public ID','randomiser-bcas','block','trial_number','response_key', 'response_button','task_type']]
df.head()

**2. Create a new column with the positions from 1-4**

In [None]:
df["position"] = df["response_button"] + 1
df.head()

**3. Create three new columns: one columns for each element of the triplet**

Here we define what was the first (n-2), second (n-1), and last element (n) of the triplet of which the current trial is the last element

In [None]:
for i in range(2, df.last_valid_index()):
    df.loc[i, 'triplet_1'] = df.loc[i-2, 'position']

for i in range(1, df.last_valid_index()):
    df.loc[i, 'triplet_2'] = df.loc[i-1, 'position']

for i in range(0, df.last_valid_index()):
    df.loc[i, 'triplet_3'] = df.loc[i, 'position']

df.head()

**4. Drop unnecessary rows**

We drop the rows which contain NaN values (including the first two trials of the response blocks - because they are not complete triplets)

In [None]:
df = df[df['trial_number'].notna()]
df = df.dropna()
df.head()

**5. Convert triplet columns to string format and leave only the first character**

In [None]:
df['triplet_1'] = df['triplet_1'].apply(str)
df['triplet_2'] = df['triplet_2'].apply(str)
df['triplet_3'] = df['triplet_3'].apply(str)

df['triplet_1'] = df['triplet_1'].str[:1]
df['triplet_2'] = df['triplet_2'].str[:1]
df['triplet_3'] = df['triplet_3'].str[:1]
df.head()

**6. Create a new column containing the concatenated value of the three triplet columns**

In [None]:
df['triplet'] = df['triplet_1'] + df['triplet_2'] + df['triplet_3'] 
df.head()

**7. Load the ASRT output**

In [None]:
asrt = pd.read_csv("asrt.csv", encoding="ISO-8859-1")
asrt.head()

**8. Drop the unnecessary columns of the ASRT dataframe**

In [None]:
asrt = asrt.loc[:, ['Participant Public ID', 'sequence']]

**9. Drop the unnecessary rows of the ASRT dataframe**

In [None]:
asrt = asrt.dropna()
asrt.head()

**10. Drop the duplicates of the ASRT dataframe**

In [None]:
asrt = asrt.drop_duplicates()

**11. Convert the sequence column to string and leave only the first four characters**

In [None]:
asrt['sequence'] = asrt['sequence'].apply(str)
asrt['sequence'] = asrt['sequence'].str[:4]
asrt.head()

**12. Drop the duplicates**

In [None]:
asrt = asrt.drop_duplicates()
asrt.head()

**13. Append the sequence column of the ASRT dataframe to the PDP dataframe based on the public ID**

Now we see which participant completed which sequence

In [None]:
df = df.merge(asrt, how='outer', on='Participant Public ID')
df.head(5)

**14. Drop unnecessary columns**

In [None]:
df = df.dropna()

**15. Create triplet_type column: value is 1 if the triplet column contains a H triplet, the value is 0 is it contains a L triplet**

In [None]:
for i in range(0, df.last_valid_index()):
    if df.loc[i, 'sequence'].find(df.loc[i, 'triplet_1']+df.loc[i, 'triplet_3'])!=-1:
      df.loc[i, 'is_high'] = 1
      df.loc[i, 'r_or_t'] = 0
    elif df.loc[i, 'triplet_1']==df.loc[i, 'sequence'][3] and df.loc[i, 'triplet_3']==df.loc[i, 'sequence'][0]:
      df.loc[i, 'is_high'] = 1
      df.loc[i, 'r_or_t'] = 0
    else:
      df.loc[i, 'is_high'] = 0
      if df.loc[i, 'triplet_1']==df.loc[i, 'triplet_3']:
         df.loc[i, 'r_or_t'] = 1
      else:
         df.loc[i, 'r_or_t'] = 0

df.head()

**16. Create new block column with 1-4 instead of 1-8**

In [None]:
df.loc[df['block'] == 1, 'new_block'] = 1
df.loc[df['block'] == 2, 'new_block'] = 2
df.loc[df['block'] == 3, 'new_block'] = 3
df.loc[df['block'] == 4, 'new_block'] = 4
df.loc[df['block'] == 5, 'new_block'] = 1
df.loc[df['block'] == 6, 'new_block'] = 2
df.loc[df['block'] == 7, 'new_block'] = 3
df.loc[df['block'] == 8, 'new_block'] = 4

**17. Remove blocks with more than 50% of invalid answers (repetitions, trills) from the PDP dataframe**

In [None]:
r_or_t_proportion = df.groupby(['Participant Public ID', 'randomiser-bcas','task_type','new_block']).agg({'r_or_t': 'mean'})
r_or_t_proportion

In [None]:
df_high = df.groupby(['Participant Public ID', 'randomiser-bcas','task_type','new_block']).agg({'is_high': 'mean'})
df_high

In [None]:
df_high_r_or_t = pd.concat([df_high, r_or_t_proportion], join='outer', axis=1)
df_high_r_or_t

In [None]:
df_invalid_removed = df_high_r_or_t[df_high_r_or_t['r_or_t'] < 0.5]
df_invalid_removed

**18. Calculate the percentage of H triplets [participant, exclusion/inclusion]**

In [None]:
final = df_invalid_removed.groupby(['Participant Public ID', 'randomiser-bcas','task_type']).agg({'is_high': 'mean'})
final = final.unstack()
final_after_exclusion = final.dropna()

**19. Save PDP file**

In [None]:
final_after_exclusion.to_csv("pdp_excl_results.csv",index=True)
files.download("pdp_excl_results.csv")