# P19 Data Repair: Merge Corrupt Session Files

This notebook reconstructs participant P19 study files after a split/corrupt recording export.

## Notebook Summary

- Rebuild participant `P19` data from split source files in `data/corrupt`.
- Merge Block 1 and Blocks 2-4 into complete CSV and JSON datasets.
- Keep only relevant block ranges for analysis and drop out-of-scope rows.
- Detect and fix rows where provided result number and layer were swapped.
- Export repaired files to `data/` for downstream analysis notebooks.


## Rebuild CSV Tracking Data

Load split CSV exports, filter the expected block ranges, and save one merged participant file.


In [2]:
# Import shared helpers and dependencies from functions.ipynb
%run functions.ipynb

# Define source files for separated block recordings
b1 = '../data/corrupt/P19_2022-08-15_Block_1.csv'
b2_4 = '../data/corrupt/P19_2022-08-15_Block_2-4.csv'

# Load Block 1 records and keep only rows from training/early blocks (< 1)
df_b1 = pd.read_csv(b1, sep=";")
df_b1 = df_b1[df_b1['Block'] < 1]

# Load Blocks 2-4 and keep only experiment blocks (> 0)
df_b2_4 = pd.read_csv(b2_4, sep=";")
df_b2_4 = df_b2_4[df_b2_4['Block'] > 0]

# Concatenate both subsets to reconstruct the complete CSV dataset for P19
df_complete = pd.concat([df_b1, df_b2_4])

# Quick sanity check of merged records before export
display(df_complete)

# Save merged CSV used by later analysis notebooks
df_complete.to_csv(rf'../data/P19_2022-08-15_merged.csv', sep=';', index=False)


Unnamed: 0,Date,ProbandId,Block,Trial,TrialNumber,Condition,ResultNumber,ResultLayer,Layer01,Layer02,...,CenterReached,Frequency,LayerBorderBump,EnterCenterBump,inTargetArea,delayCount,delayIdx,visualizationLastLayerIdx,delayElapsedSuccessful,Unnamed: 41
0,2022-08-15T13:53:11.407Z,19,-4,0,-8,Combined Feedback,150,1,150,135,...,-,-,-,-,False,8,49,-1,False,
1,2022-08-15T13:53:11.458Z,19,-4,0,-8,Combined Feedback,150,1,150,135,...,-,-,-,-,False,8,49,-1,False,
2,2022-08-15T13:53:11.484Z,19,-4,0,-8,Combined Feedback,150,1,150,135,...,-,-,-,-,False,8,49,-1,False,
3,2022-08-15T13:53:11.487Z,19,-4,0,-8,Combined Feedback,150,1,150,135,...,-,-,-,-,False,8,49,-1,False,
4,2022-08-15T13:53:11.513Z,19,-4,0,-8,Combined Feedback,150,1,150,135,...,-,-,-,-,False,8,49,-1,False,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
86737,2022-08-15T15:13:08.374Z,19,3,20,83,Visual Feedback,161,1,161,120,...,-,-,-,-,False,8,1562,-1,False,
86738,2022-08-15T15:13:08.405Z,19,3,20,83,Visual Feedback,161,1,161,120,...,-,-,-,-,False,8,1562,-1,False,
86739,2022-08-15T15:13:08.435Z,19,3,20,83,Visual Feedback,161,1,161,120,...,-,-,-,-,False,8,1562,-1,False,
86740,2022-08-15T15:13:08.467Z,19,3,20,83,Visual Feedback,161,1,161,120,...,-,-,-,-,False,8,1562,-1,False,


## Rebuild and Correct JSON Trial Results

Load split JSON-line exports, merge valid blocks, correct swapped result fields, and write a cleaned JSON file.


In [20]:
# Define source files for JSON line exports
b1j = '../data/corrupt/P19_2022-08-15_Block_1.json'
b2_4j = '../data/corrupt/P19_2022-08-15_Block_2-4.json'

# Load Block 1 JSON rows and keep only block IDs below 1
df_b1j = pd.read_json(b1j, lines=True)
df_b1j = df_b1j[df_b1j['BlockId'] < 1]

# Load Blocks 2-4 JSON rows and keep only positive block IDs
df_b2_4j = pd.read_json(b2_4j, lines=True)
df_b2_4j = df_b2_4j[df_b2_4j['BlockId'] > 0]

# Merge JSON subsets to rebuild the complete participant results
df_completej = pd.concat([df_b1j, df_b2_4j])

# Heuristic: swapped rows have a smaller result number than result layer
switchedValues = df_completej[df_completej['ProvidedResultNumber'] < df_completej['ProvidedResultLayer']]

# Swap back both fields in affected rows
for idx, row in switchedValues.iterrows():
    display(idx)
    df_completej.at[idx, 'ProvidedResultNumber'] = row['ProvidedResultLayer']
    df_completej.at[idx, 'ProvidedResultLayer'] = row['ProvidedResultNumber']

# Quick sanity check of corrected merged JSON records
display(df_completej)

# Export corrected JSON lines for downstream study notebooks
df_completej.to_json(rf'../data/P19_2022-08-15_merged.json', lines=True, orient='records')


16

Unnamed: 0,BlockId,CommitResultDate,Condition,EndInteractionDate,ExpectedResultLayer,ExpectedResultNumber,LayerNumberConfiguration,ProbandId,ProvidedResultLayer,ProvidedResultNumber,StartDate,Training,TrialId,level
0,-4,2022-08-15T13:55:12.494Z,Combined Feedback,2022-08-15T13:54:42.888Z,1,150,"[150, 135, 148, 149, 131, 143, 145]",19,3,149,2022-08-15T13:53:56.112Z,True,0,results
1,-4,2022-08-15T13:56:21.543Z,Combined Feedback,2022-08-15T13:56:17.010Z,2,155,"[152, 155, 149, 138, 147, 124, 135]",19,2,155,2022-08-15T13:55:12.496Z,True,1,results
2,-3,2022-08-15T13:57:16.133Z,Visual Feedback,2022-08-15T13:57:10.414Z,6,164,"[145, 154, 116, 148, 143, 164, 130]",19,6,164,2022-08-15T13:56:27.277Z,True,0,results
3,-3,2022-08-15T13:58:21.958Z,Visual Feedback,2022-08-15T13:58:18.094Z,3,158,"[141, 148, 158, 140, 126, 146, 142]",19,3,158,2022-08-15T13:57:16.134Z,True,1,results
4,-2,2022-08-15T13:59:08.558Z,Tactile Feedback,2022-08-15T13:59:04.814Z,2,154,"[145, 154, 139, 136, 146, 147, 135]",19,2,154,2022-08-15T13:58:28.125Z,True,0,results
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
87,3,2022-08-15T15:11:15.684Z,Visual Feedback,2022-08-15T15:11:11.500Z,4,159,"[132, 140, 128, 159, 149, 148, 144]",19,4,159,2022-08-15T15:10:46.821Z,False,16,results
88,3,2022-08-15T15:11:52.532Z,Visual Feedback,2022-08-15T15:11:48.428Z,2,158,"[148, 158, 144, 139, 152, 131, 128]",19,2,158,2022-08-15T15:11:15.685Z,False,17,results
89,3,2022-08-15T15:12:15.357Z,Visual Feedback,2022-08-15T15:12:11.102Z,6,150,"[134, 148, 130, 149, 147, 150, 143]",19,6,150,2022-08-15T15:11:52.533Z,False,18,results
90,3,2022-08-15T15:12:40.749Z,Visual Feedback,2022-08-15T15:12:35.085Z,5,153,"[139, 134, 142, 145, 153, 138, 149]",19,5,153,2022-08-15T15:12:15.357Z,False,19,results
