# Convert E-Prime output files to more manageable formats

In [1]:
from os import remove
from os.path import join

import pandas as pd
from convert_eprime.utils import remove_unicode
from convert_eprime.tests.utils import get_test_data_path

data_dir = get_test_data_path()
config_dir = get_test_data_path()

## Convert a raw E-Prime output text file to csv
The text files automatically outputted by E-Prime contain all of the information available in the edat file, although the data are in an unusual format that cannot be used directly. The function text_to_csv reads the data from the text file into a pandas DataFrame and writes the DataFrame out to a file without manipulating the data.

In [2]:
from convert_eprime.convert import text_to_csv

text_file = join(data_dir, 'Cuetask-PILOT-1.txt')
out_file = join(data_dir, 'out_csv.csv')

with open(text_file, 'r') as fo:
    raw_data = fo.readlines()[:20]
    raw_data = [l.rstrip() for l in raw_data]

# Remove unicode characters.
filtered_data = [remove_unicode(row) for row in raw_data]

print('The raw text file (after removing unicode characters):')
for l in filtered_data:
    print(l)
print('')

text_to_csv(text_file, out_file)
print('')

df = pd.read_csv(out_file)
print('The converted csv file:')
print(df.head(10))

remove(out_file)

The raw text file (after removing unicode characters):
*** Header Start ***
VersionPersist: 1
LevelName: Session
LevelName: Block
LevelName: Trial
LevelName: SubTrial
LevelName: LogLevel5
LevelName: LogLevel6
LevelName: LogLevel7
LevelName: LogLevel8
LevelName: LogLevel9
LevelName: LogLevel10
Experiment: Cuetask
SessionDate: 01-01-1800
SessionTime: 12:26:17
SessionStartDateTimeUtc: 1/1/1800 4:26:17 PM
Subject: PILOT
Session: 1
DataFile.Basename: Cuetask-PILOT-1
RandomSeed: 1345330782

Output file successfully created- /Users/tsalo/Documents/tsalo/convert-eprime/convert_eprime/tests/data/out_csv.csv

The converted csv file:
  StudioVersion  TestButton6.ACC  TestButton7.CRESP  \
0           NaN              NaN                NaN   
1           NaN              NaN                NaN   
2           NaN              NaN                NaN   
3           NaN              NaN                NaN   
4           NaN              NaN                NaN   
5           NaN              NaN       

## Convert a raw E-Prime output text file to reduced csv
The text files automatically outputted by E-Prime contain all of the information available in the edat file, but sometimes the desired columns (the ones from the edat file) may be named differently in the text file, or may be split into multiple columns. The function `text_to_rcsv` (text to reduced csv) reads the data from the text file into a pandas DataFrame, just like `text_to_csv`, but then goes a step further and performs a series of operations on the DataFrame based on a parameters file specific to the task.

These operations include:
- Renaming columns
- Merging data from multiple columns into new ones
- Reducing the rows of the DataFrame based on NaNs in one or more columns
- Reducing the columns of the DataFrame based on a desired subset

In [3]:
from convert_eprime.convert import text_to_rcsv

text_file = join(data_dir, 'Cuetask-PILOT-1.txt')
edat_file = join(data_dir, 'Cuetask-PILOT-1.edat2')
param_file = join(config_dir, 'test_cue.json')
out_file = join(data_dir, 'out_rcsv.csv')

with open(text_file, 'r') as fo:
    raw_data = fo.readlines()[:10]
    raw_data = [l.rstrip() for l in raw_data]

# Remove unicode characters.
filtered_data = [remove_unicode(row) for row in raw_data]

print('The raw text file (after removing unicode characters):')
for l in filtered_data:
    print(l)
print('')

text_to_rcsv(text_file, edat_file, param_file, out_file)
print('')

df = pd.read_csv(out_file)
print('The converted and reduced csv file:')
print(df.head(10))

remove(out_file)

The raw text file (after removing unicode characters):
*** Header Start ***
VersionPersist: 1
LevelName: Session
LevelName: Block
LevelName: Trial
LevelName: SubTrial
LevelName: LogLevel5
LevelName: LogLevel6
LevelName: LogLevel7
LevelName: LogLevel8

Output file successfully created- /Users/tsalo/Documents/tsalo/convert-eprime/convert_eprime/tests/data/out_rcsv.csv

The converted and reduced csv file:
   Crave.CRESP  Crave.RESP  Crave.RT  Crave.RTTime   FixDur  Duration  \
0          NaN         7.0    9469.0      135249.0  10531.0       NaN   
1          NaN         NaN       NaN           NaN      NaN    5500.0   
2          NaN         NaN       NaN           NaN      NaN    3500.0   
3          NaN         NaN       NaN           NaN      NaN    6000.0   
4          NaN         NaN       NaN           NaN      NaN    7000.0   
5          NaN         NaN       NaN           NaN      NaN    5000.0   
6          NaN         NaN       NaN           NaN      NaN    6000.0   
7         

## Convert an exported E-Prime text file to a reduced csv
The standard steps for preparing behavioral data from E-Prime for analysis are: (1) open the edat file with **E-DataAid**, (2) export the file as an "E-Prime text" file (a tab-delimited text file with three empty rows at the top), (3) convert the resulting text file to csv format, and (4) reduce the csv file based on which columns are relevant for analysis.

The function `etext_to_rcsv` (E-Prime text to reduced csv) simply reads in the "E-Prime text" file as a pandas DataFrame, reduces the columns of the DataFrame based on a parameters file specific to the task, and writes out the data to a csv file.

In [4]:
from convert_eprime.convert import etext_to_rcsv

etext_file = join(data_dir, 'PILOT_cuetask_exported.txt')
param_file = join(config_dir, 'test_cue.json')
out_file = join(data_dir, 'out_rcsv.csv')

etext_df = pd.read_csv(etext_file, sep='\t')
print('The E-Prime text file:')
print(etext_df.head(10))
print('')

etext_to_rcsv(etext_file, param_file, out_file)
print('')

df = pd.read_csv(out_file)
print('The converted and reduced csv file:')
print(df.head(10))

remove(out_file)

The E-Prime text file:
           STRING  INTEGER INTEGER.1  \
0         EXPNAME  SUBJECT   SESSION   
1               1        1         1   
2  ExperimentName  Subject   Session   
3         Cuetask    PILOT         1   
4         Cuetask    PILOT         1   
5         Cuetask    PILOT         1   
6         Cuetask    PILOT         1   
7         Cuetask    PILOT         1   
8         Cuetask    PILOT         1   
9         Cuetask    PILOT         1   

                                            STRING.1           STRING.2  \
0                                           VARIABLE           VARIABLE   
1                                                  1                  1   
2                                  Clock.Information  DataFile.Basename   
3  <?xml version="1.0"?>\n<Clock xmlns:dt="urn:sc...    Cuetask-PILOT-1   
4  <?xml version="1.0"?>\n<Clock xmlns:dt="urn:sc...    Cuetask-PILOT-1   
5  <?xml version="1.0"?>\n<Clock xmlns:dt="urn:sc...    Cuetask-PILOT-1   
6  <?xml ve