# Processing Zooniverse Data Exports
Use Case: Two-task workflow -- survey (T0) and text (T1)

**Instructions**

1.   Upload data files using file browser: click on file folder in left sidebar, click on "Upload", and select file from your local machine.
2.   Edit the `filename_classifications`, `filename_output`, and `columns_out` variables to use your input file and desired output columns.
3.   Run the full notebook by selecting the "Run all" option from the "Runtime" menu.
4.   The final step will  automatically download the output CSV file.

### Configuration

In [0]:
filename_classifications = 'future-workflow-wainting-for-approval-classifications.csv'
filename_output = 'classifications_flat+trim.csv'

In [0]:
columns_out = ['classification_id', 'created_at', 'user_name', 'user_id',
               'workflow_id', 'workflow_version', 'subject_ids', 
               'taskvalue_text', 'taskvalue_survey']

In [0]:
# Reference: column names to choose from

columns_in = ['classification_id', 'user_name', 'user_id', 'user_ip', 
              'workflow_id','workflow_name', 'workflow_version', 'created_at', 
              'gold_standard', 'expert', 'metadata', 'annotations', 
              'subject_data', 'subject_ids']
       
columns_new = ['metadata_json', 'annotations_json', 'subject_data_json', 
               'taskvalue_text', 'taskvalue_survey']

### Load Data

In [0]:
import pandas as pd
import json

In [0]:
classifications = pd.read_csv(filename_classifications)

#### Expanding JSON Fields

Converts JSON strings into Python dictionaries, providing access to key-value pairs.

In [0]:
classifications['metadata_json'] = [json.loads(q) for q in classifications.metadata]
classifications['annotations_json'] = [json.loads(q) for q in classifications.annotations]
classifications['subject_data_json'] = [json.loads(q) for q in classifications.subject_data]

### Flatten Annotations

In [0]:
taskvalue_survey =[]
taskvalue_text = []

for i,row in classifications.iterrows():

  # Hard-coded to parse two task annotations (T0, T1)
  entries = len(row['annotations_json'])
  if entries != 2:
    raise Exception('Assumes two annotation entries; found {}.'.format(entries))

  for t in row['annotations_json']:
    # Survey Task = T0
    if t['task'] == 'T0':
      if len(t['value']) > 0:
        taskvalue_survey.append(t['value'][0]['choice'])
      else:
        taskvalue_survey.append('')
    
    # Text Task = T1
    if t['task'] == 'T1':
      taskvalue_text.append(t['value'].rstrip())

In [0]:
classifications['taskvalue_text'] = taskvalue_text
classifications['taskvalue_survey'] = taskvalue_survey

### Trim Columns & Download File

In [0]:
output = classifications[columns_out]
output.to_csv(filename_output, index=False)

In [0]:
from google.colab import files
files.download(filename_output)