We shouldn't turn on workflows 2a, 2b, and 2c until this workflow 1 final subject set is complete. 

Here is a list of steps that need to be carried out in order to be ready to activate workflows 2a, 2b, and 2c. 

-  Determine the final, aggregated consensus results for the subjects (the journal pages) that have been processed through workflow 1. 

-  Use these consensus results to create 3 new subject sets: 

   - subject set 1) images that should be associated w/ workflow 2a -- i.e., pages with just 1 single sky figure with axes labeled

   - subject set 2) images that should be associated w/ workflow 2b -- i.e., pages with multiple sky figures with axes labeled

   - subject set 3) images that should be associated w/ workflow 2c -- i.e., pages with sky figures without axes labeled

Once you have access to the Project Editor for Astronomy Rewind, you can click on 'Workflows' to see workflows 2a, 2b, and 2c are. Once you go into the editor for each workflow, you can click on 'Test this workflow' to see what the volunteer experience will be in answering the tasks/questions for that workflow.

Note: with this many images, it's best to use one of our API clients to automate the upload of subjects into a new subject set. See https://github.com/zooniverse/panoptes-python-client or https://github.com/zooniverse/panoptes-cli

- Associate the new subject sets with their respective workflow (under 'associated subject set' in the project editor for that workflow). Also de-associate the current 'test images' subject sets that are just used for testing. 

- Once the subject sets are associated, have Alyssa, Julie and Gretchen review each workflow to make sure all the help context, text, etc. is correct and to make sure that the experience into WWT is still working properly. 

- Once workflow 1 is finished, go to the 'Visibility' tab and make Workflows 2a, 2b, and 2c active. 


In [1]:
import os
import json

import pandas as pd

In [2]:
def JSONParser(data):
    """call json.loads"""
    return json.loads(data)


def load_classifications(filename, json_columns=None):
    """
    Load classifications into pandas dataframe.
    
    Some columns of the csv are embedded json and need special parsing.
    """
    json_columns = json_columns or ['metadata', 'annotations', 'subject_data']
    converters = {i: JSONParser for i in json_columns}

    return pd.read_csv(filename, converters=converters)


def unpack(series):
    """
    Return the first value in a series.

    All annotations values are lists because of a few multiple tasks.
    The second multiple task always has the value of 'None of the above'
    (For this dataset!)
    """
    return [a[0] for a in series]


def parse_classifications(filename):
    """
    Load classifications and datamunge annotations column.
    """
    data = load_classifications(filename)

    # Only need the first item in the annotations list of json objects
    data['annotations'] = unpack(data['annotations'])
    return data


def explore(data):
    """ 
    print the values are in the annotations
    """
    import numpy as np
    values = np.unique(np.concatenate([a['value'] for a in data['annotations']]))
    print(values)
    return


def write_workflow_csvs(filename, w2s=None, overwrite=False):
    """
    Cull classifications file and write to new workflows.
    
    Parameters
    ----------
    filename : str
        input csv file
    
    ws2 : dict
        key: workflow title (for file naming)
        value: Single string to select on within annotations values.
    
    overwrite : bool [False]
        overwrite with new file 
    """
    # load and munge data
    data = parse_classifications(filename)
    
    if w2s is None:
        w2s = {'2a': 'A single sky\xa0figure *with* axes labeled',
               '2b': 'Two or more sky figures *with* axes labeled',
               '2c': 'Sky figure(s) *without* axes labeled'}

    for wf in w2s.keys():
        # new filename assumes wf1 is in the first filename!
        outname = filename.replace('wf1', 'wf{0:s}'.format(wf))
        
        # Identifiy matches to next workflow
        iwf = [w2s[wf] in a['value'] for a in data['annotations']]

        # create sub-copy of dataframe with only workflow matches
        df = data.iloc[iwf]

        # write ... or not
        if not os.path.isfile(outname) or overwrite:
            df.to_csv(outname)
            msg = 'wrote'
        else:
            msg = 'not overwriting'
        print('{0:s} {1:s}'.format(msg, outname))

    return

In [3]:
# Slice out all other workflows and testing (and maintain the header line)
! head -1 astronomy-rewind-classifications.csv > astronomy-rewind-classifications_wf1.csv 
! grep "Workflow 1: Identifying figure types" astronomy-rewind-classifications.csv >> astronomy-rewind-classifications_wf1.csv 

write_workflow_csvs('astronomy-rewind-classifications_wf1.csv')

wrote astronomy-rewind-classifications_wf2c.csv
wrote astronomy-rewind-classifications_wf2a.csv
wrote astronomy-rewind-classifications_wf2b.csv
