Skip to content
This repository has been archived by the owner on Sep 26, 2019. It is now read-only.

Project Workflows

Alex Gorski edited this page May 17, 2016 · 11 revisions

A workflow represents the series of marking, transcribing, or verification tasks, both ordered and unordered, for a set of subjects (e.g. documents, images). There is a one-to-one correspondence between a subject and a workflow. This will be the most important and most comprehensive step for configuring your project.

A project's workflows are defined in the following files:

my_project/
+-- workflows/
|   +-- mark.json
|   +-- transcribe.json
|   +-- verify.json
  • Mark workflows asks users to annotate the location of specific fields or to simply answer questions about specific fields, for example whether or not a field is present in the document. If your project does not require transcription or validation, this is the only workflow you need.
  • Transcribe workflows asks users to transcribe the text on the document for a particular field, annotated during the mark task workflow.
  • Verify workflows asks users to compare two or more transcriptions of the same field and select the most accurate. This workflow is not required if you trust the accuracy of the annotations generated in the previous steps.

A basic workflow.json file might look like this:

{
  "name":"mark",
  "label":"Mark Workflow",
  "retire_limit": 3,
  "generates_subjects": true,
  "generates_subjects_for": "transcribe",
  "first_task":"the_first_task",
  "tasks": {
    "the_first_task": {
      "next_task": "the_second_task"
    },
    "the_second_task": {
      "next_task": "the_last_task"
    },
    "the_last_task": {
      "next_task": null
    }
  }
}

All workflows support the following properties:

  • name: String - Unique key for this workflow. Must be one of 'mark', 'transcribe', or 'verify'
  • label: String - Friendly name for workflow
  • retire_limit: Int - Number indicating threshold for retiring the subject operated on in a given workflow. Mostly relevant to Mark workflow, where retire_limit is the number of times we require someone to say "There is nothing left to mark." In the Transcription & Verification workflows, retire_limit is ignored in favor of generates_subject_after. Default 3.
  • generates_subjects: Bool - Indicates that some submitted classifications may generate secondary/tertiary subjects. Default true.
  • generates_subjects_for: String - Name of next workflow (if any) to associate with generated subjects, e.g. 'transcribe','verify', or null, which indicates there is no next workflow. Default null.
  • generates_subjects_after: Int - Number of classifications a generated subject must represent before it's activated for the next workflow. In Transcribe and Verify workflow, upon activating a generated subject, the parent subject acquires status "complete". Default 1.
  • generates_subjects_max: Int - Max number of distinct annotations that a generated subject may represent before we mark its status 'contentious'. Default 10.
  • generates_subjects_method: String - Available options are:
    • one-per-classification: (default) Indicates that the submitted classification's annotation should be used to generate a single subject without considering any other classifications submitted for that subject. Used in Mark when generating subjects for Transcribe.
    • collect-unique: Indicates that the generated subject should assemble all distinct classifications for the given subject as a list. Used in Transcribe when generating subjects for Verify.
    • select-most-popular: Indicates that the generated subject should consider all classifications for the given subject and select the annotation value that is most popular. Used in Verify for generating
  • first_task: TASK key of first task to invoke.
  • tasks: Hash mapping task keys to TASKs

You can see sample configurations in our sample projects: anzac, emigrant, and whale_tales.

Tasks

A task is an action your ask a user to perform on a subject. A workflow has one or more tasks that can be ordered or unordered. Here is an example of a typical task:

  tasks: {
    "determine_has_records": {
      "tool": "pickOne",
      "tool_config": {
        "options": {
          "yes": {
            "label": "Yes",
            "next_task": "identify_records"
          },
          "no": {
            "label": "No"
          }
        }
      }
    },
    "identify_records": {
      ...
    }
  }

A task supports the following properties

  • key: String - Workflow-unique alphanumeric (e.g. '0','1','mark_one'). Established as key of the task in the tasks hash in the workflow.json.

  • tool: Enum - One of "pickAndMarkOne", "pointTool", "rectangleTool", "pickOne", "textTool", "numberTool", "dateTool", "compositeTool", "verifyTool". See below for detailed descriptions of these tools.

  • instruction: Text - Friendly prompt given to user, which contextualizes task (e.g. "How many penguins are there?", "What color is this penguin?", "Choose the type of document")

  • help: Hash - Help text can be constructed in one of two ways:

    • As a markdown file. For example, this configuration looks for the file identify_records.md in the /project/my-project/content/help folder:

      "help": {
        "file": "identify_records"
      },
    • As a hash in the following format:

      "help": {
        "title": "How To Identify Records",
        "body": "Typically records have a unique ID on the top right-hand side of the document"
      }
  • generates_subjects: Bool - Indicates that some submitted classifications may generate secondary/tertiary subjects.

  • generates_subject_type: String - Unique string identifying the type of subject generated. This must be unique across all tasks in the workflow. The value must match a task key in the destination workflow.

  • tool_config: Hash - Specify arbitrary tool options. See the section on Tools below.

  • export_name: String - Friendly name for the data point collected by this prompt. E.g. "Record Number", "Document Date" Useful when you export data later.

Tools

Tools are pluggable, configurable widgets that perform a single, simple task related to identifying an area of the subject ("marking"), adding data to a subject ("transcribing"), or moving the user from one tool to the next ("core").

Tools are specified in a task config via tool and tool-specific configuration is specified via tool_config. A simple yes-or-no tool might look like this:

  tasks: {
    "determine_has_animals": {
      "tool": "pickOne",
      "instruction": "Are there animals in this image?",
      "tool_config": {
        "options": {
          "yes": {
            "label": "Yes",
            "next_task": "identify_animals"
          },
          "no": {
            "label": "No"
          }
        }
      }
    }
  }

Or you could ask the user to select multiple options:

  tasks: {
    "identify_animals": {
      "tool": "pickMany",
      "instruction": "Which of these animals do you see?",
      "tool_config": {
        "options": [
          {"value": "lion", "label": "Lion" },
          {"value": "tiger", "label": "Tiger" },
          {"value": "panther", "label": "Panther" },
          {"value": "jaguar", "label": "Jaguar" }
        ]
      }
    }
  }

Core Tools

Certain tools (e.g. 'pickOne') are "core tools", meaning they can appear in any workflow.

Pick One (pickOne)

Pick One is a simple tool that presents two or more optional tasks. This is equivalent to a radio button. Supported configuration options include:

  • options: Array of Hashes - each hash can have the following properties:
    • value: String - The key of the option chosen.
    • label: String - Friendly label of option (e.g. "This looks like a Casualty Form...", "This looks like an attestation..")
    • next_task: String - Key of TASK to jump to if user clicks this option.

Pick Many (pickMany)

Pick Many is similar to Pick One, but allows user to select multiple options before continuing, all of which are stored in a single generated classification. Supported configuration options include:

  • options: Array of Hashes - each hash can have the following properties:
    • value: String - The key of the option chosen.
    • label: String - Friendly label of option (e.g. "This looks like a Casualty Form...", "This looks like an attestation..")

Marking Tools

Marking tools include various methods for identifying specific points and areas of images.

All marking tools accept the following config params (in addition to tool specific params noted below):

  • fill_color: String - CSS color. Default "rgba(0,0,0,0.30)"
  • stroke_color: String - CSS color. Default "#fff"
  • stroke_width: Integer - Pixel stroke width Default 3

All marking tools generate the following classification data (in addition to tool-specific data noted below):

  • x: Integer - Pixel coordinate within parent subject
  • y: Integer - Pixel coordinate within parent subject

Pick One Mark One (pickOneMarkOne)

PickOneMarkOne is the sole marking tool. It produces a menu of "marking types" in the right column, which are associated with user-supplied labels.

Tool-specific config options include:

  • options: List of Hashes - Each hash passed to options should define a marking type using the following properties:
    • type: Enum - The marking type. Must be one of "pointTool" (a single point), "rectangleTool" (a rectangle), "textRowTool" (a rectangle that spans the width of the document)
    • label: The label to display, which the user clicks on to activate the marking type.
    • color: The color of the displayed mark
    • generates_subject_type: String - Unique string identifying the type of subject generated. This must be unique across all tasks in the workflow. The value must match a task key in the destination workflow.
    • help: Hash - See how to define help text in the Tasks section

The supported marking types and their optional (proposed) additional config params are described below:

i. Point Tool (pointTool)

A simple point on the document. Optional config:

  • radius: Integer - Pixel radius. Default 40
ii. Rectangle Tool (rectangleTool):

Rectangular selector for identifying arbitrary rectangular regions of a document. Tool-specific config options include:

  • min_height: Integer in pixels, or Float as percentage of subject
  • max_height: same as above

Tool-specific classification data generated by rectangleRow tools:

  • width: Integer - Width of region
  • height: Integer - Height of region
iii. Text Row Tool (textRowTool)

Document-wide rectangular selector suited to identifying rows of horizontal text that span the width of the document. Tool-specific config options include:

  • min_height: Integer in pixels, or Float as percentage of subject
  • max_height: same as above

Tool-specific classification data generated by textRow tools:

  • yUpper: Integer - y-coordinate of the upper bounders of the row
  • yLower: Integer - y-coordinate of the lower bounders of the row

Example pickOneMarkOne config:

"identify_records": {
  "tool": "pickOneMarkOne",
  "instruction": "Pick a field and mark it with the corresponding marking tool.",
  "tool_config": {
    "options": [
      { "type": "rectangleTool",
        "label": "Blocky region of the doc",
        "color": "green",
        "max_height": 0.6
      },
      { "type": "rectangleTool",
        "label": "Row of text",
        "color": "blue"
      }
    ]
  }
}

Transcribe Tools

Transcribe tools are widgets suitable for gathering typed data with configurable constraints.

Text Tool (textTool)

Probably the simplest transcription tool, the text tool presents a single text input. The tool can be augmented with options below.

  • limit: Integer - Character limit.
  • suggest: (not yet supported) Indicates should autocomplete. suggest supports the following possible values:
    • An array of literal strings (e.g. ["cat","dog","other"])
    • A URL returning auto-complete suggestions for current entry (e.g. "http://example.com/terms/suggest?term=%%TERM%%" )
    • The phrase "common", which indicates the most commonly typed values for the current input will be suggested.
  • multiline: Boolean - Indicates whether or not value is expected to have line-breaks. Note that sufficiently large values of limit imply use of a textarea regardless.
  • match: String - Regex defining valid strings (e.g. "^[a-z]+$")

Configuration example:

  ...
  tasks: {
    "transcribe_mortgager_name": {
      "tool": "textTool",
      "tool_config": {
        "limit": 100
      }
    }
  }
  ...

Number Tool (numberTool)

An extension of the Text Tool (perhaps using match option to restrict characters like "^-?\d+([,.]\d+)?$"). Supported config options:

  • minimum: Number
  • maximum: Number

Date Tool (dateTool)

A date (and date range) picker that supports approximates dates and pre-1970 dates. Supported config options:

  • minimum: String - ISO 8601 date string establishing oldest allowed date (e.g. "-30000101" for 3000 BCE)
  • maximum: String - ISO 8601 date string establishing maximum allowed date (e.g. "20150227")
  • range: Boolean - If true, a date range may be selected
  • allow_approximate: Boolean - If true, user may check a box to indicate date is approximate.

Composite Tool (compositeTool)

A composite tool is a tool composed of two or more basic tools. A composite tool presents multiple tools side by side for cases where the mark being considered contains multiple distinct data that are confusing to consider in isolation. Config options include:

  • tools: Array of Hashes defining what tools to compose. Each hash should include:
    • tool: String - Key of tool
    • tool_config: Hash - tool specific config options (refer to tool specific config options above)

Note that composite tool classifications are special in that they are a hash of the classifications generated by each of their constituent tools. For example, if a composite tool is configured like this:

"em_transcribe_valuation": {
  "tool": "compositeTool",
  "tool_config": {
    "tools": {
      "em_valuation_date": {
        "tool": "dateTool",
        "tool_config": {},
        "label": "Record Date"
      },  
      "em_valuation_amount": {
        "tool": "textTool",
        "tool_config": {},
        "label": "Amount"
      }
    }
  },  
  "instruction": "Enter any dated property valuations that were recorded"
}

Next step: Define Your Project Subjects