Skip to content


Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP


unthinkingly edited this page · 5 revisions
Clone this wiki locally

Welcome to the Talkingpapers Wiki.

Initial Writeup

Please see Robert’s post

First Draft Release Plan

1. Blog post describing the project, project repository, initial draft of use cases, list of iterations
2. HTML/CSS design of initial forms-builder interface
3. Form Creator web application (database, script)
4. Paper Form Generator that emits text entry/OCR friendly printable form
5. Paper Form Generator that contains 2-D schema bar code (header/footer) and machine-readable field labels
6. Paper Form Reader that OCRs the form and deserializes the schema
7. HTML/CSS design of online Data Massage interface (tabular data + attached scan + OCR confidence scores, potential validation errors)
8. Data Massage web application with CRUD support, merge with schema, validation.
9. Data Massage web application with adapters to publish into Sahana, Freebase, GeoCommons.
10. End-to-end week-long field test involving hundreds of forms with substantial schema evolution.

Initial Mockup

The mock-up below is based on the excellent work that Sahana has been doing to allow generation of OCR-friendly forms for field data collection. In this example, I’ve used their Missing Person form as a starting point. Additions are in red.

Design assumptions:

  • The form was generated with a forms creator that exposes schema in a standard format such as XForms.
  • Data entered in the form may need to be validated using a desktop application while offline in the field, so it needs to contain its own schema.
  • The schema is serialized compactly and encoded in a high-capacity 2-D bar code on the page.
  • The bar code is duplicated across the top and bottom of the printed page for redundancy.
  • Bar coding formats that do not required high-resolution printers, color printers, or high-resolution scanners is preferable, as such equipment is rarely available in the developing world.
  • Bar coding formats that do not use color or even grayscale are preferable, as paper that has been in the field for days often returns dusty, smudged, and in otherwise poor condition.
  • The schema encoded might NOT be XForms, but something more flexible such as Turtle.
  • Individual fields on the form are also labeled with bar codes for the unique item names/IDs defined within the form schema.
  • Forms may consist of multiple pages, so the header/footer bar code should indicate which page, and of how many.
  • The data within a form is ultimately destined for an online repository designed to hold data of that type, so the header should contain a default URI referencing the destination address — the repository where the data should eventually end up.
  • Completed forms are to be scanned or photographed and uploaded to an online service that
    • performs OCR,
    • deserializes the schema, and
    • provides an temporary sandbox where sets of data based on that schema may be cleansed of any errors created during form completion or OCR, before being routed on to their destination.
  • Multiple options are provided at the bottom for anyone handed the form to make sure it finds its way home.
  • Individual copies of printed forms may be duplicated, or accidentally resubmitted. Individual sheets of paper cannot have guaranteed unique IDs. Identifying and consolidating duplicates is a task to be handled online in the sandbox once forms have been uploaded.
  • A Walking Paper map may be printed on the back side of a form and cross-referenced with the Talking Paper schema.
Something went wrong with that request. Please try again.