Pipeline for inducing structured workflows from raw screen-and-keystroke recordings of a user working on a task, and calculating offloading score.
The pipeline takes a directory of raw recordings as input and produces:
- A segmented, labeled trajectory of what the user was doing at each moment
- A timeline with gaps closed, pauses detected, and remaining gaps classified
- Visualizations of the timeline and activity breakdown
- Per-step output-use labels and an aggregate offloading score for the session, by matching each step to a reference workflow
The pipeline expects a records/ directory in the format produced by the computer-recorder in the workflow-induction-toolkit:
records/
├── actions.db
└── screenshots/
├── <timestamp>_<action>_first.jpg
├── <timestamp>_<action>_final.jpg
├── <timestamp>_<action>_before.jpg
├── <timestamp>_<action>_after.jpg
└── ...
To generate recordings in this format, use the computer-recorder tool from the workflow-induction-toolkit repo linked above. It captures user activity on macOS / Windows into exactly this layout, so any recording produced there can be fed directly into this pipeline.
Optional inputs the pipeline will pick up automatically if present:
- A
.specstory/directory adjacent to or insiderecords/— used to extract AI-assistant interaction logs. - A
logs/directory insiderecords/— fallback location for AI interaction logs. - A parent directory containing multiple
issue_*subfolders — used for data trimming across overlapping recordings and for fetching/summarizing GitHub issue context.
python run_all.py --data /path/to/recordsThis runs all five stages in sequence. Outputs are written into sibling directories next to records/ (i.e. into the parent of the --data directory):
<parent>/
├── records/ # input
├── 0_preprocessing/
│ └── processed_trajectory.json
├── 1_segment/
│ ├── segments.json
│ └── annotated.json
├── 2_induction/
│ ├── grouped.json
│ └── merged.json
├── 3_timeline/
│ ├── timeline.json
│ ├── 2_gaps/gaps_closed.json
│ ├── 3_pauses/with_pauses.json
│ └── 4_classified/classified.json
└── 5_visualizations/
If the pipeline fails partway through, re-run with --resume and it will detect existing intermediate outputs and skip completed stages:
python run_all.py --data /path/to/records --resumepython run_all.py --data /path/to/records \
--skip-preprocessing \
--skip-segmentation \
--skip-induction \
--skip-timeline \
--skip-visualizationOther useful flags:
--threshold <float>— fixed MSE threshold for segmentation (default: derived from--auto-percentile, default 75).--no-remerge— disable LLM-based re-merging of adjacent segments.--min-gap <seconds>— minimum gap length to keep when closing gaps in the timeline (default 5.0).--issues-root <path>— directory ofissue_*folders for data trimming and issue summarization.--verbose— print detailed progress for each stage.
| Stage | Directory | What it does |
|---|---|---|
| 0. Preprocessing | 0_preprocessing/ |
Optionally trim overlapping recordings; extract a processed trajectory from actions.db + screenshots; extract AI interactions from logs; optionally summarize a linked GitHub issue. |
| 1. Segmentation | 1_segment/ |
Split the trajectory into segments by visual change, then annotate each segment with an activity description. |
| 2. Induction | 2_induction/ |
Induce high-level activity labels (labels.txt) over the annotated segments and merge adjacent segments that share a label. |
| 3. Timeline | 3_timeline/ |
Map merged segments onto a wall-clock timeline, close small gaps, insert detected pauses, and classify remaining gaps using gap_labels.txt. |
| 4. Visualization | 4_visualize/ |
Render timeline and breakdown charts into 5_visualizations/. |
| 5. Scoring | 5_scoring/ |
Match each non-human step to a reference workflow and apply per-step output-use labels used to compute the offloading score. |
Once run_all.py has produced the per-step annotations for a session, the offloading score is computed by matching each non-human step in the recording to a reference workflow and applying output-use labels.
The reference workflow is generated separately, from the same recording, by the workflow-induction-toolkit. That toolkit's induce_workflow.sh produces two artifacts you need here:
workflow.json— structured workflow representationworkflow.txt— flat, line-per-step natural-language version of the workflow
python 5_scoring/parse_json_to_turns.py --input-json /path/to/workflow.jsonWrites /path/to/workflow_annotated.json by default. Optional flags:
python 5_scoring/parse_json_to_turns.py \
--input-json /path/to/workflow.json \
--output-json /path/to/custom_annotated.json \
--print-treepython 5_scoring/analyze_jsons_with_workflow_context.py \
--annotated-json /path/to/workflow_annotated.json \
--workflow-txt /path/to/workflow.txt \
--time-split long \
--task-description "Build a local web app that ..." \
--task-id timer_appOr pass the task description as a file:
python 5_scoring/analyze_jsons_with_workflow_context.py \
--annotated-json /path/to/workflow_annotated.json \
--workflow-txt /path/to/workflow.txt \
--time-split long \
--task-description-file /path/to/task.txt \
--task-id timer_appRequired flags:
--annotated-json— the_annotated.jsonfromparse_json_to_turns.py.--workflow-txt— flat workflow text from the workflow-induction-toolkit.--time-split {long,short}— label indicating whether this run is from the long or short time split.- Exactly one of
--task-description TEXTor--task-description-file PATH— free-text description of what the user was trying to do.
Optional flags:
--task-id ID— free-form identifier recorded in the output JSON for bookkeeping (default:task).--input-json— raw merged JSON, used only to computetime_takenfrom top-leveltime.start/time.end.--output-json— explicit output path. Defaults to replacing_annotated.jsonwith_annotated_labeled_workflow_context.json.--output-use-only— compute only the workflow matching + output-use labels; skip rewrite/counterfactual and cognitive annotations.
The output JSON contains per-step output-use labels and the matched workflow indices, from which the aggregate offloading score for the session is computed.
- Python 3.10+
- An OpenAI API key in the environment (
OPENAI_API_KEY) — used by the segmentation, annotation, induction, and gap-classification stages.
- workflow-induction-toolkit — the source of the recording format and the
computer-recordertool used to produce inputs for this pipeline.