Best practice document: Extracting data for TTS and a "reader mode" #69

HadrienGardeur · 2024-01-30T12:38:53Z

Text-to-speech (TTS) is among the most popular features in reading apps and slowly creeping up as a must-have feature in Web browsers as well.

But despite the popularity and usefulness of TTS, there is no best practice document providing guidance for developers on how they should implement this feature.
The group working on accessibility for FXL publications has also identified that in addition to TTS, extracting text from an FXL resource could be used to provide a "reader mode" of the current page/spread, enabling users to adjust the text and layout to their needs.

For both TTS and a reader mode, reading systems need guidance about the way they should extract data from XHTML to build these alternate renderings:

using accessibility metadata to infer what might be possible (accessModeSufficient, readingOrder, alternativeText, longDescription)
walking the DOM to create an alternate tree-like structure
rules to extract context (language for example) and semantics (HTML and ARIA) that will be relevant for these alternate renderings
recommendations for either breaking down longer text into multiple utterances (a paragraph broken down into sentences) or merging multiple text nodes to re-create a full utterance (a single sentence but divided into multiple strings in an FXL resource) that will be passed to the TTS engine
skippability and escapability rules
building a reader mode view from that tree-like structure

The text was updated successfully, but these errors were encountered:

sueneu · 2024-01-30T14:43:47Z

I agree.
Building a Reader Mode view from TTS would be an efficient way to give the user choices for accessing the content of a book. A single source would mean consistency between audio mode and visual mode. Using the same code for Reader Mode and TTS would reduce redundant work in Epub production.

A best practice document would be helpful even if TTS doesn't ultimately work out as a basis for Reader Mode. Improved and consistent TTS among reading systems would lower the expense of making an accessible ebook. Publishers who can't create audio overlays could rely on robust TTS to make compliant Epubs. End users who require smaller Epub files would benefit from an audio option without media overlays. And anecdotally, few publishers and users are satisfied with the current TTS experience.

wareid · 2024-02-05T15:48:05Z

Research to do/Questions to ask:

How do you break things down using the DOM/HTML elements (span, div), particularly non-semantic elements?
What is extracted that is non-textual content? (Alt text, roles)
What kind of semantic structure is extracted? And used?
Could this extracted version be used as a remediation/assessment tool?
How is MathML handled?
Skippability/Escapability/Personalization? (How do we handle the potential elements needing to be skipped/escaped/included in user settings?)

cookiecrook · 2024-02-09T00:17:47Z

Also overlap with the CSS algo for converting to plaintext.
https://www.w3.org/TR/css-text-4/#plaintext

cookiecrook · 2024-02-09T00:18:23Z

And work in ARIA/AccName...

HadrienGardeur · 2024-02-22T10:02:00Z

VitalSource seems to have a two-fold approach with a simplified and a detailed reading mode, as described by @rickj in the following comment: #72 (comment)

This is exactly the kind of information that we're looking for to kickstart this joint effort on TTS and reader mode.

gautierchomel mentioned this issue Feb 7, 2024

FXL & Reflowable #60

Open

cookiecrook mentioned this issue Feb 9, 2024

PubCG is looking to standardize mainstream speech equivalent "reader" functionality w3c/aria#2123

Open

wareid mentioned this issue Feb 15, 2024

Text Citations and footnotes using ReadAloud can be disruptive to comprehension #72

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Best practice document: Extracting data for TTS and a "reader mode" #69

Best practice document: Extracting data for TTS and a "reader mode" #69

HadrienGardeur commented Jan 30, 2024 •

edited

Loading

sueneu commented Jan 30, 2024

wareid commented Feb 5, 2024

cookiecrook commented Feb 9, 2024

cookiecrook commented Feb 9, 2024 •

edited

Loading

HadrienGardeur commented Feb 22, 2024

Best practice document: Extracting data for TTS and a "reader mode" #69

Best practice document: Extracting data for TTS and a "reader mode" #69

Comments

HadrienGardeur commented Jan 30, 2024 • edited Loading

sueneu commented Jan 30, 2024

wareid commented Feb 5, 2024

cookiecrook commented Feb 9, 2024

cookiecrook commented Feb 9, 2024 • edited Loading

HadrienGardeur commented Feb 22, 2024

HadrienGardeur commented Jan 30, 2024 •

edited

Loading

cookiecrook commented Feb 9, 2024 •

edited

Loading