Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Best practice document: Extracting data for TTS and a "reader mode" #69

Open
HadrienGardeur opened this issue Jan 30, 2024 · 5 comments

Comments

@HadrienGardeur
Copy link

HadrienGardeur commented Jan 30, 2024

Text-to-speech (TTS) is among the most popular features in reading apps and slowly creeping up as a must-have feature in Web browsers as well.

But despite the popularity and usefulness of TTS, there is no best practice document providing guidance for developers on how they should implement this feature.
The group working on accessibility for FXL publications has also identified that in addition to TTS, extracting text from an FXL resource could be used to provide a "reader mode" of the current page/spread, enabling users to adjust the text and layout to their needs.

For both TTS and a reader mode, reading systems need guidance about the way they should extract data from XHTML to build these alternate renderings:

  • using accessibility metadata to infer what might be possible (accessModeSufficient, readingOrder, alternativeText, longDescription)
  • walking the DOM to create an alternate tree-like structure
  • rules to extract context (language for example) and semantics (HTML and ARIA) that will be relevant for these alternate renderings
  • recommendations for either breaking down longer text into multiple utterances (a paragraph broken down into sentences) or merging multiple text nodes to re-create a full utterance (a single sentence but divided into multiple strings in an FXL resource) that will be passed to the TTS engine
  • skippability and escapability rules
  • building a reader mode view from that tree-like structure
@sueneu
Copy link

sueneu commented Jan 30, 2024

I agree.
Building a Reader Mode view from TTS would be an efficient way to give the user choices for accessing the content of a book. A single source would mean consistency between audio mode and visual mode. Using the same code for Reader Mode and TTS would reduce redundant work in Epub production.

A best practice document would be helpful even if TTS doesn't ultimately work out as a basis for Reader Mode. Improved and consistent TTS among reading systems would lower the expense of making an accessible ebook. Publishers who can't create audio overlays could rely on robust TTS to make compliant Epubs. End users who require smaller Epub files would benefit from an audio option without media overlays. And anecdotally, few publishers and users are satisfied with the current TTS experience.

@wareid
Copy link
Contributor

wareid commented Feb 5, 2024

Research to do/Questions to ask:

  • How do you break things down using the DOM/HTML elements (span, div), particularly non-semantic elements?
  • What is extracted that is non-textual content? (Alt text, roles)
  • What kind of semantic structure is extracted? And used?
  • Could this extracted version be used as a remediation/assessment tool?
  • How is MathML handled?
  • Skippability/Escapability/Personalization? (How do we handle the potential elements needing to be skipped/escaped/included in user settings?)

@cookiecrook
Copy link

Also overlap with the CSS algo for converting to plaintext.
https://www.w3.org/TR/css-text-4/#plaintext

@cookiecrook
Copy link

cookiecrook commented Feb 9, 2024

And work in ARIA/AccName...

@HadrienGardeur
Copy link
Author

VitalSource seems to have a two-fold approach with a simplified and a detailed reading mode, as described by @rickj in the following comment: #72 (comment)

This is exactly the kind of information that we're looking for to kickstart this joint effort on TTS and reader mode.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants