Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Need for Authors to be able to set Social and Emotional Characteristics of TTS (text to speech) #114

Open
SuzanneTaylor opened this issue May 26, 2022 · 2 comments

Comments

@SuzanneTaylor
Copy link

SuzanneTaylor commented May 26, 2022

[This github entry is from the Accessibility for Children Community Group]

Although more research is needed to specify which types of voices would be best for which applications at the content-level, it is important at the technology-ecosystem-level to introduce the ability to set social and emotional speech characteristics.

Situations in which setting these characteristics can be important with rough examples of markup solutions

Markup suggestions are only to give an idea of the type of control that is needed and have not been carefully crafted and edited so far. Affect-bias defines core categories such as Joy, Shame, Anger, Interest, Excitement, Startle, etc. These attributes may help us design markup that will allow authors specify appropriate voice tones.

"Friendliness"

  • Case where child/person might imagine the voice is sounding increasingly frustrated even when the voice is neutral (e.g. “recalculating” GPS) | friendliness="20%" friendliness="25%" etc to offset effect
  • Emergency / High Risk / Low Support | voice-type="reassuring respecting_urgency"
  • Responses to Child’s Actions in Educational or Entertainment Games | excitement="10%" joy="25%"
  • Child has not responded or answered - need to attract attention without sounding angry | interest="20%" friendliness="60%" (also a little louder in case child walked away, but with friendliness high so as to not sound angry)

Neutral

  • Test item about inferring emotional information from text alone. | voice-type="device" | Means: no tone of voice; honor device settings | voice-type="neutral" | Means: regardless of device settings, read in neutral voice
  • Child needs to understand they are talking to an AI | voice-type="computer" | Means: ensure this doesn’t sound human

Additional Situations to be Addressed

Education

  • Responses to Child’s Actions in High Stakes Testing
  • Responses to Child’s Actions in Testing - Correct Feedback
  • Responses to Child’s Actions in Testing - Incorrect Feedback
  • Responses to Child’s Actions in Testing - Ungraded Feedback
  • Responses to Child’s Actions in Instruction

AI

  • Cases where AI detects the child’s mood (sentiment analysis).
  • AI detects that child is not taking a situation or warning seriously (e.g laughing, not looking at the screen, )
@AutoSponge
Copy link
Contributor

This reminds me of https://www.w3.org/TR/emotionml/. We may need to review it for hints of how to incorporate emotion into this spec.

@brennanyoung
Copy link

I strongly approve of anticipating the need for 'affective' characteristics for synthetic voices.
In our use case (medical simulation), we use voices which can be in pain, out of breath, anxious and relaxed.
I agree that EML is a promising place to start. Some great work in there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants