Need for Authors to be able to set Social and Emotional Characteristics of TTS (text to speech) #114

SuzanneTaylor · 2022-05-26T13:20:56Z

[This github entry is from the Accessibility for Children Community Group]

Although more research is needed to specify which types of voices would be best for which applications at the content-level, it is important at the technology-ecosystem-level to introduce the ability to set social and emotional speech characteristics.

Situations in which setting these characteristics can be important with rough examples of markup solutions

Markup suggestions are only to give an idea of the type of control that is needed and have not been carefully crafted and edited so far. Affect-bias defines core categories such as Joy, Shame, Anger, Interest, Excitement, Startle, etc. These attributes may help us design markup that will allow authors specify appropriate voice tones.

"Friendliness"

Case where child/person might imagine the voice is sounding increasingly frustrated even when the voice is neutral (e.g. “recalculating” GPS) | friendliness="20%" friendliness="25%" etc to offset effect
Emergency / High Risk / Low Support | voice-type="reassuring respecting_urgency"
Responses to Child’s Actions in Educational or Entertainment Games | excitement="10%" joy="25%"
Child has not responded or answered - need to attract attention without sounding angry | interest="20%" friendliness="60%" (also a little louder in case child walked away, but with friendliness high so as to not sound angry)

Neutral

Test item about inferring emotional information from text alone. | voice-type="device" | Means: no tone of voice; honor device settings | voice-type="neutral" | Means: regardless of device settings, read in neutral voice
Child needs to understand they are talking to an AI | voice-type="computer" | Means: ensure this doesn’t sound human

Additional Situations to be Addressed

Education

Responses to Child’s Actions in High Stakes Testing
Responses to Child’s Actions in Testing - Correct Feedback
Responses to Child’s Actions in Testing - Incorrect Feedback
Responses to Child’s Actions in Testing - Ungraded Feedback
Responses to Child’s Actions in Instruction

AI

Cases where AI detects the child’s mood (sentiment analysis).
AI detects that child is not taking a situation or warning seriously (e.g laughing, not looking at the screen, )

AutoSponge · 2022-06-08T16:11:05Z

This reminds me of https://www.w3.org/TR/emotionml/. We may need to review it for hints of how to incorporate emotion into this spec.

brennanyoung · 2022-06-17T12:23:17Z

I strongly approve of anticipating the need for 'affective' characteristics for synthetic voices.
In our use case (medical simulation), we use voices which can be in pain, out of breath, anxious and relaxed.
I agree that EML is a promising place to start. Some great work in there.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Need for Authors to be able to set Social and Emotional Characteristics of TTS (text to speech) #114

Need for Authors to be able to set Social and Emotional Characteristics of TTS (text to speech) #114

SuzanneTaylor commented May 26, 2022 •

edited

AutoSponge commented Jun 8, 2022

brennanyoung commented Jun 17, 2022

Need for Authors to be able to set Social and Emotional Characteristics of TTS (text to speech) #114

Need for Authors to be able to set Social and Emotional Characteristics of TTS (text to speech) #114

Comments

SuzanneTaylor commented May 26, 2022 • edited

Situations in which setting these characteristics can be important with rough examples of markup solutions

"Friendliness"

Neutral

Additional Situations to be Addressed

Education

AI

AutoSponge commented Jun 8, 2022

brennanyoung commented Jun 17, 2022

SuzanneTaylor commented May 26, 2022 •

edited