Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How is a complete SSML document expected to be parsed when set once at .text property of SpeechSynthesisUtterance instance? #10

Open
guest271314 opened this issue Dec 17, 2017 · 12 comments

Comments

@guest271314
Copy link

commented Dec 17, 2017

According to the specification

5.2.3 SpeechSynthesisUtterance Attributes text attribute This attribute specifies the text to be synthesized and spoken for this utterance. This may be either plain text or a complete, well-formed SSML document.

It is not clear how an entire SSML document is expected to be parsed when set at .text property at single instance of SpeechSynthesisUtterance().

See guest271314/SpeechSynthesisSSMLParser#1

@foolip

This comment has been minimized.

Copy link
Member

commented Feb 20, 2018

If nobody implements the SSML bits, maybe it should just be removed from the spec, rather than trying to clarify this? @andrenatal @gshires, WDYT?

@guest271314

This comment has been minimized.

Copy link
Author

commented Feb 20, 2018

@foolip Not sure what you mean by

nobody implements the SSML bits

?

SSML parsing is definitely utilized by *mazon *lexa and *olly, *BM *luemix, *oogle *ctions as a web service (for a fee or with an EUL agreement).

We should be able to implement the specification without using an external web service or licensing agreement.

@guest271314

This comment has been minimized.

Copy link
Author

commented Feb 20, 2018

@foolip There is an available patch to implement SSML parsing at Chromium by way of speech-dispatcher, see https://bugs.chromium.org/p/chromium/issues/detail?id=88072. Unfortunately have not yet been able to access a 64-bit device.

@guest271314

This comment has been minimized.

Copy link
Author

commented Feb 20, 2018

@foolip The portion of the Web Speech API specification that is difficult to navigate is trying to determine if SSML parsing is supported at the specific platform, see https://bugs.chromium.org/p/chromium/issues/detail?id=88072#c48. We know that neither Chromium nor Firefox has actually set the SSML parsing flag to "on" when initializing SSIP communication with speech-dispatcher. Cannot state the reason therefor at Chromium other than this comment

Setting SSML to true when passing speech to the Linux speech-dispatcher would help, but we couldn't land that by itself - we'd want to at a minimum try to support that on other platforms, or at least parse SSML and strip out the tags, converting them to plaintext, on platforms without SSML support.

though the capability is available to do so.

This addresses whether the string or document is SSML in the first instance https://github.com/guest271314/SpeechSynthesisSSMLParser/blob/master/SpeechSynthesisSSMLParser.js#L89.

@guest271314

This comment has been minimized.

Copy link
Author

commented Feb 20, 2018

@foolip Further, we could

  1. Negate the use of speech-dispatcher altogether and ship espeak-ng with browsers with the appropriate option set for SSML parsing by default.

  2. Ideally, build the speech synthesizer from scratch using only Web Audio API.

@foolip

This comment has been minimized.

Copy link
Member

commented Feb 20, 2018

So, does any browser engine (Chrome, EdgeHTML, Gecko or WebKit) try to parse the text property as SSML? That's what I mean by being implemented.

@guest271314

This comment has been minimized.

Copy link
Author

commented Feb 20, 2018

@foolip Have not tried Edge or Webkit, which both utilize different approaches than Chromium and Firefox. Edge has SAPI, MacOS does not parse SSML at all, though has their own form of markup.

For Chromium and Firefox the bridge is speech-dispatcher, which calls a speech synthesis module to parse the text or SSML.

The issue is that neither Chromium nor Firefox implementations actually pass the appropriate flags to the SSIP socket to turn on SSML parsing for the speech synthesizer module.

@guest271314

This comment has been minimized.

Copy link
Author

commented Feb 20, 2018

@guest271314

This comment has been minimized.

Copy link
Author

commented Feb 21, 2018

@foolip The below code should meet the requirement of the specification using JavaScript

<!DOCTYPE html>
<html>

<head>
  <title>Parse Text or SSML for 5.2.3 SpeechSynthesisUtterance Attributes text attribute test </title>
  <script>
        
  // https://w3c.github.io/speech-api/speechapi.html#utterance-attributes
  // "5.2.3 SpeechSynthesisUtterance Attributes text attribute This attribute specifies the text to be synthesized and spoken for this utterance. This may be either plain text or a complete, well-formed SSML document."

const text_or_ssml = [
  "hello universe"
  , `<?xml version="1.0"?><!DOCTYPE speak PUBLIC "-//W3C//DTD SYNTHESIS 1.0//EN" "http://www.w3.org/TR/speech-synthesis/synthesis.dtd"><speak version="1.1"
        xmlns="http://www.w3.org/2001/10/synthesis"
        xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
        xsi:schemaLocation="http://www.w3.org/2001/10/synthesis http://www.w3.org/TR/speech-synthesis11/synthesis.xsd"
        xml:lang="en-US">hello universe</speak>`
  , (new DOMParser()).parseFromString(`<?xml version="1.0"?><speak version="1.1"
        xmlns="http://www.w3.org/2001/10/synthesis"
        xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
        xsi:schemaLocation="http://www.w3.org/2001/10/synthesis http://www.w3.org/TR/speech-synthesis11/synthesis.xsd"
        xml:lang="en-US">hello universe</speak>`, "application/xml")
  , (() => {
    const doc_type = document.implementation.createDocumentType("speak", "PUBLIC", `"-//W3C//DTD SYNTHESIS 1.0//EN"
  "http://www.w3.org/TR/speech-synthesis/synthesis.dtd"`);
    const ssml = document.implementation.createDocument ("http://www.w3.org/2001/10/synthesis", "speak", doc_type);
    ssml.documentElement.textContent = "hello universe";
    return ssml;
  })()
];

window.speechSynthesis.cancel();

window.speechSynthesis.onvoiceschanged = async () => {

  for (let text_ssml of text_or_ssml) {

    await new Promise(resolve => {

      const utterance = new SpeechSynthesisUtterance();
    
      const parser = new DOMParser();
      
      let parsed_text_ssml;
      
      if (text_ssml && typeof text_ssml === "string") {
        parsed_text_ssml = parser.parseFromString(text_ssml, "application/xml");
        if (parsed_text_ssml.querySelector("parsererror") && parsed_text_ssml.documentElement.nodeName !== "speak") {
          console.warn("not a complete, well-formed SSML document.", parsed_text_ssml.querySelector("parsererror").textContent);
        } 
        else {
          text_ssml = parsed_text_ssml;
        }
      }

      if (text_ssml instanceof XMLDocument && text_ssml.documentElement.nodeName === "speak") {
        console.log("complete, well-formed SSML document.", text_ssml.documentElement);
        utterance.text = text_ssml.documentElement.textContent;
      } 
      else {
        console.log("plain text", text_ssml);
        utterance.text = text_ssml;
      }
      
      utterance.onend = resolve;
      window.speechSynthesis.speak(utterance);
    })
  }

};

</script>
</head>

<body>
</body>

</html>

@dtturcotte

This comment has been minimized.

Copy link

commented Apr 9, 2018

@guest271314 so this wasn't implemented yet? I'm trying to set an attribute on utterance that tells it that the input is ssml: https://stackoverflow.com/questions/49724626/google-speech-api-or-web-speech-api-support-for-ssml

@guest271314

This comment has been minimized.

Copy link
Author

commented Apr 9, 2018

@dtturcotte SSML parsing is not implemented at Chromium by default, see https://bugs.chromium.org/p/chromium/issues/detail?id=88072, https://bugs.chromium.org/p/chromium/issues/detail?id=806592.

This is a beginning of an implementation client side using JavaScript https://github.com/guest271314/SpeechSynthesisSSMLParser. It is also possible to use Native Messaging https://src.chromium.org/viewvc/chrome/trunk/src/chrome/common/extensions/docs/examples/api/nativeMessaging/host/ at Chromium/Chrome to communicate with espeak or espeak-ng at host and get the result back as a data URL at the app https://github.com/jdiamond/chrome-native-messaging, or use a local server to get the stdout from the command to JavaScript https://stackoverflow.com/questions/48219981/how-to-programmatically-send-a-unix-socket-command-to-a-system-server-autospawne.

@foolip

This comment has been minimized.

Copy link
Member

commented Aug 20, 2018

I added a test for SSML in web-platform-tests/wpt#12568 to see if it's supported anywhere.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.