synthesizeUsingWebSocket producing bad files #956

luiscastro193 · 2019-09-15T16:20:34Z

Text to speech using synthesizeUsingWebSocket apparently works but the output file can't be opened by any player. The files produced using synthesize (without web socket) work fine but I can't always use the http API because texts are too long.

This is the code:

"use strict";
require('dotenv').config();
const fs = require('fs');
const path = require('path');
const TextToSpeechV1 = require('ibm-watson/text-to-speech/v1');
const textToSpeech = new TextToSpeechV1({
	iam_apikey: process.env.APIKEY,
	url: process.env.URL
});
const files = process.argv.slice(2);

function toSpeech(text, fileName) {
	let audio = textToSpeech.synthesizeUsingWebSocket({text, voice: 'es-ES_EnriqueV3Voice', accept: 'audio/ogg;codecs=opus'});
	audio.pipe(fs.createWriteStream(`${fileName}.ogg`));
}

for (let file of files)
	toSpeech(fs.readFileSync(file, 'utf8'), path.basename(file, path.extname(file)));

SDK Version
4.4.0

Additional information:

OS: Linux Mint 19.2
Which version of Node are you using? 10.16.3

The text was updated successfully, but these errors were encountered:

luiscastro193 · 2019-09-15T16:34:46Z

A simplified version of the code:

"use strict";
require('dotenv').config();
const fs = require('fs');
const TextToSpeechV1 = require('ibm-watson/text-to-speech/v1');
const textToSpeech = new TextToSpeechV1({
	iam_apikey: process.env.APIKEY,
	url: process.env.URL
});

function toSpeech(text, fileName) {
	let audio = textToSpeech.synthesizeUsingWebSocket({text, voice: 'es-ES_EnriqueV3Voice', accept: 'audio/ogg;codecs=opus'});
	audio.pipe(fs.createWriteStream(`${fileName}.ogg`));
}

toSpeech("Esto es una prueba", "test");

MasterOdin · 2019-09-15T18:44:57Z

I've seen this to. I think the reason is here:

node-sdk/lib/synthesize-stream.ts

Lines 127 to 141 in 86506e3

    
               socket.onmessage = message => { 
        
                 const chunk = message.data; 
        
                 // some messages are strings - emit those unencoded, but push them to 
        
                 // the stream as binary 
        
                 const data = typeof chunk === 'string' ? chunk : Buffer.from(chunk); 
        
                 /** 
        
                  * Emit any messages received over the wire, mainly used for debugging. 
        
                  * 
        
                  * @event SynthesizeStream#message 
        
                  * @param {Object} message - frame object received from service 
        
                  * @param {Object} data - a data attribute of the frame that's either a string or a Buffer/TypedArray 
        
                  */ 
        
                 self.emit('message', message, data); 
        
                 self.push(Buffer.from(chunk)); 
        
               };

where chunk is a string only really at the beginning and that's because https://cloud.ibm.com/docs/services/text-to-speech?topic=text-to-speech-usingWebSocket will initially pass a JSON string that tells you the shape of the file:

{
  'binary_streams': [
    {
      content_type: 'audio/ogg;codecs=opus'
    }
  ]
}

which the above implementation (and example in the repo) is appending to the front of the generated *.ogg file.

Changing the check for a string to be something like:

            if (typeof chunk === 'string') {
                try {
                    chunk = JSON.parse(chunk);
                    return
                }
                catch (e) {

                }

            }

seems to fix the problem, though this throws away the above JSON object as well as the timings JSON object. Should both of these be accounted for downstream of the synthesizeUsingWebsocket method or dealt within inside of it, possibly just outputting it to a specific emitter off the stream?

@luiscastro193 For a current workable solution, you'll have to put a filter on the piped binary stream doing something like:

const { Transform } = require('stream')

class Filter extends Transform {

  constructor() {
    super({
      readableObjectMode: true,
      writableObjectMode: true
    })
  }

  _transform(chunk, encoding, next) {
    try {
      JSON.parse(chunk.toString());
      next();
    }
    catch (e) {}
    return next(null, chunk)
  }

  has(value) {
    return !!value
  }
}

const transformer = new Filter();

audio.pipe(transformer).pipe(fs.createWriteStream(`${fileName}.ogg`))

dpopp07 · 2019-09-16T18:50:56Z

@luiscastro193 Thanks for the issue and @MasterOdin thanks for the detailed explanation. Unless there is a sensible use case for including that information in the audio file itself, I like the idea of not including it in the file and emitting it from a different event so the user can capture it if they want. That seems better if the current approach is currently generating unusable (or difficult to use) files.

Is there a reason users might want this info in their files? If not, we can go ahead with this change. Ideally, we would get it in before the major release. I can open a PR unless you've already started working on this @MasterOdin

MasterOdin · 2019-09-16T19:17:28Z

I would agree that I don't think there is a sensible reason to embed it into the file itself (as pulling out the JSON structure without corrupting the file is a bit tricky) , and just emit on a specifically named events in-case the user wants said info.

@dpopp07 I'll make a PR as this does give me immense satisfaction in finally figuring out/fixing why ffplay would fail on audio that was too short. I'll make the PR against the v4 line though as I don't think it requires anything that was explicitly new in the v5 release candidate and that that function remained largely the same.

dpopp07 · 2019-09-16T19:30:17Z

Since this changes default behavior, my instinct is to be concerned this has the potential to be breaking. That said, if this is purely removing a pain for users than I could consider it a fix and a necessary change in behavior (and/or feature for the new event).

Sounds good though, I appreciate it!

MasterOdin · 2019-09-16T19:40:18Z

Yeah, it's technically "breaking", but it's breaking in that I think it makes things more inline with how people would expect it to work (certainly how I was expecting it to work), especially based on prior usage of things in older versions and the examples themselves document it. I'd also very much like to use this in the v4 branch immediately as I'm not yet ready to spend time in dealing with the authentication changes for v5 for my work.

dpopp07 · 2019-09-16T19:47:14Z

Right, I think it is fine. In that case, I will await the PR. Thanks again

MasterOdin · 2019-09-16T19:58:31Z

No problem, I've got some amount of dead time in my ramp-up for my IBM internship and I figure no one can yell at me for working on a IBM project to fill in the day.

dpopp07 · 2019-09-16T20:06:12Z

Oh, nice! I certainly won't yell at you 😄

watson-github-bot · 2019-09-19T02:02:12Z

🎉 This issue has been resolved in version 4.5.0 🎉

The release is available on:

Your semantic-release bot 📦🚀

dpopp07 added the bug label Sep 16, 2019

MasterOdin mentioned this issue Sep 16, 2019

feat: separate strings out of primary SynthesizeStream pipe #957

Merged

1 task

dpopp07 closed this as completed in #957 Sep 17, 2019

watson-github-bot added the released label Sep 19, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

synthesizeUsingWebSocket producing bad files #956

synthesizeUsingWebSocket producing bad files #956

luiscastro193 commented Sep 15, 2019

luiscastro193 commented Sep 15, 2019

MasterOdin commented Sep 15, 2019 •

edited

Loading

dpopp07 commented Sep 16, 2019

MasterOdin commented Sep 16, 2019 •

edited

Loading

dpopp07 commented Sep 16, 2019 •

edited

Loading

MasterOdin commented Sep 16, 2019 •

edited

Loading

dpopp07 commented Sep 16, 2019

MasterOdin commented Sep 16, 2019

dpopp07 commented Sep 16, 2019

watson-github-bot commented Sep 19, 2019

synthesizeUsingWebSocket producing bad files #956

synthesizeUsingWebSocket producing bad files #956

Comments

luiscastro193 commented Sep 15, 2019

luiscastro193 commented Sep 15, 2019

MasterOdin commented Sep 15, 2019 • edited Loading

dpopp07 commented Sep 16, 2019

MasterOdin commented Sep 16, 2019 • edited Loading

dpopp07 commented Sep 16, 2019 • edited Loading

MasterOdin commented Sep 16, 2019 • edited Loading

dpopp07 commented Sep 16, 2019

MasterOdin commented Sep 16, 2019

dpopp07 commented Sep 16, 2019

watson-github-bot commented Sep 19, 2019

MasterOdin commented Sep 15, 2019 •

edited

Loading

MasterOdin commented Sep 16, 2019 •

edited

Loading

dpopp07 commented Sep 16, 2019 •

edited

Loading

MasterOdin commented Sep 16, 2019 •

edited

Loading