-
Notifications
You must be signed in to change notification settings - Fork 668
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
synthesizeUsingWebSocket producing bad files #956
Comments
A simplified version of the code: "use strict";
require('dotenv').config();
const fs = require('fs');
const TextToSpeechV1 = require('ibm-watson/text-to-speech/v1');
const textToSpeech = new TextToSpeechV1({
iam_apikey: process.env.APIKEY,
url: process.env.URL
});
function toSpeech(text, fileName) {
let audio = textToSpeech.synthesizeUsingWebSocket({text, voice: 'es-ES_EnriqueV3Voice', accept: 'audio/ogg;codecs=opus'});
audio.pipe(fs.createWriteStream(`${fileName}.ogg`));
}
toSpeech("Esto es una prueba", "test"); |
I've seen this to. I think the reason is here: node-sdk/lib/synthesize-stream.ts Lines 127 to 141 in 86506e3
where {
'binary_streams': [
{
content_type: 'audio/ogg;codecs=opus'
}
]
} which the above implementation (and example in the repo) is appending to the front of the generated Changing the check for a string to be something like: if (typeof chunk === 'string') {
try {
chunk = JSON.parse(chunk);
return
}
catch (e) {
}
} seems to fix the problem, though this throws away the above JSON object as well as the @luiscastro193 For a current workable solution, you'll have to put a filter on the piped binary stream doing something like: const { Transform } = require('stream')
class Filter extends Transform {
constructor() {
super({
readableObjectMode: true,
writableObjectMode: true
})
}
_transform(chunk, encoding, next) {
try {
JSON.parse(chunk.toString());
next();
}
catch (e) {}
return next(null, chunk)
}
has(value) {
return !!value
}
}
const transformer = new Filter();
audio.pipe(transformer).pipe(fs.createWriteStream(`${fileName}.ogg`)) |
@luiscastro193 Thanks for the issue and @MasterOdin thanks for the detailed explanation. Unless there is a sensible use case for including that information in the audio file itself, I like the idea of not including it in the file and emitting it from a different event so the user can capture it if they want. That seems better if the current approach is currently generating unusable (or difficult to use) files. Is there a reason users might want this info in their files? If not, we can go ahead with this change. Ideally, we would get it in before the major release. I can open a PR unless you've already started working on this @MasterOdin |
I would agree that I don't think there is a sensible reason to embed it into the file itself (as pulling out the JSON structure without corrupting the file is a bit tricky) , and just emit on a specifically named events in-case the user wants said info. @dpopp07 I'll make a PR as this does give me immense satisfaction in finally figuring out/fixing why ffplay would fail on audio that was too short. I'll make the PR against the v4 line though as I don't think it requires anything that was explicitly new in the v5 release candidate and that that function remained largely the same. |
Since this changes default behavior, my instinct is to be concerned this has the potential to be breaking. That said, if this is purely removing a pain for users than I could consider it a fix and a necessary change in behavior (and/or feature for the new event). Sounds good though, I appreciate it! |
Yeah, it's technically "breaking", but it's breaking in that I think it makes things more inline with how people would expect it to work (certainly how I was expecting it to work), especially based on prior usage of things in older versions and the examples themselves document it. I'd also very much like to use this in the v4 branch immediately as I'm not yet ready to spend time in dealing with the authentication changes for v5 for my work. |
Right, I think it is fine. In that case, I will await the PR. Thanks again |
No problem, I've got some amount of dead time in my ramp-up for my IBM internship and I figure no one can yell at me for working on a IBM project to fill in the day. |
Oh, nice! I certainly won't yell at you 😄 |
🎉 This issue has been resolved in version 4.5.0 🎉 The release is available on: Your semantic-release bot 📦🚀 |
Text to speech using synthesizeUsingWebSocket apparently works but the output file can't be opened by any player. The files produced using synthesize (without web socket) work fine but I can't always use the http API because texts are too long.
This is the code:
SDK Version
4.4.0
Additional information:
Node
are you using? 10.16.3The text was updated successfully, but these errors were encountered: