Skip to content

[text-to-speech] Spaces in text are encoded as + #635

@arthurfabre

Description

@arthurfabre
  • Steps to reproduce:
    TSS.synthesize("Hello Bob", Voice.EN_LISA);

  • Expected behavior
    Audio for "Hello Bob"

  • Actual behavior
    Audio for "Hello+Bob"

  • JDK version: OpenJDK 1.8.0_121

  • java-sdk version: 3.7.1

With commit 7d9bbd7, to resolve #602, the text is now url-encoded before being passed off to okhttp.

Unfortunately, RequestUtils.encode calls URLEncoder.encode(), which performs form-encoding instead of %-encoding. okhttp then does proper %-encoding, which results in requests for synthesizing "Hi Bob" becoming:

https://stream.watsonplatform.net/text-to-speech/api/v1/synthesize?text=Hi%252C2BBob&voice=en-US_LisaVoice&accept=audio/l16;%20rate%3D48000

Issue #602 seems to be caused by the é character being encoded as UTF-8 by okhttp (0xC3 0xA9) but decoded as ASCII by the backend, hence the BadRequestException: 'ascii' codec can't decode byte 0xc3 error.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions