New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to prevent JSON unicode escaping #809

Closed
NCrashed opened this Issue Sep 5, 2014 · 5 comments

Comments

Projects
None yet
3 participants
@NCrashed
Contributor

NCrashed commented Sep 5, 2014

Hi, I've found that unicode strings are forced to be escaped in toPrettyString method (also at any json to string conversion), example:

{
    "id": null,
    "result": [
        {
            "test_field": [
                "\u041A\u0443\u0441\u043E\u043A \u0442\u0435\u043A\u0441\u0442\u0430"
            ]
        }
    ],
    "jsonrpc": "2.0"
}

Is there is any way to control the behavior? It is getting hard to analyze response for huge JSONs (generally for frontend developers).

@s-ludwig

This comment has been minimized.

Show comment
Hide comment
@s-ludwig

s-ludwig Sep 5, 2014

Member

In fact I disabled Unicode output escaping for the std_data_json candidate. @etcimon, do you think that doing so would do any harm in practice, based on what you experienced so far? I would at least make it the default to pass Unicode as-is. For the std.data.json module, this could be made a configuration option, but in case of vibe.d I'd rather keep the interface fixed before it gets deprecated and removed.

Member

s-ludwig commented Sep 5, 2014

In fact I disabled Unicode output escaping for the std_data_json candidate. @etcimon, do you think that doing so would do any harm in practice, based on what you experienced so far? I would at least make it the default to pass Unicode as-is. For the std.data.json module, this could be made a configuration option, but in case of vibe.d I'd rather keep the interface fixed before it gets deprecated and removed.

@etcimon

This comment has been minimized.

Show comment
Hide comment
@etcimon

etcimon Sep 5, 2014

Contributor

Two ways to do it would be an ensure_ascii like in python, or a special UTF wrappper function around the output string for when it's necessary to have it "pretty". I can't really know if it would cause problems to have forced ascii off by default because it depends on remote peer engine, but playing it safe would be to leave it on as a document-specific setting. I've never seen a website using plain UTF-8 characters but I can see why it would be preferable for debugging.

Contributor

etcimon commented Sep 5, 2014

Two ways to do it would be an ensure_ascii like in python, or a special UTF wrappper function around the output string for when it's necessary to have it "pretty". I can't really know if it would cause problems to have forced ascii off by default because it depends on remote peer engine, but playing it safe would be to leave it on as a document-specific setting. I've never seen a website using plain UTF-8 characters but I can see why it would be preferable for debugging.

@s-ludwig

This comment has been minimized.

Show comment
Hide comment
@s-ludwig

s-ludwig Sep 5, 2014

Member

I've never seen a website using plain UTF-8 characters but I can see why it would be preferable for debugging.

I don't see why it would ever not be preferable. After all JSON is defined to be UTF encoded, so this just seems to be done for some kind of legacy support in existing libraries. I had hoped that the time of 7-bit bytes was over, though.

Since this change was introduced recently without any preceding complaints, I'm leaning towards just disabling it for vibe.d and to keep it as an option in std.data.json.

Member

s-ludwig commented Sep 5, 2014

I've never seen a website using plain UTF-8 characters but I can see why it would be preferable for debugging.

I don't see why it would ever not be preferable. After all JSON is defined to be UTF encoded, so this just seems to be done for some kind of legacy support in existing libraries. I had hoped that the time of 7-bit bytes was over, though.

Since this change was introduced recently without any preceding complaints, I'm leaning towards just disabling it for vibe.d and to keep it as an option in std.data.json.

@etcimon

This comment has been minimized.

Show comment
Hide comment
@etcimon

etcimon Sep 5, 2014

Contributor

Since this change was introduced recently without any preceding complaints, I'm leaning towards just disabling it for vibe.d and to keep it as an option in std.data.json.

So it would be preferable to disable it by default if there is a simple legacy wrapper than outputs legacy ASCII from a UTF json string. After giving it some thought, I don't see any issue with sending UTF strings by default whereas being unable to decode UTF-8/UTF-16 from ASCII strings was obviously a necessity.

Contributor

etcimon commented Sep 5, 2014

Since this change was introduced recently without any preceding complaints, I'm leaning towards just disabling it for vibe.d and to keep it as an option in std.data.json.

So it would be preferable to disable it by default if there is a simple legacy wrapper than outputs legacy ASCII from a UTF json string. After giving it some thought, I don't see any issue with sending UTF strings by default whereas being unable to decode UTF-8/UTF-16 from ASCII strings was obviously a necessity.

@s-ludwig s-ludwig closed this in 26e1fba Sep 5, 2014

@NCrashed

This comment has been minimized.

Show comment
Hide comment
@NCrashed

NCrashed Sep 5, 2014

Contributor

Thanks a lot for fast response!

I've never seen a website using plain UTF-8 characters but I can see why it would be preferable for debugging.

Although any browser understands escaped characters most of Russian websites are sent with plain UTF-8 (escaped char is 6 byte long and plain is only 2 bytes long).

Contributor

NCrashed commented Sep 5, 2014

Thanks a lot for fast response!

I've never seen a website using plain UTF-8 characters but I can see why it would be preferable for debugging.

Although any browser understands escaped characters most of Russian websites are sent with plain UTF-8 (escaped char is 6 byte long and plain is only 2 bytes long).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment