Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Queries interface fails if response has non-utf8 symbols #405

Closed
Krock21 opened this issue Mar 6, 2024 · 4 comments
Closed

Queries interface fails if response has non-utf8 symbols #405

Krock21 opened this issue Mar 6, 2024 · 4 comments
Assignees
Labels
query-tracker All related to Queries page

Comments

@Krock21
Copy link
Member

Krock21 commented Mar 6, 2024

Queries interface fails if response has non-utf8 symbols

How to reproduce:

  1. Submit "привет" CHYT query.
  2. See how interface fails with an error Error occurred while parsing YSON[1], Invalid UTF-8 string in JSON[1]

Or use CLI to reproduce:

yt start-query chyt "привет"
queryID
yt get-query queryID --format '<encode_utf8=%false>json'
HTTP 400 Error occurred while parsing YSON Invalid UTF-8 string in JSON

This is because CHYT converts clickhouse-thrown std::exception into YT TError directly, in binary format (using exception.what())

Then it is stored in dynamic table in binary format:

"inner_errors" = [
        {
            "attributes" = {
                "datetime" = "2024-03-08T17:27:47.033681Z";
                "fid" = 0u;
                "host" = "our-host.net";
                "pid" = 3673;
                "thread" = "ThreadPool";
                "tid" = 7905737751553949491u
            };
            "code" = 1;
            "message" = 53 79 6e 74 61 78 20 65 72 72 6f 72 3a 20 66 61 69 6c 65 64 20 61 74 20 70 6f 73 69 74 69 6f 6e 20 31 20 28 27 d0 27 29 3a 20 d0 be d1 88 d0 b8 d0 b1 d0 ba d0 b0 31 2e 20 55 6e 72 65 63 6f 67 6e 69 7a 65 64 20 74 6f 6b 65 6e 3a 20 27 d0 27
        }

and can't be decoded into UTF8 json on server side, resulting in failure

@Krock21 Krock21 changed the title Queries interface fails if response has utf8 symbolS Queries interface fails if response has utf8 symbols Mar 6, 2024
@Krock21
Copy link
Member Author

Krock21 commented Mar 7, 2024

As a solution we can just use 'json' format. It should have no effect on other UI internals that parse json (unless they behave differently on unicode)

@ma-efremoff ma-efremoff added the query-tracker All related to Queries page label Mar 7, 2024
@vitshev vitshev self-assigned this Mar 11, 2024
@Krock21
Copy link
Member Author

Krock21 commented Mar 12, 2024

I researched it a bit, and seems like current way of fetching data (format <encode_utf8=%false>json is correct

The issue is that binary yson written in table is not a correct UTF8 sequence, hence YSON -> JSON conversion fails

For this case, I think query-tracker/CHYT should make sure that correct utf8 is written. I will investigate further and come back with results

@Krock21 Krock21 changed the title Queries interface fails if response has utf8 symbols Queries interface fails if response has non-utf8 symbols Mar 13, 2024
@Krock21
Copy link
Member Author

Krock21 commented Mar 18, 2024

After some discussion we decided that binary exceptions are okay and UI should learn how to handle them.

UI can handle binary data on Operation page, since it does get_operation format=json and decodes output to correct UTF8 or shows it in binary
UI can handle binary data from tables by using web_json format. web_json isn't available for non-streaming APIs so it is not a solution

A suggested solution for UI is to never use <encode_utf8=false>json since it fails on non-utf8 data. UI should use other format that is okay with binary data and perform additional decoding

@vrozaev
Copy link
Collaborator

vrozaev commented Jun 28, 2024

Fixed in #533

@vrozaev vrozaev closed this as completed Jun 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
query-tracker All related to Queries page
Projects
None yet
Development

No branches or pull requests

4 participants