Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

clp-s is truncating the json bytes while compression #289

Open
satya256 opened this issue Feb 16, 2024 · 4 comments
Open

clp-s is truncating the json bytes while compression #289

satya256 opened this issue Feb 16, 2024 · 4 comments
Labels
bug Something isn't working

Comments

@satya256
Copy link

Bug

I am trying to compress the json input file which contains the larger json (may be larger than 1MB )per line and the total file size is around 100 MB and outputting the following error lines

[error] Truncated JSON (323 bytes) at end of file

CLP version

a7368cf

Environment

Ubuntu 22.04

Reproduction steps

Input the json file with more than 30MB in size and also keep larger json lines which are more than 1MB

@satya256 satya256 added the bug Something isn't working label Feb 16, 2024
@satya256
Copy link
Author

satya256 commented Feb 16, 2024

If I change the buff size to say 300 MB from 1 MB the above issue is not observed

https://github.com/y-scope/clp/blob/main/components/core/src/clp_s/JsonFileIterator.hpp#L25

@gibber9809
Copy link
Contributor

Hi @satya256 thanks for the report. We're looking into this and putting together some changes to make the JSON parser a bit more robust.

I just have some clarifying questions that should help us narrow down the specific issue you've run into.

  1. Does your JSON log data contain UTF-8 characters?
  2. Is your JSON log data new-line delimited or delimited another way, and do JSON records ever contain a newline in the middle?

@bb-rajakarthik
Copy link

hi @gibber9809. To answer your questions on behalf of @satya256,

  1. Yes all characters in the JSONs are UTF-8
  2. Yes our JSON data is new line delimited and we do have newlines in the middle which are escaped

@gibber9809
Copy link
Contributor

Hey @bb-rajakarthik and @satya256, we merged #310 which significantly improves error handling and error reporting during compression. The issue you ran into should be fixed, but please let us know if you're still encountering any issues with compression.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants