Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Out-of-order logs for large split files with timestamped and un-timestamped sections #236

Open
davemarco opened this issue Jan 20, 2024 · 0 comments
Labels
bug Something isn't working

Comments

@davemarco
Copy link
Contributor

Bug

Timestamped and un-timestamped segments are built in parallel, and clp relies on their sequential order for decompression. As a result, large split files with timestamped and un-timestamped sections may have out-of-order logs vs. original when compressed/decompressed.

CLP version

776fc3a

Environment

Ubuntu 22.04

Reproduction steps

  1. Copy HDFS_1 from Loghub-dataset (log file without timestamps)
  2. Append logs with timestamps to the end of HDFS.log
    • For example take Hadoop logs from Loghub-dataset
    • cat Hadoop/application_1445062781478_0011/container_1445062781478_0011_01_000001.log >> HDFS.log
  3. Copy HDFS_1 and Hadoop directories into a new directory
    • Copy HDFS_1 first, then Hadoop, so Hadoop gets compressed first
    • Purpose of Hadoop directory is to create a segment with timestamps first in sequential order
  4. Compress/decompress directory
  5. Beginning of compressed/decompressed HDFS.log is out of order
    • head HDFS.log, should return 081111... instead of 081109...

Note: This will only reproduce bug after PR #231 is merged fixing issue #167. If not, issue #167 will create a similar output.

@davemarco davemarco added the bug Something isn't working label Jan 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant