Skip to content

In the case of a cache disk mistake read,the first read data is inconsistent #5981

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
YunhuiChen opened this issue Apr 15, 2025 · 5 comments · Fixed by #6103
Closed

In the case of a cache disk mistake read,the first read data is inconsistent #5981

YunhuiChen opened this issue Apr 15, 2025 · 5 comments · Fixed by #6103
Assignees
Milestone

Comments

@YunhuiChen
Copy link
Contributor

YunhuiChen commented Apr 15, 2025

What happened:
1、 mount juicefs
bin/mount.juicefs redis://redis.kube-system:6379/0 /jfs/pvc-64dab018-ed69-4f82-a1aa-52ff8c82a5fb-umvyyf -o enable-cap,writeback=false,cache-dir=/data/cache,cache-items=10,enable-xattr,max-readahead=100,subdir=pvc-64dab018-ed69-4f82-a1aa-52ff8c82a5fb

2、Simulate cache disk mistake read using chaosmesh

    - name: juicefs-io-mistake-read
      templateType: IOChaos
      ioChaos:
        action: mistake
        mode: one
        selector:
          namespaces:
            - kube-system
          labelSelectors:
            app.kubernetes.io/name: juicefs-mount
          labelSelectors:
            chaostest: "true"
          nodes:
            - chaos-k8s-001
        volumePath: /data/cache
        path: /data/cache/**/*
        mistake:
          filling: random
          maxOccurrences: 1
          maxLength: 1
        methods:
          - READ
        percent: 100

2、Data consistency check using vdbench

3、Data inconsistency occurred:

04:09:58.839 localhost-0: 04:09:58.825    All 1 sectors in this key block are corrupted.
04:09:58.840 localhost-0: 04:09:58.825    All corruptions are of the same type:
04:09:58.840 localhost-0: 04:09:58.825    ===> Compression pattern miscompare.
04:09:58.840 localhost-0: 04:09:58.825    Only the FIRST sector will be reported:
04:09:58.840 localhost-0: 04:09:58.825
04:09:58.840 localhost-0: 04:09:58.825         Data Validation error for fsd=fsd1; FSD lba: 0x2df01400; Key block size: 512; relative sector in data block: 0x00
04:09:58.841 localhost-0: 04:09:58.825         File name: /data/vdb.1_3.dir/vdb_f0002.file; file block lba: 0x05500000; bad sector file lba: 0x05501400
04:09:58.841 localhost-0: 04:09:58.827 0x000   00000000 2df01400 ........ ........   00000000 2df01400 00000196 37a37a18
04:09:58.841 localhost-0: 04:09:58.827 0x010   01..0000 31647366 20202020 00000000   01030000 31647366 20202020 0000001f
04:09:58.842 localhost-0: 04:09:58.828 0x0c0*  7f3850fa 3b59e396 1abbfdd2 01374e77   7f3850fa 3b59e396 1abbfdd2 01374eec
04:09:58.846 localhost-0: 04:09:58.833 Key block lba: 0x2df24800
04:09:58.846 localhost-0: 04:09:58.833    Key block of 512 bytes has 1 512-byte sectors.
04:09:58.847 localhost-0: 04:09:58.833    Timeline:
04:09:58.847 localhost-0: 04:09:58.833    Tue Apr 15 2025 04:09:42.680 GMT Sector last written. (As found in the first corrupted sector, timestamp is taken just BEFORE the actual write).
04:09:58.847 localhost-0: 04:09:58.833    Tue Apr 15 2025 04:09:58.758 GMT Key block first found to be corrupted during a read-before-write.
04:09:58.848 localhost-0: 04:09:58.833
04:09:58.848 localhost-0: 04:09:58.833    All 1 sectors in this key block are corrupted.
04:09:58.848 localhost-0: 04:09:58.833    All corruptions are of the same type:
04:09:58.848 localhost-0: 04:09:58.834    ===> Compression pattern miscompare.
04:09:58.849 localhost-0: 04:09:58.834    Only the FIRST sector will be reported:
04:09:58.849 localhost-0: 04:09:58.834
04:09:58.850 localhost-0: 04:09:58.834         Data Validation error for fsd=fsd1; FSD lba: 0x2df24800; Key block size: 512; relative sector in data block: 0x00
04:09:58.850 localhost-0: 04:09:58.834         File name: /data/vdb.1_3.dir/vdb_f0002.file; file block lba: 0x05500000; bad sector file lba: 0x05524800
04:09:58.850 localhost-0: 04:09:58.842 0x000   00000000 2df24800 ........ ........   00000000 2df24800 00000196 37a37a18
04:09:58.851 localhost-0: 04:09:58.843 0x010   01..0000 31647366 20202020 00000000   01030000 31647366 20202020 0000001f
04:09:58.851 localhost-0: 04:09:58.844 0x140*  5fa8f949 168a7473 6b723be5 3cebdcd8   5fa84449 168a7473 6b723be5 3cebdcd8
04:09:58.859 localhost-0: 04:09:58.850 Key block lba: 0x2df4c000

4、md5sum file,The data in the first read is inconsistent:

root@dynamic-ce-juicefs-85545b6bfc-844z6:/data/vdb.1_3.dir# md5sum vdb_f0001.file
51fc69b3d5030f5dafc1ea00cced02fc  vdb_f0001.file
root@dynamic-ce-juicefs-85545b6bfc-844z6:/data/vdb.1_3.dir#
root@dynamic-ce-juicefs-85545b6bfc-844z6:/data/vdb.1_3.dir# md5sum vdb_f0001.file
febf97c778d7a4fe3d586dd84fa5c767  vdb_f0001.file
root@dynamic-ce-juicefs-85545b6bfc-844z6:/data/vdb.1_3.dir#
root@dynamic-ce-juicefs-85545b6bfc-844z6:/data/vdb.1_3.dir# md5sum vdb_f0001.file
febf97c778d7a4fe3d586dd84fa5c767  vdb_f0001.file
root@dynamic-ce-juicefs-85545b6bfc-844z6:/data/vdb.1_3.dir# md5sum vdb_f0001.file
febf97c778d7a4fe3d586dd84fa5c767  vdb_f0001.file
root@dynamic-ce-juicefs-85545b6bfc-844z6:/data/vdb.1_3.dir# md5sum vdb_f0001.file
febf97c778d7a4fe3d586dd84fa5c767  vdb_f0001.file

What you expected to happen:

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?

Environment:

  • JuiceFS version (use juicefs --version) or Hadoop Java SDK version:
  • root@chaos-k8s-001:~/chenyunhui/juicefs# ./juicefs version
    juicefs version 1.3.0-dev+2025-04-14.196db13e
  • Cloud provider or hardware configuration running JuiceFS:
  • OS (e.g cat /etc/os-release):
  • Kernel (e.g. uname -a):
  • Object storage (cloud provider and region, or self maintained):
  • Metadata engine info (version, cloud provider managed or self maintained):
  • Network connectivity (JuiceFS to metadata engine, JuiceFS to object storage):
  • Others:
@YunhuiChen YunhuiChen added the kind/bug Something isn't working label Apr 15, 2025
@jiefenghuang
Copy link
Contributor

you can adjust the checksum mode

@jiefenghuang jiefenghuang self-assigned this Apr 18, 2025
@davies
Copy link
Contributor

davies commented Apr 24, 2025

we should change the default checksum level to extend

@davies davies reopened this Apr 24, 2025
@davies davies removed the kind/bug Something isn't working label Apr 24, 2025
@davies davies added this to the Release 1.3 milestone Apr 24, 2025
@YunhuiChen
Copy link
Contributor Author

No data errors have occurred since it was changed to extend

@jiefenghuang
Copy link
Contributor

jiefenghuang commented Apr 24, 2025

we should change the default checksum level to extend

It increases random read amplification and affects performance,is it necessary? @davies

@davies
Copy link
Contributor

davies commented Apr 24, 2025

I think data integrity has higher priority

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants