Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ZFS pool corrupt, core dump on running zfs list: "internal error: Invalid exchange", can no longer zpool import. #6805

Closed
AaronFriel opened this issue Nov 1, 2017 · 7 comments
Labels
Bot: Not Stale Override for the stale bot

Comments

@AaronFriel
Copy link

AaronFriel commented Nov 1, 2017

System information

Type Version/Name
Distribution Name Ubuntu
Distribution Version 17.10
Linux Kernel 4.13.0-16
Architecture x86_64
ZFS Version 0.6.5.11-1ubuntu3
SPL Version 0.6.5.11-1ubuntu1

Describe the problem you're observing

Found my continuous integration service using Docker in Docker was not working, this was the cause:

➜ docker logs e038e8d38a06
... (miscellaneous)
time="2017-11-01T12:19:54.982955029Z" level=info msg="Setting the storage driver from the $DOCKER_DRIVER environment variable (zfs)"
Error starting daemon: error initializing graphdriver: Cannot find root filesystem main/docker: signal: aborted (core dumped): "/usr/sbin/zfs zfs list -rHp -t filesystem -o name,origin,used,available,mountpoint,compression,type,volsize,quota,written,logicalused,usedbydataset main/docker" => internal error: No error information

So I tried running the command myself:

➜ sudo zfs list -rHp -t filesystem -o name,origin,used,available,mountpoint,compression,type,volsize,quota,written,logicalused,usedbydataset main/docker
internal error: Invalid exchange
[1]    89325 abort      sudo zfs list -rHp -t filesystem -o  main/docker

Upon rebooting, the ZFS pool was not imported, and attempts to import were met with:

➜ sudo zpool import -aN
cannot import 'main': one or more devices is currently unavailable

However, all of the devices are available: there is a zil (mounted at /dev/vg-main/log) and a data volume (mounted at /dev/vg-main/capacity). Both are available, as seen here:

➜ sudo zpool import -F -n -m
   pool: main
     id: 8637001018325559436
  state: ONLINE
 action: The pool can be imported using its name or numeric identifier.
 config:

        main                 ONLINE
          vg--main-capacity  ONLINE
        logs
          log                ONLINE

And I cannot seem to force the pool to import, because of this invalid exchange error:

➜ sudo zpool import -a -f -m -F
internal error: Invalid exchange
[1]    30818 abort      sudo zpool import -a -f -m -F

Describe how to reproduce the problem

No idea. Needed to reboot system to see if that would clear it up. It did not.

Include any warning/errors/backtraces from the system logs

No warnings or errors other than the zpool import one above, and the sudden inability to run zfs list commands.

@AaronFriel AaronFriel changed the title Error running zfs list, core dump: "internal error: Invalid exchange" ZFS pool corrupt, core dump on running zfs list: "internal error: Invalid exchange", can no longer zpool import. Nov 1, 2017
@AaronFriel
Copy link
Author

Further searches yielded a number of other "Invalid exchange" errors, e.g.: #4360 #6597.

To clarify this error, I am not using ZFS encryption. The ZFS volume is set up on two logical volumes, on redundant storage (cloud provider's triply replicated disks.) The corruption occurred while the server was running (see: Docker errors).

@behlendorf
Copy link
Contributor

@AaronFriel it appears that somehow the pool has been corrupted. The "Invalid exchange" error you're seeing during zpool import is EBADE which was what ZFS uses internally to report a checksum error. Normally these errors get automatically repaired when the pool is configured with redundancy, but in this case that's not possible and the error is bubbling up and causing this mysterious error message.

Depending on how extensive the damage is to the pool you can try using -Fn or -FXn to attempt to rewind the pool to an importable state.

@bazzawill
Copy link

Sorry to revive an old thread but this was the first thread I came across google zfs invalid exchange.
I have tried -Fn and -FXn both complete with no output but I am still unable to import my pool
Hoping someone is still monitoring this to point me in the right direction
I am on arch linux

@stale
Copy link

stale bot commented Aug 24, 2020

This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the Status: Stale No recent activity for issue label Aug 24, 2020
@nh2
Copy link

nh2 commented Oct 3, 2020

Happens to me too:

I have ZFS on a single SSD on my laptop (NixOS 20.03, Linux 5.4.61) with ZFS encryption enabled (no dedup or compression). The laptop ran out of power and afterwards I have multiple Permanent errors in zpool status -v. i suspect that this is because SSDs without power loss protection capacitors can corrupt data on power loss. Migrating to an enterprise SSD hat has power loss protection, I get this when making a backup (to be transferred to the new SSD):

$ sudo zfs send -R --raw rpool@before-ssd-upgrade | nc othermachine 1235
internal error: Invalid exchange

I had run echo 1 > /sys/module/zfs/parameters/zfs_send_corrupt_data before to allow creating the backup with some corrupted files, to avoid the warning: cannot send 'rpool/home@before-ssd-upgrade': Input/output error that would stop the zfs send).

So right now, it seems impossible to back up the file system using zfs send.

@stale stale bot removed the Status: Stale No recent activity for issue label Oct 3, 2020
@helamonster
Copy link

I am encountering this error while attempting to import a pool. The pool was degraded and in the process of repairing when the system locked up so had to be power cycled. Afterwards, the pool won't import:

root@daytapod1 ~ $ zpool import
   pool: dpool
     id: 14246488695543634538
  state: FAULTED
status: One or more devices contains corrupted data.
 action: The pool cannot be imported due to damaged devices or data.
        The pool may be active on another system, but can be imported using
        the '-f' flag.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-5E
 config:

        dpool                                            FAULTED  corrupted data
          raidz2-0                                       FAULTED  corrupted data
            sde                                          ONLINE
            sdd                                          ONLINE
            sdb                                          ONLINE
            sdc                                          ONLINE
        cache
          ata-Samsung_SSD_850_PRO_512GB_S250NX0H606935A

root@daytapod1 ~ $ zpool import dpool
internal error: cannot import 'dpool': Invalid exchange
Aborted

root@daytapod1 ~ $ zpool import -F dpool
internal error: cannot import 'dpool': Invalid exchange
Aborted

This is on an HP Microserver Gen10 by the way, on which the storage controller seems to be extremely unstable. I recommend not using that device if you have the option.

Is there any hope of recovering?

@stale
Copy link

stale bot commented Feb 11, 2022

This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the Status: Stale No recent activity for issue label Feb 11, 2022
@stale stale bot closed this as completed Jun 19, 2022
@ryao ryao added Bot: Not Stale Override for the stale bot and removed Status: Stale No recent activity for issue labels Sep 15, 2022
@ryao ryao reopened this Sep 15, 2022
andrewc12 pushed a commit to andrewc12/openzfs that referenced this issue Sep 27, 2022
Add a meaningful error message for ECKSUM to common error messages.

Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes openzfs#6805 
Closes openzfs#13808
Closes openzfs#13898
andrewc12 pushed a commit to andrewc12/openzfs that referenced this issue Oct 1, 2022
Add a meaningful error message for ECKSUM to common error messages.

Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes openzfs#6805 
Closes openzfs#13808
Closes openzfs#13898
andrewc12 pushed a commit to andrewc12/openzfs that referenced this issue Oct 1, 2022
Add a meaningful error message for ECKSUM to common error messages.

Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes openzfs#6805 
Closes openzfs#13808
Closes openzfs#13898
andrewc12 pushed a commit to andrewc12/openzfs that referenced this issue Oct 1, 2022
Add a meaningful error message for ECKSUM to common error messages.

Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes openzfs#6805 
Closes openzfs#13808
Closes openzfs#13898
andrewc12 pushed a commit to andrewc12/openzfs that referenced this issue Oct 1, 2022
Add a meaningful error message for ECKSUM to common error messages.

Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes openzfs#6805 
Closes openzfs#13808
Closes openzfs#13898
andrewc12 pushed a commit to andrewc12/openzfs that referenced this issue Oct 1, 2022
Add a meaningful error message for ECKSUM to common error messages.

Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes openzfs#6805 
Closes openzfs#13808
Closes openzfs#13898
andrewc12 pushed a commit to andrewc12/openzfs that referenced this issue Oct 1, 2022
Add a meaningful error message for ECKSUM to common error messages.

Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes openzfs#6805 
Closes openzfs#13808
Closes openzfs#13898
tonyhutter pushed a commit to tonyhutter/zfs that referenced this issue Nov 23, 2022
Add a meaningful error message for ECKSUM to common error messages.

Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes openzfs#6805
Closes openzfs#13808
Closes openzfs#13898
tonyhutter pushed a commit to tonyhutter/zfs that referenced this issue Nov 30, 2022
Add a meaningful error message for ECKSUM to common error messages.

Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes openzfs#6805
Closes openzfs#13808
Closes openzfs#13898
tonyhutter pushed a commit to tonyhutter/zfs that referenced this issue Dec 1, 2022
Add a meaningful error message for ECKSUM to common error messages.

Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes openzfs#6805
Closes openzfs#13808
Closes openzfs#13898
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bot: Not Stale Override for the stale bot
Projects
None yet
Development

No branches or pull requests

6 participants