Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Linux] Veracrypt freeze at the end of big volume creation #474

Closed
Schweineschwarte opened this issue Jul 22, 2019 · 13 comments
Closed

[Linux] Veracrypt freeze at the end of big volume creation #474

Schweineschwarte opened this issue Jul 22, 2019 · 13 comments

Comments

@Schweineschwarte
Copy link

Schweineschwarte commented Jul 22, 2019

Hello,
if I create a "small" volume with Veracrypt 1.23 for Linux 64 bit, it works without problems.
Now, I have a new external Seagate HDD with 2 TB and I test it with SeaChest without any errors. If I want to encrypt this HDD as a partition/drive, the Veracrypt GUI freeze at the end of the volume creation (after the bar reached 100% - see image). Same problem with the console version of Veracrypt 1.23 (I tested it over night - time enough to soothe).
I can create a normal partition with fdisk and can create a filesystem with mkfs.ext4 without errors. After, I create a 1,6 TB Container in this partition, but the Veracrypt GUI freeze at the end again.
I am not sure, if Veracrypt freeze complete or only the GUI/console output message. If I unplug the external HDD, some kworkers need much CPU performance. I find no trouble reports in dmesg, so I think Veracrypt have some trouble with big volumes.

My system: openSUSE 15.0 64 bit with KDE
Hardware: https://pastebin.com/3G47NSzm

Freeze image:

Screenshot_20190722_203049

@Schweineschwarte
Copy link
Author

I have test it with a new computer, with AMD Ryzen 5 2600 processor and openSUSE 15.0 64 bit with KDE.
While the encryption I log the temperature with "sensors" and the CPU load with "ps". Short before the freeze,the speaker beep and the Konsole calls the message
Message from syslogd@linux-9ilm at Aug 25 15:36:42 ...
kernel: 15264.337904] NMI watchdog: BUG soft lockup - CPU#2 stuck for 23s! [ksoftirqd/2:22]
This happen 4 times. At this times the computer stuck and lags hard. After, the computer don't response. (see the image below)

At beginning the temperature have the following values:

Temperatur
acpitz-acpi-0
Adapter: ACPI interface
temp1: +16.8°C (crit = +20.8°C)

amdgpu-pci-0900
Adapter: PCI adapter
fan1: 995 RPM
temp1: +47.0°C (crit = +0.0°C, hyst = +0.0°C)

k10temp-pci-00c3
Adapter: PCI adapter
Tdie: +43.0°C (high = +70.0°C)
Tctl: +43.0°C`

At the end, the temperatures are:
`Temperatur
acpitz-acpi-0
Adapter: ACPI interface
temp1: +16.8°C (crit = +20.8°C)

amdgpu-pci-0900
Adapter: PCI adapter
fan1: 1001 RPM
temp1: +43.0°C (crit = +0.0°C, hyst = +0.0°C)

k10temp-pci-00c3
Adapter: PCI adapter
Tdie: +60.1°C (high = +70.0°C)
Tctl: +60.1°C

I think, this isn't too high. AMD says the Max Temps is 95°C.
https://www.amd.com/de/products/cpu/amd-ryzen-5-2600

The CPU load at beginning:

%CPU %MEM ARGS So 25. Aug 13:36:36 CEST 2019
0.5 0.0 [kswapd0]
0.7 0.0 [dmcrypt_write]
1.2 1.4 /usr/bin/plasmashell
1.3 0.0 [kworker/6:1]
1.6 0.7 /usr/bin/kwin_x11
1.8 0.0 [wlan0]
2.5 0.5 /usr/bin/X
2.8 0.0 [ksoftirqd/2]
56.6 0.0 [kworker/u64:1]
72.7 0.2 /usr/bin/veracrypt

The CPU load at the end:

%CPU %MEM ARGS So 25. Aug 15:42:37 CEST 2019
1.3 1.4 /usr/bin/plasmashell
1.5 0.0 [ksoftirqd/7]
1.6 0.8 /usr/bin/kwin_x11
2.6 0.5 /usr/bin/X
3.0 0.0 [dmcrypt_write]
4.0 0.0 [ksoftirqd/2]
4.0 0.0 [kworker/6:2]
11.0 0.0 [wlan0]
26.1 0.2 /usr/bin/veracrypt
84.3 0.0 [kworker/u64:1]

The load of veracrypt and kworker have been interchanged. At beginning VC 72.7, kworker 56.6 and at the end VC 26.1 and kworker 84.3.

The new system:
openSUSE 15.0 64 bit with KDE
Hardware:
https://pastebin.com/KvsqqnjQ

Veracrypt_Watchdog_klein

@Schweineschwarte
Copy link
Author

Both times I want to encrypt with AES(Twofish) and SHA-512.
The Benchmark tell some huge higher speed as Veracrypt use for encryption. The encryption speed is only at the beginning high, but break down very quickly.

Benchmark:
Screenshot_20190825_163832

@alt3r-3go
Copy link
Contributor

@Schweineschwarte, thanks for providing details of the problem you observe and doing additional exploration. That low performance bug is unlikely to be related to this one you're observing, at the face of it anyway. The one there doesn't cause any stalls, it's just a benchmark producing unexpectedly low result.

So let's look into this one here in more detail. Those soft lockup messages should also be accompanied by stack traces - could you please post either your syslog excerpts for those (full stack traces together with soft lockup messages) or [preferred, at it will provide better picture] full log output starting from the machine boot, then with VeraCrypt starting and doing the operation that gets stuck for you and ends in a soft lockup. That should provide additional information for troubleshooting.

The temperature doesn't look like a problem in this one - the values are in the "okay-ish" zone and overheating wouldn't cause soft lockups anyway, that must be a purely SW-level problem. The temperature increase per se it also expected - your CPU is doing additional work of encryption after all.

@Schweineschwarte
Copy link
Author

Schweineschwarte commented Sep 10, 2019

@alt3r-3go
Here some log files. The external HDD, who should be encrypt, is sdc. This run I break up at the end, because the computer was very very slow, but don't crash. The mouse didn't work but I could enable and disable the Num-Lock light. So I don't think the computer crashed, at this moment. But it was impossible to work with this machine at this moment. The volume creation speed breaked down to 11 MB/s. So I think the computer have reached the status which we want to observe. It comes no watchdog message at this end. But I see some errors messages in the log file who could be interest you.

dmesg before volume creation starts:
https://pastebin.com/na2PiaDW

journalctl bevore volume creation starts:
https://pastebin.com/dgnMJgT1

dmesg log active on volume creation (starts with equal values):
https://pastebin.com/CkjYhsHV

journalctl log active on volume creation:
https://pastebin.com/MUj4adPE

/var/log/warn:
https://pastebin.com/z1QRFx7x

/var/log/messages (too big for pastebin):
https://gist.github.com/Schweineschwarte/96c463d67ab4d7b2ff5d1ee690a059e7

@Schweineschwarte
Copy link
Author

Here, you can see the /var/log/warn of 25th August 2019, with the soft lockup messages.
https://gist.github.com/Schweineschwarte/f1a6a1ff385fd3cd77b478d968eba3cd

@alt3r-3go
Copy link
Contributor

alt3r-3go commented Sep 11, 2019

Thanks, that helps a lot. I don't have time to look in all the details this week, but what I can see at the first scan of the dmesg and the warn log - this actually doesn't look like VeraCrypt driver at all, but reminds me of a bad sector (or a set thereof) on the disk drive.

The USB and SCSI drivers scream errors when writing and they are both "below" VeraCrypt driver. Plus, soft lockup looks like a natural consequence in this case, because the drives, especially "spinning rust"-type as you seem to have here, tend to stall the I/O operation trying to read (write) the sector again and again, instead of just returning an error. That in turn leads to the driver getting stuck in the IO wait and then it gets noticed by the scheduler eventually, manifested as a soft lockup error. And Linux used to be rather allergic to prolonged IO waits (in my experience, anyway, and that's from a while ago), so general OS stalls and all sorts of glitches are expected.

So please run a full bad sector check on your drive - there's usually a vendor utility for that, Linux also has some, Windows disk tools also can do that - but it would be best to do it on a physical host, not the virtual machine as you seem to have here (there are Virtual Box drivers trying to load themselves anyway, so this is a guess) to prevent any additional complications from the VMM middleman.

@Schweineschwarte
Copy link
Author

Schweineschwarte commented Sep 13, 2019

I have tested the external Seagate HDD again (with my Linux "host system", not in a virtual machine) but SeaChest can't find any errors.

Available devices:
https://pastebin.com/k3eeMnai

Device information:
https://pastebin.com/xVjscpfH

SMART check (unsupported):
https://pastebin.com/TESafZTN

SMART error log (unsupported)
https://pastebin.com/P0ETyJcS

Long generic test:
https://pastebin.com/dmhP0Qss

I have saved the full log of the long generic test, but the log file have a size of 1,9 GiB (a bit to much for pastebin 😄 ). If you want to see this file I can upload it on an file hoster. But you will see only "Reading LBA: 0" until "Reading LBA: 3907029120".

@alt3r-3go
Copy link
Contributor

Thanks. That's interesting then. One other reason, though IMHO much less likely, is power brownout during more intensive operations, but that would be harder to test. Is this drive powered directly from USB or has a separate power adapter?

@git70
Copy link

git70 commented Sep 25, 2019

I thought it might be related to your problems:
keepassxreboot/keepassxc#3569
keepassxreboot/keepassxc#3415
Common features: AMD Ryzen + OpenSUSE
Maybe it's worth checking ...

@Schweineschwarte
Copy link
Author

Schweineschwarte commented Sep 27, 2019

@alt3r-3go
Thank you for your efforts :) Yes, this device is directly connected to my front USB, without any adapters.
I observed the HDD connection is lost if I check the HDD with the program "badblocks" or I want to create a sha512sum over a very big file (2TB).
I have done some more tests, but I am not finished yet and I have no time to do this in the next two weeks. If I am ready I will post it here. ;)

@git70
I will look, if it can help. Thanks!

@alt3r-3go
Copy link
Contributor

Thanks and sure, take your time. This indeed sounds like insufficient power (errors or malfunction under load), something that happens frequently with those external drives that are powered only from the USB, despite the manufacturer's advertising. I, for one, always buy those with additional external power adapter, because of that - not that it's convenient, oh well.

@Schweineschwarte
Copy link
Author

Schweineschwarte commented Jan 17, 2020

After a long time I want to report me back.
I did some more things and I think I had multiple problems. First, my front USB ports are not very stable. I had recognize some problems with my WiFi-stick, if I copy huge amounts of files, with an other hard disk (which had external power supply). The other hard disk with external power supply works fine, but the WiFi-stick had some connection trouble at this time. So, I connected the problem HDD on the backside, but the problem HDD had some trouble, too (encryption didn‘t work, breakup on huge file checksums etc.). Then, I wanted to check, if the origin of this problems is the HDD-controller or the case controller. I removed the HDD of the case and I buyed an USB-Y-cable (1x power, 1x data) to SATA-connection (DeLOCK Konverter SATA-22-Pin zu USB-3.0-/2.0, Adapter) and connected the extracted HDD to my USB-ports on the backside. I was surprised as I saw in SeaChest, the HDD have “now“ SMART-support… Now, I can encrypt my HDD and can check huge files with checksums etc. So I think, the second problem was the case controller (maybe, the low power via one usb-port could be a third reason).
Thank you very much for your help!

@alt3r-3go
Copy link
Contributor

No worries, glad you've got it working now and thanks for reporting back, that's going to help other people in similar situations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants