When the disk is almost full, the storage will crash #3423

handsonbao · 2021-12-07T05:57:02Z

Please check the FAQ documentation before raising an issue

Describe the bug (required)
When i used the importer to load SF300 into nebula, i found that some errors in the output log. I stoped it and found that the storage crashed. So i check everything including the log of storaged. But i found nothing.
Finally i found that the disk was almost full, and i deleted some data.
I restarted the storaged and reimported the SF300. It worked and i found no error.

When the disk is almost full, the storage will crash
Your Environments (required)
ent 2.6.1 c074eeb

OS: uname -a
4.18.0-305.7.1.el8_4.x86_64 Parser framework #1 SMP Tue Jun 29 21:55:12 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
Compiler: g++ --version or clang++ --version
g++ (GCC) 8.4.1 20200928 (Red Hat 8.4.1-1)
Copyright © 2018 Free Software Foundation, Inc.
CPU: lscpu
架构： x86_64
CPU 运行模式： 32-bit, 64-bit
字节序： Little Endian
CPU: 96
在线 CPU 列表： 0-95
每个核的线程数： 2
每个座的核数： 24
座： 2
NUMA 节点： 2
厂商 ID： GenuineIntel
BIOS Vendor ID: Intel(R) Corporation
CPU 系列： 6
型号： 85
型号名称： Intel(R) Xeon(R) Platinum 8255C CPU @ 2.50GHz
BIOS Model name: Intel(R) Xeon(R) Platinum 8255C CPU @ 2.50GHz
步进： 7
CPU MHz： 3100.013
CPU 最大 MHz： 3900.0000
CPU 最小 MHz： 1000.0000
BogoMIPS： 5000.00
虚拟化： VT-x
L1d 缓存： 32K
L1i 缓存： 32K
L2 缓存： 1024K
L3 缓存： 36608K
NUMA 节点0 CPU： 0-23,48-71
NUMA 节点1 CPU： 24-47,72-95
Commit id (e.g. a3ffc7d8)

How To Reproduce(required)

Steps to reproduce the behavior:

According to the "describe the bug"

Expected behavior

Additional context

The text was updated successfully, but these errors were encountered:

critical27 · 2021-12-07T06:08:59Z

Check if your storage log contains a log like "Failed to appendLogs because of no more space".

handsonbao · 2021-12-07T06:39:51Z

I found no word about "appendLogs".

handsonbao · 2021-12-15T10:34:49Z

I tested this on a single node just now. And I found this in the nebula-storage.INFO：

W1215 10:29:13.669871 10755 FileBasedWal.cpp:520] [Port: 9780, Space: 1, Part: 9] Failed to appendLogs because of no more space
W1215 10:29:13.669879 10755 RaftPart.cpp:718] [Port: 9780, Space: 1, Part: 9] Failed to write into WAL
W1215 10:29:13.669885 10755 RaftPart.cpp:731] [Port: 9780, Space: 1, Part: 9] Failed to write wal

The storaged finally crashed:
2021年 12月 15日星期三 10:33:10 UTC
[vesoft@handson scripts]$ sudo ./nebula.service status all
[INFO] nebula-metad(de03025): Running as 10702, Listening on 9559
[INFO] nebula-graphd(de03025): Running as 10716, Listening on 9669
[INFO] nebula-storaged(de03025): Exited

When in a cluster, if one node is almost full, the storaged on the node will crash.

Nivras · 2021-12-23T05:48:59Z

This because diskManager will get the space info and save the freebytes by 10 second. And everty time to write wal log, storaged will makesure freebytes < minimum_reserved_bytes, in this case, minimum_reserved_bytes is 256M, and write speed is too fast, so before diskManager update the freebytes, disk don't have enough space and it makes the storaged fataled.

I change the minimum_reserved_bytes to 2.5G, and test again, the storaged will return error when the disk left about 2.1G space. and won't crashed.

handsonbao added the type/bug Type: something is unexpected label Dec 7, 2021

Sophie-Xie assigned critical27 Dec 7, 2021

Sophie-Xie added this to the v3.0.0 milestone Dec 7, 2021

jamieliu1023 mentioned this issue Dec 11, 2021

Weekly Report 2021-12-10 vesoft-inc/nebula-community#68

Closed

Sophie-Xie added the need info Solution: need more information (ex. can't reproduce) label Dec 15, 2021

Sophie-Xie assigned Nivras and unassigned critical27 Dec 15, 2021

Sophie-Xie removed the need info Solution: need more information (ex. can't reproduce) label Dec 20, 2021

Nivras mentioned this issue Dec 28, 2021

add LogMonitor to check log disk freeBytes and change log level when space is almost full #3576

Merged

7 tasks

Sophie-Xie linked a pull request Jan 4, 2022 that will close this issue

add LogMonitor to check log disk freeBytes and change log level when space is almost full #3576

Merged

7 tasks

critical27 closed this as completed in #3576 Jan 10, 2022

jamieliu1023 mentioned this issue Jan 15, 2022

Weekly Report 2022-01-14 vesoft-inc/nebula-community#85

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

When the disk is almost full, the storage will crash #3423

When the disk is almost full, the storage will crash #3423

handsonbao commented Dec 7, 2021 •

edited by Sophie-Xie

Loading

critical27 commented Dec 7, 2021

handsonbao commented Dec 7, 2021

handsonbao commented Dec 15, 2021

Nivras commented Dec 23, 2021

When the disk is almost full, the storage will crash #3423

When the disk is almost full, the storage will crash #3423

Comments

handsonbao commented Dec 7, 2021 • edited by Sophie-Xie Loading

critical27 commented Dec 7, 2021

handsonbao commented Dec 7, 2021

handsonbao commented Dec 15, 2021

Nivras commented Dec 23, 2021

handsonbao commented Dec 7, 2021 •

edited by Sophie-Xie

Loading