Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

the graphd process will crash on the server of CentOS6 #3356

Closed
Donald-Su opened this issue Nov 25, 2021 · 2 comments
Closed

the graphd process will crash on the server of CentOS6 #3356

Donald-Su opened this issue Nov 25, 2021 · 2 comments
Assignees
Labels
type/bug Type: something is unexpected
Milestone

Comments

@Donald-Su
Copy link

Donald-Su commented Nov 25, 2021

Describe the bug (must be provided)

  • the graphd process will crash on server of CentOS6, and on server of CentOS7 work well

Your Environments (must be provided)

  • OS: CentOS release 6.6 (Final)
  • nebula version: v2.6.1

How To Reproduce(must be provided)

Steps to reproduce the behavior:

  1. Step 1: modify the param of system_memory_high_watermark_ratio in the config file of nebula-storaged.conf
  2. Step2: Start the nebula graphd, the process will crash

other info

  • the log:
Log file created at: 2021/11/08 16:05:13
Running on machine: TEST
Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
[2021-11-08 16:05:13.573] [INFO] [04270] [GraphDaemon.cpp] >>> [GraphDaemon] msg=line 125, Starting Graph HTTP Service
[2021-11-08 16:05:13.582] [INFO] [04276] [WebService.cpp] >>> [WebService] msg=line 125, Web service started on HTTP[19669], HTTP2[19670]
[2021-11-08 16:05:13.583] [INFO] [04270] [GraphDaemon.cpp] >>> [GraphDaemon] msg=line 139, Number of networking IO threads: 32
[2021-11-08 16:05:13.583] [INFO] [04270] [GraphDaemon.cpp] >>> [GraphDaemon] msg=line 148, Number of worker threads: 32
[2021-11-08 16:05:13.583] [INFO] [04270] [MetaClient.cpp] >>> [MetaClient] msg=line 58, Create meta client to "127.0.0.1":9559
[2021-11-08 16:05:15.617] [INFO] [04270] [MetaClient.cpp] >>> [MetaClient] msg=line 3013, Load leader ok
[2021-11-08 16:05:15.618] [INFO] [04270] [MetaClient.cpp] >>> [MetaClient] msg=line 118, Register time task for heartbeat!
[2021-11-08 16:05:15.621] [FATAL] [04270] [MemoryUtils.cpp] >>> [MemoryUtils] msg=line 57, Check failed: memorySize.size() == 2U (1 vs. 2)
  • the debug info of graphd process
(gdb) bt
#0  0x00007ff44061b9d9 in raise () from /lib64/libc.so.6
#1  0x00007ff44061d0e8 in abort () from /lib64/libc.so.6
#2  0x0000000001cb6714 in google::LogMessage::Fail() ()
#3  0x0000000001cb6671 in google::LogMessage::SendToLog() ()
#4  0x0000000001cb5fd4 in google::LogMessage::Flush() ()
#5  0x0000000001cb943e in google::LogMessageFatal::~LogMessageFatal() ()
#6  0x00000000017cfd2c in nebula::MemoryUtils::hitsHighWatermark() ()
#7  0x0000000000d82b60 in nebula::graph::QueryEngine::setupMemoryMonitorThread()::{lambda()#1}::operator()() const [clone .isra.218] ()
#8  0x0000000000d82d8c in nebula::graph::QueryEngine::setupMemoryMonitorThread() ()
#9  0x0000000000d832a6 in nebula::graph::QueryEngine::init(std::shared_ptr<folly::IOThreadPoolExecutor>, nebula::meta::MetaClient*) ()
#10 0x0000000000d62f08 in nebula::graph::GraphService::init(std::shared_ptr<folly::IOThreadPoolExecutor>, nebula::HostAddr const&) ()
#11 0x0000000000d0b586 in main ()

The Root Cause

  • the new version of nebula v2.6.x used the new way(MemAvailable) to estimate the available memory at the function of MemoryUtils::hitsHighWatermark(), however it's not available on the CentOS6, pls check from the git commit: /proc/meminfo: provide estimated available memory
@icella
Copy link

icella commented Jan 5, 2022

遇到类似的问题,但centos版本是CentOS Linux release 7.6.1810 (Core),其他配置如下:

  • nebula 版本 v2.6.1
  • 3台 内存32G 16core 阿里云 ECS

过程

执行一个稠密点match query, graph进程的内存一直上涨,然后进程exit

graphd-stderr.log:

616864 E1230 16:52:54.249279 25636 QueryInstance.cpp:108] N5folly13BrokenPromiseE: Broken promise for type name `nebula::Status`
616865 E1230 18:41:18.204828  5209 StorageAccessExecutor.h:41] GetVerticesExecutor failed, error E_LEADER_CHANGED, part 11
616866 E1230 18:41:18.208159  5216 QueryInstance.cpp:108] Storage Error: The leader has changed. Try again later
616867 E1230 18:41:32.132153  5210 StorageAccessExecutor.h:41] GetNeighborsExecutor failed, error E_LEADER_CHANGED, part 8
616868 E1230 18:41:32.132206  5210 StorageAccessExecutor.h:41] GetNeighborsExecutor failed, error E_LEADER_CHANGED, part 5
616869 E1230 18:41:32.132216  5210 StorageAccessExecutor.h:41] GetNeighborsExecutor failed, error E_LEADER_CHANGED, part 12
616870 E1230 18:41:32.132226  5210 StorageAccessExecutor.h:41] GetNeighborsExecutor failed, error E_LEADER_CHANGED, part 2
616871 E1230 18:41:32.132236  5210 StorageAccessExecutor.h:41] GetNeighborsExecutor failed, error E_LEADER_CHANGED, part 13
616872 E1230 18:41:32.132246  5210 StorageAccessExecutor.h:41] GetNeighborsExecutor failed, error E_LEADER_CHANGED, part 3
616873 E1230 18:41:32.132257  5210 StorageAccessExecutor.h:41] GetNeighborsExecutor failed, error E_LEADER_CHANGED, part 10
616874 E1230 18:41:32.132266  5210 StorageAccessExecutor.h:41] GetNeighborsExecutor failed, error E_LEADER_CHANGED, part 15
616875 E1230 18:41:32.132278  5210 StorageAccessExecutor.h:41] GetNeighborsExecutor failed, error E_LEADER_CHANGED, part 7
616876 E1230 18:41:32.132334  5216 QueryInstance.cpp:108] Storage Error: The leader has changed. Try again later

@Sophie-Xie
Copy link
Contributor

  • 3台 内存32G 16core 阿里云 ECS

This should be a different issue, please create a new issue to track. Thanks.

yixinglu pushed a commit to yixinglu/nebula that referenced this issue Mar 21, 2022
#### What type of PR is this?
- [* ] bug
- [ ] feature
- [ ] enhancement

#### What does this PR do?
fix meminfo bug

#### Which issue(s)/PR(s) this PR relates to?
fix issue vesoft-inc#3356
  
#### Special notes for your reviewer, ex. impact of this fix, etc:


#### Additional context/ Design document:


#### Checklist:
- [ ] Documentation affected (Please add the label if documentation needs to be modified.)
- [ ] Incompatibility (If it breaks the compatibility, please describe it and add the corresponding label.)
- [ ] If it's needed to cherry-pick (If cherry-pick to some branches is required, please label the destination version(s).)
- [ ] Performance impacted: Consumes more CPU/Memory

#### Release notes:

Please confirm whether to be reflected in release notes and how to describe:
>                                                                 `


Migrated from vesoft-inc#3534

Co-authored-by: yuehua.jia <3423893+jiayuehua@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/bug Type: something is unexpected
Projects
None yet
Development

No branches or pull requests

5 participants