Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ardb hangs after continual use for 48 hours #141

Closed
guateandrew opened this issue Dec 1, 2014 · 13 comments
Closed

Ardb hangs after continual use for 48 hours #141

guateandrew opened this issue Dec 1, 2014 · 13 comments

Comments

@guateandrew
Copy link

I seem to get around 24-28 hours of our ardb before it slows to a halt. It still accepts connections but returns nothing, eventually my client is timing out. This is despite not yet deploying it on production so very little volume of queries. Database size is around 40GB.

When I go to ardb and type "info" it halts and nothing is displayed. I can resolve this by killing the server and bringing it back up again, but not sustainable for being in production.

I previously saw this reported here and was closed. I am running the latest on the master branch

@yinqiwen
Copy link
Owner

yinqiwen commented Dec 1, 2014

is the server build with rocksdb as the storage engine? and what commands used in your application env?
there is similar issue in our production env. in our production env, the server would hang tens of minutes if too many write commands sent to server in a short time. to resovle this, wemake the server start with 2 listening port, read command would send to one, while write commands sent to another one.

@guateandrew
Copy link
Author

Yes, there is nothing custom about the setup, so RocksDB. I dont think we are sending it much load, mainly some scripts running against it but couldn't be more that 20-30 writes/second. Average write size is probably 1KB

@guateandrew
Copy link
Author

What would be the config for the 2 port setup? Sounds like a fix in the meantime. Thanks.

@yinqiwen
Copy link
Owner

yinqiwen commented Dec 1, 2014

just add 'listen' configuration in ardb.conf

@yinqiwen
Copy link
Owner

yinqiwen commented Dec 2, 2014

and can you use gdb attach running process to get the stack trace when the server hang next time?

@guateandrew
Copy link
Author

I'm experiencing the situation right now. I'm not that familiar with gdb: here is the output when i attach to the process but i'm not sure what to read into it:
Attaching to process 9482
Reading symbols from /ardb-disk/ardb/src/ardb-server...done.
Reading symbols from /lib/x86_64-linux-gnu/libpthread.so.0...Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/libpthread-2.19.so...done.
done.
[New LWP 12344]
[New LWP 9502]
[New LWP 9501]
[New LWP 9500]
[New LWP 9499]
[New LWP 9498]
[New LWP 9497]
[New LWP 9496]
[New LWP 9495]
[New LWP 9494]
[New LWP 9493]
[New LWP 9492]
[New LWP 9491]
[New LWP 9490]
[New LWP 9489]
[New LWP 9488]
[New LWP 9487]
[New LWP 9486]
[New LWP 9485]
[New LWP 9484]
[New LWP 9483]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Loaded symbols for /lib/x86_64-linux-gnu/libpthread.so.0
Reading symbols from /lib/x86_64-linux-gnu/libz.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib/x86_64-linux-gnu/libz.so.1
Reading symbols from /lib/x86_64-linux-gnu/librt.so.1...Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/librt-2.19.so...done.
done.
Loaded symbols for /lib/x86_64-linux-gnu/librt.so.1
Reading symbols from /usr/lib/x86_64-linux-gnu/libstdc++.so.6...(no debugging symbols found)...done.
Loaded symbols for /usr/lib/x86_64-linux-gnu/libstdc++.so.6
Reading symbols from /lib/x86_64-linux-gnu/libm.so.6...Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/libm-2.19.so...done.
done.
Loaded symbols for /lib/x86_64-linux-gnu/libm.so.6
Reading symbols from /lib/x86_64-linux-gnu/libgcc_s.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib/x86_64-linux-gnu/libgcc_s.so.1
Reading symbols from /lib/x86_64-linux-gnu/libc.so.6...Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/libc-2.19.so...done.
done.
Loaded symbols for /lib/x86_64-linux-gnu/libc.so.6
Reading symbols from /lib64/ld-linux-x86-64.so.2...Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/ld-2.19.so...done.
done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
185 ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S: No such file or directory.

@guateandrew
Copy link
Author

Here is the backtrace:
#0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1 0x00000000005aa95d in rocksdb::port::CondVar::Wait (this=this@entry=0x7fd022c10560) at port/port_posix.cc:83
#2 0x0000000000520e78 in rocksdb::DBImpl::DelayWrite (this=this@entry=0x7fd022c10400, expiration_time=expiration_time@entry=0) at db/db_impl.cc:4171
#3 0x000000000052b736 in rocksdb::DBImpl::Write (this=0x7fd022c10400, options=..., my_batch=) at db/db_impl.cc:4039
#4 0x0000000000521114 in rocksdb::DB::Put (this=this@entry=0x7fd022c10400, opt=..., column_family=0x7fd022e837a0, key=..., value=...) at db/db_impl.cc:4599
#5 0x0000000000521161 in rocksdb::DBImpl::Put (this=this@entry=0x7fd022c10400, o=..., column_family=, key=..., val=...) at db/db_impl.cc:3931
#6 0x000000000053b35a in rocksdb::DB::Put (this=0x7fd022c10400, options=..., key=..., value=...) at ./include/rocksdb/db.h:171
#7 0x000000000051be8a in ardb::RocksDBEngine::Put (this=, key=..., value=..., options=...) at engine/rocksdb_engine.cpp:340
#8 0x0000000000424883 in ardb::Ardb::SetRaw (this=this@entry=0x7fffffe77560, ctx=..., key=..., value=...) at ardb.cpp:485
#9 0x0000000000424a09 in ardb::Ardb::SetKeyValue (this=this@entry=0x7fffffe77560, ctx=..., value=...) at ardb.cpp:519
#10 0x00000000004db582 in ardb::Ardb::HashMultiSet (this=this@entry=0x7fffffe77560, ctx=..., meta=..., fs=...) at command/t_hash.cpp:55
#11 0x00000000004dcbe7 in ardb::Ardb::HashSet (this=this@entry=0x7fffffe77560, ctx=..., meta=..., field=..., value=...) at command/t_hash.cpp:120
#12 0x00000000004ddb18 in ardb::Ardb::HSet (this=0x7fffffe77560, ctx=..., cmd=...) at command/t_hash.cpp:230
#13 0x0000000000425221 in ardb::Ardb::DoCall (this=this@entry=0x7fffffe77560, ctx=..., setting=..., args=...) at ardb.cpp:795
#14 0x0000000000427946 in ardb::Ardb::Call (this=0x7fffffe77560, ctx=..., args=..., flags=0) at ardb.cpp:946
#15 0x00000000004548b5 in ardb::RedisRequestHandler::MessageReceived (this=0x7fcfed7b49c0, ctx=..., e=...) at network.cpp:65
#16 0x00000000004245d3 in HandleStreamEvent (e=..., ctx=..., this=) at ./common/channel/channel_upstream_handler.hpp:133
#17 ardb::ChannelPipeline::SendUpstreamardb::codec::RedisCommandFrame (ctx=, e=..., this=) at ./common/channel/all_includes.hpp:89
#18 0x000000000043eafd in SendUpstreamardb::MessageEvent<ardb::codec::RedisCommandFrame > (this=, this=, e=...) at ./common/channel/all_includes.hpp:164
#19 fire_message_receivedardb::codec::RedisCommandFrame (destructor=0x0, message=0x7fffffe76eb0, ctx=...) at ./common/channel/channel_helper.hpp:91
#20 ardb::codec::StackFrameDecoderardb::codec::RedisCommandFrame::CallDecode (this=0x7fd022ffdb20, context=..., channel=0x7fd022ebc880, cumulation=...)
at ./common/channel/codec/stack_frame_decoder.hpp:102
#21 0x000000000043f02f in ardb::codec::StackFrameDecoderardb::codec::RedisCommandFrame::MessageReceived (this=0x7fd022ffdb20, ctx=..., e=...) at ./common/channel/codec/stack_frame_decoder.hpp:157
#22 0x0000000000482bd3 in HandleStreamEvent (e=..., ctx=..., this=) at ./common/channel/channel_upstream_handler.hpp:133
#23 ardb::ChannelPipeline::SendUpstreamardb::Buffer (ctx=, e=..., this=) at ./common/channel/all_includes.hpp:89
#24 0x0000000000483ec0 in SendUpstreamardb::MessageEvent<ardb::Buffer > (event=..., this=0x7fd022ebc8b8) at ./common/channel/all_includes.hpp:128
#25 fire_message_receivedardb::Buffer (destructor=0x0, message=0x7fd022ebc8e8, channel=0x7fd022ebc880) at ./common/channel/channel_helper.hpp:83
#26 ardb::Channel::OnRead (this=0x7fd022ebc880) at common/channel/channel.cpp:523
#27 0x00000000004829fd in ardb::Channel::IOEventCallback (eventLoop=, fd=, clientData=0x7fd022ebc880, mask=1) at common/channel/channel.cpp:53
#28 0x000000000049fea0 in aeProcessEvents (eventLoop=eventLoop@entry=0x7fd022c29ec8, flags=flags@entry=3) at common/channel/redis/ae.c:429
#29 0x00000000004a018b in aeMain (eventLoop=0x7fd022c29ec8) at common/channel/redis/ae.c:485
#30 0x0000000000425fb8 in ardb::Ardb::Start (this=this@entry=0x7fffffe77560) at ardb.cpp:453
#31 0x000000000040961f in main (argc=, argv=) at main.cpp:119

@guateandrew
Copy link
Author

There was no open file limit error in this case so this bug is unrelated to the write issue #142 (which means you can probably close #142

@guateandrew
Copy link
Author

Restarted and an hour later a similar hang. Backtrace:

#0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1 0x00000000005aa95d in rocksdb::port::CondVar::Wait (this=this@entry=0x7f2db9810560) at port/port_posix.cc:83
#2 0x0000000000520e78 in rocksdb::DBImpl::DelayWrite (this=this@entry=0x7f2db9810400, expiration_time=expiration_time@entry=0) at db/db_impl.cc:4171
#3 0x000000000052b736 in rocksdb::DBImpl::Write (this=0x7f2db9810400, options=..., my_batch=) at db/db_impl.cc:4039
#4 0x0000000000521114 in rocksdb::DB::Put (this=this@entry=0x7f2db9810400, opt=..., column_family=0x7f2db9a837a0, key=..., value=...) at db/db_impl.cc:4599
#5 0x0000000000521161 in rocksdb::DBImpl::Put (this=this@entry=0x7f2db9810400, o=..., column_family=, key=..., val=...) at db/db_impl.cc:3931
#6 0x000000000053b35a in rocksdb::DB::Put (this=0x7f2db9810400, options=..., key=..., value=...) at ./include/rocksdb/db.h:171
#7 0x000000000051be8a in ardb::RocksDBEngine::Put (this=, key=..., value=..., options=...) at engine/rocksdb_engine.cpp:340
#8 0x0000000000424883 in ardb::Ardb::SetRaw (this=this@entry=0x7fffe9d15dc0, ctx=..., key=..., value=...) at ardb.cpp:485
#9 0x0000000000424a09 in ardb::Ardb::SetKeyValue (this=this@entry=0x7fffe9d15dc0, ctx=..., value=...) at ardb.cpp:519
#10 0x00000000004db582 in ardb::Ardb::HashMultiSet (this=this@entry=0x7fffe9d15dc0, ctx=..., meta=..., fs=...) at command/t_hash.cpp:55
#11 0x00000000004dcbe7 in ardb::Ardb::HashSet (this=this@entry=0x7fffe9d15dc0, ctx=..., meta=..., field=..., value=...) at command/t_hash.cpp:120
#12 0x00000000004ddb18 in ardb::Ardb::HSet (this=0x7fffe9d15dc0, ctx=..., cmd=...) at command/t_hash.cpp:230
#13 0x0000000000425221 in ardb::Ardb::DoCall (this=this@entry=0x7fffe9d15dc0, ctx=..., setting=..., args=...) at ardb.cpp:795
#14 0x0000000000427946 in ardb::Ardb::Call (this=0x7fffe9d15dc0, ctx=..., args=..., flags=0) at ardb.cpp:946
#15 0x00000000004548b5 in ardb::RedisRequestHandler::MessageReceived (this=0x7f2d73d95240, ctx=..., e=...) at network.cpp:65
#16 0x00000000004245d3 in HandleStreamEvent (e=..., ctx=..., this=) at ./common/channel/channel_upstream_handler.hpp:133
#17 ardb::ChannelPipeline::SendUpstreamardb::codec::RedisCommandFrame (ctx=, e=..., this=) at ./common/channel/all_includes.hpp:89
#18 0x000000000043eafd in SendUpstreamardb::MessageEvent<ardb::codec::RedisCommandFrame > (this=, this=, e=...) at ./common/channel/all_includes.hpp:164
#19 fire_message_receivedardb::codec::RedisCommandFrame (destructor=0x0, message=0x7fffe9d15710, ctx=...) at ./common/channel/channel_helper.hpp:91
#20 ardb::codec::StackFrameDecoderardb::codec::RedisCommandFrame::CallDecode (this=0x7f2db9b0bdc0, context=..., channel=0x7f2d98df3a40, cumulation=...)
at ./common/channel/codec/stack_frame_decoder.hpp:102
#21 0x000000000043f02f in ardb::codec::StackFrameDecoderardb::codec::RedisCommandFrame::MessageReceived (this=0x7f2db9b0bdc0, ctx=..., e=...) at ./common/channel/codec/stack_frame_decoder.hpp:157
#22 0x0000000000482bd3 in HandleStreamEvent (e=..., ctx=..., this=) at ./common/channel/channel_upstream_handler.hpp:133
#23 ardb::ChannelPipeline::SendUpstreamardb::Buffer (ctx=, e=..., this=) at ./common/channel/all_includes.hpp:89
#24 0x0000000000483ec0 in SendUpstreamardb::MessageEvent<ardb::Buffer > (event=..., this=0x7f2d98df3a78) at ./common/channel/all_includes.hpp:128
#25 fire_message_receivedardb::Buffer (destructor=0x0, message=0x7f2d98df3aa8, channel=0x7f2d98df3a40) at ./common/channel/channel_helper.hpp:83
#26 ardb::Channel::OnRead (this=0x7f2d98df3a40) at common/channel/channel.cpp:523
#27 0x00000000004829fd in ardb::Channel::IOEventCallback (eventLoop=, fd=, clientData=0x7f2d98df3a40, mask=1) at common/channel/channel.cpp:53
#28 0x000000000049fea0 in aeProcessEvents (eventLoop=eventLoop@entry=0x7f2db9b74d38, flags=flags@entry=3) at common/channel/redis/ae.c:429
#29 0x00000000004a018b in aeMain (eventLoop=0x7f2db9b74d38) at common/channel/redis/ae.c:485
#30 0x0000000000425fb8 in ardb::Ardb::Start (this=this@entry=0x7fffe9d15dc0) at ardb.cpp:453
#31 0x000000000040961f in main (argc=, argv=) at main.cpp:119

@yinqiwen
Copy link
Owner

yinqiwen commented Dec 2, 2014

It's blocked in rocksdb, the backtrace seems that rocksdb serv too many writes at the time.
Is the system have high iowait?

@guateandrew
Copy link
Author

I checked top and -w is less than 2%. Then on iostat I checked %util for each drive and it was at or below 7% for several tests.

@yinqiwen
Copy link
Owner

yinqiwen commented Dec 2, 2014

I checked rocksdb's code, the thread with write operation would be blocked until background compaction thread complete.
Is the compact-enable configured yes in ardb.conf? Disable it would let ardb not trigger compact task background automatically. (rocksdb would trigger the task too)

@yinqiwen
Copy link
Owner

yinqiwen commented Dec 2, 2014

And is the disk HDD? Rocksdb perform worse than leveldb on HDD disk, much better on SSD according our tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants