Support to reload configuration file without killing process #6

yashh · 2012-07-09T20:50:25Z

Having a signal like HUP or something to reload the configuration file will be really helpful since we do swap instances sometimes.

manjuraj · 2012-07-10T01:35:33Z

This is a good idea.

The start up time of twemproxy is really small, especially when preconnect is not set. So, if your clients had retries built into it, then you can trivially propagate configuration changes by doing a rolling restarts of twemprox'ies. This is a good enough solution, imo

bmatheny · 2012-07-27T14:36:41Z

Would you be open to a patch for this? The problem we have is that if the server is busy enough the ports for twem may become used in the short restart window, causing twem to not start. In that case we have to shutdown apache, restart twem, start apache.

manjuraj · 2012-07-30T16:30:26Z

Would you be open to a patch for this?
I am always open to accepting patches :)

I just think that reloading of configuration file on-the-fly is tricky to get right.

The problem we have is that if the server is busy enough the ports for twem may become used in the short restart window, causing twem to not start. In that case we have to shutdown apache, restart twem, start apache.

I am curious as how this is happening. I set SO_REUSEADDR address option on the twemcache's listening socket, which means that we would reuse ports even if they are busy (See: proxy_reuse() function in nc_proxy.c)

Could you paste me the log file dump with the error that you seeing?

bmatheny · 2012-07-30T17:25:43Z

SO_REUSEADDR will only reuse a TIME_WAIT socket, not an ESTABLISHED socket. If twem is running on a busy web server (as a local proxy), and the port gets used, reuseaddr is no help.

I agree the on the fly reload is tricky, I'll put some thought into it.

manjuraj · 2012-08-02T14:04:25Z

SO_REUSEADDR will only reuse a TIME_WAIT socket, not an ESTABLISHED socket. If twem is running on a busy web server (as a local proxy), and the port gets used, reuseaddr is no help.

Is there a test case that would reproduce this scenario? I am unable to reproduce this scenario, even if I bombard twemproxy with a constant stream of traffic and restart it.

manjuraj · 2012-08-02T14:12:18Z

blake, I wonder if this issue would go away if we set l_linger = 0 by calling nc_set_linger(sd, 0) on the listening sockets (see notes/socket.txt for details on it)?

grzegorzlyczba · 2013-08-04T16:12:27Z

@manjuraj I think that playing with SO_LINGER it's not right way to do this.
The best way is reload configuration on the fly is to modify internal configuration structures. Other solution is to handle requests by child process - when configuration changed master process can create new child process(es) with new configuration and kill old one like for example nginx do.
Interesting thing in Linux since 3.9 is SO_REUSEPORT - https://lwn.net/Articles/542629/

Roguelazer · 2013-10-31T23:43:54Z

As far as I'm concerned, this issue makes using twemproxy extremely impractical in production. The number of redis connections that are dropped in a few seconds of restart time is pretty crazy. Clients shouldn't have to have custom retry logic (never mind the number of backed up connections over a couple second restart time)

Would it be easier to do something like HAproxy (where a new process started with the -sf <pid> flag opens a unix domain socket with <pid> and transfers over the listening socket)? I could probably hack together a quick patch that does that...

eklitzke · 2013-12-03T05:06:03Z

This has become a fairly significant issue for us. We run a redis cluster with several hosts, and each host is running many instances of redis, each on a different port. The idea is that when a new service/thing needs to use redis, a new redis gets launched on a new port on each of the hosts in the cluster, and we use twemproxy to shard across the hosts in the pool. For instance, redis for thing1 is on port 12001, redis for thing2 is on port 12002, redis for thing3 is on port 12003. This provides different isolated redis pools, so that they can have different memory limits, for instance.

The problem is that our twemproxy config has dozens of these backend pools and is rapidly growing. Adding a new stanza (for a new service/redis port, one that doesn't affect any of the existing sections in the twemproxy config) requires a full restart of twemproxy due to the limitation described in this issue. Generally restarting twemproxy is very fast, but in some cases we've observed it to randomly be very slow. Having the client applications gracefully handle twemproxy being down is complicated by the fact that it's a heterogeneous environment with a number of different languages/libraries used to access redis/twemproxy.

The solution we've come up with in the interim is to have clients connect to an HAProxy instance on the same host as twemproxy, and the HAProxy then forwards the TCP connections to one of two twemproxies. When a configuration change is made there's a shell script that chooses one twemproxy, drains it of requests, update its config, puts it back in the HAProxy backend, and then repeats for the other twemproxy. However, this is incredibly byzantine and adds extra latency due to the extra HAProxy process.

I just want to encourage you to think about this use case (and provide a possible solution to others frustrated by this issue) and hopefully provide some additional impetus to work on this problem.

manjuraj · 2013-12-30T22:01:48Z

@eklitzke I feel your pain. Happy to accept pull request if anyone wants to take a stab at this. The approach suggested by @Roguelazer sounds promising, though I haven't researched it.

When designing twemproxy, I did take into consideration dynamic configuration reload scenario. In fact all state in twemproxy hangs off the context object (https://github.com/twitter/twemproxy/blob/master/src/nc_core.h#L114). This was done to make configuration updates seamless. Updating configuration would involve replacing the existing context object with the new one. This replacement can also happen safely before/after the wait in the event loop (https://github.com/twitter/twemproxy/blob/master/src/nc_core.c#L316)

Anyway, at the high level, any solution to this problem would have to take the following scenarios into consideration:

A brand new server pool is added to the configuration
Existing server pool is updated. The update could either involve adding a server to the existing server pool or removing a server from the existing server pool
An existing server pool is removed

From an implementation perspective (1) and (3) are easy to achieve, (2) might be hairy.

today we got a core of twemproxy:: $ gdb -c core.14420 ./bin/nutcracker (gdb) bt #0 0x000000302af2e2ed in raise () from /lib64/tls/libc.so.6 #1 0x000000302af2fa3e in abort () from /lib64/tls/libc.so.6 twitter#2 0x0000000000419c82 in nc_assert (cond=0x444dc0 "!TAILQ_EMPTY(&send_msgq) && nsend != 0", file=0x444aa8 "nc_message.c", line=745, panic=1) at nc_util.c:308 twitter#3 0x000000000040d0d6 in msg_send_chain (ctx=0x553090, conn=0x55b380, msg=0x0) at nc_message.c:745 twitter#4 0x000000000040d568 in msg_send (ctx=0x553090, conn=0x55b380) at nc_message.c:820 twitter#5 0x00000000004059af in core_send (ctx=0x553090, conn=0x55b380) at nc_core.c:173 twitter#6 0x0000000000405ffe in core_core (arg=0x55b380, events=65280) at nc_core.c:301 twitter#7 0x0000000000429297 in event_wait (evb=0x5652e0, timeout=389) at nc_epoll.c:269 twitter#8 0x000000000040606f in core_loop (ctx=0x553090) at nc_core.c:316 twitter#9 0x000000000041b109 in nc_run (nci=0x7fbfffea80) at nc.c:530 twitter#10 0x000000000041b20d in main (argc=14, argv=0x7fbfffecc8) at nc.c:579 (gdb) f 3 twitter#3 0x000000000040d0d6 in msg_send_chain (ctx=0x553090, conn=0x55b380, msg=0x0) at nc_message.c:745 745 ASSERT(!TAILQ_EMPTY(&send_msgq) && nsend != 0); (gdb) l 740 if (msg == NULL) { 741 break; 742 } 743 } 744 745 ASSERT(!TAILQ_EMPTY(&send_msgq) && nsend != 0); 746 747 conn->smsg = NULL; 748 749 n = conn_sendv(conn, &sendv, nsend); it is caused by this ``ASSERT`` at nc_message.c:745, ``conn_send`` send no more than ``NC_IOV_MAX(128)`` pieces in ``msg_send_chain``, if the first fragment of MULTI-DEL response is send on last batch. and this is the last msg in send queue, the next call of ``msg_send_chain`` will got ``nsend == 0``:: following case show such a case: 1. mget on ``126`` keys 2. a mutli-del cmd :: def test_multi_delete_20140525(): conn = redis.Redis('127.0.0.5', 4100) cnt = 126 keys = ['key-%s'%i for i in range(cnt)] pipe = conn.pipeline(transaction=False) pipe.mget(keys) pipe.delete(*keys) print pipe.execute() see: https://github.com/idning/test-twemproxy/blob/master/test_redis/test_del.py#L56-L63 more detail: http://idning.github.io/twemproxy-core-20140523.html

zerobfd · 2014-08-27T18:00:12Z

I'm curious as to whether something simple like this would work:

Catch SIGHUP, make event
Reparse config file, construct new context
For server pool in contexts:
- If new pool, add connections
- If old pool is gone, mark connections for deletion
- If current pool changed
  1. Compare servers in pool
  2. Add necessary connections
  3. Mark unnecessary connections
- Swap out contexts
- Close unneeded connections

Zero testing done yet, but I think that this would allow the reload to fail at any point up until actually swapping out the contexts while being able to keep running with the current config. The only possible issue I can think up is that this process takes so long that something times out (ex. if someone has a very large ketama pool?), in which case the nginx-style parent/child model may work better.

vlm · 2015-02-06T08:54:47Z

I created a hot config reload patch which does not drop connections. Will release in form of a pull request by the end of this week.

bobrik · 2015-02-09T11:09:48Z

@vlm looking forward to it

mckelvin · 2015-02-09T11:13:26Z

@vlm looking forward to it +1

vlm · 2015-02-12T05:29:16Z

@bobrik @mckelvin check out #321

elukey · 2016-12-23T09:01:44Z

Any news about the pull request? I can see tons of feedback on it but no update on this thread..

Nutcracker is the daemon that runs on every mw host and let them connect to all the redis/memcached shards (object/session caches). It doesn't support graceful config reload for the moment (twitter/twemproxy#6), so every config file change triggers puppet restart actions. This behavior might not be ideal for all the use cases, so it is safer to perform the action manually (maybe via salt) in a controlled way. Bug: T155755 Change-Id: I224f689cdb9ed4c7f7c53905c8394bdecded7a6a

manjuraj mentioned this issue Dec 30, 2013

Add memcached servers at runtime #168

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support to reload configuration file without killing process #6

Support to reload configuration file without killing process #6

yashh commented Jul 9, 2012

manjuraj commented Jul 10, 2012

bmatheny commented Jul 27, 2012

manjuraj commented Jul 30, 2012

bmatheny commented Jul 30, 2012

manjuraj commented Aug 2, 2012

manjuraj commented Aug 2, 2012

grzegorzlyczba commented Aug 4, 2013

Roguelazer commented Oct 31, 2013

eklitzke commented Dec 3, 2013

manjuraj commented Dec 30, 2013

zerobfd commented Aug 27, 2014

vlm commented Feb 6, 2015

bobrik commented Feb 9, 2015

mckelvin commented Feb 9, 2015

vlm commented Feb 12, 2015

elukey commented Dec 23, 2016

Support to reload configuration file without killing process #6

Support to reload configuration file without killing process #6

Comments

yashh commented Jul 9, 2012

manjuraj commented Jul 10, 2012

bmatheny commented Jul 27, 2012

manjuraj commented Jul 30, 2012

bmatheny commented Jul 30, 2012

manjuraj commented Aug 2, 2012

manjuraj commented Aug 2, 2012

grzegorzlyczba commented Aug 4, 2013

Roguelazer commented Oct 31, 2013

eklitzke commented Dec 3, 2013

manjuraj commented Dec 30, 2013

zerobfd commented Aug 27, 2014

vlm commented Feb 6, 2015

bobrik commented Feb 9, 2015

mckelvin commented Feb 9, 2015

vlm commented Feb 12, 2015

elukey commented Dec 23, 2016