-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support to reload configuration file without killing process #6
Comments
This is a good idea. The start up time of twemproxy is really small, especially when preconnect is not set. So, if your clients had retries built into it, then you can trivially propagate configuration changes by doing a rolling restarts of twemprox'ies. This is a good enough solution, imo |
Would you be open to a patch for this? The problem we have is that if the server is busy enough the ports for twem may become used in the short restart window, causing twem to not start. In that case we have to shutdown apache, restart twem, start apache. |
I just think that reloading of configuration file on-the-fly is tricky to get right.
I am curious as how this is happening. I set SO_REUSEADDR address option on the twemcache's listening socket, which means that we would reuse ports even if they are busy (See: proxy_reuse() function in nc_proxy.c) Could you paste me the log file dump with the error that you seeing? |
SO_REUSEADDR will only reuse a TIME_WAIT socket, not an ESTABLISHED socket. If twem is running on a busy web server (as a local proxy), and the port gets used, reuseaddr is no help. I agree the on the fly reload is tricky, I'll put some thought into it. |
Is there a test case that would reproduce this scenario? I am unable to reproduce this scenario, even if I bombard twemproxy with a constant stream of traffic and restart it. |
blake, I wonder if this issue would go away if we set l_linger = 0 by calling nc_set_linger(sd, 0) on the listening sockets (see notes/socket.txt for details on it)? |
@manjuraj I think that playing with SO_LINGER it's not right way to do this. |
As far as I'm concerned, this issue makes using twemproxy extremely impractical in production. The number of redis connections that are dropped in a few seconds of restart time is pretty crazy. Clients shouldn't have to have custom retry logic (never mind the number of backed up connections over a couple second restart time) Would it be easier to do something like HAproxy (where a new process started with the |
This has become a fairly significant issue for us. We run a redis cluster with several hosts, and each host is running many instances of redis, each on a different port. The idea is that when a new service/thing needs to use redis, a new redis gets launched on a new port on each of the hosts in the cluster, and we use twemproxy to shard across the hosts in the pool. For instance, redis for thing1 is on port 12001, redis for thing2 is on port 12002, redis for thing3 is on port 12003. This provides different isolated redis pools, so that they can have different memory limits, for instance. The problem is that our twemproxy config has dozens of these backend pools and is rapidly growing. Adding a new stanza (for a new service/redis port, one that doesn't affect any of the existing sections in the twemproxy config) requires a full restart of twemproxy due to the limitation described in this issue. Generally restarting twemproxy is very fast, but in some cases we've observed it to randomly be very slow. Having the client applications gracefully handle twemproxy being down is complicated by the fact that it's a heterogeneous environment with a number of different languages/libraries used to access redis/twemproxy. The solution we've come up with in the interim is to have clients connect to an HAProxy instance on the same host as twemproxy, and the HAProxy then forwards the TCP connections to one of two twemproxies. When a configuration change is made there's a shell script that chooses one twemproxy, drains it of requests, update its config, puts it back in the HAProxy backend, and then repeats for the other twemproxy. However, this is incredibly byzantine and adds extra latency due to the extra HAProxy process. I just want to encourage you to think about this use case (and provide a possible solution to others frustrated by this issue) and hopefully provide some additional impetus to work on this problem. |
@eklitzke I feel your pain. Happy to accept pull request if anyone wants to take a stab at this. The approach suggested by @Roguelazer sounds promising, though I haven't researched it. When designing twemproxy, I did take into consideration dynamic configuration reload scenario. In fact all state in twemproxy hangs off the context object (https://github.com/twitter/twemproxy/blob/master/src/nc_core.h#L114). This was done to make configuration updates seamless. Updating configuration would involve replacing the existing context object with the new one. This replacement can also happen safely before/after the wait in the event loop (https://github.com/twitter/twemproxy/blob/master/src/nc_core.c#L316) Anyway, at the high level, any solution to this problem would have to take the following scenarios into consideration:
From an implementation perspective (1) and (3) are easy to achieve, (2) might be hairy. |
today we got a core of twemproxy:: $ gdb -c core.14420 ./bin/nutcracker (gdb) bt #0 0x000000302af2e2ed in raise () from /lib64/tls/libc.so.6 #1 0x000000302af2fa3e in abort () from /lib64/tls/libc.so.6 twitter#2 0x0000000000419c82 in nc_assert (cond=0x444dc0 "!TAILQ_EMPTY(&send_msgq) && nsend != 0", file=0x444aa8 "nc_message.c", line=745, panic=1) at nc_util.c:308 twitter#3 0x000000000040d0d6 in msg_send_chain (ctx=0x553090, conn=0x55b380, msg=0x0) at nc_message.c:745 twitter#4 0x000000000040d568 in msg_send (ctx=0x553090, conn=0x55b380) at nc_message.c:820 twitter#5 0x00000000004059af in core_send (ctx=0x553090, conn=0x55b380) at nc_core.c:173 twitter#6 0x0000000000405ffe in core_core (arg=0x55b380, events=65280) at nc_core.c:301 twitter#7 0x0000000000429297 in event_wait (evb=0x5652e0, timeout=389) at nc_epoll.c:269 twitter#8 0x000000000040606f in core_loop (ctx=0x553090) at nc_core.c:316 twitter#9 0x000000000041b109 in nc_run (nci=0x7fbfffea80) at nc.c:530 twitter#10 0x000000000041b20d in main (argc=14, argv=0x7fbfffecc8) at nc.c:579 (gdb) f 3 twitter#3 0x000000000040d0d6 in msg_send_chain (ctx=0x553090, conn=0x55b380, msg=0x0) at nc_message.c:745 745 ASSERT(!TAILQ_EMPTY(&send_msgq) && nsend != 0); (gdb) l 740 if (msg == NULL) { 741 break; 742 } 743 } 744 745 ASSERT(!TAILQ_EMPTY(&send_msgq) && nsend != 0); 746 747 conn->smsg = NULL; 748 749 n = conn_sendv(conn, &sendv, nsend); it is caused by this ``ASSERT`` at nc_message.c:745, ``conn_send`` send no more than ``NC_IOV_MAX(128)`` pieces in ``msg_send_chain``, if the first fragment of MULTI-DEL response is send on last batch. and this is the last msg in send queue, the next call of ``msg_send_chain`` will got ``nsend == 0``:: following case show such a case: 1. mget on ``126`` keys 2. a mutli-del cmd :: def test_multi_delete_20140525(): conn = redis.Redis('127.0.0.5', 4100) cnt = 126 keys = ['key-%s'%i for i in range(cnt)] pipe = conn.pipeline(transaction=False) pipe.mget(keys) pipe.delete(*keys) print pipe.execute() see: https://github.com/idning/test-twemproxy/blob/master/test_redis/test_del.py#L56-L63 more detail: http://idning.github.io/twemproxy-core-20140523.html
I'm curious as to whether something simple like this would work:
Zero testing done yet, but I think that this would allow the reload to fail at any point up until actually swapping out the contexts while being able to keep running with the current config. The only possible issue I can think up is that this process takes so long that something times out (ex. if someone has a very large ketama pool?), in which case the nginx-style parent/child model may work better. |
I created a hot config reload patch which does not drop connections. Will release in form of a pull request by the end of this week. |
@vlm looking forward to it |
@vlm looking forward to it +1 |
Any news about the pull request? I can see tons of feedback on it but no update on this thread.. |
Nutcracker is the daemon that runs on every mw host and let them connect to all the redis/memcached shards (object/session caches). It doesn't support graceful config reload for the moment (twitter/twemproxy#6), so every config file change triggers puppet restart actions. This behavior might not be ideal for all the use cases, so it is safer to perform the action manually (maybe via salt) in a controlled way. Bug: T155755 Change-Id: I224f689cdb9ed4c7f7c53905c8394bdecded7a6a
Having a signal like HUP or something to reload the configuration file will be really helpful since we do swap instances sometimes.
The text was updated successfully, but these errors were encountered: