Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support to reload configuration file without killing process #6

Open
yashh opened this issue Jul 9, 2012 · 16 comments
Open

Support to reload configuration file without killing process #6

yashh opened this issue Jul 9, 2012 · 16 comments
Milestone

Comments

@yashh
Copy link

yashh commented Jul 9, 2012

Having a signal like HUP or something to reload the configuration file will be really helpful since we do swap instances sometimes.

@manjuraj
Copy link
Collaborator

This is a good idea.

The start up time of twemproxy is really small, especially when preconnect is not set. So, if your clients had retries built into it, then you can trivially propagate configuration changes by doing a rolling restarts of twemprox'ies. This is a good enough solution, imo

@bmatheny
Copy link

Would you be open to a patch for this? The problem we have is that if the server is busy enough the ports for twem may become used in the short restart window, causing twem to not start. In that case we have to shutdown apache, restart twem, start apache.

@manjuraj
Copy link
Collaborator

Would you be open to a patch for this?
I am always open to accepting patches :)

I just think that reloading of configuration file on-the-fly is tricky to get right.

The problem we have is that if the server is busy enough the ports for twem may become used in the short restart window, causing twem to not start. In that case we have to shutdown apache, restart twem, start apache.

I am curious as how this is happening. I set SO_REUSEADDR address option on the twemcache's listening socket, which means that we would reuse ports even if they are busy (See: proxy_reuse() function in nc_proxy.c)

Could you paste me the log file dump with the error that you seeing?

@bmatheny
Copy link

SO_REUSEADDR will only reuse a TIME_WAIT socket, not an ESTABLISHED socket. If twem is running on a busy web server (as a local proxy), and the port gets used, reuseaddr is no help.

I agree the on the fly reload is tricky, I'll put some thought into it.

@manjuraj
Copy link
Collaborator

manjuraj commented Aug 2, 2012

SO_REUSEADDR will only reuse a TIME_WAIT socket, not an ESTABLISHED socket. If twem is running on a busy web server (as a local proxy), and the port gets used, reuseaddr is no help.

Is there a test case that would reproduce this scenario? I am unable to reproduce this scenario, even if I bombard twemproxy with a constant stream of traffic and restart it.

@manjuraj
Copy link
Collaborator

manjuraj commented Aug 2, 2012

blake, I wonder if this issue would go away if we set l_linger = 0 by calling nc_set_linger(sd, 0) on the listening sockets (see notes/socket.txt for details on it)?

@grzegorzlyczba
Copy link

@manjuraj I think that playing with SO_LINGER it's not right way to do this.
The best way is reload configuration on the fly is to modify internal configuration structures. Other solution is to handle requests by child process - when configuration changed master process can create new child process(es) with new configuration and kill old one like for example nginx do.
Interesting thing in Linux since 3.9 is SO_REUSEPORT - https://lwn.net/Articles/542629/

@Roguelazer
Copy link

As far as I'm concerned, this issue makes using twemproxy extremely impractical in production. The number of redis connections that are dropped in a few seconds of restart time is pretty crazy. Clients shouldn't have to have custom retry logic (never mind the number of backed up connections over a couple second restart time)

Would it be easier to do something like HAproxy (where a new process started with the -sf <pid> flag opens a unix domain socket with <pid> and transfers over the listening socket)? I could probably hack together a quick patch that does that...

@eklitzke
Copy link

eklitzke commented Dec 3, 2013

This has become a fairly significant issue for us. We run a redis cluster with several hosts, and each host is running many instances of redis, each on a different port. The idea is that when a new service/thing needs to use redis, a new redis gets launched on a new port on each of the hosts in the cluster, and we use twemproxy to shard across the hosts in the pool. For instance, redis for thing1 is on port 12001, redis for thing2 is on port 12002, redis for thing3 is on port 12003. This provides different isolated redis pools, so that they can have different memory limits, for instance.

The problem is that our twemproxy config has dozens of these backend pools and is rapidly growing. Adding a new stanza (for a new service/redis port, one that doesn't affect any of the existing sections in the twemproxy config) requires a full restart of twemproxy due to the limitation described in this issue. Generally restarting twemproxy is very fast, but in some cases we've observed it to randomly be very slow. Having the client applications gracefully handle twemproxy being down is complicated by the fact that it's a heterogeneous environment with a number of different languages/libraries used to access redis/twemproxy.

The solution we've come up with in the interim is to have clients connect to an HAProxy instance on the same host as twemproxy, and the HAProxy then forwards the TCP connections to one of two twemproxies. When a configuration change is made there's a shell script that chooses one twemproxy, drains it of requests, update its config, puts it back in the HAProxy backend, and then repeats for the other twemproxy. However, this is incredibly byzantine and adds extra latency due to the extra HAProxy process.

I just want to encourage you to think about this use case (and provide a possible solution to others frustrated by this issue) and hopefully provide some additional impetus to work on this problem.

@manjuraj
Copy link
Collaborator

@eklitzke I feel your pain. Happy to accept pull request if anyone wants to take a stab at this. The approach suggested by @Roguelazer sounds promising, though I haven't researched it.

When designing twemproxy, I did take into consideration dynamic configuration reload scenario. In fact all state in twemproxy hangs off the context object (https://github.com/twitter/twemproxy/blob/master/src/nc_core.h#L114). This was done to make configuration updates seamless. Updating configuration would involve replacing the existing context object with the new one. This replacement can also happen safely before/after the wait in the event loop (https://github.com/twitter/twemproxy/blob/master/src/nc_core.c#L316)

Anyway, at the high level, any solution to this problem would have to take the following scenarios into consideration:

  1. A brand new server pool is added to the configuration
  2. Existing server pool is updated. The update could either involve adding a server to the existing server pool or removing a server from the existing server pool
  3. An existing server pool is removed

From an implementation perspective (1) and (3) are easy to achieve, (2) might be hairy.

idning added a commit to idning/twemproxy that referenced this issue May 25, 2014
today we got a core of twemproxy::

    $ gdb -c core.14420 ./bin/nutcracker

    (gdb) bt
    #0  0x000000302af2e2ed in raise () from /lib64/tls/libc.so.6
    #1  0x000000302af2fa3e in abort () from /lib64/tls/libc.so.6
    twitter#2  0x0000000000419c82 in nc_assert (cond=0x444dc0 "!TAILQ_EMPTY(&send_msgq) && nsend != 0", file=0x444aa8 "nc_message.c", line=745, panic=1) at nc_util.c:308
    twitter#3  0x000000000040d0d6 in msg_send_chain (ctx=0x553090, conn=0x55b380, msg=0x0) at nc_message.c:745
    twitter#4  0x000000000040d568 in msg_send (ctx=0x553090, conn=0x55b380) at nc_message.c:820
    twitter#5  0x00000000004059af in core_send (ctx=0x553090, conn=0x55b380) at nc_core.c:173
    twitter#6  0x0000000000405ffe in core_core (arg=0x55b380, events=65280) at nc_core.c:301
    twitter#7  0x0000000000429297 in event_wait (evb=0x5652e0, timeout=389) at nc_epoll.c:269
    twitter#8  0x000000000040606f in core_loop (ctx=0x553090) at nc_core.c:316
    twitter#9  0x000000000041b109 in nc_run (nci=0x7fbfffea80) at nc.c:530
    twitter#10 0x000000000041b20d in main (argc=14, argv=0x7fbfffecc8) at nc.c:579
    (gdb) f 3
    twitter#3  0x000000000040d0d6 in msg_send_chain (ctx=0x553090, conn=0x55b380, msg=0x0) at nc_message.c:745
    745         ASSERT(!TAILQ_EMPTY(&send_msgq) && nsend != 0);
    (gdb) l
    740             if (msg == NULL) {
    741                 break;
    742             }
    743         }
    744
    745         ASSERT(!TAILQ_EMPTY(&send_msgq) && nsend != 0);
    746
    747         conn->smsg = NULL;
    748
    749         n = conn_sendv(conn, &sendv, nsend);

it is caused by this ``ASSERT`` at nc_message.c:745,

``conn_send`` send no more than ``NC_IOV_MAX(128)`` pieces in ``msg_send_chain``,

if the first fragment of MULTI-DEL response is send on last batch. and this is the last msg in send queue, the next call of ``msg_send_chain`` will got ``nsend == 0``::

following case show such a case:
1. mget on ``126`` keys
2. a mutli-del cmd

::

    def test_multi_delete_20140525():
        conn = redis.Redis('127.0.0.5', 4100)
        cnt = 126
        keys = ['key-%s'%i for i in range(cnt)]
        pipe = conn.pipeline(transaction=False)
        pipe.mget(keys)
        pipe.delete(*keys)
        print pipe.execute()

see: https://github.com/idning/test-twemproxy/blob/master/test_redis/test_del.py#L56-L63

more detail: http://idning.github.io/twemproxy-core-20140523.html
@zerobfd
Copy link

zerobfd commented Aug 27, 2014

I'm curious as to whether something simple like this would work:

  1. Catch SIGHUP, make event
  2. Reparse config file, construct new context
  3. For server pool in contexts:
    • If new pool, add connections
    • If old pool is gone, mark connections for deletion
    • If current pool changed
      1. Compare servers in pool
      2. Add necessary connections
      3. Mark unnecessary connections
    • Swap out contexts
    • Close unneeded connections

Zero testing done yet, but I think that this would allow the reload to fail at any point up until actually swapping out the contexts while being able to keep running with the current config. The only possible issue I can think up is that this process takes so long that something times out (ex. if someone has a very large ketama pool?), in which case the nginx-style parent/child model may work better.

@vlm
Copy link

vlm commented Feb 6, 2015

I created a hot config reload patch which does not drop connections. Will release in form of a pull request by the end of this week.

@bobrik
Copy link

bobrik commented Feb 9, 2015

@vlm looking forward to it

@mckelvin
Copy link
Contributor

mckelvin commented Feb 9, 2015

@vlm looking forward to it +1

@vlm
Copy link

vlm commented Feb 12, 2015

@bobrik @mckelvin check out #321

@elukey
Copy link

elukey commented Dec 23, 2016

Any news about the pull request? I can see tons of feedback on it but no update on this thread..

wmfgerrit pushed a commit to wikimedia/operations-puppet that referenced this issue Feb 7, 2017
Nutcracker is the daemon that runs on every mw host and let them
connect to all the redis/memcached shards (object/session caches).
It doesn't support graceful config reload for the moment
(twitter/twemproxy#6), so every config
file change triggers puppet restart actions. This behavior might
not be ideal for all the use cases, so it is safer to perform
the action manually (maybe via salt) in a controlled way.

Bug: T155755
Change-Id: I224f689cdb9ed4c7f7c53905c8394bdecded7a6a
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests