Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

twemproxy work with redis-sentinel #324

Open
wants to merge 36 commits into
base: master
Choose a base branch
from

Conversation

andyqzb
Copy link
Contributor

@andyqzb andyqzb commented Mar 2, 2015

Hi, @manjuraj @idning , I made a patch for twemproxy to let it work with redis-sentinel.
This patch is implemented as the design mentioned in issue 297.

In addition, I add keepalive to the sentinel pub/sub connection. Because sentinel pub/sub connection don't have heartbeat, we must add keepalive to check if the connection is dead. The implemention of nc_set_keepalive is copied from redis.

You can configure a server_pool like sigma in conf/nutcracker.yml to use this feature.

I hope to hear your feedback about this patch.

servers:
- 127.0.0.1:6379:1 server1
- 127.0.0.1:6380:1 server2
sentinels:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: don't you need three sentinels for quorum in this example?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mishan You can configure any number of sentinels no matter how many sentinels you use. Becase twemproxy just need connect to one of them to fetch the redis address.

Of cause, configuring all the sentinels you use is better. Because if some of the sentinels are dead, twemproxy can connect to the alive sentinels.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid being confused, I added one sentinel address to the example configuration.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is actually a good reason to connect to multiple Sentinels. If the Sentinel you are connected to is on the minority side of a network partition, and the original master is in that partition, it could report it's "old" master. By querying all of the Sentinels you'd detect it and route to the majority master.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@therealbill In the partition, if twemproxy can only connect a part of the sentinels, and some told you new master, some told you old master. Twemproxy don't know whick is right if the both two sides don't have instances over half of all. Another, the design of sentinel doesn't work well in network partition, so I think take consideration of network partition doesn't make much sense.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On Mar 24, 2015, at 10:48 PM, andy notifications@github.com wrote:

In conf/nutcracker.yml #324 (comment):

@@ -65,3 +65,18 @@ omega:
servers:
- 127.0.0.1:11214:100000
- 127.0.0.1:11215:1
+
+sigma:

  • listen: 127.0.0.1:22125
  • hash: fnv1a_64
  • distribution: ketama
  • auto_eject_hosts: false
  • redis: true
  • server_retry_timeout: 2000
  • server_failure_limit: 1
  • servers:
    • 127.0.0.1:6379:1 server1
    • 127.0.0.1:6380:1 server2
  • sentinels:
    @therealbill https://github.com/TheRealBill In the partition, if twemproxy can only connect a part of the sentinels, and some told you new master, some told you old master. While twemproxy don't know whick is right if the both two sides don't have instances over half of all. Another, the design of sentinel doesn't work well in network partition, so I think take consideration of network partition is not needed.

I think it is quite the contrary. If Twemproxy can’t verify the current master it should refuse to proxy commands for the server. If you’ve only got one sentinel you can communicate with, it would be wise to assume the current master is no longer the master, but minimally cautious to simply refuse the connections for that slot. That is preferable to having data go to multiple places. One of the big benefits to using Twemproxy+sentinel is the ability, if done correctly, to make split-brain scenario something the clients are immune to. I’ve done it outside of Twemproxy so it is certainly doable - and not terribly complicated.

Conceptually the best place to run those sentinels will be the Twemproxy nodes themselves. However that isn’t alway possible. As such your network between Sentinels and Sentinels+Redis can partition but twemproxy could still see the whole picture. But regardless by having the proxy not accept commands (or minimally not accept write commands) when you aren’t getting results which are confirmable you remove split-brain by nipping it at the very point it can happen - you stop proxying that node/slot.

The design of sentinel works fine in network partitions, it is the client which needs to be informed of changes, and Twemproxy is functionally the client in this scenario. Ideally, Twemproxy would subscribe to each sentinel’s master-change event channel.[1] Then, on a master-change event it would update that slot’s master setting and proxy to the new master. By only checking one sentinel you can’t reliably handle the case - and you don’t get the information as fast when you do get it. But by checking each of the sentinel servers you will either know you can’t communicate and can fail to deny-to-safe-mode, or catch the event from the majority’s side and know you need to reconfigure even though the other sentinel(s) hasn’t caught the change. Incidentally there is a lag between the failover and the non-leader Sentinels so even outside of a network split the proper route is to catch it from the leader.

Ideally you want these actions to be idempotent as well. In other words if you get a change notification multiple times, or the current pull, and the "new-master" is the same as last-updated-master, don’t make changes. I suspect you’ve already done that but sometime people miss that so I mention it solely out of that possibility. But always communicate with more than just one sentinel.

Perhaps to do this we could either:

  1. specify a quorum of sentinels to communicate with
  2. pull the list of known sentinels from sentinel and use that discovery to learn of and communicate with all of the other sentinels.

On the idea of not listing servers, perhaps we could do something like:

Where server name is use by twemproxy for slot identification, but podname is the name you pass to sentinel to get the master & slaves (remember, a sentinel can manage multiple Redis master/slave combos). Then you can do all discovery via sentinel.

On a related note I’m not seeing which code maps a given pod (“server” in Twemproxy parlance) to a given set of Sentinels. Each pod (server) may be managed by different sentinel pools. I’ve also not found where we allow you to use a pod-name other than the server name in the servers section. We should cover that as well. Perhaps something along the lines of (assuming we do not do full discovery via sentinel):
sentinel: true
servers:

Or (better, IMO):
sentinel: true
servers:


sentiinels:
IP:PORT:constellation1
IP:PORT:constellation1
IP:PORT:constellation1
IP:PORT:constellation2
IP:PORT:constellation2
IP:PORT:constellation2

Where constellation = “the group of sentinels managing a given pod”. This latter option could also work well for full-discovery in sentinel. I also think it looks the most clean and readable. We might consider quoting pod-name. IIRC you can pretty much use any ascii character other than a space in a name in sentinels.

Just throwing out some ideas to hopefully express better what I find as missing or ways we can do it even better. To say I use sentinel heavily might be a bit of an understatement.

I see we currently have IP:port:weight for sentinels, but I hope we aren’t using weight at all - ideally we should not have it at all to avoid confusion.

It’d been a long day and I’ll have more time of the next week or two to really dig into this. I apologize for not getting to it sooner, and am really glad to see someone taking the first steps in making what I maintain to be a significant improvement to Twemproxy! So thanks for getting this ball rolling. :D

Cheers,
Bill

  1. OK, Ideally I’d love to be able to fire twemproxy up unconfigured, then configure it entirely via API calls the same way you can w/Redis and Sentinel.

P.S. I’ll probably work on or find someone to work on adding Red Skull (http://redskull.io) connectivity as an alternative to Sentinel connectivity to simplify things - but not until I’ve added non-JSON API interfaces to Red Skull.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@therealbill Thanks for your suggestions.
For the first suggestion, twemproxy should connect to all the sentinels. I think we shouldn't make twemproxy too complicated. It's just a client which supports sentinels. The problems in network partition should be solved by sentinel. For example, if the minority of sentinels don't server the clients, twemproxy will connect to a sentinel sunccessfully until it find a sentinel in majority part. Doing this in sentinel is simpler than doing it in twemproxy.
The podname suggestion is a good idea. I have thought to make mastername in sentinel as a mixed type like poolname-servername. So we can manage all the servers in different pool in the same sentinel. But I think the idea of specifying each server a different sentinel pool is not needed. I don’t think there are people want to config one pool's servers into different sentinel pools.
The weight in sentinel config is not used. I just want to reuse the code of loading server config in sentinel config load. Modifying the code to drop weight in sentinel load is better.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On Mar 25, 2015, at 12:14 PM, andy notifications@github.com wrote:

In conf/nutcracker.yml #324 (comment):

@@ -65,3 +65,18 @@ omega:
servers:
- 127.0.0.1:11214:100000
- 127.0.0.1:11215:1
+
+sigma:

  • listen: 127.0.0.1:22125
  • hash: fnv1a_64
  • distribution: ketama
  • auto_eject_hosts: false
  • redis: true
  • server_retry_timeout: 2000
  • server_failure_limit: 1
  • servers:
    • 127.0.0.1:6379:1 server1
    • 127.0.0.1:6380:1 server2
  • sentinels:
    @therealbill https://github.com/TheRealBill Thanks for your suggestions.
    For the first suggestion, twemproxy should connect to all the sentinels. I think we shouldn't make twemproxy too complicated. It's just a client which supports sentinels.

Not quite - it is a client of sentinel. Thus it should be following the guidelines at http://redis.io/topics/sentinel-clients

The problems in network partition should be solved by sentinel. For example, if the minority of sentinels don't server the clients, twemproxy will connect to a sentinel sunccessfully until it find a sentinel in majority part.

How does it determine if the sentinel it connects to is in a minority? In order for it to do that it would have to compare master settings across the sentinel constellation. You can’t just ask sentinel if it is in a minority network partition. You also can’t query the “known sentinels” info for the pod because that isn’t updated except on new sentinel discovery and pod resets.

If a net split does occur and the original master is in the minority partition, twemproxy will still be connected to it. When sentinel initiates a fail-over normally, it sends a client kill to the old master (if available) to DC existing connections. However, in this scenario that won’t DC twemproxy. Thus Twemproxy needs to be checking/monitoring for failovers and updating/reconnecting as appropriate. Anything short of that means to have reliable redundancy and avoid or minimize the split-brain scenario you have to use the current method of rewriting the config and restarting twemproxy.

It just occurred to me you might be trying to implement this as “just” a TCP proxy/load balancer to Sentinels. I surely hope that isn’t the case as you can’t do it reliably. The way the Sentinel and client setup works clients need direct access to every sentinel for the reasons I listed above regarding why Twemproxy would need to do the things I’m talking about. Clients need to connect to the sentinel and do more than just get the current master. They need PUBSUB and the ability to talk to each sentinel directly to ensure they aren’t talking to a minority sentinel which contained the original master. Please tell me you’re not trying to make Twemproxy a load balancer for Sentinel. :) It would be either as complicated or, more likely, more complicated than what I am talking about above. If you’re using twemproxy you don’t want to talk directly to the backend redis instances - that defeats the main purpose of twemproxy.

Doing this in sentinel is simpler than doing it in twemproxy.

Except that Sentinel doesn’t control the clients directly. The mechanism for Sentinel helping clients is by either 1) having each client talk to and monitor every sentinel, or 2) having a script on the sentinels which reaches out to reconfigure the clients, or 3) by disconnecting clients connected after a reconfiguration of the pod (such as failovers).

The podname suggestion is a good idea. I have thought to make mastername in sentinel as a mixed type like poolname-servername. So we can manage all the servers in different pool in the same sentinel. But the idea of specifying each server a different sentinel pool is not needed. I don’t think there are people want to config one pool's servers into different sentinel pools.

While it isn’t the pattern I generally recommend, especially in the case of twemproxy supporting sentinel, it is actually very common for people to have each pod run it’s own sentinels, and I consistently run into objections to doing it any other way. Furthermore in environments where a tool such as Red Skull is used to provide a cluster of managed sentinel constellations it would be the case there as well. Sentinel does start running into issues when you have high amounts of pods being monitored. So breaking that out across a banks of sentinel instances is also an expected use case. Not supporting it backs a potential user into a corner where you can increase the time it takes to failover. Given that Twemproxy is by design setup to proxy to essentially a bank of Redis instances there is a higher than normal likelihood of multiple sentinels being used.

You can learn some of these issues as redis/redis#2257 and redis/redis#2045

The weight in sentinel config is not used. I just want to reuse the code of loading server config in sentinel config load. Modifying the code to drop weight in sentinel load is better.

Ok, sounds reasonable. You should update the docs for that section to be quite clear that weight isn’t used for sentinels. A few lines in the docs can be worth many questions as to why Twemproxy isn’t respecting the weights assigned. ;)

Cheers,
Bill

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@therealbill I have seen the guidelines of sentinel clients. It said client just need to connect one of the sentinels just like what the patch does. Just like what the guidelines said below, It won't work well in the network partition.

Note: it is possible that a stale master returns online at the same time a client contacts a stale Sentinel instance, so the client may connect with a stale master, and yet the ROLE output will match. However when the master is back again Sentinel will try to demote it to slave, triggering a new disconnection. The same reasoning applies to connecting to stale slaves that will get reconfigured to replicate with a different master.

I think twemproy shouldn't do too much things about distributed system. Sentinel is responsible for the cluster. So if you want the cluster works well under the network partition, the sentinels of minority should refuse to server the clients just like zookeeper or chubby.
I have seen the redis issue 2257. I think it's a suggestion to reduce the connections of sentinels. Sentinel is ok when it has 2000+ connections(under the issue condition). The epoll can process tens of thousands of connections easily. Maybe It's a problem when the cluster has thousands of redis.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On Mar 30, 2015, at 12:00, andy notifications@github.com wrote:

In conf/nutcracker.yml:

@@ -65,3 +65,18 @@ omega:
servers:
- 127.0.0.1:11214:100000
- 127.0.0.1:11215:1
+
+sigma:

  • listen: 127.0.0.1:22125
  • hash: fnv1a_64
  • distribution: ketama
  • auto_eject_hosts: false
  • redis: true
  • server_retry_timeout: 2000
  • server_failure_limit: 1
  • servers:
    • 127.0.0.1:6379:1 server1
    • 127.0.0.1:6380:1 server2
  • sentinels:
    @therealbill I have seen the guidelines of sentinel clients. It said client just need to connect one of the sentinels just like what the patch does. Just like what the guidelines said below, It won't work well in the network partition.

But the client of twemproxy can't use sentinel directly because twemproxy is "in the way". Since that was written we have developed operational knowledge and wisdom that tells us how to do it correctly, and it really isn't that hard. At a bare minimum you connect to each sentinel and pick the master reported by a majority of sentinels. If you can't reach said quorum, don't proxy. You can do it in a relatively simple script in HAProxy, so to argue it is overly complicated to do in Twemproxy doesn't hold IMO. It really is the least you should do.
Note: it is possible that a stale master returns online at the same time a client contacts a stale Sentinel instance, so the client may connect with a stale master, and yet the ROLE output will match. However when the master is back again Sentinel will try to demote it to slave, triggering a new disconnection. The same reasoning applies to connecting to stale slaves that will get reconfigured to replicate with a different master.

I think twemproy shouldn't do too much things about distributed system.

Then don't use it with sentinel. By placing Twemproxy in front and having sentinel hidden from the clients you're preventing the clients from doing what you describe. You take on the responsibility of protecting the clients when you prevent them from protecting themselves.

Doing the simple thing is not always the right thing. If you're going to make it impossible for the client to protect itself, you must take the precautions on it's behalf. Twemproxy partitions data, and with the addition of sentinel support it thus becomes a core part of a distributed system. At that point the assertion it shouldn't do the right thing with regard to proper behavior in distributed systems is rather moot.

Sentinel is responsible for the cluster. So if you want the cluster works well under the network partition, the sentinels of minority should refuse to server the clients just like zookeeper or chubby.

Again, how exact do you think the clients can learn the sentinel they are talking to is in a minority partition? If you're going to propose they can, you have to show how. It also would be how sentinel would do it. There is one way: ask all sentinels you can connect to, and you can't get a majority of known sentinels, stop proxying the connections.

You can wax on all day about what sentinel could or should do, but you're dealing with Twemproxy, not sentinel. I'm not sure you understand what sentinel is, as it doesn't serve clients it is a lookup service.

When a network partition happens, sentinel has no way to know if a new master has been elected, and as such takes no action. It doesn't know if it is alone and all sentinels are alone (thus no master change happens) or it is in the only minority. And even then, it doesn't know who the new master is. Thus it can't know it is "in the minority" partition. Therefore it can't tell the clients to stop talking to the server as they may still be talking to the right server.

But Twemproxy can figure it out. Any client can, until you put Twemproxy in front of it and prevent it from doing so.

I have seen the redis issue 2257. I think it's a suggestion to reduce the connections of sentinels. Sentinel is ok when it has 2000+ connections(under the issue condition). The epoll can process tens of thousands of connections easily.

The point of this issue, Andy, is showing that the condition you think is rare is actually common and in a non-trivial amount of the time required. This is just one of many conditions I've seen. I've seen thousands of these setups, multiple sets of sentinels across a bank of pods is not a rare condition. Nor is it going to become one anytime soon.

Cheers,
Bill

@mishan
Copy link

mishan commented Mar 5, 2015

This is looking really good. I'm going to see about testing this under load on Tagged and simulating some failovers to give it a real world test at scale.

@andyqzb
Copy link
Contributor Author

andyqzb commented Mar 5, 2015

@mishan Great! I have tested the patch with some case, and use redis-benchmark to test it under load. But it's essential to test it under real world work load.

@mishan
Copy link

mishan commented Mar 19, 2015

@andyqzb Sorry for lack of updates from me, should get the chance to test this within a couple of weeks; made this a higher priority.

@therealbill
Copy link

Question: If we're using sentinel. Why have Redis server entries in the config when we should be getting that from Sentinel?

@andyqzb
Copy link
Contributor Author

andyqzb commented Mar 24, 2015

@therealbill It's a good idea, we can drop the servers config if we use the sentinel. While, we should think about some detail problems which I said in issue #297 . So, I don't realize it in the patch to keep it simple at the beginning. But it's a candidate feature in the future.

@mishan
Copy link

mishan commented Mar 25, 2015

@andyqzb I'm having some issue running this with redis-sentinel 2.9.11

[2015-03-25 16:04:36.736] nc.c:192 run, rabbit run / dig that hole, forget the sun / and when at last the work is done / don't sit down / it's time to dig another one
[2015-03-25 16:04:36.739] nc_sentinel.c:186 unknown server name:7
[2015-03-25 16:04:36.739] nc_sentinel.c:457 sentinel's response error, close sentinel conn.
[2015-03-25 16:04:36.739] nc_mbuf.c:300 mbuf read string failed, split char :
[2015-03-25 16:04:36.739] nc_sentinel.c:457 sentinel's response error, close sentinel conn.
[2015-03-25 16:04:36.739] nc_mbuf.c:300 mbuf read string failed, split char :
[2015-03-25 16:04:36.739] nc_sentinel.c:457 sentinel's response error, close sentinel conn.

Here's the config I'm using: http://pastebin.com/txtRVRj6

Any ideas?

@andyqzb
Copy link
Contributor Author

andyqzb commented Mar 26, 2015

@mishan Maybe the output of sentinel has changed. I have tested the patch with sentinel 2.6 and 2.8. I will check this problem later.

@andyqzb
Copy link
Contributor Author

andyqzb commented Mar 26, 2015

@mishan Maybe you config a server named 7, while you don't config it in the twemproxy. The twemproxy will pull all the server addresses in the sentinel and assume that all the server addresses got are configured in the twemproxy. So twemproy think there are errors if it gets some servers which are not configured in the twemproxy.

@axot
Copy link

axot commented Feb 5, 2016

+1 a better solution instead using agent to update config file. When this will be merged into master?

@andyqzb
Copy link
Contributor Author

andyqzb commented Feb 9, 2016

@axot @TrumanDu @mishan Thans for your attentions. @idning told me he is busy on his work these days, so the patch has to wait more time to be merged. @manjuraj , do you have time to review this patch?

@jacan
Copy link

jacan commented Mar 10, 2016

Any news, about this PR being merged?

@kri5
Copy link

kri5 commented Mar 10, 2016

👍

@mishan
Copy link

mishan commented Mar 11, 2016

There's been no response from anyone at Twitter it seems. I forked it on if(we)'s behalf and merged @andyqzb 's PR into our fork last year and we've been using it for a while now. I think I need to rebase it. I bumped the version to "0.5.0" https://github.com/ifwe/twemproxy if anyone's interested in giving it a try despite this PR not being merged.

@stiller-leser
Copy link

Any updates on this? @mishan Are you still actively using @andyqzb patch in production?

@mishan
Copy link

mishan commented May 18, 2016

@stiller-leser We are still actively using it on production. One tier has been using it for the better part of a year now without issue. I'm now rolling it out to the rest of all our production tiers.

I also updated our unofficial fork to have @andyqzb 's rebase and most recent commits and bumped the version to 0.5.1

@NuSkooler
Copy link

FWIW we are using @mishan's fork in development, and very production soon (there are currently no known issues). This is greatly simplified over the old method of using additional agents to update configurations.

@stiller-leser
Copy link

@mishan @NuSkooler Thanks for your feedback. Ultimately we decided to go with HAproxy to support mysql as well. Will keep an eye on Twemproxy though.

@ghost
Copy link

ghost commented Feb 9, 2017

Why this feature was never merged ?! :(

}

/* get sentinel master num at line 3 */
msg_read_line(msg, line_buf, 3);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will/does have issues if redis-sentinel adds new fields to the response, changes the order of the fields, or sends more lines than this expects. See #527 (wrong repo)

After upgrading from redis 2.x to 3.2.6 , I noticed this. I'll try to find a solution

cd path/to/src/of/redis/; git diff 2.8.9 3.2.8 -- src/sentinel.c

@@ -2729,11 +3242,13 @@ void sentinelInfoCommand(redisClient *c) {
             "sentinel_masters:%lu\r\n"
             "sentinel_tilt:%d\r\n"
             "sentinel_running_scripts:%d\r\n"
-            "sentinel_scripts_queue_length:%ld\r\n",
+            "sentinel_scripts_queue_length:%ld\r\n"
+            "sentinel_simulate_failure_flags:%lu\r\n",
             dictSize(sentinel.masters),
             sentinel.tilt,
             sentinel.running_scripts,
-            listLength(sentinel.scripts_queue));
+            listLength(sentinel.scripts_queue),
+            sentinel.simfailure_flags);

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/ifwe/twemproxy/commits/build-nutredis contains a fix for this and other issues - feel free to use ifwe@9820caa

I've tested this with redis 3.2, and to some extent 4.0 (but not for long periods of time)

@FractalizeR
Copy link

Any news here on this PR? This looks promising because other solutions like this: https://github.com/yak0/twemsentinel look abandoned.

@CLAassistant
Copy link

CLAassistant commented Jul 18, 2019

CLA assistant check
All committers have signed the CLA.

src/nc_conf.c Outdated Show resolved Hide resolved
@andyqzb andyqzb requested a review from TysonAndre May 6, 2022 08:57
src/nc_sentinel.c Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet