Snowball effect with reconnecting to poor performing node #1252

Open
Spikhalskiy opened this Issue Apr 6, 2016 · 2 comments

Projects

None yet

2 participants

@Spikhalskiy
Contributor

We have a problem with JedisPool that aggravates perf issues of redis nodes when this issues only start to appear.

What we have:

  • Short timeouts. Like 3ms.
  • Significantly loaded redis that sometimes starts to respond slowly with number of connections like 40000 per node.

If redis is starting to stuck for 4ms, instead of each read, we do

  • read
  • after timeout and marking Jedis as broken, JedisFactory gently sends quit to Redis in destroyObject
  • we establish new connection
  • PING-PONG

and only after that we have new Jedis instance for new read, but... actually nothing changed, we could just continue to use old instance.

So, when our Redis Cluster start to experience some perf issues - we finish it off by invalidating Jedis.

Any thoughts?
Only one from me - maybe we could add an ability to pass some type of "InvalidationStrategy" to Jedis? For example, strategy by default will mark as broken and do everything like now and 3rd party can implement it's own strategy, for example, send PING-PONG before quit. "read with timeout - PING-PONG, give it a chance - read" looks better than current mandatory invalidation flow.

I could implement and provide PR for any solution solving or providing possibility to improve current standard flow.

What do you think?

@Spikhalskiy Spikhalskiy added a commit to Spikhalskiy/jedis that referenced this issue Apr 6, 2016
@Spikhalskiy Spikhalskiy Issue #1252: Fix creating lot of new Jedis instances on unstable clus…
…ter, fix slots clearing without filling
4b7747c
@Spikhalskiy Spikhalskiy added a commit to Spikhalskiy/jedis that referenced this issue Apr 6, 2016
@Spikhalskiy Spikhalskiy Issue #1252: Fix creating lot of new Jedis instances on unstable clus…
…ter, fix slots clearing without filling
8500b9a
@Spikhalskiy Spikhalskiy added a commit to Spikhalskiy/jedis that referenced this issue Apr 6, 2016
@Spikhalskiy Spikhalskiy Issue #1252: Fix creating lot of new Jedis instances on unstable clus…
…ter, fix slots clearing without filling
fb84869
@Spikhalskiy Spikhalskiy added a commit to Spikhalskiy/jedis that referenced this issue Apr 6, 2016
@Spikhalskiy Spikhalskiy Issue #1252: Fix creating lot of new Jedis instances on unstable clus…
…ter, fix slots clearing without filling
3798c1e
@Spikhalskiy Spikhalskiy added a commit to Spikhalskiy/jedis that referenced this issue Apr 6, 2016
@Spikhalskiy Spikhalskiy Issue #1252: Acquire one long lock for trying all nodes when rediscov…
…er cluster
9dfa685
@Spikhalskiy Spikhalskiy added a commit to Spikhalskiy/jedis that referenced this issue Apr 6, 2016
@Spikhalskiy Spikhalskiy Issue #1252: Acquire one long lock for trying all nodes when rediscov…
…er cluster
c0b94b2
@Spikhalskiy Spikhalskiy added a commit to Spikhalskiy/jedis that referenced this issue Apr 6, 2016
@Spikhalskiy Spikhalskiy Issue #1252: Acquire one long lock for trying all nodes when rediscov…
…er cluster
d9c7d52
@Spikhalskiy Spikhalskiy added a commit to Spikhalskiy/jedis that referenced this issue Apr 6, 2016
@Spikhalskiy Spikhalskiy Issue #1252: Acquire one long lock for trying all nodes when rediscov…
…er cluster
b24cb3a
@Spikhalskiy Spikhalskiy added a commit to Spikhalskiy/jedis that referenced this issue Apr 6, 2016
@Spikhalskiy Spikhalskiy Issue #1252: Acquire one long lock for trying all nodes when rediscov…
…er cluster
f934dee
@Spikhalskiy Spikhalskiy added a commit to Spikhalskiy/jedis that referenced this issue Apr 6, 2016
@Spikhalskiy Spikhalskiy Issue #1252: Fix creating lot of new Jedis instances on unstable clus…
…ter, fix slots clearing without filling
a5961d0
@Spikhalskiy Spikhalskiy added a commit to Spikhalskiy/jedis that referenced this issue Apr 6, 2016
@Spikhalskiy Spikhalskiy Issue #1252: Acquire one long lock for trying all nodes when rediscov…
…er cluster
0635acf
@Spikhalskiy Spikhalskiy added a commit to Spikhalskiy/jedis that referenced this issue Apr 6, 2016
@Spikhalskiy Spikhalskiy Issue #1252: Acquire one long lock for trying all nodes when rediscov…
…er cluster
ec87a52
@Spikhalskiy Spikhalskiy added a commit to Spikhalskiy/jedis that referenced this issue Apr 13, 2016
@Spikhalskiy Spikhalskiy Issue #1252:
1. New special exception for “No reachable nodes”
2. Fixed situation when simple connection exception with tryRandomNode=true proceed as just no reachable nodes
3. Don’t try random node with immediate initiate cluster renewal after that if we got ConnectionTimeout from proper slot jedis
4. Fix rewriting real root cause JedisConnectionException with JedisClusterMaxRedirectionsException
84545ea
@Spikhalskiy Spikhalskiy added a commit to Spikhalskiy/jedis that referenced this issue Apr 13, 2016
@Spikhalskiy Spikhalskiy Issue #1252:
1. New special exception for “No reachable nodes”
2. Fixed situation when simple connection exception with tryRandomNode=true proceed as just no reachable nodes
3. Don’t try random node with immediate initiate cluster renewal after that if we got ConnectionTimeout from proper slot jedis
4. Fix rewriting real root cause JedisConnectionException with JedisClusterMaxRedirectionsException
ef5758c
@Spikhalskiy Spikhalskiy added a commit to Spikhalskiy/jedis that referenced this issue Apr 13, 2016
@Spikhalskiy Spikhalskiy Issue #1252:
1. New special exception for “No reachable nodes”
2. Fixed situation when simple connection exception with tryRandomNode=true proceed as just no reachable nodes
3. Don’t try random node with immediate initiate cluster renewal after that if we got ConnectionTimeout from proper slot jedis
4. Fix rewriting real root cause JedisConnectionException with JedisClusterMaxRedirectionsException
c567161
@Spikhalskiy Spikhalskiy added a commit to Spikhalskiy/jedis that referenced this issue Apr 13, 2016
@Spikhalskiy Spikhalskiy Issue #1252:
1. New special exception for “No reachable nodes”
2. Fixed situation when simple connection exception with tryRandomNode=true proceed as just no reachable nodes
3. Don’t try random node with immediate initiate cluster renewal after that if we got ConnectionTimeout from proper slot jedis
4. Fix rewriting real root cause JedisConnectionException with JedisClusterMaxRedirectionsException
(cherry picked from commit c567161)
453c247
@Spikhalskiy Spikhalskiy added a commit to Spikhalskiy/jedis that referenced this issue Apr 13, 2016
@Spikhalskiy Spikhalskiy Issue #1252:
1. New special exception for “No reachable nodes”
2. Fixed situation when simple connection exception with tryRandomNode=true proceed as just no reachable nodes
3. Don’t try random node with immediate initiate cluster renewal after that if we got ConnectionTimeout from proper slot jedis
4. Fix rewriting real root cause JedisConnectionException with JedisClusterMaxRedirectionsException
(cherry picked from commit c567161)
36783ed
@Spikhalskiy
Contributor
Spikhalskiy commented Apr 25, 2016 edited

Final version which is working in our prod and does it fine: https://github.com/Spikhalskiy/jedis/releases/tag/PP-2

It's master with merged related pull requests.
Issue could be closed after merging PRs to upstream.

@marcosnils
Collaborator
marcosnils commented Apr 25, 2016 edited

#1249 #1251 #1253 #1256

@Spikhalskiy amazing contribution. I've had a rough weeks lately. As soon as I have some time I promise to look at those changes.

@marcosnils marcosnils added a commit that referenced this issue Jul 11, 2016
@Spikhalskiy @marcosnils Spikhalskiy + marcosnils Issue #1252: Fix creating a lot of new Jedis instances on unstable cl…
…uster, fix slots clearing without filling (#1253)

* Issue #1252: Fix creating lot of new Jedis instances on unstable cluster, fix slots clearing without filling

* Issue #1252: Acquire one long lock for trying all nodes when rediscover cluster
69d4080
@sheinbergon sheinbergon pushed a commit to sheinbergon/jedis that referenced this issue Jul 14, 2016
@Spikhalskiy @idans-ybrantdigital-com Spikhalskiy + idans-ybrantdigital-com Issue #1252: Fix creating a lot of new Jedis instances on unstable cl…
…uster, fix slots clearing without filling (#1253)

* Issue #1252: Fix creating lot of new Jedis instances on unstable cluster, fix slots clearing without filling

* Issue #1252: Acquire one long lock for trying all nodes when rediscover cluster
f9e49cd
@Spikhalskiy Spikhalskiy added a commit to Spikhalskiy/jedis that referenced this issue Jul 18, 2016
@Spikhalskiy Spikhalskiy Issue #1252:
1. New special exception for “No reachable nodes”
2. Fixed situation when simple connection exception with tryRandomNode=true proceed as just no reachable nodes
3. Don’t try random node with immediate initiate cluster renewal after that if we got ConnectionTimeout from proper slot jedis
4. Fix rewriting real root cause JedisConnectionException with JedisClusterMaxRedirectionsException
271c32c
@marcosnils marcosnils added a commit that referenced this issue Jul 19, 2016
@Spikhalskiy @marcosnils Spikhalskiy + marcosnils Issue #1252: Fix creating a lot of new Jedis instances on unstable cl…
…uster, fix slots clearing without filling (#1253)

* Issue #1252: Fix creating lot of new Jedis instances on unstable cluster, fix slots clearing without filling

* Issue #1252: Acquire one long lock for trying all nodes when rediscover cluster

Conflicts:
	src/main/java/redis/clients/jedis/JedisClusterInfoCache.java
848fca2
@marcosnils marcosnils added a commit that referenced this issue Jul 19, 2016
@Spikhalskiy @marcosnils Spikhalskiy + marcosnils Issue #1252: Fix creating a lot of new Jedis instances on unstable cl…
…uster, fix slots clearing without filling (#1253)

* Issue #1252: Fix creating lot of new Jedis instances on unstable cluster, fix slots clearing without filling

* Issue #1252: Acquire one long lock for trying all nodes when rediscover cluster

Conflicts:
	src/main/java/redis/clients/jedis/JedisClusterInfoCache.java
51f99c3
@marcosnils marcosnils added a commit that referenced this issue Jul 19, 2016
@Spikhalskiy @marcosnils Spikhalskiy + marcosnils Issue #1252: Random node + rediscovery on connection exception replac…
…ed with rediscovery at the end (#1256)

* Issue #1252:
1. New special exception for “No reachable nodes”
2. Fixed situation when simple connection exception with tryRandomNode=true proceed as just no reachable nodes
3. Don’t try random node with immediate initiate cluster renewal after that if we got ConnectionTimeout from proper slot jedis
4. Fix rewriting real root cause JedisConnectionException with JedisClusterMaxRedirectionsException
252dd87
@marcosnils marcosnils added a commit that referenced this issue Jul 19, 2016
@Spikhalskiy @marcosnils Spikhalskiy + marcosnils Issue #1252: Random node + rediscovery on connection exception replac…
…ed with rediscovery at the end (#1256)

* Issue #1252:
1. New special exception for “No reachable nodes”
2. Fixed situation when simple connection exception with tryRandomNode=true proceed as just no reachable nodes
3. Don’t try random node with immediate initiate cluster renewal after that if we got ConnectionTimeout from proper slot jedis
4. Fix rewriting real root cause JedisConnectionException with JedisClusterMaxRedirectionsException

Conflicts:
	src/main/java/redis/clients/jedis/BinaryJedisCluster.java
	src/main/java/redis/clients/jedis/JedisCluster.java
244efbd
@marcosnils marcosnils added a commit that referenced this issue Jul 19, 2016
@Spikhalskiy @marcosnils Spikhalskiy + marcosnils Issue #1252: Random node + rediscovery on connection exception replac…
…ed with rediscovery at the end (#1256)

* Issue #1252:
1. New special exception for “No reachable nodes”
2. Fixed situation when simple connection exception with tryRandomNode=true proceed as just no reachable nodes
3. Don’t try random node with immediate initiate cluster renewal after that if we got ConnectionTimeout from proper slot jedis
4. Fix rewriting real root cause JedisConnectionException with JedisClusterMaxRedirectionsException

Conflicts:
	src/main/java/redis/clients/jedis/BinaryJedisCluster.java
	src/main/java/redis/clients/jedis/JedisCluster.java
73505a1
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment