New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Snowball effect with reconnecting to poor performing node #1252

Closed
Spikhalskiy opened this Issue Apr 6, 2016 · 3 comments

Comments

Projects
None yet
3 participants
@Spikhalskiy
Contributor

Spikhalskiy commented Apr 6, 2016

We have a problem with JedisPool that aggravates perf issues of redis nodes when this issues only start to appear.

What we have:

  • Short timeouts. Like 3ms.
  • Significantly loaded redis that sometimes starts to respond slowly with number of connections like 40000 per node.

If redis is starting to stuck for 4ms, instead of each read, we do

  • read
  • after timeout and marking Jedis as broken, JedisFactory gently sends quit to Redis in destroyObject
  • we establish new connection
  • PING-PONG

and only after that we have new Jedis instance for new read, but... actually nothing changed, we could just continue to use old instance.

So, when our Redis Cluster start to experience some perf issues - we finish it off by invalidating Jedis.

Any thoughts?
Only one from me - maybe we could add an ability to pass some type of "InvalidationStrategy" to Jedis? For example, strategy by default will mark as broken and do everything like now and 3rd party can implement it's own strategy, for example, send PING-PONG before quit. "read with timeout - PING-PONG, give it a chance - read" looks better than current mandatory invalidation flow.

I could implement and provide PR for any solution solving or providing possibility to improve current standard flow.

What do you think?

Spikhalskiy added a commit to Spikhalskiy/jedis that referenced this issue Apr 6, 2016

Issue xetorthio#1252: Fix creating lot of new Jedis instances on unst…
…able cluster, fix slots clearing without filling

Spikhalskiy added a commit to Spikhalskiy/jedis that referenced this issue Apr 6, 2016

Issue xetorthio#1252: Fix creating lot of new Jedis instances on unst…
…able cluster, fix slots clearing without filling

Spikhalskiy added a commit to Spikhalskiy/jedis that referenced this issue Apr 6, 2016

Issue xetorthio#1252: Fix creating lot of new Jedis instances on unst…
…able cluster, fix slots clearing without filling

Spikhalskiy added a commit to Spikhalskiy/jedis that referenced this issue Apr 6, 2016

Issue xetorthio#1252: Fix creating lot of new Jedis instances on unst…
…able cluster, fix slots clearing without filling

Spikhalskiy added a commit to Spikhalskiy/jedis that referenced this issue Apr 6, 2016

Spikhalskiy added a commit to Spikhalskiy/jedis that referenced this issue Apr 6, 2016

Spikhalskiy added a commit to Spikhalskiy/jedis that referenced this issue Apr 6, 2016

Spikhalskiy added a commit to Spikhalskiy/jedis that referenced this issue Apr 6, 2016

Spikhalskiy added a commit to Spikhalskiy/jedis that referenced this issue Apr 6, 2016

Spikhalskiy added a commit to Spikhalskiy/jedis that referenced this issue Apr 6, 2016

Issue xetorthio#1252: Fix creating lot of new Jedis instances on unst…
…able cluster, fix slots clearing without filling

Spikhalskiy added a commit to Spikhalskiy/jedis that referenced this issue Apr 6, 2016

Spikhalskiy added a commit to Spikhalskiy/jedis that referenced this issue Apr 6, 2016

Spikhalskiy added a commit to Spikhalskiy/jedis that referenced this issue Apr 13, 2016

Issue xetorthio#1252:
1. New special exception for “No reachable nodes”
2. Fixed situation when simple connection exception with tryRandomNode=true proceed as just no reachable nodes
3. Don’t try random node with immediate initiate cluster renewal after that if we got ConnectionTimeout from proper slot jedis
4. Fix rewriting real root cause JedisConnectionException with JedisClusterMaxRedirectionsException

Spikhalskiy added a commit to Spikhalskiy/jedis that referenced this issue Apr 13, 2016

Issue xetorthio#1252:
1. New special exception for “No reachable nodes”
2. Fixed situation when simple connection exception with tryRandomNode=true proceed as just no reachable nodes
3. Don’t try random node with immediate initiate cluster renewal after that if we got ConnectionTimeout from proper slot jedis
4. Fix rewriting real root cause JedisConnectionException with JedisClusterMaxRedirectionsException

Spikhalskiy added a commit to Spikhalskiy/jedis that referenced this issue Apr 13, 2016

Issue xetorthio#1252:
1. New special exception for “No reachable nodes”
2. Fixed situation when simple connection exception with tryRandomNode=true proceed as just no reachable nodes
3. Don’t try random node with immediate initiate cluster renewal after that if we got ConnectionTimeout from proper slot jedis
4. Fix rewriting real root cause JedisConnectionException with JedisClusterMaxRedirectionsException

Spikhalskiy added a commit to Spikhalskiy/jedis that referenced this issue Apr 13, 2016

Issue xetorthio#1252:
1. New special exception for “No reachable nodes”
2. Fixed situation when simple connection exception with tryRandomNode=true proceed as just no reachable nodes
3. Don’t try random node with immediate initiate cluster renewal after that if we got ConnectionTimeout from proper slot jedis
4. Fix rewriting real root cause JedisConnectionException with JedisClusterMaxRedirectionsException
(cherry picked from commit c567161)

Spikhalskiy added a commit to Spikhalskiy/jedis that referenced this issue Apr 13, 2016

Issue xetorthio#1252:
1. New special exception for “No reachable nodes”
2. Fixed situation when simple connection exception with tryRandomNode=true proceed as just no reachable nodes
3. Don’t try random node with immediate initiate cluster renewal after that if we got ConnectionTimeout from proper slot jedis
4. Fix rewriting real root cause JedisConnectionException with JedisClusterMaxRedirectionsException
(cherry picked from commit c567161)
@Spikhalskiy

This comment has been minimized.

Show comment
Hide comment
@Spikhalskiy

Spikhalskiy Apr 25, 2016

Contributor

Final version which is working in our prod and does it fine: https://github.com/Spikhalskiy/jedis/releases/tag/PP-2

It's master with merged related pull requests.
Issue could be closed after merging PRs to upstream.

Contributor

Spikhalskiy commented Apr 25, 2016

Final version which is working in our prod and does it fine: https://github.com/Spikhalskiy/jedis/releases/tag/PP-2

It's master with merged related pull requests.
Issue could be closed after merging PRs to upstream.

@marcosnils

This comment has been minimized.

Show comment
Hide comment
@marcosnils

marcosnils Apr 25, 2016

Collaborator

#1249 #1251 #1253 #1256

@Spikhalskiy amazing contribution. I've had a rough weeks lately. As soon as I have some time I promise to look at those changes.

Collaborator

marcosnils commented Apr 25, 2016

#1249 #1251 #1253 #1256

@Spikhalskiy amazing contribution. I've had a rough weeks lately. As soon as I have some time I promise to look at those changes.

marcosnils added a commit that referenced this issue Jul 11, 2016

Issue #1252: Fix creating a lot of new Jedis instances on unstable cl…
…uster, fix slots clearing without filling (#1253)

* Issue #1252: Fix creating lot of new Jedis instances on unstable cluster, fix slots clearing without filling

* Issue #1252: Acquire one long lock for trying all nodes when rediscover cluster

Spikhalskiy added a commit to Spikhalskiy/jedis that referenced this issue Jul 18, 2016

Issue xetorthio#1252:
1. New special exception for “No reachable nodes”
2. Fixed situation when simple connection exception with tryRandomNode=true proceed as just no reachable nodes
3. Don’t try random node with immediate initiate cluster renewal after that if we got ConnectionTimeout from proper slot jedis
4. Fix rewriting real root cause JedisConnectionException with JedisClusterMaxRedirectionsException

marcosnils added a commit that referenced this issue Jul 19, 2016

Issue #1252: Fix creating a lot of new Jedis instances on unstable cl…
…uster, fix slots clearing without filling (#1253)

* Issue #1252: Fix creating lot of new Jedis instances on unstable cluster, fix slots clearing without filling

* Issue #1252: Acquire one long lock for trying all nodes when rediscover cluster

Conflicts:
	src/main/java/redis/clients/jedis/JedisClusterInfoCache.java

marcosnils added a commit that referenced this issue Jul 19, 2016

Issue #1252: Fix creating a lot of new Jedis instances on unstable cl…
…uster, fix slots clearing without filling (#1253)

* Issue #1252: Fix creating lot of new Jedis instances on unstable cluster, fix slots clearing without filling

* Issue #1252: Acquire one long lock for trying all nodes when rediscover cluster

Conflicts:
	src/main/java/redis/clients/jedis/JedisClusterInfoCache.java

marcosnils added a commit that referenced this issue Jul 19, 2016

Issue #1252: Random node + rediscovery on connection exception replac…
…ed with rediscovery at the end (#1256)

* Issue #1252:
1. New special exception for “No reachable nodes”
2. Fixed situation when simple connection exception with tryRandomNode=true proceed as just no reachable nodes
3. Don’t try random node with immediate initiate cluster renewal after that if we got ConnectionTimeout from proper slot jedis
4. Fix rewriting real root cause JedisConnectionException with JedisClusterMaxRedirectionsException

marcosnils added a commit that referenced this issue Jul 19, 2016

Issue #1252: Random node + rediscovery on connection exception replac…
…ed with rediscovery at the end (#1256)

* Issue #1252:
1. New special exception for “No reachable nodes”
2. Fixed situation when simple connection exception with tryRandomNode=true proceed as just no reachable nodes
3. Don’t try random node with immediate initiate cluster renewal after that if we got ConnectionTimeout from proper slot jedis
4. Fix rewriting real root cause JedisConnectionException with JedisClusterMaxRedirectionsException

Conflicts:
	src/main/java/redis/clients/jedis/BinaryJedisCluster.java
	src/main/java/redis/clients/jedis/JedisCluster.java

marcosnils added a commit that referenced this issue Jul 19, 2016

Issue #1252: Random node + rediscovery on connection exception replac…
…ed with rediscovery at the end (#1256)

* Issue #1252:
1. New special exception for “No reachable nodes”
2. Fixed situation when simple connection exception with tryRandomNode=true proceed as just no reachable nodes
3. Don’t try random node with immediate initiate cluster renewal after that if we got ConnectionTimeout from proper slot jedis
4. Fix rewriting real root cause JedisConnectionException with JedisClusterMaxRedirectionsException

Conflicts:
	src/main/java/redis/clients/jedis/BinaryJedisCluster.java
	src/main/java/redis/clients/jedis/JedisCluster.java
@sazzad16

This comment has been minimized.

Show comment
Hide comment
@sazzad16

sazzad16 Nov 21, 2017

Collaborator

Resolved by #1253 and #1256

Collaborator

sazzad16 commented Nov 21, 2017

Resolved by #1253 and #1256

@sazzad16 sazzad16 closed this Nov 21, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment