You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Q1. if RssInvalidServerVersionException will occur when RSS-i is restarted by a shell script as soon as it crashes due to some reasons meanwhile some applications are still using it. clients still stores the former RSS-i version but actually the version of the newly registered RSS-i is already changed.
# also the other exception may be caused by the same reason?
org.apache.spark.shuffle.FetchFailedException: Detected server restart, current server: Server{rss04.xxx:12203, 1675897753258, rss04xxx:/data/}, previous server: Server{rss04.xxxx:12203, 1675895945858, rss04xxx:/data/} at org.apache.spark.shuffle.RssShuffleManager$$anon$2.resolveConnection(RssShuffleManager.scala:220) at com.uber.rss.clients.ServerConnectionCacheUpdateRefresher.refreshConnection(ServerConnectionCacheUpdateRefresher.java:49) at com.uber.rss.clients.ServerIdAwareSyncWriteClient.connectImpl(ServerIdAwareSyncWriteClient.java:133) at
Q2. What may cause this exception :
org.apache.spark.shuffle.FetchFailedException: Cannot fetch shuffle 0 partition 362 due to RssAggregateException (RssShuffleStageNotStartedException (Shuffle not started: DataBlockSocketReadClient 274 [/10.2xxx44973 -> /10.20xxx:12212 (1xxxx28)])
com.uber.rss.exceptions.RssShuffleStageNotStartedException: Shuffle not started: DataBlockSocketReadClient 274 [/10.2xxxx:44973 -> /10.2xxx12212 (10.xxxx)]
at com.uber.rss.clients.ClientBase.checkOKResponseStatus(ClientBase.java:291)
at com.uber.rss.clients.ClientBase.readResponseStatus(ClientBase.java:275)
at ...
The text was updated successfully, but these errors were encountered:
Q1
You are right. This happened because server restarted and client had initially connected to earlier server. Ideally should not be an issue. Maybe we can remove this check @hiboyang ?
Q2
That basically means the server you are trying to connect to has not yet received the shuffle data for corresponding partition (Identified using appId, appAttemptId, shuffleId). Is this also happening when the server restarted?
Hi, I am wondering:
Q1. if
RssInvalidServerVersionException
will occur when RSS-i is restarted by a shell script as soon as it crashes due to some reasons meanwhile some applications are still using it. clients still stores the former RSS-i version but actually the version of the newly registered RSS-i is already changed.Q2. What may cause this exception :
The text was updated successfully, but these errors were encountered: