Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

异常报错,突然假死 #507

Open
hongpxiaozhu opened this issue Aug 14, 2017 · 10 comments
Open

异常报错,突然假死 #507

hongpxiaozhu opened this issue Aug 14, 2017 · 10 comments

Comments

@hongpxiaozhu
Copy link

hongpxiaozhu commented Aug 14, 2017

2017-08-13 11:47:11,023 [buyerInfoCompleteQueueListener-1] ERROR error - RefererInvocationHandler invoke Error: uri=motan://11.30.114.29:0/com.rpcapi.risk.rule.RiskRuleRpcService requestId=1575586135825979072 interface=com.rpcapi.risk.rule.RiskRuleRpcService method=executeRules(com.rpcapi.risk.rule.param.RuleActionParam)
com.weibo.api.motan.exception.MotanServiceException: error_message: ActiveWeightLoadBalance No available referers for call : referers_size=2 requestId=1575586135825979072 interface=com.rpcapi.risk.rule.RiskRuleRpcService method=executeRules(com.rpcapi.risk.rule.param.RuleActionParam), status: 503, error_code: 10001,r=null
at com.weibo.api.motan.cluster.loadbalance.AbstractLoadBalance.selectToHolder(AbstractLoadBalance.java:81)
at com.weibo.api.motan.cluster.ha.FailoverHaStrategy.selectReferers(FailoverHaStrategy.java:90)
at com.weibo.api.motan.cluster.ha.FailoverHaStrategy.call(FailoverHaStrategy.java:53)
at com.weibo.api.motan.cluster.support.ClusterSpi.call(ClusterSpi.java:73)
at com.weibo.api.motan.proxy.RefererInvocationHandler.invoke(RefererInvocationHandler.java:125)
at com.sun.proxy.$Proxy140.executeRulesAsync(Unknown Source)
at com.riskjms.buyer.listener.BuyerInfoCompleteListener.executeRuleEngine(BuyerInfoCompleteListener.java:453)
at com.riskjms.buyer.listener.BuyerInfoCompleteListener.onMessage(BuyerInfoCompleteListener.java:254)
at org.springframework.jms.listener.AbstractMessageListenerContainer.doInvokeListener(AbstractMessageListenerContainer.java:562)
at org.springframework.jms.listener.AbstractMessageListenerContainer.invokeListener(AbstractMessageListenerContainer.java:500)
at org.springframework.jms.listener.AbstractMessageListenerContainer.doExecuteListener(AbstractMessageListenerContainer.java:468)
at org.springframework.jms.listener.AbstractPollingMessageListenerContainer.doReceiveAndExecute(AbstractPollingMessageListenerContainer.java:326)
at org.springframework.jms.listener.AbstractPollingMessageListenerContainer.receiveAndExecute(AbstractPollingMessageListenerContainer.java:264)
at org.springframework.jms.listener.DefaultMessageListenerContainer$AsyncMessageListenerInvoker.invokeListener(DefaultMessageListenerContainer.java:1069)
at org.springframework.jms.listener.DefaultMessageListenerContainer$AsyncMessageListenerInvoker.executeOngoingLoop(DefaultMessageListenerContainer.java:1061)
at org.springframework.jms.listener.DefaultMessageListenerContainer$AsyncMessageListenerInvoker.run(DefaultMessageListenerContainer.java:958)
at java.lang.Thread.run(Thread.java:745)

在一切正常情况下突然假死,访问不通,需要重启服务端,请问有解决方案吗?

@rayeaster
Copy link

rayeaster commented Aug 14, 2017

确定zk和motan注册的服务是正常的吗?
uri=motan://11.30.114.29:0/
端口为什么是零? @rayzhang0603

@rayzhang0603
Copy link
Collaborator

client端异常信息No available referers for call : referers_size=2 表示从注册中心发现的server节点有两个,但是两个节点的请求都连续失败10次被client端保护策略置为不可用了。
这个问题需要在server端排查,看看server的业务处理耗时是否正常、是否有fullgc发生、netty io线程状态是否正常等

@hongpxiaozhu
Copy link
Author

你好,请问下这个client端的保护策略如何配置?或者说不可用后如何发现并处理。

服务端配置文件

<!-- service implemention bean -->
<motan:registry regProtocol="zookeeper" name="risk_zookeeper" address="11.170.159.174:2181" connectTimeout="10000" requestTimeout="10000"/>

<!--业务实现类-->
<bean id="ruleServiceImpl" class="com.ruleengine.rpc.service.impl.RuleServiceImpl" />
<bean id="ruleSetServiceImpl" class="com.ruleengine.rpc.service.impl.RuleSetServiceImpl" />
<bean id="riskRuleRpcServiceImpl" class="com.ruleengine.rpc.service.impl.RiskRuleRpcServiceImpl" />

<motan:protocol id="riskMotan" default="true" name="motan"
                maxServerConnection="80000" maxContentLength="1048576"
                maxWorkerThread="800" minWorkerThread="20"
                requestTimeout="90000" />
<motan:basicService id="serviceBasicConfig"
                    export="riskMotan:9001"
                    group="XXX-risk"
                    module="ruleengine"
                    registry="risk_zookeeper"
/>
<motan:service interface="com.rpcapi.risk.rule.RiskRuleRpcService"
               id="riskRuleRpcRemoteService"
               ref="riskRuleRpcServiceImpl"
               export="riskMotan:9005"
               basicService="serviceBasicConfig"/>

客户端配置

<!-- reference to the remote service -->
<motan:registry regProtocol="zookeeper" name="risk_zookeeper" address="11.170.159.174:2181" connectTimeout="10000" requestTimeout="10000"/>

<!-- motan协议配置 -->
<motan:protocol default="true" id="riskMotan" name="motan" haStrategy="failover"
                loadbalance="activeWeight" maxClientConnection="10" minClientConnection="2"
                requestTimeout="90000"/>
<!-- 通用referer基础配置 -->
<motan:basicReferer id="clientBasicConfig"
                    connectTimeout="30000"
                    requestTimeout="90000"
                    accessLog="false"
                    group="XXX-risk"
                    module="ruleengine"
                    protocol="riskMotan"
                    throwException="true"
                    registry="risk_zookeeper"
                    check="false"
                     />

<!--motan 引用接口-->

<motan:referer id="riskRuleRpcService"
               interface="com.rpcapi.risk.rule.RiskRuleRpcService"
               basicReferer="clientBasicConfig"/>

<motan:referer id="riskRuleRpcServiceAsync"
interface="com.rpcapi.risk.rule.RiskRuleRpcServiceAsync"
basicReferer="clientBasicConfig"/>

@rayeaster
Copy link

rayeaster commented Aug 14, 2017

@rayzhang0603 服务端是部署在tomcat里面,出问题时tomcat进程在的,zk上注册的服务也在,但是日志里未发现明显报错或者异常。
0813.113141.604 [New I/O server worker #1-8] INFO L:55 info - NettyChannelHandler channelDisconnected: remote=/10.28.30.178:58536 local=/10.29.113.28:9005 event=UpstreamChannelStateEvent 0813.113141.607 [New I/O server boss #1 ([id: 0x7637d5a3, /0.0.0.0:9005])] INFO L:55 info - NettyChannelHandler channelConnected: remote=/10.28.30.178:58246 local=/10.29.113.28:9005 event=UpstreamChannelStateEvent 0813.114141.582 [New I/O server worker #1-11] INFO L:55 info - NettyChannelHandler channelDisconnected: remote=/10.31.120.130:59542 local=/10.29.113.28:9005 event=UpstreamChannelStateEvent 0813.114141.584 [New I/O server boss #1 ([id: 0x7637d5a3, /0.0.0.0:9005])] INFO L:55 info - NettyChannelHandler channelConnected: remote=/10.31.120.130:48068 local=/10.29.113.28:9005 event=UpstreamChannelStateEvent 0813.114141.593 [New I/O server worker #1-12] INFO L:55 info - NettyChannelHandler channelDisconnected: remote=/10.29.189.59:55557 local=/10.29.113.28:9005 event=UpstreamChannelStateEvent 0813.114141.595 [New I/O server boss #1 ([id: 0x7637d5a3, /0.0.0.0:9005])] INFO L:55 info - NettyChannelHandler channelConnected: remote=/10.29.189.59:39303 local=/10.29.113.28:9005 event=UpstreamChannelStateEvent 0813.114141.601 [New I/O server worker #1-13] INFO L:55 info - NettyChannelHandler channelDisconnected: remote=/10.30.200.62:58529 local=/10.29.113.28:9005 event=UpstreamChannelStateEvent 0813.114141.603 [New I/O server boss #1 ([id: 0x7637d5a3, /0.0.0.0:9005])] INFO L:55 info - NettyChannelHandler channelConnected: remote=/10.30.200.62:42511 local=/10.29.113.28:9005 event=UpstreamChannelStateEvent 0813.114142.125 [Timer-0 ] INFO L:55 info - NettyChannelHandler channelDisconnected: remote=/10.27.99.226:8002 local=/10.29.113.28:4794 event=UpstreamChannelStateEvent 0813.114142.126 [Timer-0 ] INFO L:55 info - NettyChannelFactory_10.27.99.226_8002 client disconnect Success: motan://10.27.99.226:8002/com.xx.rpcapi.risk.contact.ContactInfoService 0813.114142.127 [New I/O client worker #1-5] INFO L:55 info - NettyChannelHandler channelConnected: remote=/10.27.99.226:8002 local=/10.29.113.28:54172 event=UpstreamChannelStateEvent 0813.114142.185 [Timer-0 ] INFO L:55 info - NettyChannelHandler channelDisconnected: remote=/10.28.30.178:8002 local=/10.29.113.28:14436 event=UpstreamChannelStateEvent 0813.114142.185 [Timer-0 ] INFO L:55 info - NettyChannelFactory_10.28.30.178_8002 client disconnect Success: motan://10.28.30.178:8002/com.xx.rpcapi.risk.contact.ContactInfoService 0813.114142.187 [New I/O client worker #1-6] INFO L:55 info - NettyChannelHandler channelConnected: remote=/10.28.30.178:8002 local=/10.29.113.28:63810 event=UpstreamChannelStateEvent 0813.115151.604 [New I/O server worker #1-4] INFO L:55 info - NettyChannelHandler channelDisconnected: remote=/10.30.200.62:53223 local=/10.29.113.28:9005 event=UpstreamChannelStateEvent 0813.115151.605 [New I/O server worker #1-5] INFO L:55 info - NettyChannelHandler channelDisconnected: remote=/10.30.200.62:53225 local=/10.29.113.28:9005 event=UpstreamChannelStateEvent 0813.115151.606 [New I/O server boss #1 ([id: 0x7637d5a3, /0.0.0.0:9005])] INFO L:55 info - NettyChannelHandler channelConnected: remote=/10.30.200.62:60309 local=/10.29.113.28:9005 event=UpstreamChannelStateEvent 0813.115151.607 [New I/O server boss #1 ([id: 0x7637d5a3, /0.0.0.0:9005])] INFO L:55 info - NettyChannelHandler channelConnected: remote=/10.30.200.62:60311 local=/10.29.113.28:9005 event=UpstreamChannelStateEven

@rayzhang0603
Copy link
Collaborator

rayzhang0603 commented Aug 14, 2017

client端保护策略不能配置。具体策略:连续请求失败10次,会将对应server节点标记为不可用,后续的请求就不会继续调用这个节点;同时会周期性(500ms)对此节点发送心跳包进行探测,如果处理正常处理心跳请求就会自动恢复可用。
降级时有error日志,关键词'NettyClient unavailable Error' ;恢复时有info日志,关键词'NettyClient recover available'

可以看看client端有没有降级及恢复的日志。同时看看client和server端是否有可疑warn和error 日志、server端的业务处理是否正常、gc是否异常、io线程是否正常

@rayeaster
Copy link

rayeaster commented Aug 14, 2017

的确有降级和恢复日志,但是最后一次降级以后就没有恢复日志了
2017-08-13 11:45:19,747 [New I/O client worker #1-12] ERROR error - NettyClient unavailable Error: url=motan://10.31.120.130:9005/xx-risk/com.xx.rpcapi.risk.rule.RiskRuleRpcService/1.0/referer 10.31.120.130:9005 2017-08-13 11:45:19,747 [New I/O client worker #1-12] ERROR error - NettyResponseFuture notifyListener Error: com.weibo.api.motan.exception.MotanFrameworkException: error_message: decode response error: class not found, status: 503, error_code: 20003,r=null at com.weibo.api.motan.protocol.rpc.DefaultRpcCodec.decode(DefaultRpcCodec.java:130) at com.weibo.api.motan.transport.netty.NettyDecoder.decode(NettyDecoder.java:105) at org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:282) at org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:216) at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:80) at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559) at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:274) at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:261) at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:349) at org.jboss.netty.channel.socket.nio.NioWorker.processSelectedKeys(NioWorker.java:280) at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:200) at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108) at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:44) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.ClassNotFoundException: com.weibo.api.motan.exception.MotanServiceException at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:274) at com.weibo.api.motan.util.ReflectUtil.forNameWithoutCache(ReflectUtil.java:142) at com.weibo.api.motan.util.ReflectUtil.forName(ReflectUtil.java:130) at com.weibo.api.motan.protocol.rpc.DefaultRpcCodec.decodeResponse(DefaultRpcCodec.java:371) at com.weibo.api.motan.protocol.rpc.DefaultRpcCodec.decode(DefaultRpcCodec.java:125) ... 16 more 2017-08-13 11:45:20,132 [pool-5-thread-1] INFO info - NettyClient heartbeat request: url=motan://10.31.120.130:9005/com.xx.rpcapi.risk.rule.RiskRuleRpcService 2017-08-13 11:45:20,132 [pool-5-thread-1] INFO info - NettyClient heartbeat request: url=motan://10.29.113.28:9005/com.xx.rpcapi.risk.rule.RiskRuleRpcService

@rayzhang0603
Copy link
Collaborator

java.lang.Thread.run(Thread.java:745) Caused by: java.lang.ClassNotFoundException: com.weibo.api.motan.exception.MotanServiceException

client端这个类为什么没有呢?

@rayeaster
Copy link

我们用了motan的异步功能,看日志感觉是瞬间访问网络出错?
查了调用端最近半个月的日志,只有昨天发生过降级和恢复。

2017-08-13 11:02:11,670 [pool-6-thread-1] ERROR error - NettyClient unavailable Error: url=motan://10.31.120.130:9005/xx-risk/com.xx.rpcapi.risk.rule.RiskRuleRpcService/1.0/referer 10.31.120.130:9005 2017-08-13 11:02:11,670 [pool-6-thread-1] ERROR error - NettyResponseFuture notifyListener Error: com.weibo.api.motan.exception.MotanServiceException: error_message: NettyResponseFuture task cancel: serverPort=10.31.120.130:9005 requestId=1575583210878798839 interface=com.xx.rpcapi.risk.rule.RiskRuleRpcService method=executeRules(com.xx.rpcapi.risk.rule.param.RuleActionParam) cost=90094, status: 503, error_code: 10001,r=null at com.weibo.api.motan.transport.netty.NettyResponseFuture.cancel(NettyResponseFuture.java:151) at com.weibo.api.motan.transport.netty.NettyClient$TimeoutMonitor.run(NettyClient.java:486) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745)

@rayzhang0603
Copy link
Collaborator

可以看看最后一次降级后,client端的info日志是否持续有‘NettyClient heartbeat request’的日志,如果有的话说明降级后持续发送了心跳,但是都没有收到响应

@rayeaster
Copy link

client端一直都有发心跳的。服务端日志里面就是这样的:
0813.113141.604 [New I/O server worker #1-8] INFO L:55 info - NettyChannelHandler channelDisconnected: remote=/10.28.30.178:58536 local=/10.29.113.28:9005 event=UpstreamChannelStateEvent 0813.113141.607 [New I/O server boss #1 ([id: 0x7637d5a3, /0.0.0.0:9005])] INFO L:55 info - NettyChannelHandler channelConnected: remote=/10.28.30.178:58246 local=/10.29.113.28:9005 event=UpstreamChannelStateEvent 0813.114141.582 [New I/O server worker #1-11] INFO L:55 info - NettyChannelHandler channelDisconnected: remote=/10.31.120.130:59542 local=/10.29.113.28:9005 event=UpstreamChannelStateEvent 0813.114141.584 [New I/O server boss #1 ([id: 0x7637d5a3, /0.0.0.0:9005])] INFO L:55 info - NettyChannelHandler channelConnected: remote=/10.31.120.130:48068 local=/10.29.113.28:9005 event=UpstreamChannelStateEvent 0813.114141.593 [New I/O server worker #1-12] INFO L:55 info - NettyChannelHandler channelDisconnected: remote=/10.29.189.59:55557 local=/10.29.113.28:9005 event=UpstreamChannelStateEvent 0813.114141.595 [New I/O server boss #1 ([id: 0x7637d5a3, /0.0.0.0:9005])] INFO L:55 info - NettyChannelHandler channelConnected: remote=/10.29.189.59:39303 local=/10.29.113.28:9005 event=UpstreamChannelStateEvent 0813.114141.601 [New I/O server worker #1-13] INFO L:55 info - NettyChannelHandler channelDisconnected: remote=/10.30.200.62:58529 local=/10.29.113.28:9005 event=UpstreamChannelStateEvent 0813.114141.603 [New I/O server boss #1 ([id: 0x7637d5a3, /0.0.0.0:9005])] INFO L:55 info - NettyChannelHandler channelConnected: remote=/10.30.200.62:42511 local=/10.29.113.28:9005 event=UpstreamChannelStateEvent

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants