Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bug] 容器运行一年,executor resubscribe导致mesos-slave coredump #82

Closed
zmberg opened this issue Jul 3, 2019 · 2 comments
Closed
Assignees
Labels
bug Something isn't working confirmed issue is confirmed enhancement New feature or request inner issue comes from Tencent side planning issue is under planning

Comments

@zmberg
Copy link
Contributor

zmberg commented Jul 3, 2019

问题描述

容器运行一年,executor resubscribe导致mesos-slave coredump,executor相关日志如下:
image

@zmberg zmberg added the bug Something isn't working label Jul 3, 2019
@zmberg
Copy link
Contributor Author

zmberg commented Jul 3, 2019

经过排查发现:
executor与mesos-slave之间是建立的tcp长链接,如果链接断开则会重新注册。client建立链接时发现如下代码:
const CONNKEEPALIVE = 86400 * 365 * time.Second
//refresh http transport & client
httpConn.transport = httpsTransport
httpConn.client = &http.Client{
Timeout: CONNKEEPALIVE,
Transport: httpsTransport,
}

client.timeout表示链接的生命周期,所以对于长链接timeout不应该设置。

@DeveloperJim DeveloperJim added this to the 1.13.x功能迭代 milestone Jul 4, 2019
@DeveloperJim DeveloperJim added confirmed issue is confirmed enhancement New feature or request inner issue comes from Tencent side planning issue is under planning labels Jul 7, 2019
DeveloperJim added a commit that referenced this issue Jul 15, 2019
fix: executor与mesos-slave长链接不设置timeout时间; issue #82
DeveloperJim added a commit that referenced this issue Jul 26, 2019
fix: executor与mesos-slave长链接不设置timeout时间; issue #82
@zmberg
Copy link
Contributor Author

zmberg commented Jul 26, 2019

上面提交的commit将executor subscribe mesos-slave从1s调整为了15s。
经测试后发现mesos-slave默认配置要求subsribe时间在2s之内,所以将时间重新调整为1s

zmberg added a commit to zmberg/bk-bcs that referenced this issue Jul 26, 2019
DeveloperJim added a commit that referenced this issue Jul 26, 2019
fix: 调整executor subcribe时间为1s; issue #82
DeveloperJim added a commit that referenced this issue Nov 4, 2019
fix: executor与mesos-slave长链接不设置timeout时间; issue #82
DeveloperJim added a commit that referenced this issue Nov 4, 2019
fix: 调整executor subcribe时间为1s; issue #82
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working confirmed issue is confirmed enhancement New feature or request inner issue comes from Tencent side planning issue is under planning
Projects
None yet
Development

No branches or pull requests

2 participants