Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restore check for closed TCP connection in exporter process #360

Merged
merged 2 commits into from
Sep 5, 2024

Conversation

antoninbas
Copy link
Member

This is a partial reversal of d072ed8.
It turns out that the check can actually be useful as it can detect that the collector ("server" side) has closed its side of the connection, and this can be used as a signal to close our side of the connection as well. This can happen when a process using our collector implementation is closed, but no TCP RST is received when sendign a data set (in particular, this can happen when running in K8s and using a virtual IP to connect to the collector). This check can detect the issue much faster than relying on a keep-alive timeout. Furthermore, a client of this library could end up blocking if the connection has not timed out yet and the send buffer is full.

@yuntanghsu
Copy link
Contributor

Just want to double check,
I think the client will close the connection as it gets error and reconnect it when the connection is closed by server during sending the data?

@antoninbas
Copy link
Member Author

Just want to double check, I think the client will close the connection as it gets error and reconnect it when the connection is closed by server during sending the data?

If you are referring to what happens (after merging this patch) when the server closes the connection then: after at most 10s, the code will detect that the connection is closed, closeConnToCollector will be called (internally); the next time there is data to be sent, there will be an error (trying to send on closed connection), and the client will try to reconnect.

This is a partial reversal of d072ed8.
It turns out that the check can actually be useful as it can detect that
the collector ("server" side) has closed its side of the connection, and
this can be used as a signal to close our side of the connection as
well. This can happen when a process using our collector implementation
is closed, but no TCP RST is received when sendign a data set (in
particular, this can happen when running in K8s and using a virtual IP
to connect to the collector). This check can detect the issue much
faster than relying on a keep-alive timeout. Furthermore, a client of
this library could end up blocking if the connection has not timed out
yet and the send buffer is full.

Signed-off-by: Antonin Bas <antonin.bas@broadcom.com>
Signed-off-by: Antonin Bas <antonin.bas@broadcom.com>
@antoninbas antoninbas merged commit 2c76c1e into vmware:main Sep 5, 2024
7 checks passed
@antoninbas antoninbas deleted the restore-tcp-closed-conn-check branch September 5, 2024 23:21
antoninbas added a commit to antoninbas/vmware-go-ipfix that referenced this pull request Sep 5, 2024
This is a partial reversal of d072ed8.
It turns out that the check can actually be useful as it can detect that
the collector ("server" side) has closed its side of the connection, and
this can be used as a signal to close our side of the connection as
well. This can happen when a process using our collector implementation
is closed, but no TCP RST is received when sendign a data set (in
particular, this can happen when running in K8s and using a virtual IP
to connect to the collector). This check can detect the issue much
faster than relying on a keep-alive timeout. Furthermore, a client of
this library could end up blocking if the connection has not timed out
yet and the send buffer is full.

Signed-off-by: Antonin Bas <antonin.bas@broadcom.com>
antoninbas added a commit to antoninbas/vmware-go-ipfix that referenced this pull request Sep 5, 2024
This is a partial reversal of d072ed8.
It turns out that the check can actually be useful as it can detect that
the collector ("server" side) has closed its side of the connection, and
this can be used as a signal to close our side of the connection as
well. This can happen when a process using our collector implementation
is closed, but no TCP RST is received when sendign a data set (in
particular, this can happen when running in K8s and using a virtual IP
to connect to the collector). This check can detect the issue much
faster than relying on a keep-alive timeout. Furthermore, a client of
this library could end up blocking if the connection has not timed out
yet and the send buffer is full.

Signed-off-by: Antonin Bas <antonin.bas@broadcom.com>
heanlan pushed a commit that referenced this pull request Sep 5, 2024
This is a partial reversal of d072ed8.
It turns out that the check can actually be useful as it can detect that
the collector ("server" side) has closed its side of the connection, and
this can be used as a signal to close our side of the connection as
well. This can happen when a process using our collector implementation
is closed, but no TCP RST is received when sendign a data set (in
particular, this can happen when running in K8s and using a virtual IP
to connect to the collector). This check can detect the issue much
faster than relying on a keep-alive timeout. Furthermore, a client of
this library could end up blocking if the connection has not timed out
yet and the send buffer is full.

Signed-off-by: Antonin Bas <antonin.bas@broadcom.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants