Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix bug with accessing already released connection #2182

Merged
merged 1 commit into from
Aug 7, 2024

Conversation

EvgeniiMekhanik
Copy link
Contributor

There was a problem with accessing already released connection when we send error response to client,
after error in processing http response. Reponse
processing can occurs on different cpus (moreover
we can process several responses in parallel on
several cpus). Problem we fail to process two or
more responses in parallel.
We call tfw_h2_error_resp->tfw_h2_send_err_resp
and free request inside tfw_h2_send_err_resp
and also decrement connection reference counter
because of it. Also we call tfw_h2_conn_terminate_close and send FIN to the client. If FIN from the client comes immediatly ss_conn_drop_guard_exit will
be called on another cpu in parallel. For one
response everything is ok, we drop connection
in ss_conn_drop_guard_exit and release it, but
if we also call tfw_h2_error_resp for second response we release connection inside tfw_h2_send_err_resp after sending response and access already deleted connection in tfw_h2_conn_terminate_close!

  • To fix this problem we should increment connection reference counter before tfw_h2_error_resp and
    release it after ss_conn_drop_guard_exit.
  • Also we have a problem with sk->security - we release this pointer in ss_conn_drop_guard_exit but this function can be called when connection stil has active processed responses and we can pass this responses to frang. So we should release sk->security when we totally release connection.

@EvgeniiMekhanik
Copy link
Contributor Author

This PR fixes #2178 and possible fixes #2169 (need check)

Copy link
Contributor

@krizhanovsky krizhanovsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fix looks good for me, but why do we face a case when the same upstream connection is used by many CPUs concurrently? Didn't we recently fix the problem with TCP connections CPU locality , even for the loopback interface?

@EvgeniiMekhanik
Copy link
Contributor Author

Thanks to @kingluo he pay my attention on our private discussion, that we also have a problem with tfw_cli_conn_send. I implement new commit to fix it. @kingluo please look my PR and if you see some other problem places please write me about it here!

Copy link
Contributor

@kingluo kingluo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks.

* on the socket. A zeroed conn->sk is that indicator.
*/
void
tfw_connection_unlink_to_sk(TfwConn *conn)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand the reason for the move, yet I wonder if there's a way to leave this function where it was, together with the opposite function. If there was it would be preferable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think about it, but we need to tfw_classify_conn_close, which defined in http_limits.h. http_limits.h include connection.h, so I don't see any way to stay it in header except forward tfw_classify_conn_close declaration. I prefer move this function to connection.c

fw/sock_clnt.c Outdated
* and connection will be freed, so we should increment
* connection reference counter here to access `cli_conn`
* after sending data in this function.
*/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please allow me to rephrase a little bit. :-) You are free to use this any way you want.

msg can be freed in ss_do_send() callback. If it's the last msg for this connection the ref counter becomes equal to 1. However if the client closes the connection and sends FIN, the connection ref counter is decremented to zero and the connection is freed. To avoid that, increment the connection ref counter here to guarantee access to cli_conn after sending data.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes fixed.

@EvgeniiMekhanik
Copy link
Contributor Author

The fix looks good for me, but why do we face a case when the same upstream connection is used by many CPUs concurrently? Didn't we recently fix the problem with TCP connections CPU locality , even for the loopback interface?

Here we access client connection when we process response! Response processing can occurs and occurs on different CPU, because server socket is not a client socket!

@EvgeniiMekhanik EvgeniiMekhanik force-pushed the MekhanikEvgenii/fix-2178 branch 2 times, most recently from 830aaa7 to 718a5a5 Compare July 29, 2024 11:37
@EvgeniiMekhanik
Copy link
Contributor Author

It seems my second commit was wrong. We call tfw_cli_conn_send from two places:
tfw_h2_resp_fwd and we get connection before call tfw_cli_conn_send in this function!
__tfw_http_resp_fwd which is called for http1 only where we have no skb destructor and where we never destroy message on other cpu

@const-t const-t linked an issue Aug 5, 2024 that may be closed by this pull request
@EvgeniiMekhanik EvgeniiMekhanik force-pushed the MekhanikEvgenii/fix-2178 branch 2 times, most recently from 254a295 to e551a20 Compare August 5, 2024 16:12
Copy link
Contributor

@const-t const-t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

There was a problem with accessing already released
connection when we send error response to client,
after error in processing http response. Reponse
processing can occurs on different cpus (moreover
we can process several responses in parallel on
several cpus). Problem we fail to process two or
more responses in parallel.
We call tfw_h2_error_resp->tfw_h2_send_err_resp
and free request inside `tfw_h2_send_err_resp`
and also decrement connection reference counter
because of it. Also we call `tfw_h2_conn_terminate_close`
and send FIN to the client. If FIN from the client
comes immediatly `ss_conn_drop_guard_exit` will
be called on another cpu in parallel. For one
response everything is ok, we drop connection
in `ss_conn_drop_guard_exit` and release it, but
if we also call `tfw_h2_error_resp` for second response
we release connection inside `tfw_h2_send_err_resp` after
sending response and access already deleted connection in
`tfw_h2_conn_terminate_close`!
- To fix this problem we should increment connection
reference counter before `tfw_h2_error_resp` and
release it after `ss_conn_drop_guard_exit`.
- Also we have a problem with `sk->security` -
  we release this pointer in `ss_conn_drop_guard_exit`
  but this function can be called when connection stil
  has active processed responses and we can pass this
  responses to frang. So we should release `sk->security`
  when we totally release connection.
@EvgeniiMekhanik EvgeniiMekhanik merged commit d4e4ae0 into master Aug 7, 2024
1 check passed
@EvgeniiMekhanik EvgeniiMekhanik deleted the MekhanikEvgenii/fix-2178 branch August 7, 2024 07:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG in sock.c
5 participants