Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for http generate request cancellation and segfault fix #6591

Merged
merged 22 commits into from
Nov 21, 2023

Conversation

nnshah1
Copy link
Contributor

@nnshah1 nnshah1 commented Nov 16, 2023

Hooking the request free callback from evhtp to support request cancellation.

From inspection, all request free operations have to go through:

https://github.com/triton-inference-server/third_party/blob/8af7da2ae3f4938a8b497728fcca6460797c402f/libevhtp/libevhtp/evhtp.c#L1233

And connection free frees the request first:

https://github.com/triton-inference-server/third_party/blob/8af7da2ae3f4938a8b497728fcca6460797c402f/libevhtp/libevhtp/evhtp.c#L5151

So when a client disconnects we can trap the request free callback and cancel the request.

There are three objects with lifetimes to maintain:

  1. evhtp_req. Request object for libevhtp. This is created by libevhtp and freed by libevhtp when a connection is freed. We add the request_fini hook to get notified when this object is freed and use it as indication that the client has exited and cancelled the request. The req object can not be used after the request free hook. We remove the hook if the response is complete.

  2. InferRequestClass. InferRequestClass is created by the httpserver to wrap the evhtp_req and the TRITONSERVER_InferenceRequest objects. It is used in the response callbacks to send http responses. The lifetime of the object is between the request creation and the final response for a request.

  3. TRITONSERVER_InferenceRequest. InferenceRequest is created by the httpserver and used to send inputs to the core. It is also used to cancel requests. Its lifetime is between request creation and the final response AND the request release callback (sent from the core).

We use a shared_ptr to manage the lifetime of the InferenceRequest. The shared_ptr is held by the request_release_payload (freed in the request_release_callback) and by the InferRequestClass (freed in the final response callbacks). This enables us to use the request object to cancel requests after it has been released by the core but before the final response is received.

Some minor refactoring to store the inference request in the InferRequestClass.
Moved the Callbacks to the InferRequestClass to make Generate and InferRequestClass similar and to avoid having to make protected members public.
 

@nnshah1 nnshah1 marked this pull request as draft November 16, 2023 16:00
@nnshah1 nnshah1 marked this pull request as ready for review November 16, 2023 20:40
@nnshah1 nnshah1 changed the title updates with initial fix for connection gone Support for http generate request cancellation and segfault fix Nov 16, 2023
Copy link
Member

@Tabrizian Tabrizian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, please make sure that L0_infer_valgrind tests pass after this change.

src/http_server.cc Outdated Show resolved Hide resolved
src/http_server.cc Outdated Show resolved Hide resolved
src/http_server.cc Outdated Show resolved Hide resolved
src/http_server.cc Outdated Show resolved Hide resolved
src/http_server.cc Show resolved Hide resolved
rmccorm4
rmccorm4 previously approved these changes Nov 17, 2023
Copy link
Collaborator

@rmccorm4 rmccorm4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks generally good to me. Left some comments for follow-ups, doesn't need to block cherry pick.

tanmayv25
tanmayv25 previously approved these changes Nov 20, 2023
Copy link
Contributor

@tanmayv25 tanmayv25 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great! I will introduce RequestReleasePayload class in gRPC frontend too which would fix some lifetime issues that we are still observing.

rmccorm4
rmccorm4 previously approved these changes Nov 20, 2023
Copy link
Collaborator

@rmccorm4 rmccorm4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, not too sure if sagemaker server has any unique behavior, so will want to make sure L0_sagemaker passes.

GuanLuo
GuanLuo previously approved these changes Nov 20, 2023
@nnshah1
Copy link
Contributor Author

nnshah1 commented Nov 21, 2023

Also, please make sure that L0_infer_valgrind tests pass after this change.

Tested and expected tests passing.

@nnshah1 nnshah1 dismissed stale reviews from GuanLuo and rmccorm4 via 74c8ba5 November 21, 2023 15:45
@nnshah1 nnshah1 merged commit b876a90 into main Nov 21, 2023
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants