Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clean up ResponseFactory when a final complete flag is set #358

Closed
wants to merge 5 commits into from

Conversation

krishung5
Copy link
Contributor

@krishung5 krishung5 commented May 18, 2024

In the current implementation, users need to ensure that the response_sender goes out of scope or is properly cleaned up before unloading the model to guarantee that the unloading process executes correctly. This is because the response_sender is responsible for cleaning up the response_factory, which holds a shared_ptr to the underlying model. Only when the reference count of the model's shared_ptr drops to 0 will the model be unloaded.

This PR enhances the lifetime management of response_factory so that when the complete final flag is sent, the response_factory will be cleaned up immediately. This allows users to not worry about the lifetime of response_sender and to reuse response_sender if they choose to.

For rescheduled or cancelled requests, updated the documentation for the issue since it's hard and tricky to propagate that information back to the response_sender. I think we can enhance that after Jacky's ongoing changes for response_sender are in place.

Testing: triton-inference-server/server#7246

README.md Outdated
@@ -599,6 +599,11 @@ Starting from 23.10, request cancellation can be checked directly on the
the TRITONSERVER_RESPONSE_COMPLETE_FINAL flag at the end of response is still
needed even the request is cancelled.

*Note:*: If `response_sender` is reused across multiple requests, the model must
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you elaborate on "response_sender is reused across multiple requests"? I thought each request has exactly one response sender?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologize for the confusion. It is correct that each request has only one response sender. I think what I meant was specific to the case that the TRT-LLM team was experiencing:
A separate thread is launched to handle the requests instead of spawning a new thread for each request, and the thread is terminated in the finalize stage. In this scenario, the last response_sender in the thread will not be cleaned up before model gets unloaded, hence the model can not be unloaded properly. (You can refer to the testing model here for more details)

On a second thought, I think we can remove this section since the complete final flag is required for both request rescheduling and cancellation. In this case, even with that specific thread implementation, the response_sender should clean up the response_factory when the complete final flag is set.

Let me know if it makes sense to you!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the clarification! It makes sense.

One last question: Does the thread let the response_sender go out of scope, before the thread is terminated?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this particular example, I think the response_sender goes out of scope when the thread starts the termination by calling thread.join() or something like that. The issue was that thread.join() is called within the finalize, so the response_sender will only go out of scope after finalize is invoked. If manually doing del response_sender in the thread then the response_factory can be cleaned up.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't the thread let the response_sender(s) go out of scope as soon as they are no longer used, i.e. after sending response final flag, before thread.join() is called?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think that is the ideal way to manage response_sender. I believe one possible use case where response_sender does not go out of scope as soon as the final flag is sent is that, for example, the response_sender variable in this model will only be cleaned up when the next request is arrived and it gets reassigned to the new response_sender. This is when the destructor of the previous response_sender gets called. So unless users manually call del response_sender at the end of the thread, the last response_sender will not go out of scope until thread.join() is called.

// The flag to indicate if the response sender is closed. It is set to true
// once the TRITONSERVER_RESPONSE_COMPLETE_FINAL flag is set, meaning that the
// response_sender should not be used anymore. This flag is separate from the
// `is_response_factory_cleaned_` flag because errors might occur after
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you elaborate on the errors that might occur after sending complete_final flag but before deleting the response_factory? Is it the model that is holding on to the response_sender forever?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the complete final flag is set, it is still possible that some error occurs before the response actually gets sent to the backend. For example, here and here.
In this case, the response_factory will not be automatically deleted in the backend, and can only be deleted within the destructor of response_sender by sending the clean up message to the backend.

I think this bool variable closed_ is just to make sure users are not using the response sender after they set the final flag, It can not be used to determine if the response_factory is cleaned up from the backend side or not.

@mc-nv mc-nv requested a review from kthui May 21, 2024 23:58
Comment on lines +52 to +55
stub->EnqueueCleanupId(
reinterpret_cast<void*>(response_factory_address_),
PYTHONSTUB_DecoupledResponseFactoryCleanup);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we remove EnqueueCleanupId (and associated logic on the server side) now that response factory is deleted when it sees the final flag?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we would still need to keep it for rescheduled requests. It's possible that the response sender hasn't sent the final flag yet and the request gets rescheduled. There will be a new response factory created for each request even if the request is a rescheduled one, so I think we would still need to clean up the response factory in the destructor.

@krishung5
Copy link
Contributor Author

Closing as per discussion a better approach would be handle this in the Triton core instead of PYBE. Filed ticket DLIS-6795 to track this.

@krishung5 krishung5 closed this May 30, 2024
@krishung5 krishung5 deleted the krish-python-finalize branch May 30, 2024 18:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
3 participants