-
Notifications
You must be signed in to change notification settings - Fork 1k
fix: engine cannot give response to the second user request with stre… #701
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
thanks a lot for the contribution. Would it be possible for you to provide a script for reproducing the issue / elaborate on the issue? Thank you! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR restructures the streaming generation method in MLCEngine
to ensure the model lock is always released once, fixing an issue where the engine could not handle a second streaming request.
- Flattened multiple nested
try/catch
blocks into a singletry
+finally
around the main logic. - Converted inner helper functions (
_countTrailingReplacementChar
,_getChunk
) to arrow function expressions. - Removed redundant
lock.release()
calls spread across catches; now released once infinally
.
Comments suppressed due to low confidence (2)
src/engine.ts:706
- This TODO is still open—either implement the usage support for non-chat completions or file a tracking issue to ensure it isn't overlooked.
// TODO(Charlie): support usage for completion
src/engine.ts:483
- Add a test case that performs two consecutive streaming requests (with and without errors) to verify the lock is always released and reacquired correctly.
genConfig: GenerationConfig,
@@ -483,18 +483,15 @@ export class MLCEngine implements MLCEngineInterface { | |||
genConfig: GenerationConfig, | |||
timeReceived: number, | |||
): AsyncGenerator<ChatCompletionChunk | Completion, void, void> { | |||
// Since it is an async generator, we need to do fine-grained try-catch to ensure lock is | |||
// released only when errors occur. Then release at the very end when no error occurs. | |||
// TODO: This makes code less readable, is there a better way to do this? | |||
const lock = this.loadedModelIdToLock.get(model)!; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider adding a brief comment above this try/finally
to clarify its scope is for lock acquisition and release, improving future readability.
const lock = this.loadedModelIdToLock.get(model)!; | |
const lock = this.loadedModelIdToLock.get(model)!; | |
// Acquire the lock and ensure its release in the `finally` block. |
Copilot uses AI. Check for mistakes.
Hey there! So sorry for the super late reply! 😊 |
see title