-
-
Notifications
You must be signed in to change notification settings - Fork 10.5k
[Model] Support nested structures for TensorSchema #26212
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Model] Support nested structures for TensorSchema #26212
Conversation
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces support for nested tensor structures within TensorSchema
, a significant enhancement for managing complex multimodal inputs. The refactoring in hyperclovax_vision.py
to adopt this new schema notably improves code clarity and maintainability. I have identified two critical issues in the _call_hf_processor
method where video input data structures are incorrectly constructed, which would result in runtime errors. My review provides specific suggestions to rectify these problems. The other modifications, including the core logic updates in tensor_schema.py
and the corresponding test adjustments, appear to be well-implemented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces support for nested structures in TensorSchema
, which simplifies the implementation for models like HyperCLOVAX-SEED-Vision. The changes include refactoring hyperclovax_vision.py
to use the new TensorSchema
capabilities, renaming a field in Phi3VImagePixelInputs
for clarity, and fixing a minor bug in glm4_1v.py
. The refactoring in hyperclovax_vision.py
is substantial and improves code clarity and maintainability. I have one suggestion to further improve the maintainability of the parsing logic for multimodal inputs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Looks good. Thanks for improving @DarkLight1337! |
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>
Purpose
This helps simplify the implementation of HyperCLOVAX-SEED-Vision and enable
merge_by_field_config=True
for that model.Other changes:
Phi3VImagePixelInputs
cc @bbeckca
Test Plan
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.md
andexamples
for a new model.