Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Mistral Model Inference with transformers-neuronx #3153

Merged
merged 18 commits into from
Mar 11, 2024

Conversation

DAIZHENWEI
Copy link
Contributor

This PR enables mistral model inference on Inferentia with transformers-neuronx backend.

To demonstrate offline inference with transformers-neuronx, run

python3 examples/offline_inference_neuron.py

Copy link
Contributor

@liangfu liangfu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DAIZHENWEI Thank you for your contribution. I think the overall changes looks good. I left some comments on undo unnecessary changes.

examples/offline_inference_neuron.py Outdated Show resolved Hide resolved
vllm/model_executor/models/__init__.py Outdated Show resolved Hide resolved
vllm/model_executor/models/neuron/mistral.py Outdated Show resolved Hide resolved
@liangfu
Copy link
Contributor

liangfu commented Mar 5, 2024

Thanks for addressing the comments. The format.sh script here would help fix the format issue:
https://github.com/vllm-project/vllm/blob/main/format.sh

@DAIZHENWEI
Copy link
Contributor Author

DAIZHENWEI commented Mar 6, 2024

Thanks for addressing the comments. The format.sh script here would help fix the format issue: https://github.com/vllm-project/vllm/blob/main/format.sh

@liangfu The format issue has been fixed. Ready to Merge.

examples/offline_inference_neuron.py Outdated Show resolved Hide resolved
vllm/model_executor/models/neuron/mistral.py Outdated Show resolved Hide resolved
vllm/model_executor/models/neuron/mistral.py Outdated Show resolved Hide resolved
Copy link
Contributor

@liangfu liangfu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing the comments. The changes look good to me.

examples/offline_inference_neuron.py Outdated Show resolved Hide resolved
@DAIZHENWEI
Copy link
Contributor Author

@liangfu @WoosukKwon ready to merge

@WoosukKwon WoosukKwon self-requested a review March 11, 2024 17:55
Copy link
Collaborator

@WoosukKwon WoosukKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for submitting the PR!

@WoosukKwon WoosukKwon merged commit 654865e into vllm-project:main Mar 11, 2024
24 checks passed
dtransposed pushed a commit to afeldman-nm/vllm that referenced this pull request Mar 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants