Skip to content

[ci] speedup fused moe tests #5726

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jul 7, 2025

Conversation

omera-nv
Copy link
Collaborator

@omera-nv omera-nv commented Jul 3, 2025

[ci] speedup fused moe tests

@omera-nv
Copy link
Collaborator Author

omera-nv commented Jul 3, 2025

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #10862 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #10862 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #8028 completed with status: 'FAILURE'

Copy link
Collaborator

@djns99 djns99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tolerances will need to be improved on these tests, but any issues would appear to be already underlying.
I do worry about the implication that a torch randn update could randomly cause these tests to fail as this would be a very confusing bug to track down

@omera-nv
Copy link
Collaborator Author

omera-nv commented Jul 4, 2025

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #10906 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #10906 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #8060 completed with status: 'FAILURE'

@omera-nv
Copy link
Collaborator Author

omera-nv commented Jul 4, 2025

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #10933 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #10933 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #8081 completed with status: 'FAILURE'

@omera-nv omera-nv force-pushed the fix/create_on_device_in_tests branch from 2dc3ed3 to 4bdb478 Compare July 4, 2025 09:15
@omera-nv
Copy link
Collaborator Author

omera-nv commented Jul 4, 2025

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #10983 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #10983 [ run ] completed with state FAILURE
/LLM/main/L0_MergeRequest_PR pipeline #8113 completed with status: 'FAILURE'

@omera-nv
Copy link
Collaborator Author

omera-nv commented Jul 4, 2025

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #11000 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #11000 [ run ] completed with state FAILURE
/LLM/main/L0_MergeRequest_PR pipeline #8127 completed with status: 'FAILURE'

omera-nv added 2 commits July 7, 2025 13:54
Signed-off-by: Omer Ullman Argov <118735753+omera-nv@users.noreply.github.com>
Signed-off-by: Omer Ullman Argov <118735753+omera-nv@users.noreply.github.com>
@omera-nv omera-nv force-pushed the fix/create_on_device_in_tests branch from 4bdb478 to 479ce55 Compare July 7, 2025 10:54
@omera-nv
Copy link
Collaborator Author

omera-nv commented Jul 7, 2025

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #11145 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #11145 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #8241 completed with status: 'SUCCESS'
Pipeline passed with automatic retried tests. Check the rerun report for details.

@omera-nv omera-nv merged commit 1191555 into NVIDIA:main Jul 7, 2025
3 checks passed
zhou-yuxin pushed a commit to zhou-yuxin/TensorRT-LLM that referenced this pull request Jul 15, 2025
Signed-off-by: Omer Ullman Argov <118735753+omera-nv@users.noreply.github.com>
Signed-off-by: Yuxin <yuxinz@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants