Skip to content

Conversation

david6666666
Copy link
Contributor

@david6666666 david6666666 commented Sep 22, 2025

Purpose

fix gsm8k eval doc

Run standalone evaluation script

# Start vLLM server first
vllm serve Qwen/Qwen2.5-1.5B-Instruct --port 8000

# Run evaluation
python tests/evals/gsm8k/gsm8k_eval.py --port 8000

Test Plan

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: David Chen <530634352@qq.com>
@david6666666 david6666666 requested a review from mgoin as a code owner September 22, 2025 02:19
@david6666666 david6666666 changed the title GSM8K Accuracy Evaluation doc update [Docs] GSM8K Accuracy Evaluation doc update Sep 22, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates a command in the README.md for the GSM8K evaluation, correcting the path to the gsm8k_eval.py script. The change is a good improvement. I've added a suggestion to use python3 explicitly to avoid potential issues on systems where python might refer to Python 2. Additionally, please note that the path in the pytest command on line 10 of the same README file also appears to be incorrect and could be fixed for consistency.


# Run evaluation
python tests/gsm8k/gsm8k_eval.py --port 8000
python tests/evals/gsm8k/gsm8k_eval.py --port 8000
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The associated script gsm8k_eval.py uses a python3 shebang (#!/usr/bin/env python3). To ensure this command runs reliably across different user environments, it's best practice to use python3 explicitly. On some systems, python may still point to an older, incompatible Python 2 installation, which would cause the script to fail.

Suggested change
python tests/evals/gsm8k/gsm8k_eval.py --port 8000
python3 tests/evals/gsm8k/gsm8k_eval.py --port 8000

@jeejeelee jeejeelee added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 22, 2025
@jeejeelee jeejeelee enabled auto-merge (squash) September 22, 2025 02:23
@jeejeelee jeejeelee merged commit 793be8d into vllm-project:main Sep 22, 2025
26 of 32 checks passed
kingsmad pushed a commit to kingsmad/vllm that referenced this pull request Sep 22, 2025
Signed-off-by: David Chen <530634352@qq.com>
FeiDaLI pushed a commit to FeiDaLI/vllm that referenced this pull request Sep 25, 2025
Signed-off-by: David Chen <530634352@qq.com>
charlifu pushed a commit to ROCm/vllm that referenced this pull request Sep 25, 2025
Signed-off-by: David Chen <530634352@qq.com>
Signed-off-by: charlifu <charlifu@amd.com>
yewentao256 pushed a commit that referenced this pull request Oct 3, 2025
Signed-off-by: David Chen <530634352@qq.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready ONLY add when PR is ready to merge/full CI is needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants