[Docs] GSM8K Accuracy Evaluation doc update #25360

david6666666 · 2025-09-22T02:19:45Z

Purpose

fix gsm8k eval doc

Run standalone evaluation script

# Start vLLM server first
vllm serve Qwen/Qwen2.5-1.5B-Instruct --port 8000

# Run evaluation
python tests/evals/gsm8k/gsm8k_eval.py --port 8000

Test Plan

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: David Chen <530634352@qq.com>

gemini-code-assist

Code Review

This pull request updates a command in the README.md for the GSM8K evaluation, correcting the path to the gsm8k_eval.py script. The change is a good improvement. I've added a suggestion to use python3 explicitly to avoid potential issues on systems where python might refer to Python 2. Additionally, please note that the path in the pytest command on line 10 of the same README file also appears to be incorrect and could be fixed for consistency.

gemini-code-assist · 2025-09-22T02:20:41Z

tests/evals/gsm8k/README.md


 # Run evaluation
-python tests/gsm8k/gsm8k_eval.py --port 8000
+python tests/evals/gsm8k/gsm8k_eval.py --port 8000


The associated script gsm8k_eval.py uses a python3 shebang (#!/usr/bin/env python3). To ensure this command runs reliably across different user environments, it's best practice to use python3 explicitly. On some systems, python may still point to an older, incompatible Python 2 installation, which would cause the script to fail.

Suggested change

python tests/evals/gsm8k/gsm8k_eval.py --port 8000

python3 tests/evals/gsm8k/gsm8k_eval.py --port 8000

Signed-off-by: David Chen <530634352@qq.com>

Signed-off-by: David Chen <530634352@qq.com> Signed-off-by: charlifu <charlifu@amd.com>

Signed-off-by: David Chen <530634352@qq.com> Signed-off-by: yewentao256 <zhyanwentao@126.com>

GSM8K Accuracy Evaluation doc update

bfb4029

Signed-off-by: David Chen <530634352@qq.com>

david6666666 requested a review from mgoin as a code owner September 22, 2025 02:19

david6666666 changed the title ~~GSM8K Accuracy Evaluation doc update~~ [Docs] GSM8K Accuracy Evaluation doc update Sep 22, 2025

gemini-code-assist bot reviewed Sep 22, 2025

View reviewed changes

jeejeelee approved these changes Sep 22, 2025

View reviewed changes

jeejeelee added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 22, 2025

jeejeelee enabled auto-merge (squash) September 22, 2025 02:23

jeejeelee merged commit 793be8d into vllm-project:main Sep 22, 2025
26 of 32 checks passed

kingsmad pushed a commit to kingsmad/vllm that referenced this pull request Sep 22, 2025

[Docs] GSM8K Accuracy Evaluation doc update (vllm-project#25360)

b608cb4

Signed-off-by: David Chen <530634352@qq.com>

FeiDaLI pushed a commit to FeiDaLI/vllm that referenced this pull request Sep 25, 2025

[Docs] GSM8K Accuracy Evaluation doc update (vllm-project#25360)

647668c

Signed-off-by: David Chen <530634352@qq.com>

charlifu pushed a commit to ROCm/vllm that referenced this pull request Sep 25, 2025

[Docs] GSM8K Accuracy Evaluation doc update (vllm-project#25360)

756a3d8

Signed-off-by: David Chen <530634352@qq.com> Signed-off-by: charlifu <charlifu@amd.com>

yewentao256 pushed a commit that referenced this pull request Oct 3, 2025

[Docs] GSM8K Accuracy Evaluation doc update (#25360)

dba6db9

Signed-off-by: David Chen <530634352@qq.com> Signed-off-by: yewentao256 <zhyanwentao@126.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Docs] GSM8K Accuracy Evaluation doc update #25360

[Docs] GSM8K Accuracy Evaluation doc update #25360

david6666666 commented Sep 22, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Sep 22, 2025

Uh oh!

Uh oh!

Uh oh!

	python tests/evals/gsm8k/gsm8k_eval.py --port 8000
	python3 tests/evals/gsm8k/gsm8k_eval.py --port 8000

Uh oh!

[Docs] GSM8K Accuracy Evaluation doc update #25360

[Docs] GSM8K Accuracy Evaluation doc update #25360

Conversation

david6666666 commented Sep 22, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Run standalone evaluation script

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Sep 22, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

david6666666 commented Sep 22, 2025 •

edited by github-actions bot

Loading