-
Notifications
You must be signed in to change notification settings - Fork 39
NVIDIA vLLM Blog: Now Serving NVIDIA Nemotron with vLLM #101
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Chris Alexiuk <calexiuk@nvidia.com>
Signed-off-by: Chris Alexiuk <calexiuk@nvidia.com>
23096e4 to
d9a25c7
Compare
|
The image seems not rendering. |
|
@codex review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|  | ||
| > Figure 1: Chart showing accuracy of Nemotron Nano 2 9B on various popular benchmarks | ||
| - **Optimized Thinking:** The model has a new feature called thinking budget which avoids agent overthinking and optimizes for predictable inference cost. The chart below shows that if left alone, models can overthink, increasing inference cost, and in certain cases also reduce accuracy. Thinking budget addresses this challenge by enabling developers to tune the model to achieve the most optimal accuracy-token generation *sweetspot* for their applications. | ||
|
|
||
|  |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix broken asset URLs in Nemotron blog post
Both image references use ../assets/…. Because posts are rendered at /YYYY/MM/DD/slug.html, those relative paths resolve to /2025/10/assets/... and 404 in production. Other posts use root‑relative URLs (/assets/...) to avoid this. Switch these links to root-relative or {{ '/assets/...'}} so the images render on the published page.
Useful? React with 👍 / 👎.
Signed-off-by: Chris Alexiuk <calexiuk@nvidia.com>
|
@simon-mo - modify image references to match the other examples |
Signed-off-by: Chris Alexiuk <calexiuk@nvidia.com>
Signed-off-by: Chris Alexiuk <calexiuk@nvidia.com>
|
The image resolution is very low, any ways we can swap them? |
simon-mo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Content LGTM
|
Hey! We'd like to move ahead with publishing, and I will update the images once we get higher res. versions, if that's alright? |
|
Done. |
This PR submits both the content of the blog, as well as two
.pngassets used within the blog.