Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Qwen1.5-chat 72B int4 4卡(V100) 推理过程中token数到10k会报错OOM #1419

Open
EthanD4869 opened this issue Apr 30, 2024 · 1 comment
Labels
question Further information is requested
Milestone

Comments

@EthanD4869
Copy link

image
image

@EthanD4869 EthanD4869 added the question Further information is requested label Apr 30, 2024
@XprobeBot XprobeBot added this to the v0.11.0 milestone Apr 30, 2024
@Channingss
Copy link

请教一下,速度多少token/s,我在部署32k的int4,awq和gptq都不到1t/s..很困惑

@XprobeBot XprobeBot modified the milestones: v0.11.0, v0.11.1, v0.11.2 May 11, 2024
@XprobeBot XprobeBot modified the milestones: v0.11.2, v0.11.3 May 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants