[Misc]: vllm-ascend 推理速度非常慢 #171

ryys1122 · 2025-02-26T03:47:35Z

Anything you want to discuss about vllm on ascend.

vllm-asend 部署成功后，使用4张910B，运行推理服务。

VLLM_USE_MODELSCOPE=true NPU_VISIBLE_DEVICES=4,5,6,7 ASCEND_RT_VISIBLE_DEVICES=4,5,6,7 vllm serve --tensor-parallel-size 4 deepseek-ai/DeepSeek-R1-Distill-Qwen-32B

测试推理服务

curl http://localhost:8000/v1/completions -H "Content-Type: application/json" -d '{
"model": "deepseek-ai/DeepSeek-R1-Distill-Qwen-32B",
"prompt": "描述一下北京的秋天",
"max_tokens": 512
}'

将近3分钟才返回结果。

caolicaoli · 2025-03-05T06:38:33Z

我这里4卡运行llama70b，每秒9token，100并发下每秒700token。

ryys1122 · 2025-03-06T00:50:57Z

我这里4卡运行llama70b，每秒9token，100并发下每秒700token。

运行的命令和参数是什么样的呢？

wangxiyuan added the performance label Feb 26, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Misc]: vllm-ascend 推理速度非常慢 #171

[Misc]: vllm-ascend 推理速度非常慢 #171

ryys1122 commented Feb 26, 2025

caolicaoli commented Mar 5, 2025

ryys1122 commented Mar 6, 2025

[Misc]: vllm-ascend 推理速度非常慢 #171

[Misc]: vllm-ascend 推理速度非常慢 #171

Comments

ryys1122 commented Feb 26, 2025

Anything you want to discuss about vllm on ascend.

caolicaoli commented Mar 5, 2025

ryys1122 commented Mar 6, 2025