-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[v0.7.1rc1] FAQ & Feedback #19
Comments
Any plans to support qwen2.5-vl? |
@shannanyinxiang According our test, the |
Thank you for your prompt reply! |
方便分享一下qwen2-vl的启动参数吗? |
This comment has been minimized.
This comment has been minimized.
@whu-dft please follow the install guide https://vllm-ascend.readthedocs.io/en/v0.7.1rc1/installation.html
|
Thanks! |
Is there any table comparing vllm-ascend V.S MindIE considering speed, model support, etc ? |
Same as above, need performance of vllm-ascend based on different hardware. We tested both vllm-ascend and mindIE on 910B, seems like the performance of mindIE is better. |
@Infinite666 @sisrfeng Thanks for your feedback. Currently, the performance and accuracy of vLLM on Ascend still need to be improved. We are also working together with the MindIE team to improve it. The first release will be v0.7.3 in 2025 Q1. Therefore, in the short term, we will still focus on the performance improvement of vLLM Ascend, and welcome everyone join us to improve it. |
This comment has been minimized.
This comment has been minimized.
请问,我在部署DeepSeek-R1-Distill-70B模型的时候,启动命令:python3 -m vllm.entrypoints.openai.api_server --model /workspace/models/DeepSeek-R1-Distill-70B/ --tensor-parallel-size 8 报错如下
|
@WWCTF According the log, it's a OOM here, IIUC, please try small one or try to use multi-node: https://vllm-ascend.readthedocs.io/en/latest/tutorials.html#online-serving-on-multi-machine |
实际报错内容如下: root@4ffad0458746:/# python3 -m vllm.entrypoints.openai.api_server --model /workspace/models/DeepSeek-R1-Distill-70B/ --tensor-parallel-size 8 --gpu-memory-utilization 0.95 |
Please leave comments here about your usage of vLLM Ascend Plugin.
Does it work? Does it not work? Which models do you need? Which feature do you need? any bugs?
For in depth discussion, please feel free to join #sig-ascend in the vLLM Slack workspace.
Next RC release: v0.7.3rc1 will ready in early March (2025.03).
FAQ:
1. What devices are currently supported?
Currently, only Atlas A2 series are supported.
2. How to setup dev env, build and test?
Here is a step by step guide for building and testing.
If you just want to install stable vLLM, please refer to: https://vllm-ascend.readthedocs.io/en/latest/installation.html
3. How to do multi node deployment?
You can launch multi-node service with Ray, find more details at our tutorials: Online Serving on Multi Machine.
ray: command not found
: pip install rayfatal error :numa.h:No such file or directory
:yum install numactl-devel
/apt install libnuma-dev
4.
RuntimeError: Failed to infer device type
orImportError:
libatb.so: cannot open shared object file: No such file or directory
.This is usually because of the wrong
torch_npu
version or lack of Ascend CANN NNAL Package.Make sure you install the correct version of
torch_npu
.Install with the specific CANN and NNAL.
The details of
torch_npu
and CANN NNAL could be found at our docs.5. Is Atlas 300 currently supported?
Not supported yet, currently only Atlas A2 series devices supported as shown here.
From a technical view, vllm-ascend support would be possible if the torch-npu is supported. Otherwise, we have to implement it by using custom ops. We are also welcome to join us to improve together.
6. Are Quantization algorithms currently supported?
Not support now, but we will support W8A8 and FA3 quantization algorithms in the future.
7. Inference speed is slow.
Currently, the performance of vLLM on Ascend still need to be improved. We are also working together with the Ascend team to improve it. The first release will be
v0.7.3
in 2025 Q1. Therefore, welcome everyone join us to improve it.8. DeepSeek V3 / R1 related errors.
Known issue will be fixed in vllm-ascend
v0.7.3rc1
(March. 2025) with CANN8.1.RC1.alpha001
(Feb. 2025):AssertionError: Torch not compiled with CUDA enabled.
RuntimeError: GroupTopkOperation CreateOperation failed.
ValueError: Unknown quantization method: ascend.
Find more details in #72, which tracks initial support for the Deepseek V3 model with vllm-ascend.
9. Qwen2-VL / Qwen2.5-VL related errors.
Q1:
Qwen2-VL-72B-Instruct
inference failure:RuntimeError: call aclnnFlashAttentionScore failed
. (#115)This is caused by the inner error of CANN ops, which will be fixed in the next CANN version.
BTW, qwen2 in vllm only works with torch SDPA on non-GPU platform. We'll improve it in vLLM to make it support more backend in the next release. Find more details here.
10
Error: TBE Subprocess Task Distribute Failure When TP>1
(#198)
[ERROR] TBE Subprocess[task_distribute] raise error[], main process disappeared!
It's not that the model wasn't loaded successfully, but that the model wasn't exited successfully. Adding code related to manually cleaning up objects, with reference to the tutorials, can resolve this error.
(Updated on: 2025.03.06)
The text was updated successfully, but these errors were encountered: