[New Model]: DeepSeek V3 / R1 #72

Yikun · 2025-02-17T07:48:09Z

This issue tracks initial support for the Deepseek V3 model with vllm-ascend:

https://huggingface.co/deepseek-ai/DeepSeek-R1
https://huggingface.co/deepseek-ai/DeepSeek-V3

cc @wangxiyuan feel free to update any investigations

Yikun · 2025-02-18T12:31:04Z

For v0.7.1-dev: #68 #88

update (2025.02.19): #88 merged to v0.7.1-dev, DeepSeek test passed (via DeepSeek-V2-Lite), V3 arch same as V2 should also work, will backport to main soon.

Here is the note for DeepSeek-V2-Lite deploy: https://vllm-ascend.readthedocs.io/en/latest/tutorials.html#online-serving-on-multi-machine

update (2025.02.22) DeepSeek V3 / R1 support will be ready in next RC release of vLLM Ascend (v0.7.3rc1) in the early of 2025.03

Known issue will be fixed in vllm-ascend v0.7.3rc1 (March. 2025) with CANN 8.1.RC1.alpha001 (March. 2025):

AssertionError: Torch not compiled with CUDA enabled

Issue link: DeepSeek-R1 on 0.7.1-dev with Torch not compiled with CUDA enabled #122 (comment)

Workaround: This is because in the code of the vllm community, specifically in the file vllm/vllm/model_executor/layers/rotary_embedding.py, the device is hard-coded as 'cuda'. We can choose to manually replace these occurrences of 'cuda' with 'npu' or add "from torch_npu.contrib import transfer_to_npu" at the beginning of the script.
Fixed by:
- vLLM PR (work in v0.7.4): [model][refactor] remove cuda hard code in models and layers vllm#13658
- vLLM Ascend workarouind PR (work in v0.7.3-dev): [BugFix] Add transfer_to_npu in worker.py to replace hard-code 'cuda' in vllm. #228
w8a8 quantization is unspported yet
ValueError: Unknown quantization method: ascend. Must be one of ['aqlm', 'awq', 'deepspeedfp', 'tpu_int8', 'fp8', 'fbgemm_fp8', 'modelopt', 'marlin', 'gguf', 'gptq_marlin_24', 'gptq_marlin', 'awq_marlin', 'gptq', 'compressed-tensors', 'bitsandbytes', 'qqq', 'hqq', 'experts_int8', 'neuron_quant', 'ipex', 'quark', 'moe_wna16'].

Issue Link: Quantization error while running Deepseek-V3-w8a8 #119

Workaround: don't use quantization, and wait for next final release (late of 2025.03)
Quantization is unspported yet
KeyError: 'model.layers.0.self_attn.q_a_proj.weight'
issue: DeepSeek-R1 on 0.7.1-dev with Torch not compiled with CUDA enabled #122 (comment)
Wrokaround: Remove https://huggingface.co/deepseek-ai/DeepSeek-R1/blob/main/config.json#L39-L47
RuntimeError: GroupTopkOperation CreateOperation failed

Workaround: This is caused by the inner ops in CANN, will fixed in next RC release of vLLM Ascend (v0.7.3rc1) in the early of 2025.03. Need bump CANN version to CANN 8.1.RC1.alpha001 (will public publish at the ~~end of Feb. 2025~~ March.2025)

Will be fixed by: [Misc]: Bump CANN version to CANN 8.1.RC1.alpha001 #142

Workaround: [Fix] Remove npu_group_topk before CANN version update #242

update (2025.03.05) we are still waiting for CANN 8.1.RC1.alpha001 release.: https://www.hiascend.com/zh/developer/download/community/result?module=cann

caolicaoli · 2025-03-04T03:14:35Z

非常好

staugust · 2025-03-06T08:17:17Z

It seems like torch_npu in docker image quay.io/ascend/vllm-ascend:v0.7.1rc1 is a dev version. The commit id is not found in torch_npu repo. I'm wondering whether I missed something. Could you please help me to find the source code of torch_npu in the docker image? Thanks in advance. @Yikun

>>> torch_npu.version.git_version
'0e8c5249aacfcf94f3d61c6ff0938fadada1cc6a'
>>> torch_npu.version.__version__
'2.5.1.dev20250218'

wangxiyuan · 2025-03-06T08:52:11Z

@staugust sorry, this torch-npu used by 0.7.1rc1 is a private version which source code is not merged to main branch. vllm-ascend will rely on a official/open source version of torch-npu in the next release.

staugust · 2025-03-06T09:05:33Z

@staugust sorry, this torch-npu used by 0.7.1rc1 is a private version which source code is not merged to main branch. vllm-ascend will rely on a official/open source version of torch-npu in the next release.

Is https://github.com/Ascend/pytorch the official/open source repository of torch-npu? By the way, which branch is the developing branch for next release? I'm working on on-demand profiling, and found that calling localhost:8000/start_profile and localhost:8000/stop_profile blocks vllm inference. Is there any plan to make profiling available for production environment?

wangxiyuan · 2025-03-06T09:26:27Z

@staugust it's here: https://gitee.com/ascend/pytorch I have no idea about its branch policy, you can ask the release things there.

I assume profiling works. @Potabk Please take a look at the profile problem mentioned by @staugust

This was referenced Feb 17, 2025

vLLM Ascend Roadmap Q1 2025 #71

Open

Does this project support the deployment of deepseek-v3 and deepseek-r1 on Ascend? #39

Closed

Yikun assigned wangxiyuan Feb 17, 2025

Yikun added the new model label Feb 17, 2025

Yikun mentioned this issue Feb 26, 2025

[v0.7.1rc1] FAQ & Feedback #19

Open

MengqingCao mentioned this issue Feb 26, 2025

[Bug]: Qwen/Qwen1.5-MoE-A2.7B-Chat模型启动报错 #176

Closed

ApsarasX mentioned this issue Feb 27, 2025

[Usage]: Checkpoint loading error when running Deepseek-V3/R1 #183

Closed

Yikun mentioned this issue Feb 28, 2025

[Usage]: 使用0.7.1rc1四机推理Deepseek-V3报错，RuntimeError: GroupTopkOperation CreateOperation failed! #206

Open

Yikun mentioned this issue Mar 5, 2025

DeepSeek-R1 on 0.7.1-dev with Torch not compiled with CUDA enabled #122

Open

Yikun pinned this issue Mar 5, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[New Model]: DeepSeek V3 / R1 #72

[New Model]: DeepSeek V3 / R1 #72

Yikun commented Feb 17, 2025 •

edited

Loading

Yikun commented Feb 18, 2025 •

edited

Loading

caolicaoli commented Mar 4, 2025

staugust commented Mar 6, 2025

wangxiyuan commented Mar 6, 2025

staugust commented Mar 6, 2025

wangxiyuan commented Mar 6, 2025

[New Model]: DeepSeek V3 / R1 #72

[New Model]: DeepSeek V3 / R1 #72

Comments

Yikun commented Feb 17, 2025 • edited Loading

Yikun commented Feb 18, 2025 • edited Loading

caolicaoli commented Mar 4, 2025

staugust commented Mar 6, 2025

wangxiyuan commented Mar 6, 2025

staugust commented Mar 6, 2025

wangxiyuan commented Mar 6, 2025

Yikun commented Feb 17, 2025 •

edited

Loading

Yikun commented Feb 18, 2025 •

edited

Loading