[Hardware][Ascend]MLA for deepseek #88

SidaoY · 2025-02-18T09:18:00Z

What this PR does / why we need it?

To adapt to the MLA structure of vLLM DeepSeek on Ascend hardware, write the AscendMLAAttentionBackendImpl class.

Does this PR introduce any user-facing change?

Users can choose to set VLLM_MLA_DISABLE to 1 or 0 to disable or enable MLA.

How was this patch tested?

vllm_ascend/platform.py

ganyi1996ppo · 2025-02-18T11:11:22Z

vllm_ascend/attention.py

+            kv_b_proj_weight = self.kv_b_proj.weight.reshape(self.num_heads,
+                                                   self.qk_nope_head_dim + self.v_head_dim,
+                                                   self.kv_lora_rank)
+            w_kc = kv_b_proj_weight[:, :self.qk_nope_head_dim, :].contiguous()


Should modify model loader to enable this on intialize stage

Can workaround this by set this weight as attribute during runtime, this way only do slice + contiguous one time compared with this version

ganyi1996ppo · 2025-02-18T11:21:46Z

vllm_ascend/attention.py

+                                         compressType=0, calcType=0, scaleType=0, quantType=0,
+                                         inputLayout=0, outDataType=-1, attnOut=attn_output) 
+            attn_output_t = torch_npu.npu_transpose(attn_output, (1, 0, 2), require_contiguous=True)
+            attn_output_t = torch_npu.npu_bmmV2(attn_output_t, w_vc, [])


torch.bmm can do the same maybe.

vllm_ascend/attention.py

ganyi1996ppo · 2025-02-18T13:00:58Z

vllm_ascend/attention.py

+            kv = self.kv_b_proj(kv_c_normed)[0].view(num_tokens, kv_heads_num, -1)
+            k_nope, value = kv.split([self.qk_nope_head_dim, self.v_head_dim], dim=-1)
+            k_cache = torch.cat([kv_c_normed.view(num_tokens, self.num_kv_heads, -1), k_pe], dim=2)
+            k_pe = k_pe.repeat(1, self.num_heads, 1)


Can you test torch.expand here? which dose not touch global memory compared with repeat.

Signed-off-by: YHT <[email protected]>

### What this PR does / why we need it?  To adapt to the MLA structure of vLLM DeepSeek on Ascend hardware, write the AscendMLAAttentionBackendImpl class. ### Does this PR introduce _any_ user-facing change?  Users can choose to set VLLM_MLA_DISABLE to 1 or 0 to disable or enable MLA. ### How was this patch tested?  Signed-off-by: YHT <[email protected]> Co-authored-by: YHT <[email protected]> Signed-off-by: angazenn <[email protected]>

wangxiyuan requested changes Feb 18, 2025

View reviewed changes

vllm_ascend/platform.py Show resolved Hide resolved

Yikun mentioned this pull request Feb 18, 2025

reset default block_size from 16 to 128 #84

Merged

ganyi1996ppo reviewed Feb 18, 2025

View reviewed changes

vllm_ascend/attention.py Show resolved Hide resolved

Yikun mentioned this pull request Feb 18, 2025

[New Model]: DeepSeek V3 / R1 #72

Open

SidaoY force-pushed the MLA branch from ee4e038 to bc0caaa Compare February 18, 2025 12:57

ganyi1996ppo reviewed Feb 18, 2025

View reviewed changes

feat: MLA for deepseek

7ca3fac

Signed-off-by: YHT <[email protected]>

SidaoY force-pushed the MLA branch from bc0caaa to 7ca3fac Compare February 18, 2025 13:54

ganyi1996ppo merged commit 2e15eb3 into vllm-project:v0.7.1-dev Feb 18, 2025
3 checks passed

Yikun mentioned this pull request Feb 18, 2025

[Doc] Update doc to work with release #85

Merged

SidaoY deleted the MLA branch February 19, 2025 07:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Hardware][Ascend]MLA for deepseek #88

[Hardware][Ascend]MLA for deepseek #88

SidaoY commented Feb 18, 2025 •

edited

Loading

ganyi1996ppo Feb 18, 2025 •

edited

Loading

ganyi1996ppo Feb 18, 2025 •

edited

Loading

ganyi1996ppo Feb 18, 2025 •

edited

Loading

ganyi1996ppo Feb 18, 2025

[Hardware][Ascend]MLA for deepseek #88

[Hardware][Ascend]MLA for deepseek #88

Conversation

SidaoY commented Feb 18, 2025 • edited Loading

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

ganyi1996ppo Feb 18, 2025 • edited Loading

Choose a reason for hiding this comment

ganyi1996ppo Feb 18, 2025 • edited Loading

Choose a reason for hiding this comment

ganyi1996ppo Feb 18, 2025 • edited Loading

Choose a reason for hiding this comment

ganyi1996ppo Feb 18, 2025

Choose a reason for hiding this comment

SidaoY commented Feb 18, 2025 •

edited

Loading

ganyi1996ppo Feb 18, 2025 •

edited

Loading

ganyi1996ppo Feb 18, 2025 •

edited

Loading

ganyi1996ppo Feb 18, 2025 •

edited

Loading