v0.7.3 support speculative decoding #252

mengwei805 · 2025-03-06T08:57:03Z

What this PR does / why we need it?

support speculative decoding in Ascend, including speculating with a draft model、by matching n-grams in the prompt、using MLP speculators and using EAGLE based draft models.

Does this PR introduce any user-facing change?

u can refer to https://docs.vllm.ai/en/latest/features/spec_decode.html#

How was this patch tested?

Four modes of speculative decoding have been tested, consistent with GPU devices

Signed-off-by: mengwei805 <[email protected]>

mengwei805 added 3 commits March 6, 2025 08:51

v0.7.3 support speculative decoding

100c2fd

Signed-off-by: mengwei805 <[email protected]>

fix codecheck

eb7c4c7

Signed-off-by: mengwei805 <[email protected]>

fix codecheck

89864f2

Signed-off-by: mengwei805 <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.7.3 support speculative decoding #252

v0.7.3 support speculative decoding #252

mengwei805 commented Mar 6, 2025

v0.7.3 support speculative decoding #252

Are you sure you want to change the base?

v0.7.3 support speculative decoding #252

Conversation

mengwei805 commented Mar 6, 2025

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?