Do LLM Cache Support V100 hardware? #791

jlcoo · 2025-03-04T08:09:46Z

I using V100 gpu to testing deploy Distributed KV Cache exmaple, unfortunately it's failed, because requires flash attention backend.

DwyaneShi · 2025-03-04T18:25:05Z

@jlcoo Thanks for trying out the distributed kv cache offloading feature, we will support more attention backends soon, please stay tuned.

jlcoo · 2025-03-05T03:15:35Z

@DwyaneShi Thanks for the update! I’m really looking forward to the support for more attention backends. I’m wondering if the distributed kv cache offloading feature with the support for more attention backends will be available in version 0.3?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Do LLM Cache Support V100 hardware? #791

Do LLM Cache Support V100 hardware? #791

jlcoo commented Mar 4, 2025

DwyaneShi commented Mar 4, 2025

jlcoo commented Mar 5, 2025

Do LLM Cache Support V100 hardware? #791

Do LLM Cache Support V100 hardware? #791

Comments

jlcoo commented Mar 4, 2025

DwyaneShi commented Mar 4, 2025

jlcoo commented Mar 5, 2025