You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I ran two experiments to evaluate the optimization effectiveness:
Using examples/blend_kv/blend_kv.py.
Running the full prefill directly with vLLM.
In both cases, I found the optimization results were not significant. However, when using the original CacheBlend repository code, the results were excellent under the same dataset and setup.
What could be the reasons for this discrepancy? Are there any additional configurations or dependencies in blend_kv.py that I might have missed?
Looking forward to your insights!
The text was updated successfully, but these errors were encountered:
Description:
I ran two experiments to evaluate the optimization effectiveness:
Using examples/blend_kv/blend_kv.py.
Running the full prefill directly with vLLM.
In both cases, I found the optimization results were not significant. However, when using the original CacheBlend repository code, the results were excellent under the same dataset and setup.
What could be the reasons for this discrepancy? Are there any additional configurations or dependencies in blend_kv.py that I might have missed?
Looking forward to your insights!
The text was updated successfully, but these errors were encountered: