You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We should have a testing script that our future users can use to compare performance with naive VLLM. For this end, we need to write two scripts, test_vllm_with_trace.py and test_lmcache_with_trace.py. test_lmcache_with_trace.py should also contain the setup for the lmcache setting (storage device, size used, model...)
Each file takes in a trace in the below json format: (Output length is the max-output length and if it EOS after the output length, keep inferencing until it meets the length.)
The idea is that given a request trace (ex. request trace received for a service), we will be able to compare performance of different versions of vllm and LMCache.
The text was updated successfully, but these errors were encountered:
We should have a testing script that our future users can use to compare performance with naive VLLM. For this end, we need to write two scripts, test_vllm_with_trace.py and test_lmcache_with_trace.py. test_lmcache_with_trace.py should also contain the setup for the lmcache setting (storage device, size used, model...)
Each file takes in a trace in the below json format: (Output length is the max-output length and if it EOS after the output length, keep inferencing until it meets the length.)
and outputs the following format:
The idea is that given a request trace (ex. request trace received for a service), we will be able to compare performance of different versions of vllm and LMCache.
The text was updated successfully, but these errors were encountered: