You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have quantized openai/whisper-medium model with W4A16 GPTQ method having 2.8GB of size to 640MB. But I am facing the issue when I load two quantized model simultaneously it utilize almost 6GB of the VRAM. Can anyone know why this utilize too much memory.
I have quantized openai/whisper-medium model with W4A16 GPTQ method having 2.8GB of size to 640MB. But I am facing the issue when I load two quantized model simultaneously it utilize almost 6GB of the VRAM. Can anyone know why this utilize too much memory.
#loading model
llm = LLM(
model="Bakht123/whisper-medium-gptq-W4A16-G128",
max_model_len=448,
max_num_seqs=64,
limit_mm_per_prompt={"audio": 1},
)
llm1 = LLM(
model="Bakht123/whisper-medium-gptq-W4A16-G128",
max_model_len=448,
max_num_seqs=64,
limit_mm_per_prompt={"audio": 1},
)
#memory utilization
The text was updated successfully, but these errors were encountered: