[Bug]: Memory Leak or Abnormal Memory Increase When Deploying Fine-Tuned Qwen2VL-72B Model with vLLM Serve #216

XuyaoWang · 2025-03-02T11:46:27Z

Your current environment

The output of `npu-smi info`

root@1c518a2e9ee2:/workspace# npu-smi info
+------------------------------------------------------------------------------------------------+
| npu-smi 23.0.7                   Version: 23.0.7                                               |
+---------------------------+---------------+----------------------------------------------------+
| NPU   Name                | Health        | Power(W)    Temp(C)           Hugepages-Usage(page)|
| Chip                      | Bus-Id        | AICore(%)   Memory-Usage(MB)  HBM-Usage(MB)        |
+===========================+===============+====================================================+
| 0     910B3               | OK            | 102.8       55                0    / 0             |
| 0                         | 0000:C1:00.0  | 0           0    / 0          3338 / 65536         |
+===========================+===============+====================================================+
| 1     910B3               | OK            | 97.6        54                0    / 0             |
| 0                         | 0000:C2:00.0  | 0           0    / 0          3337 / 65536         |
+===========================+===============+====================================================+
| 2     910B3               | OK            | 96.5        56                0    / 0             |
| 0                         | 0000:81:00.0  | 0           0    / 0          3334 / 65536         |
+===========================+===============+====================================================+
| 3     910B3               | OK            | 108.7       57                0    / 0             |
| 0                         | 0000:82:00.0  | 0           0    / 0          3334 / 65536         |
+===========================+===============+====================================================+
| 4     910B3               | OK            | 103.9       58                0    / 0             |
| 0                         | 0000:01:00.0  | 0           0    / 0          3334 / 65536         |
+===========================+===============+====================================================+
| 5     910B3               | OK            | 99.3        56                0    / 0             |
| 0                         | 0000:02:00.0  | 0           0    / 0          3334 / 65536         |
+===========================+===============+====================================================+
| 6     910B3               | OK            | 107.7       58                0    / 0             |
| 0                         | 0000:41:00.0  | 0           0    / 0          3334 / 65536         |
+===========================+===============+====================================================+
| 7     910B3               | OK            | 105.1       59                0    / 0             |
| 0                         | 0000:42:00.0  | 0           0    / 0          3338 / 65536         |
+===========================+===============+====================================================+
+---------------------------+---------------+----------------------------------------------------+
| NPU     Chip              | Process id    | Process name             | Process memory(MB)      |
+===========================+===============+====================================================+
| No running processes found in NPU 0                                                            |
+===========================+===============+====================================================+
| No running processes found in NPU 1                                                            |
+===========================+===============+====================================================+
| No running processes found in NPU 2                                                            |
+===========================+===============+====================================================+
| No running processes found in NPU 3                                                            |
+===========================+===============+====================================================+
| No running processes found in NPU 4                                                            |
+===========================+===============+====================================================+
| No running processes found in NPU 5                                                            |
+===========================+===============+====================================================+
| No running processes found in NPU 6                                                            |
+===========================+===============+====================================================+
| No running processes found in NPU 7                                                            |
+===========================+===============+====================================================+

The output of `cat /usr/local/Ascend/ascend-toolkit/latest/"$(uname -i)"-linux/ascend_toolkit_install.info`

root@1c518a2e9ee2:/workspace# cat /usr/local/Ascend/ascend-toolkit/latest/"$(uname -i)"-linux/ascend_toolkit_install.info
package_name=Ascend-cann-toolkit
version=8.0.0
innerversion=V100R001C20SPC001B251
compatible_version=[V100R001C15],[V100R001C17],[V100R001C18],[V100R001C19],[V100R001C20]
arch=aarch64
os=linux
path=/usr/local/Ascend/ascend-toolkit/8.0.0/aarch64-linux

The output of `python collect_env.py`

root@1c518a2e9ee2:/workspace# python collect_env.py
INFO 02-27 17:19:50 __init__.py:28] Available plugins for group vllm.platform_plugins:
INFO 02-27 17:19:50 __init__.py:30] name=ascend, value=vllm_ascend:register
INFO 02-27 17:19:50 __init__.py:32] all available plugins for group vllm.platform_plugins will be loaded.
INFO 02-27 17:19:50 __init__.py:34] set environment variable VLLM_PLUGINS to control which plugins to load.
INFO 02-27 17:19:50 __init__.py:42] plugin ascend loaded.
INFO 02-27 17:19:51 __init__.py:28] Available plugins for group vllm.platform_plugins:
INFO 02-27 17:19:51 __init__.py:30] name=ascend, value=vllm_ascend:register
INFO 02-27 17:19:51 __init__.py:32] all available plugins for group vllm.platform_plugins will be loaded.
INFO 02-27 17:19:51 __init__.py:34] set environment variable VLLM_PLUGINS to control which plugins to load.
INFO 02-27 17:19:51 __init__.py:42] plugin ascend loaded.
INFO 02-27 17:19:51 __init__.py:187] No platform detected, vLLM is running on UnspecifiedPlatform
WARNING 02-27 17:19:51 _custom_ops.py:19] Failed to import from vllm._C with ModuleNotFoundError("No module named 'vllm._C'")
INFO 02-27 17:19:51 __init__.py:174] Platform plugin ascend is activated
Collecting environment information...
PyTorch version: 2.5.1
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A

OS: Ubuntu 22.04.5 LTS (aarch64)
GCC version: (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
Clang version: Could not collect
CMake version: Could not collect
Libc version: glibc-2.35

Python version: 3.10.15 (main, Nov 27 2024, 06:51:55) [GCC 11.4.0] (64-bit runtime)
Python platform: Linux-5.10.0-60.139.0.166.oe2203.aarch64-aarch64-with-glibc2.35
Is CUDA available: False
CUDA runtime version: No CUDA
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Architecture:                       aarch64
CPU op-mode(s):                     64-bit
Byte Order:                         Little Endian
CPU(s):                             192
On-line CPU(s) list:                0-191
Vendor ID:                          HiSilicon
Model name:                         Kunpeng-920
Model:                              0
Thread(s) per core:                 1
Core(s) per cluster:                48
Socket(s):                          -
Cluster(s):                         4
Stepping:                           0x1
Frequency boost:                    disabled
CPU max MHz:                        2600.0000
CPU min MHz:                        200.0000
BogoMIPS:                           200.00
Flags:                              fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma dcpop asimddp asimdfhm ssbs
L1d cache:                          12 MiB (192 instances)
L1i cache:                          12 MiB (192 instances)
L2 cache:                           96 MiB (192 instances)
L3 cache:                           192 MiB (8 instances)
NUMA node(s):                       8
NUMA node0 CPU(s):                  0-23
NUMA node1 CPU(s):                  24-47
NUMA node2 CPU(s):                  48-71
NUMA node3 CPU(s):                  72-95
NUMA node4 CPU(s):                  96-119
NUMA node5 CPU(s):                  120-143
NUMA node6 CPU(s):                  144-167
NUMA node7 CPU(s):                  168-191
Vulnerability Gather data sampling: Not affected
Vulnerability Itlb multihit:        Not affected
Vulnerability L1tf:                 Not affected
Vulnerability Mds:                  Not affected
Vulnerability Meltdown:             Not affected
Vulnerability Mmio stale data:      Not affected
Vulnerability Retbleed:             Not affected
Vulnerability Spec store bypass:    Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:           Mitigation; __user pointer sanitization
Vulnerability Spectre v2:           Not affected
Vulnerability Srbds:                Not affected
Vulnerability Tsx async abort:      Not affected

Versions of relevant libraries:
[pip3] numpy==1.26.4
[pip3] pyzmq==26.2.1
[pip3] torch==2.5.1
[pip3] torch-npu==2.5.1.dev20250218
[pip3] torchaudio==2.5.1
[pip3] torchvision==0.20.1
[pip3] transformers==4.49.0
[conda] Could not collect
ROCM Version: Could not collect
Neuron SDK Version: N/A
vLLM Version: 0.7.1
vLLM Build Flags:
CUDA Archs: Not Set; ROCm: Disabled; Neuron: Disabled
GPU Topology:
Could not collect

LD_LIBRARY_PATH=/usr/local/python3.10/lib/python3.10/site-packages/cv2/../../lib64:/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_0/lib:/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_0/examples:/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_0/tests/atbopstest:/usr/local/Ascend/ascend-toolkit/latest/tools/aml/lib64:/usr/local/Ascend/ascend-toolkit/latest/tools/aml/lib64/plugin:/usr/local/Ascend/ascend-toolkit/latest/lib64:/usr/local/Ascend/ascend-toolkit/latest/lib64/plugin/opskernel:/usr/local/Ascend/ascend-toolkit/latest/lib64/plugin/nnengine:/usr/local/Ascend/ascend-toolkit/latest/opp/built-in/op_impl/ai_core/tbe/op_tiling/lib/linux/aarch64:/usr/local/Ascend/driver/lib64/common:/usr/local/Ascend/driver/lib64/driver:/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_0/lib:/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_0/examples:/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_0/tests/atbopstest:/usr/local/Ascend/ascend-toolkit/latest/tools/aml/lib64:/usr/local/Ascend/ascend-toolkit/latest/tools/aml/lib64/plugin:/usr/local/Ascend/ascend-toolkit/latest/lib64:/usr/local/Ascend/ascend-toolkit/latest/lib64/plugin/opskernel:/usr/local/Ascend/ascend-toolkit/latest/lib64/plugin/nnengine:/usr/local/Ascend/ascend-toolkit/latest/opp/built-in/op_impl/ai_core/tbe/op_tiling/lib/linux/aarch64:/usr/local/Ascend/driver/lib64/common:/usr/local/Ascend/driver/lib64/driver:
NCCL_CUMEM_ENABLE=0
TORCHINDUCTOR_COMPILE_THREADS=1

🐛 Describe the bug

When deploying the fine-tuned Qwen2VL-72B model using vllm serve, the memory usage will abnormally increase after the model is loaded. Once it consumes all the memory on the server, it throws an error and exits.

The original serving script is:

export ASCEND_RT_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
source /usr/local/Ascend/ascend-toolkit/set_env.sh

vllm serve <path_to_finetuned_qwen2vl-72b> \
--served-model-name finetuned_qwen2vl-72b \
--tensor-parallel-size 8 \
--distributed_executor_backend "mp"

When the program hangs, the loading interface, memory usage, and GPU memory usage are as follows:

The error message when the program crashes:

error_message.log

The debugging script is:

export VLLM_LOGGING_LEVEL=DEBUG
export VLLM_TRACE_FUNCTION=1
export ASCEND_RT_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
source /usr/local/Ascend/ascend-toolkit/set_env.sh

vllm serve <path_to_finetuned_qwen2vl-72b> \
--served-model-name finetuned_qwen2vl-72b \
--tensor-parallel-size 8 \
--distributed_executor_backend "mp"

The detailed log is:

ascend_detailed_log.log

The text was updated successfully, but these errors were encountered:

XuyaoWang added the bug Something isn't working label Mar 2, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: Memory Leak or Abnormal Memory Increase When Deploying Fine-Tuned Qwen2VL-72B Model with vLLM Serve #216

[Bug]: Memory Leak or Abnormal Memory Increase When Deploying Fine-Tuned Qwen2VL-72B Model with vLLM Serve #216

XuyaoWang commented Mar 2, 2025

[Bug]: Memory Leak or Abnormal Memory Increase When Deploying Fine-Tuned Qwen2VL-72B Model with vLLM Serve #216

[Bug]: Memory Leak or Abnormal Memory Increase When Deploying Fine-Tuned Qwen2VL-72B Model with vLLM Serve #216

Comments

XuyaoWang commented Mar 2, 2025

Your current environment

🐛 Describe the bug