You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
root@1c518a2e9ee2:/workspace# python collect_env.py
INFO 02-27 17:19:50 __init__.py:28] Available plugins for group vllm.platform_plugins:
INFO 02-27 17:19:50 __init__.py:30] name=ascend, value=vllm_ascend:register
INFO 02-27 17:19:50 __init__.py:32] all available plugins for group vllm.platform_plugins will be loaded.
INFO 02-27 17:19:50 __init__.py:34] set environment variable VLLM_PLUGINS to control which plugins to load.
INFO 02-27 17:19:50 __init__.py:42] plugin ascend loaded.
INFO 02-27 17:19:51 __init__.py:28] Available plugins for group vllm.platform_plugins:
INFO 02-27 17:19:51 __init__.py:30] name=ascend, value=vllm_ascend:register
INFO 02-27 17:19:51 __init__.py:32] all available plugins for group vllm.platform_plugins will be loaded.
INFO 02-27 17:19:51 __init__.py:34] set environment variable VLLM_PLUGINS to control which plugins to load.
INFO 02-27 17:19:51 __init__.py:42] plugin ascend loaded.
INFO 02-27 17:19:51 __init__.py:187] No platform detected, vLLM is running on UnspecifiedPlatform
WARNING 02-27 17:19:51 _custom_ops.py:19] Failed to import from vllm._C with ModuleNotFoundError("No module named 'vllm._C'")
INFO 02-27 17:19:51 __init__.py:174] Platform plugin ascend is activated
Collecting environment information...
PyTorch version: 2.5.1
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A
OS: Ubuntu 22.04.5 LTS (aarch64)
GCC version: (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
Clang version: Could not collect
CMake version: Could not collect
Libc version: glibc-2.35
Python version: 3.10.15 (main, Nov 27 2024, 06:51:55) [GCC 11.4.0] (64-bit runtime)
Python platform: Linux-5.10.0-60.139.0.166.oe2203.aarch64-aarch64-with-glibc2.35
Is CUDA available: False
CUDA runtime version: No CUDA
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
CPU:
Architecture: aarch64
CPU op-mode(s): 64-bit
Byte Order: Little Endian
CPU(s): 192
On-line CPU(s) list: 0-191
Vendor ID: HiSilicon
Model name: Kunpeng-920
Model: 0
Thread(s) per core: 1
Core(s) per cluster: 48
Socket(s): -
Cluster(s): 4
Stepping: 0x1
Frequency boost: disabled
CPU max MHz: 2600.0000
CPU min MHz: 200.0000
BogoMIPS: 200.00
Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma dcpop asimddp asimdfhm ssbs
L1d cache: 12 MiB (192 instances)
L1i cache: 12 MiB (192 instances)
L2 cache: 96 MiB (192 instances)
L3 cache: 192 MiB (8 instances)
NUMA node(s): 8
NUMA node0 CPU(s): 0-23
NUMA node1 CPU(s): 24-47
NUMA node2 CPU(s): 48-71
NUMA node3 CPU(s): 72-95
NUMA node4 CPU(s): 96-119
NUMA node5 CPU(s): 120-143
NUMA node6 CPU(s): 144-167
NUMA node7 CPU(s): 168-191
Vulnerability Gather data sampling: Not affected
Vulnerability Itlb multihit: Not affected
Vulnerability L1tf: Not affected
Vulnerability Mds: Not affected
Vulnerability Meltdown: Not affected
Vulnerability Mmio stale data: Not affected
Vulnerability Retbleed: Not affected
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1: Mitigation; __user pointer sanitization
Vulnerability Spectre v2: Not affected
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Not affected
Versions of relevant libraries:
[pip3] numpy==1.26.4
[pip3] pyzmq==26.2.1
[pip3] torch==2.5.1
[pip3] torch-npu==2.5.1.dev20250218
[pip3] torchaudio==2.5.1
[pip3] torchvision==0.20.1
[pip3] transformers==4.49.0
[conda] Could not collect
ROCM Version: Could not collect
Neuron SDK Version: N/A
vLLM Version: 0.7.1
vLLM Build Flags:
CUDA Archs: Not Set; ROCm: Disabled; Neuron: Disabled
GPU Topology:
Could not collect
LD_LIBRARY_PATH=/usr/local/python3.10/lib/python3.10/site-packages/cv2/../../lib64:/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_0/lib:/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_0/examples:/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_0/tests/atbopstest:/usr/local/Ascend/ascend-toolkit/latest/tools/aml/lib64:/usr/local/Ascend/ascend-toolkit/latest/tools/aml/lib64/plugin:/usr/local/Ascend/ascend-toolkit/latest/lib64:/usr/local/Ascend/ascend-toolkit/latest/lib64/plugin/opskernel:/usr/local/Ascend/ascend-toolkit/latest/lib64/plugin/nnengine:/usr/local/Ascend/ascend-toolkit/latest/opp/built-in/op_impl/ai_core/tbe/op_tiling/lib/linux/aarch64:/usr/local/Ascend/driver/lib64/common:/usr/local/Ascend/driver/lib64/driver:/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_0/lib:/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_0/examples:/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_0/tests/atbopstest:/usr/local/Ascend/ascend-toolkit/latest/tools/aml/lib64:/usr/local/Ascend/ascend-toolkit/latest/tools/aml/lib64/plugin:/usr/local/Ascend/ascend-toolkit/latest/lib64:/usr/local/Ascend/ascend-toolkit/latest/lib64/plugin/opskernel:/usr/local/Ascend/ascend-toolkit/latest/lib64/plugin/nnengine:/usr/local/Ascend/ascend-toolkit/latest/opp/built-in/op_impl/ai_core/tbe/op_tiling/lib/linux/aarch64:/usr/local/Ascend/driver/lib64/common:/usr/local/Ascend/driver/lib64/driver:
NCCL_CUMEM_ENABLE=0
TORCHINDUCTOR_COMPILE_THREADS=1
🐛 Describe the bug
When deploying the fine-tuned Qwen2VL-72B model using vllm serve, the memory usage will abnormally increase after the model is loaded. Once it consumes all the memory on the server, it throws an error and exits.
Your current environment
The output of `npu-smi info`
The output of `cat /usr/local/Ascend/ascend-toolkit/latest/"$(uname -i)"-linux/ascend_toolkit_install.info`
The output of `python collect_env.py`
🐛 Describe the bug
When deploying the fine-tuned Qwen2VL-72B model using vllm serve, the memory usage will abnormally increase after the model is loaded. Once it consumes all the memory on the server, it throws an error and exits.
The original serving script is:
When the program hangs, the loading interface, memory usage, and GPU memory usage are as follows:
The error message when the program crashes:
error_message.log
The debugging script is:
The detailed log is:
ascend_detailed_log.log
The text was updated successfully, but these errors were encountered: