Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: ImportError: libatb.so: cannot open shared object file: No such file or directory #152

Open
phellonchen opened this issue Feb 24, 2025 · 6 comments
Labels
question Further information is requested

Comments

@phellonchen
Copy link

Your current environment

npu-smi info
+------------------------------------------------------------------------------------------------+
| npu-smi 24.1.rc3 Version: 24.1.rc3 |
+---------------------------+---------------+----------------------------------------------------+
| NPU Name | Health | Power(W) Temp(C) Hugepages-Usage(page)|
| Chip | Bus-Id | AICore(%) Memory-Usage(MB) HBM-Usage(MB) |
+===========================+===============+====================================================+
| 0 910B1 | OK | 94.6 49 0 / 0 |
| 0 | 0000:C1:00.0 | 0 0 / 0 3373 / 65536 |
+===========================+===============+====================================================+
| 1 910B1 | OK | 97.8 48 0 / 0 |
| 0 | 0000:01:00.0 | 0 0 / 0 3367 / 65536 |
+===========================+===============+====================================================+
| 2 910B1 | OK | 93.0 48 0 / 0 |
| 0 | 0000:C2:00.0 | 0 0 / 0 3371 / 65536 |
+===========================+===============+====================================================+
| 3 910B1 | OK | 98.0 48 0 / 0 |
| 0 | 0000:02:00.0 | 0 0 / 0 3367 / 65536 |
+===========================+===============+====================================================+
| 4 910B1 | OK | 100.6 48 0 / 0 |
| 0 | 0000:81:00.0 | 0 0 / 0 3371 / 65536 |
+===========================+===============+====================================================+
| 5 910B1 | OK | 101.3 49 0 / 0 |
| 0 | 0000:41:00.0 | 0 0 / 0 3369 / 65536 |
+===========================+===============+====================================================+
| 6 910B1 | OK | 100.3 49 0 / 0 |
| 0 | 0000:82:00.0 | 0 0 / 0 3369 / 65536 |
+===========================+===============+====================================================+
| 7 910B1 | OK | 94.9 49 0 / 0 |
| 0 | 0000:42:00.0 | 0 0 / 0 3369 / 65536 |
+===========================+===============+====================================================+
+---------------------------+---------------+----------------------------------------------------+
| NPU Chip | Process id | Process name | Process memory(MB) |
+===========================+===============+====================================================+
| No running processes found in NPU 0 |
+===========================+===============+====================================================+
| No running processes found in NPU 1 |
+===========================+===============+====================================================+
| No running processes found in NPU 2 |
+===========================+===============+====================================================+
| No running processes found in NPU 3 |
+===========================+===============+====================================================+
| No running processes found in NPU 4 |
+===========================+===============+====================================================+
| No running processes found in NPU 5 |
+===========================+===============+====================================================+
| No running processes found in NPU 6 |
+===========================+===============+====================================================+
| No running processes found in NPU 7 |
+===========================+===============+====================================================+

cat /usr/local/Ascend/ascend-toolkit/latest/"$(uname -i)"-linux/ascend_toolkit_install.info
package_name=Ascend-cann-toolkit
version=8.0.RC1
innerversion=V100R001C17SPC001B240
compatible_version=[V100R001C15,V100R001C18],[V100R001C30],[V100R001C13],[V100R003C11],[V100R001C29],[V100R001C10]
arch=aarch64
os=linux
path=/usr/local/Ascend/ascend-toolkit/8.0.RC1/aarch64-linux

torch 2.5.1
torch-npu 2.5.1.dev20250218
vllm 0.7.1+empty
vllm-ascend 0.7.1rc1

🐛 Describe the bug

No module named 'vllm._version'
from vllm.version import version as VLLM_VERSION
INFO 02-24 16:11:22 init.py:28] Available plugins for group vllm.platform_plugins:
INFO 02-24 16:11:22 init.py:30] name=ascend, value=vllm_ascend:register
INFO 02-24 16:11:22 init.py:32] all available plugins for group vllm.platform_plugins will be loaded.
INFO 02-24 16:11:22 init.py:34] set environment variable VLLM_PLUGINS to control which plugins to load.
INFO 02-24 16:11:22 init.py:42] plugin ascend loaded.
INFO 02-24 16:11:22 init.py:187] No platform detected, vLLM is running on UnspecifiedPlatform
Traceback (most recent call last):
File "/home/ma-user/anaconda3/envs/PyTorch-2.1.0/lib/python3.9/site-packages/torch_npu/init.py", line 17, in
import torch_npu.npu
File "/home/ma-user/anaconda3/envs/PyTorch-2.1.0/lib/python3.9/site-packages/torch_npu/npu/init.py", line 114, in
from torch_npu.utils import _should_print_warning
File "/home/ma-user/anaconda3/envs/PyTorch-2.1.0/lib/python3.9/site-packages/torch_npu/utils/init.py", line 1, in
from torch_npu import _C
ImportError: libatb.so: cannot open shared object file: No such file or directory

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/ma-user/work/algorithm/llama_factory_code/vllm_npus/vllm/vllm_test.py", line 2, in
import torch_npu
File "/home/ma-user/anaconda3/envs/PyTorch-2.1.0/lib/python3.9/site-packages/torch_npu/init.py", line 19, in
from torch_npu.utils._error_code import ErrCode, pta_error
File "/home/ma-user/anaconda3/envs/PyTorch-2.1.0/lib/python3.9/site-packages/torch_npu/utils/init.py", line 1, in
from torch_npu import _C
ImportError: libatb.so: cannot open shared object file: No such file or directory

@phellonchen phellonchen added the bug Something isn't working label Feb 24, 2025
@wangxiyuan
Copy link
Collaborator

follow the doc to install the corrent vllm-ascend and torch-npu please https://vllm-ascend.readthedocs.io/en/latest/installation.html

@wangxiyuan wangxiyuan added question Further information is requested and removed bug Something isn't working labels Feb 24, 2025
@phellonchen
Copy link
Author

follow the doc to install the corrent vllm-ascend and torch-npu please https://vllm-ascend.readthedocs.io/en/latest/installation.html

There were no errors during installation, but the error occurred when running example. py

# Install vllm from source
git clone  --depth 1 --branch v0.7.1 https://github.com/vllm-project/vllm
cd vllm
VLLM_TARGET_DEVICE=empty pip install . --extra-index-url https://download.pytorch.org/whl/cpu/

# Install vllm-ascend from source
git clone  --depth 1 --branch v0.7.1rc1 https://github.com/vllm-project/vllm-ascend.git
cd vllm-ascend
pip install -e . --extra-index-url https://download.pytorch.org/whl/cpu/

# You need to install `torch-npu` manually, because that vllm-ascend relies on an unreleased version of torch-npu.
# This step will be removed in the next vllm-ascend release.
#
# Here we take python 3.10 on aarch64 as an example. Feel free to install the correct version for your environment. See:
#
# https://pytorch-package.obs.cn-north-4.myhuaweicloud.com/pta/Daily/v2.5.1/20250218.4/pytorch_v2.5.1_py39.tar.gz
# https://pytorch-package.obs.cn-north-4.myhuaweicloud.com/pta/Daily/v2.5.1/20250218.4/pytorch_v2.5.1_py310.tar.gz
# https://pytorch-package.obs.cn-north-4.myhuaweicloud.com/pta/Daily/v2.5.1/20250218.4/pytorch_v2.5.1_py311.tar.gz
#
mkdir pta
cd pta
wget https://pytorch-package.obs.cn-north-4.myhuaweicloud.com/pta/Daily/v2.5.1/20250218.4/pytorch_v2.5.1_py310.tar.gz
tar -xvf pytorch_v2.5.1_py310.tar.gz
pip install ./torch_npu-2.5.1.dev20250218-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl

@phellonchen
Copy link
Author

phellonchen commented Feb 24, 2025

follow the doc to install the corrent vllm-ascend and torch-npu please https://vllm-ascend.readthedocs.io/en/latest/installation.html

Thanks, I noticed that the following steps may be feasible

wget https://ascend-repo.obs.cn-east-2.myhuaweicloud.com/CANN/CANN%208.0.0/Ascend-cann-nnal_8.0.0_linux-aarch64.run
chmod +x. /Ascend-cann-nnal_8.0.0_linux-aarch64.run
./Ascend-cann-nnal_8.0.0_linux-aarch64.run --install

source /usr/local/Ascend/nnal/atb/set_env.sh

https://github.com/PaddlePaddle/Paddle/issues/65797

@wangxiyuan
Copy link
Collaborator

correct. new torch-npu rely on nnal package.

@zmh2000829
Copy link

follow the doc to install the corrent vllm-ascend and torch-npu please https://vllm-ascend.readthedocs.io/en/latest/installation.html

Thanks, I noticed that the following steps may be feasible

wget https://ascend-repo.obs.cn-east-2.myhuaweicloud.com/CANN/CANN%208.0.0/Ascend-cann-nnal_8.0.0_linux-aarch64.run
chmod +x. /Ascend-cann-nnal_8.0.0_linux-aarch64.run
./Ascend-cann-nnal_8.0.0_linux-aarch64.run --install

source /usr/local/Ascend/nnal/atb/set_env.sh

https://github.com/PaddlePaddle/Paddle/issues/65797

为什么输出这样,是精度丢失嘛?

Image

@MengqingCao
Copy link
Contributor

为什么输出这样,是精度丢失嘛?

Image

Plz post your inference scripts here so that I could find out why this happens.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

4 participants