-
Notifications
You must be signed in to change notification settings - Fork 377
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Failed to build transformer-engine #1506
Comments
Same issue when installing. OS: Ubuntu 22.04
note: This error originates from a subprocess, and is likely not a problem with pip. def _load_library():
+-----------------------------------------------------------------------------------------+ |
The error in the original issue indicates a problem with finding the cuDNN headers: /workspace/jmwang/.local/lib/python3.12/site-packages/torch/include/ATen/cudnn/cudnn-wrapper.h:3:10: fatal error: cudnn.h: No such file or directory @skr3178 could you confirm that you are seeing the same error (it should be above the lines you posted)? |
@ptrendx I'm having the same issue as the OP with the cuDNN headers. Here's a reproducible example: conda create --name tr_engine \
python=3.10 \
nvidia/label/cuda-12.6.3::cuda \
nvidia::cudnn
conda activate tr_engine
pip install torch --index-url https://download.pytorch.org/whl/cu126
export CUDA_HOME=$CONDA_PREFIX
export NVTE_FRAMEWORK=pytorch
pip install git+https://github.com/NVIDIA/TransformerEngine.git@stable --verbose This gives: ~/miniconda3/envs/tr_engine/bin/x86_64-conda-linux-gnu-c++ -DNV_CUDNN_FRONTEND_USE_DYNAMIC_LOADING -Dtransformer_engine_EXPORTS -I/tmp/pip-req-build-_h4f8bra/transformer_engine/common/.. -I/tmp/pip-req-build-_h4f8bra/transformer_engine/common/include -I/tmp/pip-req-build-_h4f8bra/transformer_engine/common/../../3rdparty/cudnn-frontend/include -I/tmp/pip-req-build-_h4f8bra/build/cmake/string_headers -isystem /lustre/fsw/portfolios/llmservice/users/cmccarthy/miniconda3/envs/tr_engine/targets/x86_64-linux/include -Wl,--version-script=/tmp/pip-req-build-_h4f8bra/transformer_engine/common/libtransformer_engine.version -O3 -DNDEBUG -std=gnu++17 -fPIC -MD -MT CMakeFiles/transformer_engine.dir/comm_gemm_overlap/comm_gemm_overlap.cpp.o -MF CMakeFiles/transformer_engine.dir/comm_gemm_overlap/comm_gemm_overlap.cpp.o.d -o CMakeFiles/transformer_engine.dir/comm_gemm_overlap/comm_gemm_overlap.cpp.o -c /tmp/pip-req-build-_h4f8bra/transformer_engine/common/comm_gemm_overlap/comm_gemm_overlap.cpp
In file included from /tmp/pip-req-build-_h4f8bra/transformer_engine/common/normalization/common.cpp:9:
/tmp/pip-req-build-_h4f8bra/transformer_engine/common/normalization/common.h:10:10: fatal error: cudnn.h: No such file or directory
10 | #include <cudnn.h>
| ^~~~~~~~~
In file included from /tmp/pip-req-build-_h4f8bra/transformer_engine/common/cudnn_utils.cpp:7:
/tmp/pip-req-build-_h4f8bra/transformer_engine/common/cudnn_utils.h:10:10: fatal error: cudnn.h: No such file or directory
10 | #include <cudnn.h>
| ^~~~~~~~~
compilation terminated.
compilation terminated. But the file exists at the standard path (I'm assuming $CUDA_HOME/include is standard) (tr_engine) ~/$ cd $CUDA_HOME && find . -name cudnn.h
./lib/python3.10/site-packages/nvidia/cudnn/include/cudnn.h
./include/cudnn.h Explicitly adding I've attached the full pip install / build output. Thanks for taking a look. |
Python 3.12.7
pytorch: 2.6.0+cu126
cuda: 12.6
cudnn 9.3.0.75
gcc: 13.3.0
RTX4090
Ubuntu
have export the path already
pip install transformer_engine[pytorch]
Defaulting to user installation because normal site-packages is not writeable
Collecting transformer_engine[pytorch]
Using cached transformer_engine-1.13.0-py3-none-any.whl.metadata (16 kB)
Collecting transformer_engine_cu12==1.13.0 (from transformer_engine[pytorch])
Using cached transformer_engine_cu12-1.13.0-py3-none-manylinux_2_28_x86_64.whl.metadata (16 kB)
Collecting transformer_engine_torch==1.13.0 (from transformer_engine[pytorch])
Downloading transformer_engine_torch-1.13.0.tar.gz (121 kB)
Preparing metadata (setup.py) ... done
Requirement already satisfied: pydantic in /workspace/shared/anaconda3/lib/python3.12/site-packages (from transformer_engine_cu12==1.13.0->transformer_engine[pytorch]) (2.8.2)
Requirement already satisfied: importlib-metadata>=1.0 in /workspace/shared/anaconda3/lib/python3.12/site-packages (from transformer_engine_cu12==1.13.0->transformer_engine[pytorch]) (7.0.1)
Requirement already satisfied: packaging in /workspace/shared/anaconda3/lib/python3.12/site-packages (from transformer_engine_cu12==1.13.0->transformer_engine[pytorch]) (24.1)
Requirement already satisfied: torch in ./.local/lib/python3.12/site-packages (from transformer_engine_torch==1.13.0->transformer_engine[pytorch]) (2.6.0+cu126)
Requirement already satisfied: zipp>=0.5 in /workspace/shared/anaconda3/lib/python3.12/site-packages (from importlib-metadata>=1.0->transformer_engine_cu12==1.13.0->transformer_engine[pytorch]) (3.17.0)
Requirement already satisfied: annotated-types>=0.4.0 in /workspace/shared/anaconda3/lib/python3.12/site-packages (from pydantic->transformer_engine_cu12==1.13.0->transformer_engine[pytorch]) (0.6.0)
Requirement already satisfied: pydantic-core==2.20.1 in /workspace/shared/anaconda3/lib/python3.12/site-packages (from pydantic->transformer_engine_cu12==1.13.0->transformer_engine[pytorch]) (2.20.1)
Requirement already satisfied: typing-extensions>=4.6.1 in /workspace/shared/anaconda3/lib/python3.12/site-packages (from pydantic->transformer_engine_cu12==1.13.0->transformer_engine[pytorch]) (4.11.0)
Requirement already satisfied: filelock in /workspace/shared/anaconda3/lib/python3.12/site-packages (from torch->transformer_engine_torch==1.13.0->transformer_engine[pytorch]) (3.13.1)
Requirement already satisfied: setuptools in /workspace/shared/anaconda3/lib/python3.12/site-packages (from torch->transformer_engine_torch==1.13.0->transformer_engine[pytorch]) (75.1.0)
Requirement already satisfied: sympy==1.13.1 in ./.local/lib/python3.12/site-packages (from torch->transformer_engine_torch==1.13.0->transformer_engine[pytorch]) (1.13.1)
Requirement already satisfied: networkx in /workspace/shared/anaconda3/lib/python3.12/site-packages (from torch->transformer_engine_torch==1.13.0->transformer_engine[pytorch]) (3.3)
Requirement already satisfied: jinja2 in /workspace/shared/anaconda3/lib/python3.12/site-packages (from torch->transformer_engine_torch==1.13.0->transformer_engine[pytorch]) (3.1.4)
Requirement already satisfied: fsspec in /workspace/shared/anaconda3/lib/python3.12/site-packages (from torch->transformer_engine_torch==1.13.0->transformer_engine[pytorch]) (2024.6.1)
Requirement already satisfied: nvidia-cuda-nvrtc-cu12==12.6.77 in ./.local/lib/python3.12/site-packages (from torch->transformer_engine_torch==1.13.0->transformer_engine[pytorch]) (12.6.77)
Requirement already satisfied: nvidia-cuda-runtime-cu12==12.6.77 in ./.local/lib/python3.12/site-packages (from torch->transformer_engine_torch==1.13.0->transformer_engine[pytorch]) (12.6.77)
Requirement already satisfied: nvidia-cuda-cupti-cu12==12.6.80 in ./.local/lib/python3.12/site-packages (from torch->transformer_engine_torch==1.13.0->transformer_engine[pytorch]) (12.6.80)
Requirement already satisfied: nvidia-cudnn-cu12==9.5.1.17 in ./.local/lib/python3.12/site-packages (from torch->transformer_engine_torch==1.13.0->transformer_engine[pytorch]) (9.5.1.17)
Requirement already satisfied: nvidia-cublas-cu12==12.6.4.1 in ./.local/lib/python3.12/site-packages (from torch->transformer_engine_torch==1.13.0->transformer_engine[pytorch]) (12.6.4.1)
Requirement already satisfied: nvidia-cufft-cu12==11.3.0.4 in ./.local/lib/python3.12/site-packages (from torch->transformer_engine_torch==1.13.0->transformer_engine[pytorch]) (11.3.0.4)
Requirement already satisfied: nvidia-curand-cu12==10.3.7.77 in ./.local/lib/python3.12/site-packages (from torch->transformer_engine_torch==1.13.0->transformer_engine[pytorch]) (10.3.7.77)
Requirement already satisfied: nvidia-cusolver-cu12==11.7.1.2 in ./.local/lib/python3.12/site-packages (from torch->transformer_engine_torch==1.13.0->transformer_engine[pytorch]) (11.7.1.2)
Requirement already satisfied: nvidia-cusparse-cu12==12.5.4.2 in ./.local/lib/python3.12/site-packages (from torch->transformer_engine_torch==1.13.0->transformer_engine[pytorch]) (12.5.4.2)
Requirement already satisfied: nvidia-cusparselt-cu12==0.6.3 in ./.local/lib/python3.12/site-packages (from torch->transformer_engine_torch==1.13.0->transformer_engine[pytorch]) (0.6.3)
Requirement already satisfied: nvidia-nccl-cu12==2.21.5 in ./.local/lib/python3.12/site-packages (from torch->transformer_engine_torch==1.13.0->transformer_engine[pytorch]) (2.21.5)
Requirement already satisfied: nvidia-nvtx-cu12==12.6.77 in ./.local/lib/python3.12/site-packages (from torch->transformer_engine_torch==1.13.0->transformer_engine[pytorch]) (12.6.77)
Requirement already satisfied: nvidia-nvjitlink-cu12==12.6.85 in ./.local/lib/python3.12/site-packages (from torch->transformer_engine_torch==1.13.0->transformer_engine[pytorch]) (12.6.85)
Requirement already satisfied: triton==3.2.0 in ./.local/lib/python3.12/site-packages (from torch->transformer_engine_torch==1.13.0->transformer_engine[pytorch]) (3.2.0)
Requirement already satisfied: mpmath<1.4,>=1.1.0 in /workspace/shared/anaconda3/lib/python3.12/site-packages (from sympy==1.13.1->torch->transformer_engine_torch==1.13.0->transformer_engine[pytorch]) (1.3.0)
Requirement already satisfied: MarkupSafe>=2.0 in /workspace/shared/anaconda3/lib/python3.12/site-packages (from jinja2->torch->transformer_engine_torch==1.13.0->transformer_engine[pytorch]) (2.1.3)
Using cached transformer_engine_cu12-1.13.0-py3-none-manylinux_2_28_x86_64.whl (125.2 MB)
Using cached transformer_engine-1.13.0-py3-none-any.whl (459 kB)
Building wheels for collected packages: transformer_engine_torch
Building wheel for transformer_engine_torch (setup.py) ... error
error: subprocess-exited-with-error
× python setup.py bdist_wheel did not run successfully.
│ exit code: 1
╰─> [22 lines of output]
/workspace/shared/anaconda3/lib/python3.12/site-packages/setuptools/_distutils/dist.py:261: UserWarning: Unknown distribution option: 'tests_require'
warnings.warn(msg)
running bdist_wheel
/workspace/jmwang/.local/lib/python3.12/site-packages/torch/utils/cpp_extension.py:529: UserWarning: Attempted to use ninja as the BuildExtension backend but we could not find ninja.. Falling back to using the slow distutils backend.
warnings.warn(msg.format('we could not find ninja.'))
running build
running build_ext
/workspace/jmwang/.local/lib/python3.12/site-packages/torch/utils/cpp_extension.py:458: UserWarning: There are no g++ version bounds defined for CUDA version 12.6
warnings.warn(f'There are no {compiler_name} version bounds defined for CUDA version {cuda_str_version}')
building 'transformer_engine_torch' extension
creating build/temp.linux-x86_64-cpython-312/csrc
creating build/temp.linux-x86_64-cpython-312/csrc/extensions
creating build/temp.linux-x86_64-cpython-312/csrc/extensions/multi_tensor
g++ -pthread -B /workspace/shared/anaconda3/compiler_compat -fno-strict-overflow -Wsign-compare -DNDEBUG -O2 -Wall -fPIC -O2 -isystem /workspace/shared/anaconda3/include -fPIC -O2 -isystem /workspace/shared/anaconda3/include -fPIC -I/tmp/pip-install-d8mpwx1x/transformer-engine-torch_84f4d864065842a4a131c88cea3e6872/common_headers -I/tmp/pip-install-d8mpwx1x/transformer-engine-torch_84f4d864065842a4a131c88cea3e6872/common_headers/common -I/tmp/pip-install-d8mpwx1x/transformer-engine-torch_84f4d864065842a4a131c88cea3e6872/common_headers/common/include -I/tmp/pip-install-d8mpwx1x/transformer-engine-torch_84f4d864065842a4a131c88cea3e6872/csrc -I/workspace/jmwang/.local/lib/python3.12/site-packages/torch/include -I/workspace/jmwang/.local/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/workspace/jmwang/.local/lib/python3.12/site-packages/torch/include/TH -I/workspace/jmwang/.local/lib/python3.12/site-packages/torch/include/THC -I/usr/local/cuda/include -I/workspace/shared/anaconda3/include/python3.12 -c csrc/common.cpp -o build/temp.linux-x86_64-cpython-312/csrc/common.o -O3 -fvisibility=hidden -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="_gcc" -DPYBIND11_STDLIB="_libstdcpp" -DPYBIND11_BUILD_ABI="_cxxabi1016" -DTORCH_EXTENSION_NAME=transformer_engine_torch -D_GLIBCXX_USE_CXX11_ABI=1 -std=c++17
In file included from /workspace/jmwang/.local/lib/python3.12/site-packages/torch/include/ATen/cudnn/Handle.h:4,
from csrc/common.h:14,
from csrc/common.cpp:7:
/workspace/jmwang/.local/lib/python3.12/site-packages/torch/include/ATen/cudnn/cudnn-wrapper.h:3:10: fatal error: cudnn.h: No such file or directory
3 | #include <cudnn.h>
| ^~~~~~~~~
compilation terminated.
error: command '/usr/bin/g++' failed with exit code 1
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for transformer_engine_torch
Running setup.py clean for transformer_engine_torch
Failed to build transformer_engine_torch
ERROR: ERROR: Failed to build installable wheels for some pyproject.toml based projects (transformer_engine_torch)
The text was updated successfully, but these errors were encountered: