Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

eager_generator': corrupted double-linked list: 0x0000000006ee2200 *** #61834

Open
WanwanLinLin opened this issue Feb 19, 2024 · 10 comments
Open
Assignees
Labels
status/following-up 跟进中 type/build 编译/安装问题

Comments

@WanwanLinLin
Copy link

问题描述 Issue Description

我尝试在CentOS7.9本地编译,我的ldd --version是2.17,是系统自带的,这是我的构建编译命令:
cmake .. -DPY_VERSION=3.9 -DWITH_GPU=OFF -DWITH_NCCL=OFF -DWITH_MKLDNN=OFF
-DWITH_RCCL=OFF -DCMAKE_INSTALL_PREFIX=/home/cproject/Paddle/install

make -j1

但每次都是报这个错误:
*** Error in `/home/cproject/Paddle/build/paddle/fluid/eager/auto_code_generator/eager_generator': corrupted double-linked list: 0x0000000006ee2200 ***
======= Backtrace: =========
/lib64/libc.so.6(+0x8097f)[0x7f922a79f97f]
/lib64/libc.so.6(+0x8120e)[0x7f922a7a020e]
/home/cproject/Paddle/build/paddle/phi/libphi.so(_ZN3phi13KernelFactoryD1Ev+0x18a)[0x7f922cafcaca]
/lib64/libc.so.6(__cxa_finalize+0x9a)[0x7f922a75905a]
/home/cproject/Paddle/build/paddle/phi/libphi.so(+0xce3707)[0x7f922c5e9707]
======= Memory map: ========
00400000-004c1000 r--p 00000000 fd:02 79165216 /home/cproject/Paddle/build/paddle/fluid/eager/auto_code_generator/eager_generator
004c1000-048e2000 r-xp 000c1000 fd:02 79165216 /home/cproject/Paddle/build/paddle/fluid/eager/auto_code_generator/eager_generator
048e2000-05433000 r--p 044e2000 fd:02 79165216 /home/cproject/Paddle/build/paddle/fluid/eager/auto_code_generator/eager_generator
05434000-054e6000 r--p 05033000 fd:02 79165216 /home/cproject/Paddle/build/paddle/fluid/eager/auto_code_generator/eager_generator
054e6000-05520000 rw-p 050e5000 fd:02 79165216 /home/cproject/Paddle/build/paddle/fluid/eager/auto_code_generator/eager_generator
05520000-05557000 rw-p 00000000 00:00 0
06b55000-087c4000 rw-p 00000000 00:00 0 [heap]
7f9224000000-7f9224021000 rw-p 00000000 00:00 0
7f9224021000-7f9228000000 ---p 00000000 00:00 0
7f922a4f9000-7f922a51e000 r-xp 00000000 fd:00 33555743 /usr/lib64/libgomp.so.1.0.0
7f922a51e000-7f922a71d000 ---p 00025000 fd:00 33555743 /usr/lib64/libgomp.so.1.0.0
7f922a71d000-7f922a71e000 r--p 00024000 fd:00 33555743 /usr/lib64/libgomp.so.1.0.0
7f922a71e000-7f922a71f000 rw-p 00025000 fd:00 33555743 /usr/lib64/libgomp.so.1.0.0
7f922a71f000-7f922a8e3000 r-xp 00000000 fd:00 33555275 /usr/lib64/libc-2.17.so
7f922a8e3000-7f922aae2000 ---p 001c4000 fd:00 33555275 /usr/lib64/libc-2.17.so
7f922aae2000-7f922aae6000 r--p 001c3000 fd:00 33555275 /usr/lib64/libc-2.17.so
7f922aae6000-7f922aae8000 rw-p 001c7000 fd:00 33555275 /usr/lib64/libc-2.17.so
7f922aae8000-7f922aaed000 rw-p 00000000 00:00 0
7f922aaed000-7f922ab02000 r-xp 00000000 fd:00 33554508 /usr/lib64/libgcc_s-4.8.5-20150702.so.1
7f922ab02000-7f922ad01000 ---p 00015000 fd:00 33554508 /usr/lib64/libgcc_s-4.8.5-20150702.so.1
7f922ad01000-7f922ad02000 r--p 00014000 fd:00 33554508 /usr/lib64/libgcc_s-4.8.5-20150702.so.1
7f922ad02000-7f922ad03000 rw-p 00015000 fd:00 33554508 /usr/lib64/libgcc_s-4.8.5-20150702.so.1
7f922ad03000-7f922ae04000 r-xp 00000000 fd:00 33555286 /usr/lib64/libm-2.17.so
7f922ae04000-7f922b003000 ---p 00101000 fd:00 33555286 /usr/lib64/libm-2.17.so
7f922b003000-7f922b004000 r--p 00100000 fd:00 33555286 /usr/lib64/libm-2.17.so
7f922b004000-7f922b005000 rw-p 00101000 fd:00 33555286 /usr/lib64/libm-2.17.so
7f922b005000-7f922b0ee000 r-xp 00000000 fd:00 33555391 /usr/lib64/libstdc++.so.6.0.19
7f922b0ee000-7f922b2ee000 ---p 000e9000 fd:00 33555391 /usr/lib64/libstdc++.so.6.0.19
7f922b2ee000-7f922b2f6000 r--p 000e9000 fd:00 33555391 /usr/lib64/libstdc++.so.6.0.19
7f922b2f6000-7f922b2f8000 rw-p 000f1000 fd:00 33555391 /usr/lib64/libstdc++.so.6.0.19
7f922b2f8000-7f922b30d000 rw-p 00000000 00:00 0
7f922b30d000-7f922b30f000 r-xp 00000000 fd:00 33555283 /usr/lib64/libdl-2.17.so
7f922b30f000-7f922b50f000 ---p 00002000 fd:00 33555283 /usr/lib64/libdl-2.17.so
7f922b50f000-7f922b510000 r--p 00002000 fd:00 33555283 /usr/lib64/libdl-2.17.so
7f922b510000-7f922b511000 rw-p 00003000 fd:00 33555283 /usr/lib64/libdl-2.17.so
7f922b511000-7f922b6cb000 r-xp 00000000 fd:02 211392241 /home/cproject/Paddle/build/third_party/install/mklml/lib/libiomp5.so
7f922b6cb000-7f922b8ca000 ---p 001ba000 fd:02 211392241 /home/cproject/Paddle/build/third_party/install/mklml/lib/libiomp5.so
7f922b8ca000-7f922b8cd000 r--p 001b9000 fd:02 211392241 /home/cproject/Paddle/build/third_party/install/mklml/lib/libiomp5.so
7f922b8cd000-7f922b8d7000 rw-p 001bc000 fd:02 211392241 /home/cproject/Paddle/build/third_party/install/mklml/lib/libiomp5.so
7f922b8d7000-7f922b906000 rw-p 00000000 00:00 0
7f922b906000-7f922c17e000 r--p 00000000 fd:02 84662122 /home/cproject/Paddle/build/paddle/phi/libphi.so
7f922c17e000-7f922f9ed000 r-xp 00878000 fd:02 84662122 /home/cproject/Paddle/build/paddle/phi/libphi.so
7f922f9ed000-7f922ff02000 r--p 040e7000 fd:02 84662122 /home/cproject/Paddle/build/paddle/phi/libphi.so
7f922ff02000-7f922ff34000 r--p 045fb000 fd:02 84662122 /home/cproject/Paddle/build/paddle/phi/libphi.so
7f922ff34000-7f922ff6d000 rw-p 0462d000 fd:02 84662122 /home/cproject/Paddle/build/paddle/phi/libphi.so
7f922ff6d000-7f922ffd9000 rw-p 00000000 00:00 0
7f922ffd9000-7f922ffe0000 r-xp 00000000 fd:00 33555312 /usr/lib64/librt-2.17.so
7f922ffe0000-7f92301df000 ---p 00007000 fd:00 33555312 /usr/lib64/librt-2.17.so
7f92301df000-7f92301e0000 r--p 00006000 fd:00 33555312 /usr/lib64/librt-2.17.so
7f92301e0000-7f92301e1000 rw-p 00007000 fd:00 33555312 /usr/lib64/librt-2.17.so
7f92301e1000-7f92301f8000 r-xp 00000000 fd:00 33555307 /usr/lib64/libpthread-2.17.so
7f92301f8000-7f92303f7000 ---p 00017000 fd:00 33555307 /usr/lib64/libpthread-2.17.so
7f92303f7000-7f92303f8000 r--p 00016000 fd:00 33555307 /usr/lib64/libpthread-2.17.soSubprocess aborted
make[2]: *** [paddle/fluid/eager/auto_code_generator/CMakeFiles/legacy_eager_codegen] 错误 1
make[1]: *** [paddle/fluid/eager/auto_code_generator/CMakeFiles/legacy_eager_codegen.dir/all] 错误 2
make: *** [all] 错误 2

版本&环境信息 Version & Environment Information

CPU版本

@WanwanLinLin WanwanLinLin added status/new-issue 新建 type/build 编译/安装问题 labels Feb 19, 2024
@WanwanLinLin
Copy link
Author

我编译的是develop版本的paddle,是按照官方教程来的

@risemeup1
Copy link
Contributor

make -j10呢?

@paddle-bot paddle-bot bot added status/following-up 跟进中 and removed status/new-issue 新建 labels Feb 20, 2024
@WanwanLinLin
Copy link
Author

make -j10呢?

一样

@WanwanLinLin
Copy link
Author

如果设置-DWITH_PYTHON=OFF就能够编译成功

@risemeup1
Copy link
Contributor

我在我们这边的Centos机器上编译是可以成功的,你用的是docker编译吗?

@TimeYWL
Copy link
Contributor

TimeYWL commented Feb 27, 2024

我这边在centos7.6 + rocm 环境下编译也遇到了这个问题。
环境:python 3.8, gcc 11.2/gcc 7.3, dtk 23.10。
使用 valgrind 分析eager_generator有以下结果:
==9562== Invalid read of size 4
==9562== at 0x33AFE56D: ??? (in /usr/lib64/libstdc++.so.6.0.19)
==9562== by 0x33B60E22: std::basic_string<char, std::char_traits, std::allocator >::~basic_string() (in /usr/lib64/libstdc++.so.6.0.19)
==9562== by 0x34523059: __cxa_finalize (in /usr/lib64/libc-2.17.so)
==9562== by 0x16B2BA26: ??? (in /workspace/Paddle-2.6.0/build/paddle/phi/libphi.so)
==9562== by 0xAAB1089: _dl_fini (in /usr/lib64/ld-2.17.so)
==9562== by 0x34522CE8: __run_exit_handlers (in /usr/lib64/libc-2.17.so)
==9562== by 0x34522D36: exit (in /usr/lib64/libc-2.17.so)
==9562== by 0x3450B55B: (below main) (in /usr/lib64/libc-2.17.so)
==9562== Address 0x35da3030 is 16 bytes inside a block of size 34 free'd
==9562== at 0xB6CC51D: operator delete(void*) (vg_replace_malloc.c:586)
==9562== by 0x33B60E22: std::basic_string<char, std::char_traits, std::allocator >::~basic_string() (in /usr/lib64/libstdc++.so.6.0.19)
==9562== by 0x34522CE8: __run_exit_handlers (in /usr/lib64/libc-2.17.so)
==9562== by 0x34522D36: exit (in /usr/lib64/libc-2.17.so)
==9562== by 0x3450B55B: (below main) (in /usr/lib64/libc-2.17.so)
==9562== Block was alloc'd at
==9562== at 0xB6CB593: operator new(unsigned long) (vg_replace_malloc.c:344)
==9562== by 0x33B60CD8: std::string::_Rep::_S_create(unsigned long, unsigned long, std::allocator const&) (in /usr/lib64/libstdc++.so.6.0.19)
==9562== by 0x114652B: char* std::string::_S_construct<char const*>(char const*, char const*, std::allocator const&, std::forward_iterator_tag) (basic_string.tcc:610)
==9562== by 0x113B053: char* std::string::_S_construct_aux<char const*>(char const*, char const*, std::allocator const&, std::__false_type) (basic_string.h:5180)
==9562== by 0x112CD50: char* std::string::_S_construct<char const*>(char const*, char const*, std::allocator const&) (basic_string.h:5201)
==9562== by 0x11213FB: std::basic_string<char, std::char_traits, std::allocator >::basic_string<std::allocator >(char const*, std::allocator const&) (basic_string.h:3663)
==9562== by 0xE5179F: _GLOBAL__sub_I_logging.cc (in /workspace/Paddle-2.6.0/build/paddle/fluid/eager/auto_code_generator/eager_generator)
==9562== by 0x5C1677C: __libc_csu_init (in /workspace/Paddle-2.6.0/build/paddle/fluid/eager/auto_code_generator/eager_generator)
==9562== by 0x3450B4E4: (below main) (in /usr/lib64/libc-2.17.so)
==9562==
==9562== Invalid free() / delete / delete[] / realloc()
==9562== at 0xB6CC51D: operator delete(void*) (vg_replace_malloc.c:586)
==9562== by 0x33B60E22: std::basic_string<char, std::char_traits, std::allocator >::~basic_string() (in /usr/lib64/libstdc++.so.6.0.19)
==9562== by 0x34523059: __cxa_finalize (in /usr/lib64/libc-2.17.so)
==9562== by 0x16B2BA26: ??? (in /workspace/Paddle-2.6.0/build/paddle/phi/libphi.so)
==9562== by 0xAAB1089: _dl_fini (in /usr/lib64/ld-2.17.so)
==9562== by 0x34522CE8: __run_exit_handlers (in /usr/lib64/libc-2.17.so)
==9562== by 0x34522D36: exit (in /usr/lib64/libc-2.17.so)
==9562== by 0x3450B55B: (below main) (in /usr/lib64/libc-2.17.so)
==9562== Address 0x35da3020 is 0 bytes inside a block of size 34 free'd
==9562== at 0xB6CC51D: operator delete(void*) (vg_replace_malloc.c:586)
==9562== by 0x33B60E22: std::basic_string<char, std::char_traits, std::allocator >::~basic_string() (in /usr/lib64/libstdc++.so.6.0.19)
==9562== by 0x34522CE8: __run_exit_handlers (in /usr/lib64/libc-2.17.so)
==9562== by 0x34522D36: exit (in /usr/lib64/libc-2.17.so)
==9562== by 0x3450B55B: (below main) (in /usr/lib64/libc-2.17.so)
==9562== Block was alloc'd at
==9562== at 0xB6CB593: operator new(unsigned long) (vg_replace_malloc.c:344)
==9562== by 0x33B60CD8: std::string::_Rep::_S_create(unsigned long, unsigned long, std::allocator const&) (in /usr/lib64/libstdc++.so.6.0.19)
==9562== by 0x114652B: char* std::string::_S_construct<char const*>(char const*, char const*, std::allocator const&, std::forward_iterator_tag) (basic_string.tcc:610)
==9562== by 0x113B053: char* std::string::_S_construct_aux<char const*>(char const*, char const*, std::allocator const&, std::__false_type) (basic_string.h:5180)
==9562== by 0x112CD50: char* std::string::_S_construct<char const*>(char const*, char const*, std::allocator const&) (basic_string.h:5201)
==9562== by 0x11213FB: std::basic_string<char, std::char_traits, std::allocator >::basic_string<std::allocator >(char const*, std::allocator const&) (basic_string.h:3663)
==9562== by 0xE5179F: _GLOBAL__sub_I_logging.cc (in /workspace/Paddle-2.6.0/build/paddle/fluid/eager/auto_code_generator/eager_generator)
==9562== by 0x5C1677C: __libc_csu_init (in /workspace/Paddle-2.6.0/build/paddle/fluid/eager/auto_code_generator/eager_generator)
==9562== by 0x3450B4E4: (below main) (in /usr/lib64/libc-2.17.so)
==9562==
==9562==
==9562== HEAP SUMMARY:
==9562== in use at exit: 7,212,792 bytes in 112,575 blocks
==9562== total heap usage: 746,270 allocs, 633,696 frees, 193,979,195 bytes allocated
==9562==
==9562== LEAK SUMMARY:
==9562== definitely lost: 869,834 bytes in 8,866 blocks
==9562== indirectly lost: 4,778,202 bytes in 98,973 blocks
==9562== possibly lost: 8,575 bytes in 122 blocks
==9562== still reachable: 1,556,181 bytes in 4,614 blocks
==9562== of which reachable via heuristic:
==9562== stdstring : 65,309 bytes in 1,029 blocks
==9562== newarray : 3,080 bytes in 1 blocks
==9562== suppressed: 0 bytes in 0 blocks
==9562== Rerun with --leak-check=full to see details of leaked memory
==9562==
==9562== For lists of detected and suppressed errors, rerun with: -s
==9562== ERROR SUMMARY: 4 errors from 2 contexts (suppressed: 0 from 0)

我在我们这边的Centos机器上编译是可以成功的,你用的是docker编译吗?

@ronny1996
Copy link
Contributor

你好,请问容器里能正常编译吗?

@TimeYWL
Copy link
Contributor

TimeYWL commented Feb 29, 2024

你好,请问容器里能正常编译吗?

我使用的就是从光源拉取的镜像:
docker pull image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.1.0-centos7.6-dtk23.10.1-py38

@llseek
Copy link

llseek commented Mar 6, 2024

我在我们这边的Centos机器上编译是可以成功的,你用的是docker编译吗?

能分享下您的编译环境信息吗?比如python版本、dtk版本

@WanwanLinLin
Copy link
Author

我在我们这边的Centos机器上编译是可以成功的,你用的是docker编译吗?

不是,我是CentOS7本地编译的,如果不编译python库的话paddle就能编译成功,否则就失败,报上面那个错误,而且你们提供的官方镜像我没有看到有CentOS7的。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status/following-up 跟进中 type/build 编译/安装问题
Projects
None yet
Development

No branches or pull requests

6 participants