mirrored from git://gcc.gnu.org/git/gcc.git
-
Notifications
You must be signed in to change notification settings - Fork 4.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Releases/gcc 12 #65
Open
jacopobrusini
wants to merge
2,671
commits into
master
Choose a base branch
from
releases/gcc-12
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Releases/gcc 12 #65
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This is an unofficial mirror that has nothing to do with the GCC project, so submitting pull requests here is a waste of time. Also, I have no idea what this pull request is trying to do but it would never be accepted even if it was submitted to the right place. |
atahanozbayram
approved these changes
Apr 2, 2024
testing matrix multiplication benchmarks shows that FMA on a critical chain is a perofrmance loss over separate multiply and add. While the latency of 4 is lower than multiply + add (3+2) the problem is that all values needs to be ready before computation starts. While on znver4 AVX512 code fared well with FMA, it was because of the split registers. Znver5 benefits from avoding FMA on all widths. This may be different with the mobile version though. On naive matrix multiplication benchmark the difference is 8% with -O3 only since with -Ofast loop interchange solves the problem differently. It is 30% win, for example, on S323 from TSVC: real_t s323(struct args_t * func_args) { // recurrences // coupled recurrence initialise_arrays(__func__); gettimeofday(&func_args->t1, NULL); for (int nl = 0; nl < iterations/2; nl++) { for (int i = 1; i < LEN_1D; i++) { a[i] = b[i-1] + c[i] * d[i]; b[i] = a[i] + c[i] * e[i]; } dummy(a, b, c, d, e, aa, bb, cc, 0.); } gettimeofday(&func_args->t2, NULL); return calc_checksum(__func__); } gcc/ChangeLog: * config/i386/x86-tune.def (X86_TUNE_AVOID_128FMA_CHAINS): Enable for znver5. (X86_TUNE_AVOID_256FMA_CHAINS): Likewise. (X86_TUNE_AVOID_512FMA_CHAINS): Likewise. (cherry picked from commit d6360b4)
split_constant_offset when looking through SSA defs can end up picking SSA leafs that are subject to abnormal coalescing. This can lead to downstream consumers to insert code based on the result (like from dataref analysis) in places that violate constraints for abnormal coalescing. It's best to not expand defs whose operands are subject to abnormal coalescing - and not either do something when a subexpression has operands like that already. PR tree-optimization/116585 * tree-data-ref.cc (split_constant_offset_1): When either operand is subject to abnormal coalescing do no further processing. * gcc.dg/torture/pr116585.c: New testcase. (cherry picked from commit 1d0cb3b)
Since naked functions should not enable stack protector, define TARGET_STACK_PROTECT_RUNTIME_ENABLED_P to disable stack protector for naked functions. gcc/ PR target/116962 * config/i386/i386.cc (ix86_stack_protect_runtime_enabled_p): New function. (TARGET_STACK_PROTECT_RUNTIME_ENABLED_P): New. gcc/testsuite/ PR target/116962 * gcc.target/i386/pr116962.c: New file. Signed-off-by: H.J. Lu <[email protected]> (cherry picked from commit 7d2845d)
Noticed testing LRA. 2024-10-05 John David Anglin <[email protected]> gcc/ChangeLog: * config/pa/pa.md: Fix indirect_got constraint.
When the library is configured with --disable-libstdcxx-verbose the assertions just abort instead of calling __glibcxx_assert_fail, and so I didn't export that function for the non-verbose build. However, that option is documented to not change the library ABI, so we still need to export the symbol from the library. It could be needed by programs compiled against the headers from a verbose build. The non-verbose definition can just call abort so that it doesn't pull in I/O symbols, which are unwanted in a non-verbose build. libstdc++-v3/ChangeLog: PR libstdc++/115585 * src/c++11/assert_fail.cc (__glibcxx_assert_fail): Add definition for non-verbose builds. (cherry picked from commit 52370c8)
…116641] The changes to implement LWG 2579 (r10-327-gdb33efde17932f) made std::string::assign use the propagate_on_container_copy_assignment (POCCA) trait, for consistency with operator=(const basic_string&). However, this also unintentionally affected operator=(basic_string&&) which calls assign(str) to make a deep copy when performing a move is not possible. The fix is for the move assignment operator to call _M_assign(str) instead of assign(str), as this just does the deep copy and doesn't check the POCCA trait first. The bug only affects the unlikely/useless combination of POCCA==true and POCMA==false, but we should fix it for correctness anyway. it should also make move assignment slightly cheaper to compile and execute, because we skip the extra code in assign(const basic_string&). libstdc++-v3/ChangeLog: PR libstdc++/116641 * include/bits/basic_string.h (operator=(basic_string&&)): Call _M_assign instead of assign. * testsuite/21_strings/basic_string/allocator/116641.cc: New test. (cherry picked from commit c07cf41)
I misused the AC_CHECK_DECL macro, assuming that it behaved like AC_CHECK_DECLS and always defined a HAVE_xxx macro if the decl was found. Instead, the [action-if-found] shell commands are needed to defined HAVE_O_NONBLOCK explicitly. libstdc++-v3/ChangeLog: * configure.ac: Fix check for O_NONBLOCK. * config.h.in: Regenerate. * configure: Regenerate. (cherry picked from commit b68561d)
We should not use [[unlikely]] before C++20, so use [[__unlikely__]] instead. libstdc++-v3/ChangeLog: * include/std/variant (_Variant_storage::_M_reset): Use __unlikely__ form of attribute instead of unlikely. (cherry picked from commit 9f1cd51)
A few of these files self-identified as ext/random.tcc, update to use the actual basename. libstdc++-v3/ChangeLog: * config/cpu/aarch64/opt/ext/opt_random.h: Improve doxygen file docs. * config/cpu/i486/opt/ext/opt_random.h: Likewise. (cherry picked from commit c2ad7b2)
There is no file ext/type_traits, point it to ext/type_traits.h instead. libstdc++-v3/ChangeLog: * include/bits/cpp_type_traits.h: Improve doxygen file docs. (cherry picked from commit f6ed7a6)
The shift operations for dynamic_bitset fail to zero out words where the non-zero bits were shifted to a completely different word. For a right shift we don't need to sanitize the unused bits in the high word, because we know they were already clear and a right shift doesn't change that. libstdc++-v3/ChangeLog: PR libstdc++/115399 * include/tr2/dynamic_bitset (operator>>=): Remove redundant call to _M_do_sanitize. * include/tr2/dynamic_bitset.tcc (_M_do_left_shift): Zero out low bits in words that should no longer be populated. (_M_do_right_shift): Likewise for high bits. * testsuite/tr2/dynamic_bitset/pr115399.cc: New test. (cherry picked from commit bd3a312)
Although POSIX requires ELOOP, FreeBSD documents that openat with O_NOFOLLOW returns EMLINK if the last component of a filename is a symbolic link. Check for EMLINK as well as ELOOP, so that the TOCTTOU mitigation in remove_all works correctly. See https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=214633 or the FreeBSD man page for reference. According to its man page, DragonFlyBSD also uses EMLINK for this error, and NetBSD uses its own EFTYPE. OpenBSD follows POSIX and uses EMLINK. This fixes these failures on FreeBSD: FAIL: 27_io/filesystem/operations/remove_all.cc -std=gnu++17 execution test FAIL: experimental/filesystem/operations/remove_all.cc -std=gnu++17 execution test libstdc++-v3/ChangeLog: * src/c++17/fs_ops.cc (remove_all) [__FreeBSD__ || __DragonFly__]: Check for EMLINK as well as ELOOP. [__NetBSD__]: Check for EFTYPE as well as ELOOP.
This fixes a warning from one of the test allocators: warning: base class 'class std::allocator<__gnu_test::copy_tracker>' should be explicitly initialized in the copy constructor [-Wextra] libstdc++-v3/ChangeLog: * testsuite/util/testsuite_allocator.h (tracker_allocator): Initialize base class in copy constructor. (cherry picked from commit e2fb245)
For H8/300 with -msx -mn -mint32 the type of (_M_len - __pos) is int, because int is wider than size_t so the operands are promoted. libstdc++-v3/ChangeLog: * include/std/string_view (basic_string_view::copy) Use explicit template argument for call to std::min<size_t>. (basic_string_view::substr): Likewise.
Add crtbeginT.o to extra_parts on FreeBSD. This ensures we use GCC's crt objects for static linking. Otherwise it could mix crtbeginT.o from the base system with libgcc's crtend.o, possibly leading to segfaults. libgcc: PR target/118685 * config.host (*-*-freebsd*): Add crtbeginT.o to extra_parts. Signed-off-by: Dimitry Andric <[email protected]>
During combine we may end up with (set (reg:DI 66 [ _6 ]) (ashift:DI (reg:DI 72 [ x ]) (subreg:QI (and:TI (reg:TI 67 [ _1 ]) (const_wide_int 0x0aaaaaaaaaaaaaabf)) 15))) where the shift count operand does not trivially fit the scheme of address operands. Reject those operands, especially since strip_address_mutations() expects expressions of the form (and ... (const_int ...)) and fails for (and ... (const_wide_int ...)). Thus, be more strict here and accept only CONST_INT operands. Done by replacing immediate_operand() with const_int_operand() which is enough since the former only additionally checks for LEGITIMATE_PIC_OPERAND_P and targetm.legitimate_constant_p which are always true for CONST_INT operands. While on it, fix indentation of the if block. gcc/ChangeLog: PR target/118835 * config/s390/s390.cc (s390_valid_shift_count): Reject shift count operands which do not trivially fit the scheme of address operands. gcc/testsuite/ChangeLog: * gcc.target/s390/pr118835.c: New test. (cherry picked from commit ac9806d)
Floating-point emulation in the D front-end is done via a type named `struct longdouble`, which in GDC is a small interface around the real_value type. Because the D code cannot include gcc/real.h directly, a big enough buffer is used for the data instead. On x86_64, this buffer is actually bigger than real_value itself, so when a new longdouble object is created with longdouble r; real_from_string3 (&r.rv (), buffer, mode); return r; there is uninitialized padding at the end of `r`. This was never a problem when D was implemented in C++ (until GCC 12) as comparing two longdouble objects with `==' would be forwarded to the relevant operator== overload that extracted the underlying real_value. However when the front-end was translated to D, such conditions were instead rewritten into identity comparisons return exp.toReal() is CTFloat.zero The `is` operator gets lowered as a call to `memcmp() == 0', which is where the read of uninitialized memory occurs, as seen by valgrind. ==26778== Conditional jump or move depends on uninitialised value(s) ==26778== at 0x911F41: dmd.dstruct._isZeroInit(dmd.expression.Expression) (dstruct.d:635) ==26778== by 0x9123BE: StructDeclaration::finalizeSize() (dstruct.d:373) ==26778== by 0x86747C: dmd.aggregate.AggregateDeclaration.determineSize(ref const(dmd.location.Loc)) (aggregate.d:226) [...] To avoid accidentally reading uninitialized data, explicitly initialize all `longdouble` variables with an empty constructor on C++ side of the implementation before initializing underlying real_value type it holds. PR d/116961 gcc/d/ChangeLog: * d-codegen.cc (build_float_cst): Change new_value type from real_t to real_value. * d-ctfloat.cc (CTFloat::fabs): Default initialize the return value. (CTFloat::ldexp): Likewise. (CTFloat::parse): Likewise. * d-longdouble.cc (longdouble::add): Likewise. (longdouble::sub): Likewise. (longdouble::mul): Likewise. (longdouble::div): Likewise. (longdouble::mod): Likewise. (longdouble::neg): Likewise. * d-port.cc (Port::isFloat32LiteralOutOfRange): Likewise. (Port::isFloat64LiteralOutOfRange): Likewise. gcc/testsuite/ChangeLog: * gdc.dg/pr116961.d: New test. (cherry picked from commit f7bc17e)
…ed in i3 [PR118739] The combine pass is trying to combine: Trying 16, 22, 21 -> 23: 16: r104:QI=flags:CCNO>0 22: {r120:QI=r104:QI^0x1;clobber flags:CC;} REG_UNUSED flags:CC 21: r119:QI=flags:CCNO<=0 REG_DEAD flags:CCNO 23: {r110:QI=r119:QI|r120:QI;clobber flags:CC;} REG_DEAD r120:QI REG_DEAD r119:QI REG_UNUSED flags:CC and creates the following two insn sequence: modifying insn i2 22: r104:QI=flags:CCNO>0 REG_DEAD flags:CC deferring rescan insn with uid = 22. modifying insn i3 23: r110:QI=flags:CCNO<=0 REG_DEAD flags:CC deferring rescan insn with uid = 23. where the REG_DEAD note in i2 is not correct, because the flags register is still referenced in i3. In try_combine() megafunction, we have this part: --cut here-- /* Distribute all the LOG_LINKS and REG_NOTES from I1, I2, and I3. */ if (i3notes) distribute_notes (i3notes, i3, i3, newi2pat ? i2 : NULL, elim_i2, elim_i1, elim_i0); if (i2notes) distribute_notes (i2notes, i2, i3, newi2pat ? i2 : NULL, elim_i2, elim_i1, elim_i0); if (i1notes) distribute_notes (i1notes, i1, i3, newi2pat ? i2 : NULL, elim_i2, local_elim_i1, local_elim_i0); if (i0notes) distribute_notes (i0notes, i0, i3, newi2pat ? i2 : NULL, elim_i2, elim_i1, local_elim_i0); if (midnotes) distribute_notes (midnotes, NULL, i3, newi2pat ? i2 : NULL, elim_i2, elim_i1, elim_i0); --cut here-- where the compiler distributes REG_UNUSED note from i2: 22: {r120:QI=r104:QI^0x1;clobber flags:CC;} REG_UNUSED flags:CC via distribute_notes() using the following: --cut here-- /* Otherwise, if this register is used by I3, then this register now dies here, so we must put a REG_DEAD note here unless there is one already. */ else if (reg_referenced_p (XEXP (note, 0), PATTERN (i3)) && ! (REG_P (XEXP (note, 0)) ? find_regno_note (i3, REG_DEAD, REGNO (XEXP (note, 0))) : find_reg_note (i3, REG_DEAD, XEXP (note, 0)))) { PUT_REG_NOTE_KIND (note, REG_DEAD); place = i3; } --cut here-- Flags register is used in I3, but there already is a REG_DEAD note in I3. The above condition doesn't trigger and continues in the "else" part where REG_DEAD note is put to I2. The proposed solution corrects the above logic to trigger every time the register is referenced in I3, avoiding the "else" part. PR rtl-optimization/118739 gcc/ChangeLog: * combine.cc (distribute_notes) <case REG_UNUSED>: Correct the logic when the register is used by I3. gcc/testsuite/ChangeLog: * gcc.target/i386/pr118739.c: New test. (cherry picked from commit a92dc3f)
Uros' r15-7793 fixed this PR as well, I'm just committing tests from the PR so that it can be closed. 2025-03-04 Jakub Jelinek <[email protected]> PR rtl-optimization/119071 * gcc.dg/pr119071.c: New test. * gcc.c-torture/execute/pr119071.c: New test. (cherry picked from commit ccf9db9)
…485) Commit r9-4307-g89d7557202d25a forgot to accept a fixed PIC register when extending the assert in require_pic_register. arm_pic_register can be set explicitly by the user (e.g. -mpic-register=r9) or implicitly as the default value with -fpic/-fPIC/-fPIE and -mno-pic-data-is-text-relative -mlong-calls, and we want to use/accept it when recording cfun->machine->pic_reg as used to be the case. PR target/115485 gcc/ * config/arm/arm.cc (require_pic_register): Fix typos in comment. Handle fixed arm_pic_register. gcc/testsuite/ * g++.target/arm/pr115485.C: New test. (cherry picked from commit b1d0ac2)
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Support for Apple Silicon!!!