Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Releases/gcc 12 #65

Open
wants to merge 2,671 commits into
base: master
Choose a base branch
from
Open

Releases/gcc 12 #65

wants to merge 2,671 commits into from

Conversation

jacopobrusini
Copy link

Support for Apple Silicon!!!

@jwakely
Copy link
Contributor

jwakely commented Feb 21, 2024

This is an unofficial mirror that has nothing to do with the GCC project, so submitting pull requests here is a waste of time.

Also, I have no idea what this pull request is trying to do but it would never be accepted even if it was submitted to the right place.

GCC Administrator and others added 28 commits September 29, 2024 00:19
testing matrix multiplication benchmarks shows that FMA on a critical chain
is a perofrmance loss over separate multiply and add. While the latency of 4
is lower than multiply + add (3+2) the problem is that all values needs to
be ready before computation starts.

While on znver4 AVX512 code fared well with FMA, it was because of the split
registers. Znver5 benefits from avoding FMA on all widths.  This may be different
with the mobile version though.

On naive matrix multiplication benchmark the difference is 8% with -O3
only since with -Ofast loop interchange solves the problem differently.
It is 30% win, for example, on S323 from TSVC:

real_t s323(struct args_t * func_args)
{

//    recurrences
//    coupled recurrence

    initialise_arrays(__func__);
    gettimeofday(&func_args->t1, NULL);

    for (int nl = 0; nl < iterations/2; nl++) {
        for (int i = 1; i < LEN_1D; i++) {
            a[i] = b[i-1] + c[i] * d[i];
            b[i] = a[i] + c[i] * e[i];
        }
        dummy(a, b, c, d, e, aa, bb, cc, 0.);
    }

    gettimeofday(&func_args->t2, NULL);
    return calc_checksum(__func__);
}

gcc/ChangeLog:

	* config/i386/x86-tune.def (X86_TUNE_AVOID_128FMA_CHAINS): Enable for
	znver5.
	(X86_TUNE_AVOID_256FMA_CHAINS): Likewise.
	(X86_TUNE_AVOID_512FMA_CHAINS): Likewise.

(cherry picked from commit d6360b4)
split_constant_offset when looking through SSA defs can end up
picking SSA leafs that are subject to abnormal coalescing.  This
can lead to downstream consumers to insert code based on the
result (like from dataref analysis) in places that violate constraints
for abnormal coalescing.  It's best to not expand defs whose operands
are subject to abnormal coalescing - and not either do something when
a subexpression has operands like that already.

	PR tree-optimization/116585
	* tree-data-ref.cc (split_constant_offset_1): When either
	operand is subject to abnormal coalescing do no further
	processing.

	* gcc.dg/torture/pr116585.c: New testcase.

(cherry picked from commit 1d0cb3b)
Since naked functions should not enable stack protector, define
TARGET_STACK_PROTECT_RUNTIME_ENABLED_P to disable stack protector
for naked functions.

gcc/

	PR target/116962
	* config/i386/i386.cc (ix86_stack_protect_runtime_enabled_p): New
	function.
	(TARGET_STACK_PROTECT_RUNTIME_ENABLED_P): New.

gcc/testsuite/

	PR target/116962
	* gcc.target/i386/pr116962.c: New file.

Signed-off-by: H.J. Lu <[email protected]>
(cherry picked from commit 7d2845d)
Noticed testing LRA.

2024-10-05  John David Anglin  <[email protected]>

gcc/ChangeLog:

	* config/pa/pa.md: Fix indirect_got constraint.
When the library is configured with --disable-libstdcxx-verbose the
assertions just abort instead of calling __glibcxx_assert_fail, and so I
didn't export that function for the non-verbose build. However, that
option is documented to not change the library ABI, so we still need to
export the symbol from the library. It could be needed by programs
compiled against the headers from a verbose build.

The non-verbose definition can just call abort so that it doesn't pull
in I/O symbols, which are unwanted in a non-verbose build.

libstdc++-v3/ChangeLog:

	PR libstdc++/115585
	* src/c++11/assert_fail.cc (__glibcxx_assert_fail): Add
	definition for non-verbose builds.

(cherry picked from commit 52370c8)
…116641]

The changes to implement LWG 2579 (r10-327-gdb33efde17932f) made
std::string::assign use the propagate_on_container_copy_assignment
(POCCA) trait, for consistency with operator=(const basic_string&).
However, this also unintentionally affected operator=(basic_string&&)
which calls assign(str) to make a deep copy when performing a move is
not possible. The fix is for the move assignment operator to call
_M_assign(str) instead of assign(str), as this just does the deep copy
and doesn't check the POCCA trait first.

The bug only affects the unlikely/useless combination of POCCA==true and
POCMA==false, but we should fix it for correctness anyway. it should
also make move assignment slightly cheaper to compile and execute,
because we skip the extra code in assign(const basic_string&).

libstdc++-v3/ChangeLog:

	PR libstdc++/116641
	* include/bits/basic_string.h (operator=(basic_string&&)): Call
	_M_assign instead of assign.
	* testsuite/21_strings/basic_string/allocator/116641.cc: New
	test.

(cherry picked from commit c07cf41)
I misused the AC_CHECK_DECL macro, assuming that it behaved like
AC_CHECK_DECLS and always defined a HAVE_xxx macro if the decl was
found. Instead, the [action-if-found] shell commands are needed to
defined HAVE_O_NONBLOCK explicitly.

libstdc++-v3/ChangeLog:

	* configure.ac: Fix check for O_NONBLOCK.
	* config.h.in: Regenerate.
	* configure: Regenerate.

(cherry picked from commit b68561d)
We should not use [[unlikely]] before C++20, so use [[__unlikely__]]
instead.

libstdc++-v3/ChangeLog:

	* include/std/variant (_Variant_storage::_M_reset): Use
	__unlikely__ form of attribute instead of unlikely.

(cherry picked from commit 9f1cd51)
A few of these files self-identified as ext/random.tcc, update to use
the actual basename.

libstdc++-v3/ChangeLog:

	* config/cpu/aarch64/opt/ext/opt_random.h: Improve doxygen file
	docs.
	* config/cpu/i486/opt/ext/opt_random.h: Likewise.

(cherry picked from commit c2ad7b2)
There is no file ext/type_traits, point it to ext/type_traits.h instead.

libstdc++-v3/ChangeLog:

	* include/bits/cpp_type_traits.h: Improve doxygen file docs.

(cherry picked from commit f6ed7a6)
The shift operations for dynamic_bitset fail to zero out words where the
non-zero bits were shifted to a completely different word.

For a right shift we don't need to sanitize the unused bits in the high
word, because we know they were already clear and a right shift doesn't
change that.

libstdc++-v3/ChangeLog:

	PR libstdc++/115399
	* include/tr2/dynamic_bitset (operator>>=): Remove redundant
	call to _M_do_sanitize.
	* include/tr2/dynamic_bitset.tcc (_M_do_left_shift): Zero out
	low bits in words that should no longer be populated.
	(_M_do_right_shift): Likewise for high bits.
	* testsuite/tr2/dynamic_bitset/pr115399.cc: New test.

(cherry picked from commit bd3a312)
Although POSIX requires ELOOP, FreeBSD documents that openat with
O_NOFOLLOW returns EMLINK if the last component of a filename is a
symbolic link.  Check for EMLINK as well as ELOOP, so that the TOCTTOU
mitigation in remove_all works correctly.

See https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=214633 or the
FreeBSD man page for reference.

According to its man page, DragonFlyBSD also uses EMLINK for this error,
and NetBSD uses its own EFTYPE. OpenBSD follows POSIX and uses EMLINK.

This fixes these failures on FreeBSD:
FAIL: 27_io/filesystem/operations/remove_all.cc  -std=gnu++17 execution test
FAIL: experimental/filesystem/operations/remove_all.cc  -std=gnu++17 execution test

libstdc++-v3/ChangeLog:

	* src/c++17/fs_ops.cc (remove_all) [__FreeBSD__ || __DragonFly__]:
	Check for EMLINK as well as ELOOP.
	[__NetBSD__]: Check for EFTYPE as well as ELOOP.
This fixes a warning from one of the test allocators:
warning: base class 'class std::allocator<__gnu_test::copy_tracker>' should be explicitly initialized in the copy constructor [-Wextra]

libstdc++-v3/ChangeLog:

	* testsuite/util/testsuite_allocator.h (tracker_allocator):
	Initialize base class in copy constructor.

(cherry picked from commit e2fb245)
For H8/300 with -msx -mn -mint32 the type of (_M_len - __pos) is int,
because int is wider than size_t so the operands are promoted.

libstdc++-v3/ChangeLog:

	* include/std/string_view (basic_string_view::copy) Use explicit
	template argument for call to std::min<size_t>.
	(basic_string_view::substr): Likewise.
GCC Administrator and others added 30 commits February 15, 2025 00:18
Add crtbeginT.o to extra_parts on FreeBSD. This ensures we use GCC's
crt objects for static linking. Otherwise it could mix crtbeginT.o
from the base system with libgcc's crtend.o, possibly leading to
segfaults.

libgcc:
	PR target/118685
	* config.host (*-*-freebsd*): Add crtbeginT.o to extra_parts.

Signed-off-by: Dimitry Andric <[email protected]>
During combine we may end up with

(set (reg:DI 66 [ _6 ])
     (ashift:DI (reg:DI 72 [ x ])
                (subreg:QI (and:TI (reg:TI 67 [ _1 ])
                                   (const_wide_int 0x0aaaaaaaaaaaaaabf))
                           15)))

where the shift count operand does not trivially fit the scheme of
address operands.  Reject those operands, especially since
strip_address_mutations() expects expressions of the form
(and ... (const_int ...)) and fails for (and ... (const_wide_int ...)).

Thus, be more strict here and accept only CONST_INT operands.  Done by
replacing immediate_operand() with const_int_operand() which is enough
since the former only additionally checks for LEGITIMATE_PIC_OPERAND_P
and targetm.legitimate_constant_p which are always true for CONST_INT
operands.

While on it, fix indentation of the if block.

gcc/ChangeLog:

	PR target/118835
	* config/s390/s390.cc (s390_valid_shift_count): Reject shift
	count operands which do not trivially fit the scheme of
	address operands.

gcc/testsuite/ChangeLog:

	* gcc.target/s390/pr118835.c: New test.

(cherry picked from commit ac9806d)
Floating-point emulation in the D front-end is done via a type named
`struct longdouble`, which in GDC is a small interface around the
real_value type. Because the D code cannot include gcc/real.h directly,
a big enough buffer is used for the data instead.

On x86_64, this buffer is actually bigger than real_value itself, so
when a new longdouble object is created with

    longdouble r;
    real_from_string3 (&r.rv (), buffer, mode);
    return r;

there is uninitialized padding at the end of `r`.  This was never a
problem when D was implemented in C++ (until GCC 12) as comparing two
longdouble objects with `==' would be forwarded to the relevant
operator== overload that extracted the underlying real_value.

However when the front-end was translated to D, such conditions were
instead rewritten into identity comparisons

    return exp.toReal() is CTFloat.zero

The `is` operator gets lowered as a call to `memcmp() == 0', which is
where the read of uninitialized memory occurs, as seen by valgrind.

==26778== Conditional jump or move depends on uninitialised value(s)
==26778==    at 0x911F41: dmd.dstruct._isZeroInit(dmd.expression.Expression) (dstruct.d:635)
==26778==    by 0x9123BE: StructDeclaration::finalizeSize() (dstruct.d:373)
==26778==    by 0x86747C: dmd.aggregate.AggregateDeclaration.determineSize(ref const(dmd.location.Loc)) (aggregate.d:226)
[...]

To avoid accidentally reading uninitialized data, explicitly initialize
all `longdouble` variables with an empty constructor on C++ side of the
implementation before initializing underlying real_value type it holds.

	PR d/116961

gcc/d/ChangeLog:

	* d-codegen.cc (build_float_cst): Change new_value type from real_t to
	real_value.
	* d-ctfloat.cc (CTFloat::fabs): Default initialize the return value.
	(CTFloat::ldexp): Likewise.
	(CTFloat::parse): Likewise.
	* d-longdouble.cc (longdouble::add): Likewise.
	(longdouble::sub): Likewise.
	(longdouble::mul): Likewise.
	(longdouble::div): Likewise.
	(longdouble::mod): Likewise.
	(longdouble::neg): Likewise.
	* d-port.cc (Port::isFloat32LiteralOutOfRange): Likewise.
	(Port::isFloat64LiteralOutOfRange): Likewise.

gcc/testsuite/ChangeLog:

	* gdc.dg/pr116961.d: New test.

(cherry picked from commit f7bc17e)
…ed in i3 [PR118739]

The combine pass is trying to combine:

Trying 16, 22, 21 -> 23:
   16: r104:QI=flags:CCNO>0
   22: {r120:QI=r104:QI^0x1;clobber flags:CC;}
      REG_UNUSED flags:CC
   21: r119:QI=flags:CCNO<=0
      REG_DEAD flags:CCNO
   23: {r110:QI=r119:QI|r120:QI;clobber flags:CC;}
      REG_DEAD r120:QI
      REG_DEAD r119:QI
      REG_UNUSED flags:CC

and creates the following two insn sequence:

modifying insn i2    22: r104:QI=flags:CCNO>0
      REG_DEAD flags:CC
deferring rescan insn with uid = 22.
modifying insn i3    23: r110:QI=flags:CCNO<=0
      REG_DEAD flags:CC
deferring rescan insn with uid = 23.

where the REG_DEAD note in i2 is not correct, because the flags
register is still referenced in i3.  In try_combine() megafunction,
we have this part:

--cut here--
    /* Distribute all the LOG_LINKS and REG_NOTES from I1, I2, and I3.  */
    if (i3notes)
      distribute_notes (i3notes, i3, i3, newi2pat ? i2 : NULL,
			elim_i2, elim_i1, elim_i0);
    if (i2notes)
      distribute_notes (i2notes, i2, i3, newi2pat ? i2 : NULL,
			elim_i2, elim_i1, elim_i0);
    if (i1notes)
      distribute_notes (i1notes, i1, i3, newi2pat ? i2 : NULL,
			elim_i2, local_elim_i1, local_elim_i0);
    if (i0notes)
      distribute_notes (i0notes, i0, i3, newi2pat ? i2 : NULL,
			elim_i2, elim_i1, local_elim_i0);
    if (midnotes)
      distribute_notes (midnotes, NULL, i3, newi2pat ? i2 : NULL,
			elim_i2, elim_i1, elim_i0);
--cut here--

where the compiler distributes REG_UNUSED note from i2:

   22: {r120:QI=r104:QI^0x1;clobber flags:CC;}
      REG_UNUSED flags:CC

via distribute_notes() using the following:

--cut here--
	  /* Otherwise, if this register is used by I3, then this register
	     now dies here, so we must put a REG_DEAD note here unless there
	     is one already.  */
	  else if (reg_referenced_p (XEXP (note, 0), PATTERN (i3))
		   && ! (REG_P (XEXP (note, 0))
			 ? find_regno_note (i3, REG_DEAD,
					    REGNO (XEXP (note, 0)))
			 : find_reg_note (i3, REG_DEAD, XEXP (note, 0))))
	    {
	      PUT_REG_NOTE_KIND (note, REG_DEAD);
	      place = i3;
	    }
--cut here--

Flags register is used in I3, but there already is a REG_DEAD note in I3.
The above condition doesn't trigger and continues in the "else" part where
REG_DEAD note is put to I2.  The proposed solution corrects the above
logic to trigger every time the register is referenced in I3, avoiding the
"else" part.

	PR rtl-optimization/118739

gcc/ChangeLog:

	* combine.cc (distribute_notes) <case REG_UNUSED>: Correct the
	logic when the register is used by I3.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/pr118739.c: New test.

(cherry picked from commit a92dc3f)
Uros' r15-7793 fixed this PR as well, I'm just committing tests
from the PR so that it can be closed.

2025-03-04  Jakub Jelinek  <[email protected]>

	PR rtl-optimization/119071
	* gcc.dg/pr119071.c: New test.
	* gcc.c-torture/execute/pr119071.c: New test.

(cherry picked from commit ccf9db9)
…485)

Commit r9-4307-g89d7557202d25a forgot to accept a fixed PIC register
when extending the assert in require_pic_register.

arm_pic_register can be set explicitly by the user
(e.g. -mpic-register=r9) or implicitly as the default value with
-fpic/-fPIC/-fPIE and -mno-pic-data-is-text-relative -mlong-calls, and
we want to use/accept it when recording cfun->machine->pic_reg as used
to be the case.

	PR target/115485
	gcc/
	* config/arm/arm.cc (require_pic_register): Fix typos in
	comment. Handle fixed arm_pic_register.

	gcc/testsuite/
	* g++.target/arm/pr115485.C: New test.

(cherry picked from commit b1d0ac2)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.