Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[IPSCCP] Infer attributes on arguments #107114

Merged
merged 2 commits into from
Sep 16, 2024
Merged

[IPSCCP] Infer attributes on arguments #107114

merged 2 commits into from
Sep 16, 2024

Conversation

nikic
Copy link
Contributor

@nikic nikic commented Sep 3, 2024

During inter-procedural SCCP, also infer attributes on arguments, not just return values. This allows other non-interprocedural passes to make use of the information lateron.

Draft because this causes a huge 4% compile-time regression during the clang thinlto link stage (https://llvm-compile-time-tracker.com/compare.php?from=0797c184c636889f2897746dc71390ae28005c7c&to=47673cd4f5e9ba87f13a94e23574ebf93c53f339&stat=instructions:u).

@nikic
Copy link
Contributor Author

nikic commented Sep 4, 2024

$ bloaty clang-new -d symbols -- clang-old
    FILE SIZE        VM SIZE    
 --------------  -------------- 
  +3.1%  +849Ki  +4.3%  +851Ki    [32444 Others]
  +3.3% +21.0Ki  +3.4% +21.0Ki    clang::RecursiveASTVisitor<>::TraverseStmt()
  +1.8% +20.1Ki  +1.8% +20.1Ki    [section .text]
   +30% +16.4Ki   +37% +16.4Ki    clang::RecursiveASTVisitor<>::TraverseType()
   +19% +15.2Ki   +25% +15.2Ki    clang::RecursiveASTVisitor<>::TraverseGCCAsmStmt()
   +27% +10.4Ki   +56% +10.4Ki    clang::RecursiveASTVisitor<>::TraverseCXXForRangeStmt()
  +8.0% +10.2Ki  +9.5% +10.2Ki    clang::RecursiveASTVisitor<>::TraverseLambdaExpr()
   +25% +9.30Ki   +62% +9.30Ki    clang::RecursiveASTVisitor<>::TraversePseudoObjectExpr()
  +8.8% +6.19Ki   +13% +6.19Ki    clang::RecursiveASTVisitor<>::TraverseArrayInitLoopExpr()
  +8.1% +6.08Ki   +11% +6.08Ki    clang::RecursiveASTVisitor<>::TraverseEmbedExpr()
  +8.5% +5.89Ki   +12% +5.89Ki    clang::RecursiveASTVisitor<>::TraverseExpressionTraitExpr()
  [NEW] +5.86Ki  [NEW] +5.68Ki    _ZN12_GLOBAL__N_119getMemberAttributesB5cxx11ERN4llvm8codeview16CodeViewRecordIOENS1_12MemberAccessENS1_10MethodKindENS1_13MethodOptionsE.llvm.5976913522079867229
  +7.7% +5.31Ki   +11% +5.31Ki    clang::RecursiveASTVisitor<>::TraverseTypeTraitExpr()
  +7.3% +5.28Ki   +11% +5.28Ki    clang::RecursiveASTVisitor<>::TraverseOpenACCComputeConstruct()
  +7.3% +5.28Ki   +11% +5.28Ki    clang::RecursiveASTVisitor<>::TraverseOpenACCLoopConstruct()
  +6.6% +5.09Ki  +9.4% +5.09Ki    clang::RecursiveASTVisitor<>::TraverseObjCPropertyRefExpr()
  +6.7% +5.00Ki  +9.6% +5.00Ki    clang::RecursiveASTVisitor<>::TraverseUnresolvedLookupExpr()
  +6.7% +5.00Ki  +9.6% +5.00Ki    clang::RecursiveASTVisitor<>::TraverseUnresolvedMemberExpr()
  +6.9% +4.91Ki   +10% +4.91Ki    clang::RecursiveASTVisitor<>::TraverseMaterializeTemporaryExpr()
  [DEL] -5.86Ki  [DEL] -5.68Ki    _ZN12_GLOBAL__N_119getMemberAttributesB5cxx11ERN4llvm8codeview16CodeViewRecordIOENS1_12MemberAccessENS1_10MethodKindENS1_13MethodOptionsE.llvm.17377905528244513630
 -78.7% -15.8Ki -76.7% -4.35Ki    clang::RecursiveASTVisitor<>::TraverseSubstTemplateTypeParmPackType()
  +0.7%  +990Ki  +0.8% +1003Ki    TOTAL

Inlining of RecursiveASTVisitor has been perturbed, which has outsized effects.

@nikic
Copy link
Contributor Author

nikic commented Sep 6, 2024

Okay, I've tracked down what is happening here. We infer more nonnull attributes, and then based on those determine that some recursive RecursiveASTVisitor::TraverseStmt calls reduce to just a push_back call (

Queue->push_back({S, false});
) and inline it. This adds a decent chunk of extra code to many methods. An additional effect on top of that is that this allows us to optimize away the %this argument on some methods, which means that the recursive calls now no longer have a uniform signature, so we get additional register moves. These effects are extremely amplified because they apply across a very large number of RecursiveASTVisitor methods, which in turn are instantiated a large number of times.

The issue can be added by adding a noinline attribute to this method: #107601

With that done, the compile-time numbers look good: http://llvm-compile-time-tracker.com/compare.php?from=0797c184c636889f2897746dc71390ae28005c7c&to=464036f828c4c9f46f7411a40b49e249e6b47dce&stat=instructions%3Au

@nikic nikic marked this pull request as ready for review September 6, 2024 15:56
@llvmbot
Copy link
Member

llvmbot commented Sep 6, 2024

@llvm/pr-subscribers-llvm-ir
@llvm/pr-subscribers-function-specialization

@llvm/pr-subscribers-llvm-transforms

Author: Nikita Popov (nikic)

Changes

During inter-procedural SCCP, also infer attributes on arguments, not just return values. This allows other non-interprocedural passes to make use of the information lateron.

Draft because this causes a huge 4% compile-time regression during the clang thinlto link stage (https://llvm-compile-time-tracker.com/compare.php?from=0797c184c636889f2897746dc71390ae28005c7c&amp;to=47673cd4f5e9ba87f13a94e23574ebf93c53f339&amp;stat=instructions:u).


Full diff: https://github.com/llvm/llvm-project/pull/107114.diff

12 Files Affected:

  • (modified) llvm/include/llvm/Transforms/Utils/SCCPSolver.h (+3)
  • (modified) llvm/lib/Transforms/IPO/SCCP.cpp (+1)
  • (modified) llvm/lib/Transforms/Utils/SCCPSolver.cpp (+45-22)
  • (modified) llvm/test/Transforms/FunctionSpecialization/discover-transitive-phis.ll (+1-1)
  • (modified) llvm/test/Transforms/SCCP/ip-add-range-to-call.ll (+2-2)
  • (modified) llvm/test/Transforms/SCCP/ip-constant-ranges.ll (+8-8)
  • (modified) llvm/test/Transforms/SCCP/ip-ranges-casts.ll (+7-7)
  • (modified) llvm/test/Transforms/SCCP/ip-ranges-phis.ll (+3-3)
  • (modified) llvm/test/Transforms/SCCP/ip-ranges-select.ll (+2-2)
  • (modified) llvm/test/Transforms/SCCP/musttail-call.ll (+1-1)
  • (modified) llvm/test/Transforms/SCCP/resolvedundefsin-tracked-fn.ll (+2-2)
  • (modified) llvm/test/Transforms/SCCP/switch.ll (+1-1)
diff --git a/llvm/include/llvm/Transforms/Utils/SCCPSolver.h b/llvm/include/llvm/Transforms/Utils/SCCPSolver.h
index 61a500b82875fb..696f39ca984d12 100644
--- a/llvm/include/llvm/Transforms/Utils/SCCPSolver.h
+++ b/llvm/include/llvm/Transforms/Utils/SCCPSolver.h
@@ -104,6 +104,8 @@ class SCCPSolver {
   /// argument-tracked functions.
   bool isArgumentTrackedFunction(Function *F);
 
+  const SmallPtrSetImpl<Function *> &getArgumentTrackedFunctions() const;
+
   /// Solve - Solve for constants and executable blocks.
   void solve();
 
@@ -191,6 +193,7 @@ class SCCPSolver {
                               BasicBlock *&NewUnreachableBB) const;
 
   void inferReturnAttributes() const;
+  void inferArgAttributes() const;
 
   bool tryToReplaceWithConstant(Value *V);
 
diff --git a/llvm/lib/Transforms/IPO/SCCP.cpp b/llvm/lib/Transforms/IPO/SCCP.cpp
index f0d75a2016363a..3b88e290f7a327 100644
--- a/llvm/lib/Transforms/IPO/SCCP.cpp
+++ b/llvm/lib/Transforms/IPO/SCCP.cpp
@@ -278,6 +278,7 @@ static bool runIPSCCP(
   SmallVector<ReturnInst*, 8> ReturnsToZap;
 
   Solver.inferReturnAttributes();
+  Solver.inferArgAttributes();
   for (const auto &[F, ReturnValue] : Solver.getTrackedRetVals()) {
     assert(!F->getReturnType()->isVoidTy() &&
            "should not track void functions");
diff --git a/llvm/lib/Transforms/Utils/SCCPSolver.cpp b/llvm/lib/Transforms/Utils/SCCPSolver.cpp
index 1dc82c6e9aa716..674ac86ee98766 100644
--- a/llvm/lib/Transforms/Utils/SCCPSolver.cpp
+++ b/llvm/lib/Transforms/Utils/SCCPSolver.cpp
@@ -354,31 +354,45 @@ bool SCCPSolver::removeNonFeasibleEdges(BasicBlock *BB, DomTreeUpdater &DTU,
   return true;
 }
 
+static void inferAttribute(Function *F, unsigned AttrIndex,
+                           const ValueLatticeElement &Val) {
+  // If there is a known constant range for the value, add range attribute.
+  if (Val.isConstantRange() && !Val.getConstantRange().isSingleElement()) {
+    // Do not add range attribute if the value may include undef.
+    if (Val.isConstantRangeIncludingUndef())
+      return;
+
+    // Take the intersection of the existing attribute and the inferred range.
+    Attribute OldAttr = F->getAttributeAtIndex(AttrIndex, Attribute::Range);
+    ConstantRange CR = Val.getConstantRange();
+    if (OldAttr.isValid())
+      CR = CR.intersectWith(OldAttr.getRange());
+    F->addAttributeAtIndex(
+        AttrIndex, Attribute::get(F->getContext(), Attribute::Range, CR));
+    return;
+  }
+  // Infer nonnull attribute.
+  if (Val.isNotConstant() && Val.getNotConstant()->getType()->isPointerTy() &&
+      Val.getNotConstant()->isNullValue() &&
+      !F->getAttributeAtIndex(AttrIndex, Attribute::NonNull).isValid()) {
+    F->addAttributeAtIndex(AttrIndex,
+                           Attribute::get(F->getContext(), Attribute::NonNull));
+  }
+}
+
 void SCCPSolver::inferReturnAttributes() const {
-  for (const auto &[F, ReturnValue] : getTrackedRetVals()) {
-
-    // If there is a known constant range for the return value, add range
-    // attribute to the return value.
-    if (ReturnValue.isConstantRange() &&
-        !ReturnValue.getConstantRange().isSingleElement()) {
-      // Do not add range metadata if the return value may include undef.
-      if (ReturnValue.isConstantRangeIncludingUndef())
-        continue;
+  for (const auto &[F, ReturnValue] : getTrackedRetVals())
+    inferAttribute(F, AttributeList::ReturnIndex, ReturnValue);
+}
 
-      // Take the intersection of the existing attribute and the inferred range.
-      ConstantRange CR = ReturnValue.getConstantRange();
-      if (F->hasRetAttribute(Attribute::Range))
-        CR = CR.intersectWith(F->getRetAttribute(Attribute::Range).getRange());
-      F->addRangeRetAttr(CR);
-      continue;
-    }
-    // Infer nonnull return attribute.
-    if (F->getReturnType()->isPointerTy() && ReturnValue.isNotConstant() &&
-        ReturnValue.getNotConstant()->isNullValue() &&
-        !F->hasRetAttribute(Attribute::NonNull)) {
-      F->addRetAttr(Attribute::NonNull);
+void SCCPSolver::inferArgAttributes() const {
+  for (Function *F : getArgumentTrackedFunctions()) {
+    if (!isBlockExecutable(&F->front()))
       continue;
-    }
+    for (Argument &A : F->args())
+      if (!A.getType()->isStructTy())
+        inferAttribute(F, AttributeList::FirstArgIndex + A.getArgNo(),
+                       getLatticeValueFor(&A));
   }
 }
 
@@ -779,6 +793,10 @@ class SCCPInstVisitor : public InstVisitor<SCCPInstVisitor> {
     return TrackingIncomingArguments.count(F);
   }
 
+  const SmallPtrSetImpl<Function *> &getArgumentTrackedFunctions() const {
+    return TrackingIncomingArguments;
+  }
+
   void solve();
 
   bool resolvedUndef(Instruction &I);
@@ -2153,6 +2171,11 @@ bool SCCPSolver::isArgumentTrackedFunction(Function *F) {
   return Visitor->isArgumentTrackedFunction(F);
 }
 
+const SmallPtrSetImpl<Function *> &
+SCCPSolver::getArgumentTrackedFunctions() const {
+  return Visitor->getArgumentTrackedFunctions();
+}
+
 void SCCPSolver::solve() { Visitor->solve(); }
 
 bool SCCPSolver::resolvedUndefsIn(Function &F) {
diff --git a/llvm/test/Transforms/FunctionSpecialization/discover-transitive-phis.ll b/llvm/test/Transforms/FunctionSpecialization/discover-transitive-phis.ll
index d0095231a30f93..8a172db2665aa4 100644
--- a/llvm/test/Transforms/FunctionSpecialization/discover-transitive-phis.ll
+++ b/llvm/test/Transforms/FunctionSpecialization/discover-transitive-phis.ll
@@ -29,7 +29,7 @@ entry:
 
 define internal i64 @foo(i64 %n, i1 %c1, i1 %c2, i1 %c3, i1 %c4, i1 %c5, i1 %c6, i1 %c7, i1 %c8, i1 %c9, i1 %c10) {
 ; NOFUNCSPEC-LABEL: define internal range(i64 2, 7) i64 @foo(
-; NOFUNCSPEC-SAME: i64 [[N:%.*]], i1 [[C1:%.*]], i1 [[C2:%.*]], i1 [[C3:%.*]], i1 [[C4:%.*]], i1 [[C5:%.*]], i1 [[C6:%.*]], i1 [[C7:%.*]], i1 [[C8:%.*]], i1 [[C9:%.*]], i1 [[C10:%.*]]) {
+; NOFUNCSPEC-SAME: i64 range(i64 3, 5) [[N:%.*]], i1 [[C1:%.*]], i1 [[C2:%.*]], i1 [[C3:%.*]], i1 [[C4:%.*]], i1 [[C5:%.*]], i1 [[C6:%.*]], i1 [[C7:%.*]], i1 [[C8:%.*]], i1 [[C9:%.*]], i1 [[C10:%.*]]) {
 ; NOFUNCSPEC-NEXT:  entry:
 ; NOFUNCSPEC-NEXT:    br i1 [[C1]], label [[L1:%.*]], label [[L9:%.*]]
 ; NOFUNCSPEC:       l1:
diff --git a/llvm/test/Transforms/SCCP/ip-add-range-to-call.ll b/llvm/test/Transforms/SCCP/ip-add-range-to-call.ll
index 91efbcc4ee3825..14a5900b9e9908 100644
--- a/llvm/test/Transforms/SCCP/ip-add-range-to-call.ll
+++ b/llvm/test/Transforms/SCCP/ip-add-range-to-call.ll
@@ -6,7 +6,7 @@
 ; can be added to call sites.
 define internal i32 @callee(i32 %x) {
 ; CHECK-LABEL: define internal range(i32 0, 21) i32 @callee(
-; CHECK-SAME: i32 [[X:%.*]]) {
+; CHECK-SAME: i32 range(i32 0, 21) [[X:%.*]]) {
 ; CHECK-NEXT:    ret i32 [[X]]
 ;
   ret i32 %x
@@ -133,7 +133,7 @@ define void @caller_cb3() {
 ; should be added at call sites.
 define internal i32 @callee5(i32 %x, i32 %y) {
 ; CHECK-LABEL: define internal i32 @callee5(
-; CHECK-SAME: i32 [[X:%.*]], i32 [[Y:%.*]]) {
+; CHECK-SAME: i32 range(i32 10, 21) [[X:%.*]], i32 range(i32 100, 201) [[Y:%.*]]) {
 ; CHECK-NEXT:    [[C:%.*]] = icmp slt i32 [[X]], 15
 ; CHECK-NEXT:    br i1 [[C]], label [[BB1:%.*]], label [[BB2:%.*]]
 ; CHECK:       bb1:
diff --git a/llvm/test/Transforms/SCCP/ip-constant-ranges.ll b/llvm/test/Transforms/SCCP/ip-constant-ranges.ll
index c0cdfafe71cb2d..618c6f6413ef0b 100644
--- a/llvm/test/Transforms/SCCP/ip-constant-ranges.ll
+++ b/llvm/test/Transforms/SCCP/ip-constant-ranges.ll
@@ -4,7 +4,7 @@
 ; Constant range for %a is [1, 48) and for %b is [301, 1000)
 define internal i32 @f1(i32 %a, i32 %b) {
 ; CHECK-LABEL: define {{[^@]+}}@f1
-; CHECK-SAME: (i32 [[A:%.*]], i32 [[B:%.*]]) {
+; CHECK-SAME: (i32 range(i32 1, 48) [[A:%.*]], i32 range(i32 301, 1000) [[B:%.*]]) {
 ; CHECK-NEXT:  entry:
 ; CHECK-NEXT:    ret i32 poison
 ;
@@ -27,7 +27,7 @@ entry:
 ; Constant range for %x is [47, 302)
 define internal i32 @f2(i32 %x) {
 ; CHECK-LABEL: define {{[^@]+}}@f2
-; CHECK-SAME: (i32 [[X:%.*]]) {
+; CHECK-SAME: (i32 range(i32 47, 302) [[X:%.*]]) {
 ; CHECK-NEXT:  entry:
 ; CHECK-NEXT:    [[CMP:%.*]] = icmp sgt i32 [[X]], 300
 ; CHECK-NEXT:    [[CMP4:%.*]] = icmp ugt i32 [[X]], 300
@@ -79,7 +79,7 @@ entry:
 
 define internal i32 @f3(i32 %x) {
 ; CHECK-LABEL: define {{[^@]+}}@f3
-; CHECK-SAME: (i32 [[X:%.*]]) {
+; CHECK-SAME: (i32 range(i32 0, 2) [[X:%.*]]) {
 ; CHECK-NEXT:  entry:
 ; CHECK-NEXT:    ret i32 poison
 ;
@@ -116,7 +116,7 @@ end:
 
 define internal i32 @f4(i32 %x) {
 ; CHECK-LABEL: define {{[^@]+}}@f4
-; CHECK-SAME: (i32 [[X:%.*]]) {
+; CHECK-SAME: (i32 range(i32 301, -2147483648) [[X:%.*]]) {
 ; CHECK-NEXT:  entry:
 ; CHECK-NEXT:    ret i32 poison
 ;
@@ -170,7 +170,7 @@ entry:
 
 define internal i1 @test_unreachable_callee(i32 %a) {
 ; CHECK-LABEL: define {{[^@]+}}@test_unreachable_callee
-; CHECK-SAME: (i32 [[A:%.*]]) {
+; CHECK-SAME: (i32 range(i32 1, 3) [[A:%.*]]) {
 ; CHECK-NEXT:  entry:
 ; CHECK-NEXT:    ret i1 poison
 ;
@@ -199,7 +199,7 @@ define double @test_struct({ double, double } %test) {
 ; Constant range for %x is [47, 302)
 define internal i32 @f5(i32 %x) {
 ; CHECK-LABEL: define {{[^@]+}}@f5
-; CHECK-SAME: (i32 [[X:%.*]]) {
+; CHECK-SAME: (i32 range(i32 47, 302) [[X:%.*]]) {
 ; CHECK-NEXT:  entry:
 ; CHECK-NEXT:    [[CMP:%.*]] = icmp sgt i32 [[X]], undef
 ; CHECK-NEXT:    [[CMP2:%.*]] = icmp ne i32 undef, [[X]]
@@ -282,7 +282,7 @@ entry:
 
 define internal i32 @callee6.1(i32 %i) {
 ; CHECK-LABEL: define {{[^@]+}}@callee6.1
-; CHECK-SAME: (i32 [[I:%.*]]) {
+; CHECK-SAME: (i32 range(i32 30, 44) [[I:%.*]]) {
 ; CHECK-NEXT:    [[RES:%.*]] = call i32 @callee6.2(i32 [[I]])
 ; CHECK-NEXT:    ret i32 poison
 ;
@@ -292,7 +292,7 @@ define internal i32 @callee6.1(i32 %i) {
 
 define internal i32 @callee6.2(i32 %i) {
 ; CHECK-LABEL: define {{[^@]+}}@callee6.2
-; CHECK-SAME: (i32 [[I:%.*]]) {
+; CHECK-SAME: (i32 range(i32 30, 44) [[I:%.*]]) {
 ; CHECK-NEXT:    br label [[IF_THEN:%.*]]
 ; CHECK:       if.then:
 ; CHECK-NEXT:    ret i32 poison
diff --git a/llvm/test/Transforms/SCCP/ip-ranges-casts.ll b/llvm/test/Transforms/SCCP/ip-ranges-casts.ll
index e8d417546def85..d9dc0459b5ced6 100644
--- a/llvm/test/Transforms/SCCP/ip-ranges-casts.ll
+++ b/llvm/test/Transforms/SCCP/ip-ranges-casts.ll
@@ -4,7 +4,7 @@
 ; x = [100, 301)
 define internal i1 @f.trunc(i32 %x) {
 ; CHECK-LABEL: define internal i1 @f.trunc(
-; CHECK-SAME: i32 [[X:%.*]]) {
+; CHECK-SAME: i32 range(i32 100, 301) [[X:%.*]]) {
 ; CHECK-NEXT:    [[T_1:%.*]] = trunc nuw nsw i32 [[X]] to i16
 ; CHECK-NEXT:    [[C_2:%.*]] = icmp sgt i16 [[T_1]], 299
 ; CHECK-NEXT:    [[C_4:%.*]] = icmp slt i16 [[T_1]], 101
@@ -60,7 +60,7 @@ define i1 @caller1() {
 ; x = [100, 301)
 define internal i1 @f.zext(i32 %x, i32 %y) {
 ; CHECK-LABEL: define internal i1 @f.zext(
-; CHECK-SAME: i32 [[X:%.*]], i32 [[Y:%.*]]) {
+; CHECK-SAME: i32 range(i32 100, 301) [[X:%.*]], i32 range(i32 -120, 901) [[Y:%.*]]) {
 ; CHECK-NEXT:    [[T_1:%.*]] = zext nneg i32 [[X]] to i64
 ; CHECK-NEXT:    [[C_2:%.*]] = icmp sgt i64 [[T_1]], 299
 ; CHECK-NEXT:    [[C_4:%.*]] = icmp slt i64 [[T_1]], 101
@@ -114,7 +114,7 @@ define i1 @caller.zext() {
 ; x = [100, 301)
 define internal i1 @f.sext(i32 %x, i32 %y) {
 ; CHECK-LABEL: define internal i1 @f.sext(
-; CHECK-SAME: i32 [[X:%.*]], i32 [[Y:%.*]]) {
+; CHECK-SAME: i32 range(i32 100, 301) [[X:%.*]], i32 range(i32 -120, 901) [[Y:%.*]]) {
 ; CHECK-NEXT:    [[T_1:%.*]] = zext nneg i32 [[X]] to i64
 ; CHECK-NEXT:    [[C_2:%.*]] = icmp sgt i64 [[T_1]], 299
 ; CHECK-NEXT:    [[C_4:%.*]] = icmp slt i64 [[T_1]], 101
@@ -166,7 +166,7 @@ define i1 @caller.sext() {
 ; There's nothing we can do besides going to the full range or overdefined.
 define internal i1 @f.fptosi(i32 %x) {
 ; CHECK-LABEL: define internal i1 @f.fptosi(
-; CHECK-SAME: i32 [[X:%.*]]) {
+; CHECK-SAME: i32 range(i32 100, 301) [[X:%.*]]) {
 ; CHECK-NEXT:    [[TO_DOUBLE:%.*]] = uitofp nneg i32 [[X]] to double
 ; CHECK-NEXT:    [[ADD:%.*]] = fadd double 0.000000e+00, [[TO_DOUBLE]]
 ; CHECK-NEXT:    [[TO_I32:%.*]] = fptosi double [[ADD]] to i32
@@ -208,7 +208,7 @@ define i1 @caller.fptosi() {
 ; There's nothing we can do besides going to the full range or overdefined.
 define internal i1 @f.fpext(i16 %x) {
 ; CHECK-LABEL: define internal i1 @f.fpext(
-; CHECK-SAME: i16 [[X:%.*]]) {
+; CHECK-SAME: i16 range(i16 100, 301) [[X:%.*]]) {
 ; CHECK-NEXT:    [[TO_FLOAT:%.*]] = uitofp nneg i16 [[X]] to float
 ; CHECK-NEXT:    [[TO_DOUBLE:%.*]] = fpext float [[TO_FLOAT]] to double
 ; CHECK-NEXT:    [[TO_I64:%.*]] = fptoui float [[TO_FLOAT]] to i64
@@ -251,7 +251,7 @@ define i1 @caller.fpext() {
 ; There's nothing we can do besides going to the full range or overdefined.
 define internal i1 @f.inttoptr.ptrtoint(i64 %x) {
 ; CHECK-LABEL: define internal i1 @f.inttoptr.ptrtoint(
-; CHECK-SAME: i64 [[X:%.*]]) {
+; CHECK-SAME: i64 range(i64 100, 301) [[X:%.*]]) {
 ; CHECK-NEXT:    [[TO_PTR:%.*]] = inttoptr i64 [[X]] to ptr
 ; CHECK-NEXT:    [[TO_I64:%.*]] = ptrtoint ptr [[TO_PTR]] to i64
 ; CHECK-NEXT:    [[C_1:%.*]] = icmp sgt i64 [[TO_I64]], 300
@@ -325,7 +325,7 @@ entry:
 
 define internal i64 @f.sext_to_zext(i32 %t) {
 ; CHECK-LABEL: define internal range(i64 0, 2) i64 @f.sext_to_zext(
-; CHECK-SAME: i32 [[T:%.*]]) {
+; CHECK-SAME: i32 range(i32 0, 2) [[T:%.*]]) {
 ; CHECK-NEXT:    [[A:%.*]] = zext nneg i32 [[T]] to i64
 ; CHECK-NEXT:    ret i64 [[A]]
 ;
diff --git a/llvm/test/Transforms/SCCP/ip-ranges-phis.ll b/llvm/test/Transforms/SCCP/ip-ranges-phis.ll
index 4fb73c532f2e26..5db270415cf0e2 100644
--- a/llvm/test/Transforms/SCCP/ip-ranges-phis.ll
+++ b/llvm/test/Transforms/SCCP/ip-ranges-phis.ll
@@ -3,7 +3,7 @@
 
 define internal i32 @f1(i32 %x) {
 ; CHECK-LABEL: define {{[^@]+}}@f1
-; CHECK-SAME: (i32 [[X:%.*]]) {
+; CHECK-SAME: (i32 range(i32 0, 2) [[X:%.*]]) {
 ; CHECK-NEXT:    ret i32 poison
 ;
   %cmp = icmp sgt i32 %x, 300
@@ -40,7 +40,7 @@ end:
 
 define internal i32 @f2(i32 %x, i32 %y, i32 %z, i1 %cmp.1, i1 %cmp.2) {
 ; CHECK-LABEL: define {{[^@]+}}@f2
-; CHECK-SAME: (i32 [[X:%.*]], i32 [[Y:%.*]], i32 [[Z:%.*]], i1 [[CMP_1:%.*]], i1 [[CMP_2:%.*]]) {
+; CHECK-SAME: (i32 range(i32 0, 2) [[X:%.*]], i32 range(i32 -10, 2) [[Y:%.*]], i32 range(i32 1, 11) [[Z:%.*]], i1 [[CMP_1:%.*]], i1 [[CMP_2:%.*]]) {
 ; CHECK-NEXT:  entry:
 ; CHECK-NEXT:    br i1 [[CMP_1]], label [[IF_TRUE_1:%.*]], label [[END:%.*]]
 ; CHECK:       if.true.1:
@@ -133,7 +133,7 @@ end:
 
 define internal i32 @f3(i32 %x, i32 %y, i1 %cmp.1) {
 ; CHECK-LABEL: define {{[^@]+}}@f3
-; CHECK-SAME: (i32 [[X:%.*]], i32 [[Y:%.*]], i1 [[CMP_1:%.*]]) {
+; CHECK-SAME: (i32 range(i32 0, 6) [[X:%.*]], i32 [[Y:%.*]], i1 [[CMP_1:%.*]]) {
 ; CHECK-NEXT:  entry:
 ; CHECK-NEXT:    br i1 [[CMP_1]], label [[IF_TRUE_1:%.*]], label [[END:%.*]]
 ; CHECK:       if.true.1:
diff --git a/llvm/test/Transforms/SCCP/ip-ranges-select.ll b/llvm/test/Transforms/SCCP/ip-ranges-select.ll
index 7a507eaa74b1ab..6ce46afa0909f1 100644
--- a/llvm/test/Transforms/SCCP/ip-ranges-select.ll
+++ b/llvm/test/Transforms/SCCP/ip-ranges-select.ll
@@ -18,7 +18,7 @@ define void @caller.1(ptr %arg) {
 
 define internal i32 @callee.1(i32 %arg) {
 ; CHECK-LABEL: define {{[^@]+}}@callee.1
-; CHECK-SAME: (i32 [[ARG:%.*]]) {
+; CHECK-SAME: (i32 range(i32 2, 5) [[ARG:%.*]]) {
 ; CHECK-NEXT:    [[SEL:%.*]] = select i1 false, i32 16, i32 [[ARG]]
 ; CHECK-NEXT:    br label [[BB10:%.*]]
 ; CHECK:       bb10:
@@ -40,7 +40,7 @@ declare void @use(i32)
 
 define internal i1 @f1(i32 %x, i32 %y, i1 %cmp) {
 ; CHECK-LABEL: define {{[^@]+}}@f1
-; CHECK-SAME: (i32 [[X:%.*]], i32 [[Y:%.*]], i1 [[CMP:%.*]]) {
+; CHECK-SAME: (i32 range(i32 10, 21) [[X:%.*]], i32 range(i32 100, 201) [[Y:%.*]], i1 [[CMP:%.*]]) {
 ; CHECK-NEXT:    [[SEL_1:%.*]] = select i1 [[CMP]], i32 [[X]], i32 [[Y]]
 ; CHECK-NEXT:    [[C_2:%.*]] = icmp sgt i32 [[SEL_1]], 100
 ; CHECK-NEXT:    [[C_3:%.*]] = icmp eq i32 [[SEL_1]], 50
diff --git a/llvm/test/Transforms/SCCP/musttail-call.ll b/llvm/test/Transforms/SCCP/musttail-call.ll
index 085662aad8b850..d5e0dfd0835a81 100644
--- a/llvm/test/Transforms/SCCP/musttail-call.ll
+++ b/llvm/test/Transforms/SCCP/musttail-call.ll
@@ -71,7 +71,7 @@ define internal ptr @no_side_effects(i8 %v) readonly nounwind willreturn {
 ; return value should stay as it is, and should not be zapped.
 define internal ptr @dont_zap_me(i8 %v) {
 ; CHECK-LABEL: define {{[^@]+}}@dont_zap_me
-; CHECK-SAME: (i8 [[V:%.*]]) {
+; CHECK-SAME: (i8 range(i8 2, 0) [[V:%.*]]) {
 ; CHECK-NEXT:    [[I1:%.*]] = call i32 @external()
 ; CHECK-NEXT:    ret ptr null
 ;
diff --git a/llvm/test/Transforms/SCCP/resolvedundefsin-tracked-fn.ll b/llvm/test/Transforms/SCCP/resolvedundefsin-tracked-fn.ll
index 88e9e6568c6475..07913cd3ecb4f3 100644
--- a/llvm/test/Transforms/SCCP/resolvedundefsin-tracked-fn.ll
+++ b/llvm/test/Transforms/SCCP/resolvedundefsin-tracked-fn.ll
@@ -56,7 +56,7 @@ entry:
 
 define internal i1 @test1_g(ptr %h, i32 %i) #0 {
 ; CHECK-LABEL: define {{[^@]+}}@test1_g
-; CHECK-SAME: (ptr [[H:%.*]], i32 [[I:%.*]]) {
+; CHECK-SAME: (ptr [[H:%.*]], i32 range(i32 0, 2) [[I:%.*]]) {
 ; CHECK-NEXT:  entry:
 ; CHECK-NEXT:    [[TOBOOL:%.*]] = icmp ne i32 [[I]], 0
 ; CHECK-NEXT:    br i1 [[TOBOOL]], label [[LAND_RHS:%.*]], label [[LAND_END:%.*]]
@@ -221,7 +221,7 @@ exit:
 
 define internal i1 @test3_g(ptr %h, i32 %i) {
 ; CHECK-LABEL: define {{[^@]+}}@test3_g
-; CHECK-SAME: (ptr [[H:%.*]], i32 [[I:%.*]]) {
+; CHECK-SAME: (ptr [[H:%.*]], i32 range(i32 0, 2) [[I:%.*]]) {
 ; CHECK-NEXT:  entry:
 ; CHECK-NEXT:    [[TOBOOL:%.*]] = icmp ne i32 [[I]], 0
 ; CHECK-NEXT:    br i1 [[TOBOOL]], label [[LAND_RHS:%.*]], label [[LAND_END:%.*]]
diff --git a/llvm/test/Transforms/SCCP/switch.ll b/llvm/test/Transforms/SCCP/switch.ll
index 5208213de210c1..fb81213ed12dc8 100644
--- a/llvm/test/Transforms/SCCP/switch.ll
+++ b/llvm/test/Transforms/SCCP/switch.ll
@@ -207,7 +207,7 @@ switch.2:
 ; range information.
 define internal i32 @test_ip_range(i32 %x) {
 ; CHECK-LABEL: define internal range(i32 1, 4) i32 @test_ip_range(
-; CHECK-SAME: i32 [[X:%.*]]) {
+; CHECK-SAME: i32 range(i32 1, 4) [[X:%.*]]) {
 ; CHECK-NEXT:    switch i32 [[X]], label [[DEFAULT_UNREACHABLE:%.*]] [
 ; CHECK-NEXT:      i32 3, label [[SWITCH_3:%.*]]
 ; CHECK-NEXT:      i32 1, label [[SWITCH_1:%.*]]

During inter-procedural SCCP, also infer attributes on arguments,
not just return values. This allows other non-interprocedural
passes to make use of the information lateron.
Copy link
Contributor

@andjo403 andjo403 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LTGM

@nikic nikic requested review from fhahn and dtcxzyw September 9, 2024 09:55
dtcxzyw added a commit to dtcxzyw/llvm-opt-benchmark that referenced this pull request Sep 9, 2024
@dtcxzyw
Copy link
Member

dtcxzyw commented Sep 9, 2024

Missing fold: https://alive2.llvm.org/ce/z/dufVNK
Missing canonicalization: https://alive2.llvm.org/ce/z/8ViuoR

@nikic
Copy link
Contributor Author

nikic commented Sep 9, 2024

Missing fold: https://alive2.llvm.org/ce/z/dufVNK

Isn't this handled by CVP? https://alive2.llvm.org/ce/z/2sDmsW

@dtcxzyw
Copy link
Member

dtcxzyw commented Sep 9, 2024

Missing fold: https://alive2.llvm.org/ce/z/dufVNK

Isn't this handled by CVP? https://alive2.llvm.org/ce/z/2sDmsW

It seems like a phase-ordering problem.

Reduced reproducer:

target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"
target triple = "x86_64-pc-linux-gnu"

define internal i64 @Abc_Tt6Stretch(i64 %0, i32 noundef %1) {
  %3 = icmp eq i32 %1, 0
  %spec.select = call i32 @llvm.umax.i32(i32 %1, i32 1)
  %.1 = select i1 %3, i32 0, i32 %spec.select
  %4 = icmp eq i32 %.1, 2
  %.06 = select i1 %4, i64 0, i64 %0
  %5 = icmp eq i32 %.1, 1
  %.17 = select i1 %5, i64 1, i64 %.06
  ret i64 %.17
}

define void @Wlc_BlastLut(i64 %0, i32 %1, ptr %2) {
  %4 = icmp slt i32 %1, 6
  br i1 %4, label %5, label %7

5:                                                ; preds = %3
  %6 = call i64 @Abc_Tt6Stretch(i64 %0, i32 %1)
  store i64 %6, ptr %2, align 8
  br label %7

7:                                                ; preds = %5, %3
  ret void
}

; Function Attrs: nocallback nofree nosync nounwind speculatable willreturn memory(none)
declare i32 @llvm.umax.i32(i32, i32) #0

attributes #0 = { nocallback nofree nosync nounwind speculatable willreturn memory(none) }

Before (0f50330):

; opt -O3 reduced.ll -S
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"
target triple = "x86_64-pc-linux-gnu"

; Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(argmem: write)
define void @Wlc_BlastLut(i64 %0, i32 %1, ptr nocapture writeonly %2) local_unnamed_addr #0 {
  %4 = icmp slt i32 %1, 6
  br i1 %4, label %5, label %8

5:                                                ; preds = %3
  %6 = icmp eq i32 %1, 2
  %.06.i = select i1 %6, i64 0, i64 %0
  %7 = icmp eq i32 %1, 1
  %.17.i = select i1 %7, i64 1, i64 %.06.i
  store i64 %.17.i, ptr %2, align 8
  br label %8

8:                                                ; preds = %5, %3
  ret void
}

attributes #0 = { mustprogress nofree norecurse nosync nounwind willreturn memory(argmem: write) }

After:

; bin/opt -O3 reduced.ll -S
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"
target triple = "x86_64-pc-linux-gnu"

; Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(argmem: write)
define void @Wlc_BlastLut(i64 %0, i32 %1, ptr nocapture writeonly %2) local_unnamed_addr #0 {
  %4 = icmp slt i32 %1, 6
  br i1 %4, label %5, label %9

5:                                                ; preds = %3
  %6 = icmp eq i32 %1, 0
  %spec.select.i = tail call i32 @llvm.umax.i32(i32 %1, i32 1)
  %.1.i = select i1 %6, i32 0, i32 %spec.select.i
  %7 = icmp eq i32 %.1.i, 2
  %.06.i = select i1 %7, i64 0, i64 %0
  %8 = icmp eq i32 %.1.i, 1
  %.17.i = select i1 %8, i64 1, i64 %.06.i
  store i64 %.17.i, ptr %2, align 8
  br label %9

9:                                                ; preds = %5, %3
  ret void
}

; Function Attrs: mustprogress nocallback nofree nosync nounwind speculatable willreturn memory(none)
declare i32 @llvm.umax.i32(i32, i32) #1

attributes #0 = { mustprogress nofree norecurse nosync nounwind willreturn memory(argmem: write) }
attributes #1 = { mustprogress nocallback nofree nosync nounwind speculatable willreturn memory(none) }

Copy link
Member

@dtcxzyw dtcxzyw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thank you!

Copy link
Contributor

@fhahn fhahn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@nikic
Copy link
Contributor Author

nikic commented Sep 11, 2024

Missing fold: https://alive2.llvm.org/ce/z/dufVNK

Isn't this handled by CVP? https://alive2.llvm.org/ce/z/2sDmsW

It seems like a phase-ordering problem.

The problem is loss of information due to ConstantRange intersection. We already have a range for the value from the argument attribute, and the select condition excludes part of that range. But this exclusion is small, so we stay with the original range as the more "optimal" one (by size). I don't think there's any easy solution to this. (We could track ranges for two sign preferences, but this will add a lot of overhead.)

nikic added a commit that referenced this pull request Sep 16, 2024
As things are now, this reduces size of clang bootstrapped with ThinLTO
by 0.3% and reduces thin link time by 0.3%. More importantly, it avoids
a large regression once #107114
is merged. Without this change, there would be a 0.4% regression in code
size and 4% (!) regression in thin link time. There is no impact on
run-time performance.
@nikic nikic merged commit b7e51b4 into llvm:main Sep 16, 2024
9 checks passed
@nikic nikic deleted the ipsccp-arg-attrs branch September 16, 2024 08:23
@alanzhao1
Copy link
Contributor

alanzhao1 commented Sep 27, 2024

EDIT: clarification - at this point, I'm not sure if this is a bug in Chrome or in this patch.

FYI this is causing chrome's base_unittests to fail on Windows: https://crbug.com/369900509

code in question: https://crsrc.org/c/base/logging_unittest.cc;l=389-425;drc=11d26b01fde8090cf7ac9a0afd6426c7af4dd75a

The failure error message is:

logging_unittest.cc(423): Expected: (addr1) != (addr2), actual: 00C08AF3 vs 00C08AF3

[ RUN      ] LoggingTest.CheckCausesDistinctBreakpoints
..\..\base\logging_unittest.cc(423): error: Expected: (addr1) != (addr2), actual: 00C08AF3 vs 00C08AF3
Stack trace:
	logging::`anonymous namespace'::LoggingTest_CheckCausesDistinctBreakpoints_Test::TestBody [0x00D28025+1301] (o:\base\logging_unittest.cc:423)

..\..\base\logging_unittest.cc(424): error: Expected: (addr1) != (addr3), actual: 00C08AF3 vs 00C08AF3
Stack trace:
	logging::`anonymous namespace'::LoggingTest_CheckCausesDistinctBreakpoints_Test::TestBody [0x00D28132+1570] (o:\base\logging_unittest.cc:424)

..\..\base\logging_unittest.cc(425): error: Expected: (addr2) != (addr3), actual: 00C08AF3 vs 00C08AF3
Stack trace:
	logging::`anonymous namespace'::LoggingTest_CheckCausesDistinctBreakpoints_Test::TestBody [0x00D2823F+1839] (o:\base\logging_unittest.cc:425)

[  FAILED  ] LoggingTest.CheckCausesDistinctBreakpoints (474 ms)

@alanzhao1
Copy link
Contributor

EDIT: clarification - at this point, I'm not sure if this is a bug in Chrome or in this patch.

FYI this is causing chrome's base_unittests to fail on Windows: https://crbug.com/369900509

code in question: https://crsrc.org/c/base/logging_unittest.cc;l=389-425;drc=11d26b01fde8090cf7ac9a0afd6426c7af4dd75a

The failure error message is:

logging_unittest.cc(423): Expected: (addr1) != (addr2), actual: 00C08AF3 vs 00C08AF3

[ RUN      ] LoggingTest.CheckCausesDistinctBreakpoints
..\..\base\logging_unittest.cc(423): error: Expected: (addr1) != (addr2), actual: 00C08AF3 vs 00C08AF3
Stack trace:
	logging::`anonymous namespace'::LoggingTest_CheckCausesDistinctBreakpoints_Test::TestBody [0x00D28025+1301] (o:\base\logging_unittest.cc:423)

..\..\base\logging_unittest.cc(424): error: Expected: (addr1) != (addr3), actual: 00C08AF3 vs 00C08AF3
Stack trace:
	logging::`anonymous namespace'::LoggingTest_CheckCausesDistinctBreakpoints_Test::TestBody [0x00D28132+1570] (o:\base\logging_unittest.cc:424)

..\..\base\logging_unittest.cc(425): error: Expected: (addr2) != (addr3), actual: 00C08AF3 vs 00C08AF3
Stack trace:
	logging::`anonymous namespace'::LoggingTest_CheckCausesDistinctBreakpoints_Test::TestBody [0x00D2823F+1839] (o:\base\logging_unittest.cc:425)

[  FAILED  ] LoggingTest.CheckCausesDistinctBreakpoints (474 ms)

We no longer believe this is an issue with the compiler - see https://crbug.com/369900509#comment5

@rnk
Copy link
Collaborator

rnk commented Oct 2, 2024

This is a late breaking comment, but RecursiveASTVisitor is awful, and we should ban new instantiations of it in the Clang binary ASAP. cc @AaronBallman

@nikic
Copy link
Contributor Author

nikic commented Oct 2, 2024

@rnk It's a work in progress :) See #105195 and #110040.

aarongable pushed a commit to chromium/chromium that referenced this pull request Oct 3, 2024
An upcoming LLVM change
(llvm/llvm-project#107114) causes the optimizer
to remove the checks for the value of the death_location variable since
all calls to CheckContainingFunc(...) pass one of the 3 checked values.
To avoid this, we add [[clang:optnone]] to the function to prevent this
optimization.

Bug: 369900509
Fixed: 369900509
Change-Id: I502f1dd301590b8673f000b3067327813628ae3c
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/5905531
Reviewed-by: Nico Weber <[email protected]>
Commit-Queue: Nico Weber <[email protected]>
Cr-Commit-Position: refs/heads/main@{#1363470}
github-actions bot pushed a commit to kaidokert/chrome_base_mirror that referenced this pull request Oct 5, 2024
An upcoming LLVM change
(llvm/llvm-project#107114) causes the optimizer
to remove the checks for the value of the death_location variable since
all calls to CheckContainingFunc(...) pass one of the 3 checked values.
To avoid this, we add [[clang:optnone]] to the function to prevent this
optimization.

Bug: 369900509
Fixed: 369900509
Change-Id: I502f1dd301590b8673f000b3067327813628ae3c
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/5905531
Reviewed-by: Nico Weber <[email protected]>
Commit-Queue: Nico Weber <[email protected]>
Cr-Commit-Position: refs/heads/main@{#1363470}
NOKEYCHECK=True
GitOrigin-RevId: 2343bacd0698a54f1b58b633874cb65d6c725404
jrguzman-ms pushed a commit to msft-mirror-aosp/platform.external.libchrome that referenced this pull request Nov 20, 2024
An upcoming LLVM change
(llvm/llvm-project#107114) causes the optimizer
to remove the checks for the value of the death_location variable since
all calls to CheckContainingFunc(...) pass one of the 3 checked values.
To avoid this, we add [[clang:optnone]] to the function to prevent this
optimization.

Bug: 369900509
Fixed: 369900509
Change-Id: I502f1dd301590b8673f000b3067327813628ae3c
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/5905531
Reviewed-by: Nico Weber <[email protected]>
Commit-Queue: Nico Weber <[email protected]>
Cr-Commit-Position: refs/heads/main@{#1363470}


CrOS-Libchrome-Original-Commit: 2343bacd0698a54f1b58b633874cb65d6c725404
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants