Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

StatisticsMedian: Fix bug #1658

Merged
merged 2 commits into from
Aug 25, 2023
Merged

StatisticsMedian: Fix bug #1658

merged 2 commits into from
Aug 25, 2023

Conversation

jmr
Copy link
Member

@jmr jmr commented Aug 24, 2023

Previously, this could return the wrong result when there was an even number of elements.

There were two nth_element calls. The second call could change elements in [center2, end]), which was where center pointed. Therefore, *center sometimes had the wrong value after the second nth_element call.

Rewrite to use max_element instead of the second call to nth_element. This avoids modifying the vector.

Previously, this could return the wrong result when there
was an even number of elements.

There were two `nth_element` calls.  The second call could
change elements in `[center2, end])`, which was where
`center` pointed.  Therefore, `*center` sometimes had the
wrong value after the second `nth_element` call.

Rewrite to use `max_element` instead of the second call to
`nth_element`.  This avoids modifying the vector.
@LebedevRI
Copy link
Collaborator

Please add tests.

@jmr
Copy link
Member Author

jmr commented Aug 24, 2023

Please add tests.

The tests already exist. They failed sometimes. That's how I found it.

@LebedevRI
Copy link
Collaborator

Please add tests.

The tests already exist. They failed sometimes. That's how I found it.

Could you show the failure? Define sometimes? Was that failure non-deterministic? I've never seen it...
This should be tested in

TEST(StatisticsTest, Median) {
EXPECT_DOUBLE_EQ(benchmark::StatisticsMedian({42, 42, 42, 42}), 42.0);
EXPECT_DOUBLE_EQ(benchmark::StatisticsMedian({1, 2, 3, 4}), 2.5);
EXPECT_DOUBLE_EQ(benchmark::StatisticsMedian({1, 2, 5, 10, 10}), 5.0);
}
.

@jmr
Copy link
Member Author

jmr commented Aug 24, 2023

Could you show the failure?

[ RUN      ] StatisticsTest.Median
third_party/benchmark/test/statistics_gtest.cc:17: Failure
Expected equality of these values:
  benchmark::StatisticsMedian({1, 2, 3, 4})
    Which is: 3
  2.5

Define sometimes? Was that failure non-deterministic?

1/2 of the time, non-deterministically.

I've never seen it...

If you use libc++ with appropriate debug flags, you will see it. It will randomize [begin, nth) and (nth, end):

https://github.com/llvm/llvm-project/blob/5ec13535235d07eafd64058551bc495f87c283b1/libcxx/include/__algorithm/nth_element.h#L237

A longer sequence will decrease the probability of success factorially.

This should be tested in

TEST(StatisticsTest, Median) {
EXPECT_DOUBLE_EQ(benchmark::StatisticsMedian({42, 42, 42, 42}), 42.0);
EXPECT_DOUBLE_EQ(benchmark::StatisticsMedian({1, 2, 3, 4}), 2.5);
EXPECT_DOUBLE_EQ(benchmark::StatisticsMedian({1, 2, 5, 10, 10}), 5.0);
}

.

It is.

// before
// Did we have an odd number of samples? If yes, then center is the median.
// If not, then we are looking for the average between center and the value
// before. Instead of resorting, we just look for the max value before it.
Copy link
Collaborator

@LebedevRI LebedevRI Aug 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  // before.  Instead of resorting, we just look for the max value before it,
  // which is not necessarily the element immediately preceding the `center`,
  // since the `copy` is only *partially* sorted.

@LebedevRI
Copy link
Collaborator

Ok, so the actual bug here is iterator invalidation, effectively.
We should dereference center before calling std::nth_element() again, and then everything would still work.
This was really not at all obvious from the patch description, maybe improve it a bit?

But this is still an improvement, so okay.

@jmr
Copy link
Member Author

jmr commented Aug 24, 2023

Ok, so the actual bug here is iterator invalidation, effectively.

No, the iterator is still valid. The problem is an assumption that nth_element(begin, nth, end) doesn't affect (nth, end).

@LebedevRI
Copy link
Collaborator

Ok, so the actual bug here is iterator invalidation, effectively.

No, the iterator is still valid. The problem is an assumption that nth_element(begin, nth, end) doesn't affect (nth, end).

I did say "effectively" - if you consider that the center is an iterator
pointing to the median-ish element of the copy.

@dmah42 dmah42 merged commit 9f254bd into google:main Aug 25, 2023
@dmah42
Copy link
Member

dmah42 commented Aug 25, 2023

thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants