-
-
Notifications
You must be signed in to change notification settings - Fork 375
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improved rhat diagnostic #3266
Improved rhat diagnostic #3266
Conversation
Hi, @aleksgorica and welcome to the Stan project. In terms of process, there should be an issue specifying the feature which this PR addresses. Please add an issue. I would also suggest not removing the old functionality but instead just adding new functionality for the ranked version. That way, it won't break backward compatibility when it's added. When it's ready to review for inclusion, please ping me and I can do it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just leaving comments for now. Not sure if we want to modify this code directly or have a new function
int rows = draws.rows(); | ||
int cols = draws.cols(); | ||
int size = rows * cols; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just fyi Eigen types have use index values of Eigen::Index
and std vectors use std::size_t
Generally for sizes that we know will not change we make them const
int rows = draws.rows(); | |
int cols = draws.cols(); | |
int size = rows * cols; | |
const Eigen::Index rows = draws.rows(); | |
const Eigen::Index cols = draws.cols(); | |
const Eigen::Index size = rows * cols; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(and change rest of places that use int to use Eigen::Index
for storing eigen matrix sizes)
for (int col = 0; col < cols; ++col) { | ||
for (int row = 0; row < rows; ++row) { | ||
int index | ||
= col * rows + row; // Calculating linear index in column-major order | ||
valueWithIndex[index] = {draws(row, col), index}; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For column major Eigen matrices you can just iterate through each column by row using the operator(Eigen::Index)
. Also note we use snake_case
for objects and CameCase
for template parameters
for (int col = 0; col < cols; ++col) { | |
for (int row = 0; row < rows; ++row) { | |
int index | |
= col * rows + row; // Calculating linear index in column-major order | |
valueWithIndex[index] = {draws(row, col), index}; | |
} | |
} | |
for (Eigen::Index i = 0; i < size; ++i) { | |
value_with_index[index] = {draws(i), index}; | |
} |
for (int k = i; k < j; ++k) { | ||
int index = valueWithIndex[k].second; | ||
int row = index % rows; // Adjusting row index for column-major order | ||
int col = index / rows; // Adjusting column index for column-major order | ||
double p = (avgRank - 3.0 / 8.0) / (size - 2.0 * 3.0 / 8.0 + 1.0); | ||
rankMatrix(row, col) = boost::math::quantile(dist, p); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should be able to do the index without calculating rows / cols here like above
int rows = draws.rows(); | ||
int cols = draws.cols(); | ||
int size = rows * cols; | ||
Eigen::MatrixXd rankMatrix = Eigen::MatrixXd::Zero(rows, cols); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We like to declare things as near as possible to where they are used so I'd move this down to right before the if
int j = i; | ||
double sumRanks = 0; | ||
int count = 0; | ||
|
||
while (j < size && valueWithIndex[j].first == valueWithIndex[i].first) { | ||
sumRanks += j + 1; // Rank starts from 1 | ||
++j; | ||
++count; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now this while loop while always go off at least once, can you start j
at j = i + 1
etc to avoid this? Then that while loop only happens for cases of duplicates
* Computes square root of marginal posterior variance of the estimand by | ||
* weigted average of within-chain variance W and between-chain variance B. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* Computes square root of marginal posterior variance of the estimand by | |
* weigted average of within-chain variance W and between-chain variance B. | |
* Computes square root of marginal posterior variance of the estimand by the | |
* weighted average of within-chain variance W and between-chain variance B. |
for (int chain = 0; chain < num_chains; ++chain) { | ||
boost::accumulators::accumulator_set< | ||
double, boost::accumulators::stats<boost::accumulators::tag::mean, | ||
boost::accumulators::tag::variance>> | ||
acc_draw; | ||
for (int n = 0; n < num_draws; ++n) { | ||
acc_draw(draws(n, chain)); | ||
} | ||
chain_mean(chain) = boost::accumulators::mean(acc_draw); | ||
acc_chain_mean(chain_mean(chain)); | ||
chain_var(chain) | ||
= boost::accumulators::variance(acc_draw) * unbiased_var_scale; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't look like you need an online way to update and view the current mean and variance. You can calculate the mean for each chain via draws.colwise().mean()
and draws.array().mean()
for the overall mean. Then a loop for the chain variance calculation.
Now that I'm reading this again should that be named `acc_chain_variance because of line 105? I've never used boost accumulators before so not sure if I'm following all the way here
…y of calculating variance and averages
* | ||
*/ | ||
|
||
Eigen::MatrixXd rank_transform(const Eigen::MatrixXd& draws) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the error on jenkins. This makes it so that there are not multiple definitions for different translation units
Eigen::MatrixXd rank_transform(const Eigen::MatrixXd& draws) { | |
inline Eigen::MatrixXd rank_transform(const Eigen::MatrixXd& draws) { |
Jenkins Console Log Machine informationNo LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 20.04.3 LTS Release: 20.04 Codename: focalCPU: G++: Clang: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is good! The suggested changes mostly cover some C++ things, some Qs about the codes math, and handling some edge cases.
As a first PR this is very good so far!
*/ | ||
inline double compute_potential_scale_reduction_rank( | ||
std::vector<const double*> draws, std::vector<size_t> sizes) { | ||
int num_chains = sizes.size(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
std::vector
's size type is std::size_t
int num_chains = sizes.size(); | |
std::size_t num_chains = sizes.size(); |
or use auto
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is only the first size checked to see if it is zero and then we return a NaN? If one chain failed we should still be able to use information from all of the other chains. Looking at the rest of the code, unless there is a math reason to not ignore zero sized chains I think we should just prune them
std::vector<const double*> nonzero_chains_begins;
std::vector<std::size_t> nonzero_chain_sizes;
for (int i = 0; i < chain_sizes.size(); ++i) {
if (!chain_sizes[i]) {
nonzero_chains_begin.push_back(chain_begins[i]);
nonzero_chains_sizes.push_back(chain_sizes[i]);
}
}
if (!nonzero_chains_sizes.size()) {
return std::numeric_limits<double>::quiet_NaN();
}
size_t num_draws = sizes[0]; | ||
if (num_draws == 0) { | ||
return std::numeric_limits<double>::quiet_NaN(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also should num_draws
be min_num_draws
since it's the minimum number of draws received from each chain?
* @return potential scale reduction for the specified parameter | ||
*/ | ||
inline double compute_potential_scale_reduction_rank( | ||
std::vector<const double*> draws, std::vector<size_t> sizes) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rename these to be a little more clear. begin()
is a function in standard library containers that means "an iterator pointing to the first element of a container" so using begin in the name here will signal to people "these pointers are the pointers to the first element of each chain"
std::vector<const double*> draws, std::vector<size_t> sizes) { | |
std::vector<const double*> chain_begins, std::vector<size_t> chain_sizes) { |
if (are_all_const) { | ||
// If all chains are constant then return NaN | ||
// if they all equal the same constant value | ||
if (init_draw.isApproxToConstant(init_draw(0))) { | ||
return std::numeric_limits<double>::quiet_NaN(); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But it is fine if each chain is constant, but each one is a different value? tbc I'm asking because idk if that is how the paper is written or not. I suppose this makes sense in the case of many short chains
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you are correct. The current implementation fails if different chains are constant. For example, C1: [1, 1, 1]; C2: [2, 2, 2]
would have a within-variance of 0, and the rhat
function would return inf
due to division by zero. I think the best way to correct this is to check if there exists a non-constant chain.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doing something intelligent around constant chains would be a big improvement on our current NaN behavior. But I'm not sure what that is as there's not a number that makes sense as the ESS.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If all chains have the same constant value, we can't make the difference between all chains being stuck or variable actually being constant (e.g. diagonal of correlation matrix) as Stan doesn't tag the variables. In that case diagnostics in R return NA. If the chains have different constant values, then the variable can't be a true constant, and Rhat Inf is fine.
} | ||
} | ||
|
||
Eigen::MatrixXd matrix(num_draws, num_chains); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Eigen::MatrixXd matrix(num_draws, num_chains); | |
Eigen::MatrixXd draws_matrix(num_draws, num_chains); |
double rhat_tail = rhat(rank_transform( | ||
(matrix.array() - math::quantile(matrix.reshaped(), 0.5)).abs())); | ||
|
||
return std::max(rhat_bulk, rhat_tail); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question for @avehtari
Do we want to just return the max or should we return a pair so the user can see the bulk and tail rhats?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is useful for the user to know both. A diagnostic message could be simplified by reporting if the max of these is too low, but otherwise I would prefer that both would be available for the user. Making them both available does change the io via csv and changing csv structures need to be considered carefully
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay then @aleksgorica can you have this return back an std::pair
?
inline double compute_split_potential_scale_reduction_rank( | ||
std::vector<const double*> draws, std::vector<size_t> sizes) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We want these arguments to come in as constant references. As written this will make a hard copy of the input vectors when you call this function. Making the arguments references (&
) means the function will just use the already existing object without making a copy and const
means the arguments will be constant in the function (i.e. we will not modify them)
inline double compute_split_potential_scale_reduction_rank( | |
std::vector<const double*> draws, std::vector<size_t> sizes) { | |
inline double compute_split_potential_scale_reduction_rank( | |
const std::vector<const double*>& draws, const std::vector<size_t>& sizes) { |
We want containers (like std::vector
or Eigen::MatrixXd
or std::map
) to be passed by constant reference. Small types such as double
, int
, std::size_t
etc. can be passed by value as making copies of them is trivial. Happy to explain this more if you like but don't want to overload you with info. A nice place to read about things like this is Scott Meyers "Effective Modern C++" which if you google should be easy to find a free copy of online.
This comment applies to all the function signatures you added here
* Current implementation assumes draws are stored in contiguous | ||
* blocks of memory. Chains are trimmed from the back to match the | ||
* length of the shortest chain. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You use split_chains
which also assumes each chain is the same length
double half = num_draws / 2.0; | ||
std::vector<size_t> half_sizes(2 * num_chains, std::floor(half)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to make it more clear you are using floating point division and then taking the floor to get the index
double half = num_draws / 2.0; | |
std::vector<size_t> half_sizes(2 * num_chains, std::floor(half)); | |
std::size_thalf = std::floor(num_draws / 2.0); | |
std::vector<size_t> half_sizes(2 * num_chains, half); |
Jenkins Console Log Machine informationNo LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 20.04.3 LTS Release: 20.04 Codename: focalCPU: G++: Clang: |
Okay, I hope I have understood correctly, I have just reverted the changes in the original non-rank functions to the previous code. However, I would like to know why that is necessary since I believe the new code yields the same results as the previous code according to the tests, and it is also a bit better written. |
Jenkins Console Log Machine informationNo LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 20.04.3 LTS Release: 20.04 Codename: focalCPU: G++: Clang: |
There's no need to keep the old code, but we need to keep the old interfaces so we don't break anyone's existing code. The new code should be doing ranked R-hat, right? That won't provide the same answers. If you wrote general enough code to do both, you can have the old interface delegate to the new code. Then we can rewrite the interfaces to use the new code and get rid of the old code. @SteveBronder will know more about the specifics here. |
@bob-carpenter I rewrote this to dispatch to the old cold from the previous API @aleksgorica there's one signature missing for https://github.com/stan-dev/stan/blob/develop/src/stan/mcmc/chains.hpp#L219 |
Jenkins Console Log Machine informationNo LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 20.04.3 LTS Release: 20.04 Codename: focalCPU: G++: Clang: |
This reverts commit f12f259.
Actually I reverted my old PR. Changing all the old functions to use the new one causes cmdstan tests to fail. Let's have the double split_potential_scale_reduction(
const Eigen::Matrix<Eigen::VectorXd, Dynamic, 1>& samples) Then we are good! After this is in we can then change cmdstan etc. to use the new ranked version. I think this is better because we need to change the output anyway to report back both the bulk and tail rhat |
Jenkins Console Log Machine informationNo LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 20.04.3 LTS Release: 20.04 Codename: focalCPU: G++: Clang: |
Ok, but why should we even keep |
Oh, your right I didn't see that it was a private member that is never called. Let's leave it for now but idt you need to write a rank version for that one. |
@aleksgorica you should have received an email that adds you to the stan project :) clicking on the link in that email from github should then allow you to press the "merge pull request" button |
Thank you all for accepting me to Stan project :). I had really great time working on the pull request. Also thank you @SteveBronder and @bob-carpenter for mentoring. But github doesn't allow me to merge pull request. |
I just verified that Steve gave you permission to merge. Can you verify that works on your end by merging this? Thanks, and welcome to the Stan team! |
Did you click on the link in the email that was sent to you from github? After you click that and refresh the page you should have permission to merge |
Yes, I clicked multiple times, now it always redirects me to https://github.com/stan-dev/stan. But it still does not allow me to merge. If I go to https://github.com/settings/organizations, there is written that I am outside collaborator on 1 repository for Stan. |
Sorry you should have just gotten a new email that should give you the right permissions |
Submission Checklist
./runTests.py src/test/unit
make cpplint
Summary
Solving issue #3269
Changes in compute_potential_scale_reduction, based on Vehtari
Rhat computattion moved in rhat function
Added function rank_transform
Changed tests values to match new output
Intended Effect
rank_transform: Computes normalized average ranks for draws. Transforming them to normal scores using inverse normal transformation and a fractional offset.
rhat: computes rhat like before, the computatoinal part is just moved in a new function.
compute_potential_scale_reduction: copies draws in matrix object, then computes bulk rhat and tail rhat and returns the maximum
How to Verify
Compare the results with Arviz rhat function
arviz
Side Effects
Documentation
Copyright and Licensing
Please list the copyright holder for the work you are submitting (this will be you or your assignee, such as a university or company):
Aleks Stepančič
By submitting this pull request, the copyright holder is agreeing to license the submitted work under the following licenses: