-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TFactor: bulkerify GEMV (both MC and GPU) #1214
Conversation
6558780
to
bf17fcc
Compare
37ea75b
to
2845339
Compare
cscs-ci run just to check if there is any major problems |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Please name new functions in snake case, and rename existing internal functions if possible.
matrix::Matrix<T, Device::GPU> ws_T({nworkspaces * nrefls_step, nrefls_step}, | ||
{nrefls_step, nrefls_step}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All Ts are allocated at scheduling. Better reuse it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only a few questions, nothing blocking.
05dc48d
to
f83f317
Compare
0397c9e
to
d370894
Compare
cscs-ci run |
Converted to draft in order to prevent merging, since I'm still doing some checks. |
3749abc
to
ffba9a4
Compare
@@ -1063,7 +1075,14 @@ Matrix<T, Device::CPU> ReductionToBand<B, D, T>::call(Matrix<T, D>& mat_a, const | |||
// TODO probably the first one in any panel is ok? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Random thought: just saw this comment and wondering if we should have a look and see if we can "merge" this workspace into the other one? @rasolca
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will leave it for another PR 😉
aaba87f
to
0d2def8
Compare
0d2def8
to
c306078
Compare
cscs-ci run |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only a minor include-nit, otherwise this looks good to me.
@albestro make sure to merge latest master since it's got the new CI changes now.
cscs-ci run |
cscs-ci run |
In this PR we try to increase parallelisation of GEMV step of TFactor, which is by construction serial (all results goes into the same block tile), by using workspaces for storing partial results and then reducing them before the final TRMV step.
Algorithmically, the main change is that
stepGEMV
loop has been replaced with a singlestepGEMVAll
, and the concept has been applied in a similar way for both backends, MC and GPU:pika::bulk
splits input tiles over multiple task, each one stores the partial results in their workspace and just one at the end does the reduction.In order to implement this solution, workspaces for intermediate results have been added (there is another option which does not require any additional workspace nor reduction, and that we might explore for MC in another PR).
TODO:
Close #798.
EDIT: since this is conceptually similar to #798 and it is going to be closed as soon as this gets merged, I migrated here the doc fixes happened there.