Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

V2 parallelize #123

Open
wants to merge 15 commits into
base: master
Choose a base branch
from
Open

V2 parallelize #123

wants to merge 15 commits into from

Conversation

QuiteAFoxtrot
Copy link
Collaborator

@QuiteAFoxtrot QuiteAFoxtrot commented Feb 28, 2025

So far just a draft PR, want to get the changes online so Hernan can see what I'm doing

So far shaved a few seconds off the overall runtime on DR7 just converting some indexing into some less redundant copying

Going through the V2 codebase to comment things, add docstrings, and eventually make an attempt to switch from using the Scipy/QHull backend to a parallelized Voronoi/Delaunay backend like potentially https://github.com/ParAlg/ParGeo or https://github.com/eleftherioszisis/multivoro

Also might be good to come up with some code that can convert between Voronoi and Delaunay because they are dual and right now during the Tesselation phase we're basically recomputing them from the input coordinates, instead of using the existing computed Voronoi class to produce the Delaunay class

Created some minor API changes in the main Zobov class (example: instead of "start" and "end" will now use a "stages" parameter to simplify")

@QuiteAFoxtrot
Copy link
Collaborator Author

QuiteAFoxtrot commented Mar 1, 2025

Some basic profiling/scaling times on a single core AMD Ryzen 7 5800H CPU
Times below are in Seconds
Sections with '--' are subsections of the Tesselation
SortVoids scales particularly poorly, though the Tesselation section is still the dominant time sink
Factor is the multiplicative value between the time to run that section on DR7 vs DR12. Since the factor is 4.83 in number of galaxies, would hope to see close to this value in every category, though I think some sections are O(n*log(n)) performance which would be a factor of ~7.61

Code Section DR7 DR12 Factor
Num gals: 120k 580k 4.83
Catalog Load Time: 0.281 1.28 4.55
Tesselation time: 21.902 136.33 6.22
--voronoi time: 4.00 23.45 5.86
--cuts time: 1.51 9.00 5.96
--volume time: 12.389 78.34 6.32
--delaunay time: 2.389 15.13 6.33
--neighbor time: 1.577 9.789 6.21
Zones time: 7.53 53.07 7.04
Voids time: 0.05 1.31 26.2
Sortvoids time: 1.962 82.417 42.06
SaveVoids time: 0.100 0.054 0.5
SaveZones time: 3.698 97.54 26.37

Some basic things to speed up:

  • Right now code is calculating the Voronoi first, and then later calculating the Delaunay (scipy.spatial classes), as these are dual to each other may be faster to just calculate one, and then using that output find calculate the required parts of the other one. Based on timings here would calculate Delaunay as it appears to be some factor faster than Voronoi, though if we add some parallelized backend may have to re-evaluate
  • Calculating the volume --volume time: section is surprisingly the largest time sink in this section - this is basically a For loop going through each voronoi cell, and using the Scipy ConvexHull class to get the volume of that cell, this is a great/easy candidate section for parallelization
  • Many many things I think can be optimized further, even just rewriting some things in Cython like for VoidFinder should result in some huge speedups here
  • Needs more investigation in the Zones, and especially the SortVoids (aka pruning) section, and SaveZones

@QuiteAFoxtrot
Copy link
Collaborator Author

QuiteAFoxtrot commented Mar 1, 2025

Did some refactoring which combines the --cuts time: and --volume time: sections, added multi-processing for the volume calcuation as described in bullet point 2 on my previous comment -- got that combined time from 13.899 seconds(1.51+12.389) down to 4.06 seconds with 4-cpu parallelization on DR7 and from 87.34 (9.00+78.34)sec -> 27.46 sec on 4-cpu with DR12

  • may be able to further improve volume calculation time by leveraging our special euclidean case of Voronoi cells to avoid using the scipy.spatial ConvexHull class (they're already convex hulls!) and just calculate volume directly
  • updated vsquared.py to include a num_cpus parameter (maybe further we can update the argparse stuff to add a command line parameter, perhaps -j like make uses?
  • added a cythonized version of volume calculation, which surprisingly didnt really improve the speed (or hurt) but the code is much cleaner and better documented so going to keep it
  • very happy all vsquared unit tests are still passing
  • Have not added the parallelization to Periodic Mode yet

@QuiteAFoxtrot
Copy link
Collaborator Author

Added the parallelization to Periodic mode as well, but dont have a good test setup or unit test for that

@QuiteAFoxtrot
Copy link
Collaborator Author

QuiteAFoxtrot commented Mar 2, 2025

Eliminated the --neighbor time: section, turns out the Scipy Delaunay class already provides this functionality and we were re-calculating it for no reason. We may have to re-add something in if we are able to reduce the Voronoi and Delaunay sections down to a single section, but for now this saves a nice little chunk of runtime

@hbrincon
Copy link
Contributor

hbrincon commented Mar 4, 2025

@QuiteAFoxtrot Thank you for starting work on this! This will be incredibly helpful for running V2 with DESI. To let you know, some of the commonly used V2 functionality was never added to the master branch. @dveyrat created branches for calculating which voids are edge voids and for running on non-periodic, cubic simulation volumes (a so-called "xyz" mode), and I combined these branches in https://github.com/DESI-UR/VAST/tree/fits_edge. This is as good a time as any to pull these features into the master branch (in particular, the edge void calculation code is some of the most time-intensive code in V2 and could benefit greatly from parallelization). I can start work on adding the fits_edge features to v2_parallelize.

Let me know if there's anything else I can do to help out with testing.

@QuiteAFoxtrot
Copy link
Collaborator Author

QuiteAFoxtrot commented Mar 4, 2025

Sure @hbrincon I could definitely use a hand - I think you had started going through and commenting code, if you are able to do that some more that would be hugely helpful, I got stuck on some of the names of the data structures:

Zones - zlinks - I know this is actually two things, the zlinks[0, i, ...] is all the zone IDs which are linked to the ith zone, and zlinks[1, i, ...] I think is the watershed breakpoint volumes for when those zones become part of the ith zone (or vice versa?)

Voids - zlu - pretty sure this is just the sorted watershed breakpoint values
zlut - I think, the zone IDs that flow at each breakpoint
voids, mvols, ovols, vlut, mvlut, ovlut these variables I am struggling with, for voids the name is obvious but it could refer to many different things (pre-pruning even), and then the remaining variables I think all having something to do with void/zone volume and volume indices, but I havent quite figured it out yet so if you have some knowledge on what these variables are or what they do, throwing in some comments in the code or comments here on github would be hugely helpful. Or if we have some reference on what the output data structures are supposed to look like - for example in VF its super simple, its an array of the hole x, y, z, radius, and void_ID values, obviously in V2 we have a hierarchical relationship between the cells, but I don't quite have a handle on what auxilliary information needs to be alongside that. Even just a layman's description of all these things would be great.

And yeah we can pull in those other branches too, thanks for consolidating them into that one.

@hbrincon
Copy link
Contributor

hbrincon commented Mar 4, 2025

The ones you don't know are also ones I struggle with, but my understanding of the terms is this:

zlinks[0][i][j] - list of zone IDs j for zones that border the ith zone.
zlinks[1][i][j] - The volume of the largest zone link between zones i and j. For some reason, the zone link seems to be defined in terms of the volume of the smaller of the two adjacent cells on the zone boundary, rather than in terms of the largest of them:

nl = np.amin([vol[i],vol[n]])

zlut - zone IDs in descending order of maximum linking volume
voids[i] - a list of which zone IDs are in void i, pre-pruning. Based off of
voids = [[c for q in v for c in q] for v in self.prevoids.voids]
, I'm guessing that the voids in voids are equivalent to the output of the zobov method, aka one void for every possible void in the void hierarchy as defined in section 2.3 of https://arxiv.org/abs/0712.3049
mvols - (probably short for maximum volumes), the maximum volume cell in a zone, I'm guessing based on and section 2.4 of https://arxiv.org/abs/0712.3049
ovols[i][j] - (probably short for overflow volumes), the maximum linking volume that links zone i to all zones j that have a larger maximum cell volume than i, I'm guessing based on
if j > 0 and vl[j] < minvol:
and section 3 paragraph 2 of https://arxiv.org/abs/1406.1191. The idea of overflow/void depth is discussed in section 2.3 of https://arxiv.org/abs/0712.3049 again
mvlut - lookup table used in the construction of mvols, not really sure beyond that
ovlut - lookup table used in the construction of ovols, not really sure beyond that

@hbrincon
Copy link
Contributor

hbrincon commented Mar 4, 2025

The output data structrures are a series of up to 5 fits HDUs that are calculated in

def saveVoids(self):
,
def saveZones(self):
, and
def preViz(self):
. I go over them in full in the appendix of https://arxiv.org/abs/2411.00148, here they are:
image
image
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants