Avoiding reallocations when doing a cartesian product #754

lovesegfault · 2020-04-21T03:56:04Z

I have some code that looks like this:

use rayon::prelude::*;

#[derive(Debug)]
struct BigStruct {
    a: u128,
    b: u128,
    c: u128,
    d: u128,
    e: u128
}

impl BigStruct {
    fn new(x: u64, y: u64) -> Self {
        BigStruct {
            a: x.into(),
            b: x.into(),
            c: x.into(),
            d: y.into(),
            e: y.into(),
        }
    }
}



fn main() {
    const X: u64 = 5;
    const Y: u64 = 5;
    let pairs: Vec<BigStruct> = (0..X)
        .into_par_iter()
        .flat_map(|x| (0..Y).into_par_iter().map(move |y| (x, y)))
        .map(|(x, y)| BigStruct::new(x, y))
        .collect();
    
    pairs.iter().for_each(|b| println!("{:?}", b));
}

When I went to profile it, with a very large X and Y I found that most of my execution time was spent on malloc. The natural next step, then, was to try and preallocate the Vec to have the correct size, as this at least means no calls to realloc. I learned about collect_into_vec() and decided to try and use it:

playground

use rayon::prelude::*;

#[derive(Debug)]
struct BigStruct {
    a: u128,
    b: u128,
    c: u128,
    d: u128,
    e: u128,
}

impl BigStruct {
    fn new(x: u64, y: u64) -> Self {
        BigStruct {
            a: x.into(),
            b: x.into(),
            c: x.into(),
            d: y.into(),
            e: y.into(),
        }
    }
}

fn main() {
    const X: u64 = 5;
    const Y: u64 = 5;
    let mut pairs: Vec<BigStruct> = Vec::with_capacity((X * Y) as usize);
    (0..X)
        .into_par_iter()
        .flat_map(|x| (0..Y).into_par_iter().map(move |y| (x, y)))
        .map(|(x, y)| BigStruct::new(x, y))
        .collect_into_vec(&mut pairs);

    pairs.iter().for_each(|b| println!("{:?}", b));
}

This, however, fails due to there not being collect_into_vec() for ParallelIterator, and the call to flat_map() I think converts the IndexedParallelIterator into a plain ParallelIterator.

error[E0599]: no method named `collect_into_vec` found for struct `rayon::iter::map::Map<rayon::iter::flat_map::FlatMap<rayon::range::Iter<u64>, [closure@src/main.rs:32:19: 32:66]>, [closure@src/main.rs:33:14: 33:43]>` in the current scope
  --> src/main.rs:34:10
   |
34 |         .collect_into_vec(&mut pairs);
   |          ^^^^^^^^^^^^^^^^ method not found in `rayon::iter::map::Map<rayon::iter::flat_map::FlatMap<rayon::range::Iter<u64>, [closure@src/main.rs:32:19: 32:66]>, [closure@src/main.rs:33:14: 33:43]>`
   |
   = note: the method `collect_into_vec` exists but the following trait bounds were not satisfied:
           `rayon::iter::map::Map<rayon::iter::flat_map::FlatMap<rayon::range::Iter<u64>, [closure@src/main.rs:32:19: 32:66]>, [closure@src/main.rs:33:14: 33:43]> : rayon::iter::IndexedParallelIterator`

I'm kind of at a loss with how to correctly create an iterator over the cartesian product of two ranges and then collect it into a preallocated vector. The only solution I can see is to implement cartesian() for IndexedParallelIterator so that I can retain the indices instead of losing them in the flat_map(). This, of course, would have to be done inside rayon, but I am hoping there are other solutions.

What is the right way to get an IndexedParallelIterator that's the cartesian product of two IndexedParallelIterators?

The text was updated successfully, but these errors were encountered:

cuviper · 2020-04-21T23:52:04Z

the call to flat_map() I think converts the IndexedParallelIterator into a plain ParallelIterator.

Yes, since flat_map only has to return some IntoParallelIterator type, it doesn't have to be indexed, so we don't have any idea how to offset each parallel flat_map invocation. Actually, even if we did have a variant that required the inner part to be IndexedParallelIterator, we still wouldn't know each of their lengths ahead of time -- there's nothing to say they even have to be the same!

The only solution I can see is to implement cartesian() for IndexedParallelIterator so that I can retain the indices instead of losing them in the flat_map(). This, of course, would have to be done inside rayon, but I am hoping there are other solutions.

I think that's possible, and it doesn't have to be part of rayon if you do it as your own extension trait. We made iter::plumbing public exactly to allow third party iterators. It won't be trivial to handle splits at possible odd boundaries, but not insurmountable.

If we do add something like this to rayon, it would be nice to model it after Itertools::cartesian_product for feature parity.

PS - I hope you have more than simple initialization to do in parallel -- I expect once we solve pre-allocation, your example would quickly run into a barrier of memory bandwidth when you try to scale up. It might not be much better than serial code.

cuviper changed the title ~~Avoinding reallocations when doing a cartesian product~~ Avoiding reallocations when doing a cartesian product Jun 4, 2024

nascheinkman mentioned this issue Jul 20, 2024

Add cartesian_product to ParallelIterator #1182

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoiding reallocations when doing a cartesian product #754

Avoiding reallocations when doing a cartesian product #754

lovesegfault commented Apr 21, 2020

cuviper commented Apr 21, 2020

Avoiding reallocations when doing a cartesian product #754

Avoiding reallocations when doing a cartesian product #754

Comments

lovesegfault commented Apr 21, 2020

cuviper commented Apr 21, 2020