Need help optimizing nested parallel iterators in Rust

Ask Question

Asked 6 months ago

Modified 6 months ago

Viewed 101 times

I'm pretty new to Rust and I was trying to write a project to get my hands dirty with the language, really figure it out. I wanted to write a bioinformatics tool that uses a multiple sequence alignment file to calculate a pairwise distance metric between the sequences. My entire repository is here for reference: https://github.com/theabhirath/pairsnp-rs

But I'm running into a specific issue when I profile this code. Specifically, compared to C++ code parallelized using MPI, one of my functions is very, very slow. I've added the code for this function below:

fn calculate_pairwise_snp_distances(
    a_snps: &[RoaringBitmap],
    c_snps: &[RoaringBitmap],
    g_snps: &[RoaringBitmap],
    t_snps: &[RoaringBitmap],
    nseqs: usize,
    seq_length: u64,
) -> Vec<Vec<u64>> {
    (0..nseqs)
        .into_par_iter()
        .map(|i| {
            (i + 1..nseqs)
                .into_par_iter()
                .map(|j| {
                    let mut res = &a_snps[i] & &a_snps[j];
                    res |= &c_snps[i] & &c_snps[j];
                    res |= &g_snps[i] & &g_snps[j];
                    res |= &t_snps[i] & &t_snps[j];
                    seq_length - res.len()
                })
                .collect()
        })
        .collect()
}

I'm using roaring-rs for faster bitmaps (Roaring Bitmaps) and rayon for parallelization using into_par_iter, but when I profile it I see that most of the time in this code is spent waiting and extending the result vector. Is there a more efficient way to write this sort of parallel code in Rust? Any help in optimizing the performance of this function would be appreciated!

asked May 18 at 4:17

Abhirath Anand

2052 silver badges9 bronze badges

1

How do you run the Rust code? In particular, did you pass the --release option to cargo?

Jmb
– Jmb

2025-05-18 05:06:34 +00:00
Commented May 18 at 5:06
1

Using a parallel inner loop is usually counterproductive. I'm not sure how rayon and MPI behave in that case, but I would advise you to use a regular iterator for the inner loop.

Jmb
– Jmb

2025-05-18 05:09:13 +00:00
Commented May 18 at 5:09
My initial instinct is that your code includes a lot of bounds checks that weren't in the C++ version just by the nature of [i].

kmdreko
– kmdreko

2025-05-18 05:10:57 +00:00
Commented May 18 at 5:10
1

I agree with Jmb: the inner parallel iterator will typically cause nesting issue resulting in threading overhead like expensive fork-join synchronisation. For the outer iterator, you certainly have some work imbalance, but this is fine if rayon use a dynamic schedule (it maybe does not though). You can balance the work yourself so to improve performance (up to 2x) but this make the code more complex.

Jérôme Richard
– Jérôme Richard

2025-05-18 11:51:53 +00:00
Commented May 18 at 11:51
2

@JérômeRichard all those are excellent suggestions, thank you so much! I will try and look into allocations in particular to see if that's an issue. Rayon's balancing might also be an issue – I will check on the flame graph if I'm getting stalls on that.

Abhirath Anand
– Abhirath Anand

2025-05-18 23:53:18 +00:00
Commented May 18 at 23:53

| Show 7 more comments

0 Your Answer

Sign up or log in

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

Collectives™ on Stack Overflow

Need help optimizing nested parallel iterators in Rust

0

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Your Answer

Sign up or log in

Post as a guest

Linked