2

I'm pretty new to Rust and I was trying to write a project to get my hands dirty with the language, really figure it out. I wanted to write a bioinformatics tool that uses a multiple sequence alignment file to calculate a pairwise distance metric between the sequences. My entire repository is here for reference: https://github.com/theabhirath/pairsnp-rs

But I'm running into a specific issue when I profile this code. Specifically, compared to C++ code parallelized using MPI, one of my functions is very, very slow. I've added the code for this function below:

fn calculate_pairwise_snp_distances(
    a_snps: &[RoaringBitmap],
    c_snps: &[RoaringBitmap],
    g_snps: &[RoaringBitmap],
    t_snps: &[RoaringBitmap],
    nseqs: usize,
    seq_length: u64,
) -> Vec<Vec<u64>> {
    (0..nseqs)
        .into_par_iter()
        .map(|i| {
            (i + 1..nseqs)
                .into_par_iter()
                .map(|j| {
                    let mut res = &a_snps[i] & &a_snps[j];
                    res |= &c_snps[i] & &c_snps[j];
                    res |= &g_snps[i] & &g_snps[j];
                    res |= &t_snps[i] & &t_snps[j];
                    seq_length - res.len()
                })
                .collect()
        })
        .collect()
}

I'm using roaring-rs for faster bitmaps (Roaring Bitmaps) and rayon for parallelization using into_par_iter, but when I profile it I see that most of the time in this code is spent waiting and extending the result vector. Is there a more efficient way to write this sort of parallel code in Rust? Any help in optimizing the performance of this function would be appreciated!

12
  • 1
    How do you run the Rust code? In particular, did you pass the --release option to cargo? Commented May 18 at 5:06
  • 1
    Using a parallel inner loop is usually counterproductive. I'm not sure how rayon and MPI behave in that case, but I would advise you to use a regular iterator for the inner loop. Commented May 18 at 5:09
  • My initial instinct is that your code includes a lot of bounds checks that weren't in the C++ version just by the nature of [i]. Commented May 18 at 5:10
  • 1
    I agree with Jmb: the inner parallel iterator will typically cause nesting issue resulting in threading overhead like expensive fork-join synchronisation. For the outer iterator, you certainly have some work imbalance, but this is fine if rayon use a dynamic schedule (it maybe does not though). You can balance the work yourself so to improve performance (up to 2x) but this make the code more complex. Commented May 18 at 11:51
  • 2
    @JérômeRichard all those are excellent suggestions, thank you so much! I will try and look into allocations in particular to see if that's an issue. Rayon's balancing might also be an issue – I will check on the flame graph if I'm getting stalls on that. Commented May 18 at 23:53

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.