Optimizing Iterative Computations with Large Ranges in Mathematica

Question

I have an implementation in Mathematica that involves iteratively generating large ranges of values, performing element-wise operations, and calculating a cumulative product followed by a total. While the code works, it is relatively slow given the size of the computations, and I’m seeking advice on how to optimize it for speed.

Here is my current implementation:

mkni = 9.577576587094`32*^14;
k1i = 1.2937885137981`32*^13;
n1i = 2.8913878463172`32*^13;
s = 1.5316416966770`16*^-25;
x=3.90577689449`32.*^11;
iter = 1249924;

AbsoluteTiming[
  xvars = Range[x, x - iter, -1];
  mknis = Range[mkni, mkni - iter, -1];
  k1is = Range[k1i, k1i + iter];
  n1is = Range[n1i, n1i + iter];

  factors = xvars*mknis/(k1is*n1is);

  sList = FoldList[Times, s, factors];

  c = Total[sList]
]

(*{2.62624, 9.999996576218290*10^-21}*)

My primary concern is the time it takes for the computation (around 2.63 seconds for this example). The final output c is the only value I need.

I would appreciate suggestions on how to:

Optimize the generation of ranges (xvars, mknis, k1is, and n1is).
Speed up the element-wise computation of factors and sList.
Potentially eliminate the need for intermediate lists if possible.
Leverage any advanced Mathematica functions or parallelization techniques that might help. Any insights or strategies to make this faster would be greatly appreciated!

Could it be that x is numeric when you run this? If it is an undefined symbol, then htis takes way longer than a few systems. (It took so long that I did not let it finish.) — Henrik Schumacher
– Henrik Schumacher, Commented Dec 3, 2024 at 12:24
Damn I missed that. Yes x is numeric and is an input. I've edited the question. — Dotman
– Dotman, Commented Dec 3, 2024 at 12:48
I just copy pasted these values as an example from my code. The precision of s was 16 there. — Dotman
– Dotman, Commented Dec 3, 2024 at 13:43

Henrik Schumacher · Accepted Answer · 2024-12-03 15:37:10Z

You do thr computations in Mathematica's arbitrary precision with 32 decimal digits. The problem is that this is software-emulated arithmetic and thus much slower than machine precision. Using machine precision has the nice advantage that one can simply change FoldList + Table to a Do loop and compile it. So I tried:

cf = Compile[{{s, _Real}, {x, _Real}, {mkni, _Real}, {k1i, _Real}, {n1i, _Real}, {iter, _Integer}},
    Module[{r, sum},
        r = s;
        sum = s;
        Do[
        r = r (x - i) (mkni - i)/((k1i + i) (n1i + i));
        sum += r
        , {i, 0, iter}];
        sum
        ],
    CompilationTarget -> "C",
    RuntimeOptions -> "Speed"
    ];

Here are the results of my experiments:

OP's version:

mkni = 9.577576587094`32*^14;
k1i = 1.2937885137981`32*^13;
n1i = 2.8913878463172`32*^13;
s = 1.5316416966770`32*^-25;
iter = 1249924;
x = 3.90577689449`32.*^11;

AbsoluteTiming[
 xvars = Range[x, x - iter, -1];
 mknis = Range[mkni, mkni - iter, -1];
 k1is = Range[k1i, k1i + iter];
 n1is = Range[n1i, n1i + iter];
 factors = xvars*mknis/(k1is*n1is);
 sList = FoldList[Times, s, factors];
 c = Total[sList]
 ]

{2.2683, 9.9999965762182896702400893*10^-21}

My compiled version:

AbsoluteTiming[
 c2 = cf[s, x, mkni, k1i, n1i, iter]
 ]

{0.004983`, 9.999996576218254`*^-21}

The relative error pretty low, so I guess that machine precision will do for your application:

Abs[c - cMC]/Abs[c]

1.45949*10^-14

More robust implementation

Okay, the floating point analysis in the comments below revealed that we can run quickly into underflow or overflow problems for values of x only slightly smaller or larger than OP's value of x. We can use an 64-bit signed integer to represent the exponent in the binary representation. This should considerably extend the under- and overflow threshold. Here is my crude implementation of this:

cf2 = Compile[{{s, _Real}, {x, _Real}, {mkni, _Real}, {k1i, _Real}, {n1i, _Real}, {iter, _Integer}},
   Module[{z, mr, er, msum, esum, 
     factor, der, de, desum, esum10, msum10},
    
    er = Round[Log2[s]];
    mr = s (2.^-er);
    
    {msum, esum} = {mr, er};
    
    Do[
     factor = (x - i) (mkni - i)/((k1i + i) (n1i + i));
     
     z = mr factor;
     der = Round[Log2[z]];
     mr = z (2.^-der);
     er = er + der;
     de = er - esum;
     
     (* Mantissa has 53 bits. 
     Adding some further bits of tolerance for safety. *)
     If[-60 < de < 60,
      (
       z = msum + mr (2^de);
       desum = Round[Log2[z]];
       msum = z (2.^-desum);
       
       esum += desum;
       )
      ,
      If[
       de >= 60,(* Summand is too big; 
       new sum equals the summand. *)
       msum = mr;
       esum = er;
       ,
       (* de<=-56: summand is too small ; discard it. *)
       msum = msum;
       esum = esum;
       ]
      ];
     , {i, 0, iter}];
    
    esum10 = esum Log[2.]/Log[10.];
    msum10 = msum (10.^(esum10 - Round[esum10]));
    
    {msum10, Round[esum10]}
    ],
   CompilationTarget -> "C",
   RuntimeOptions -> "Speed"
   ];

Here a usage case:

AbsoluteTiming[{mc2, ec2} = cf2[s, x, mkni, k1i, n1i, iter]]

{0.052009, {1.70182, -25.}}

Read this as: The result is mc2 * 10.^ec2. It is quite a bit slower than cf, but the main reasons are Log2 and 2.^#&. These are quite costly functions. I use them to get the mantissa and the exponent of the binary representation. In C++ I could use std::frexp to directly access those bits; that is substantially less expensive. But to my knowledge, Mathematica and Compile do not provide any interface to that. =/

Thanks! I was concerned about using Compile after reading this answer mathematica.stackexchange.com/questions/156099/… For my purpose it is important that I can 'trust' the output ie. Mathematic keeps track of the errors for me and output only the correct digits — Dotman
– Dotman, Commented Dec 3, 2024 at 13:53
Yes, machine precision does not track the error. But that is good because it is fastr this way. It's the programmers job to certify that the program does not run into precision problems. Let's look at you problem. MinMax[factors] returns {0.999981..., 0.999985...}, so all the number in factors are very close to 1. When multiplying these together, the only bad thing that can happen is underflow. But their product is actually 7.454986941049172801408960*10^-10. The exponent is -10 and that is very far away from the underflow threshold, which is somewhere around 307. — Henrik Schumacher
– Henrik Schumacher, Commented Dec 3, 2024 at 14:01
Multiplying with s lowers the exponent to nothing less than -35. So there is no problem for machine numbers to represent the members of sList. All numbers in sList are positive, so there cannot be any catastrophic cancellation. The only bad thing we might do is to sum them in suboptimal order: The big ones first, the small ones last. But sList has an exponential tail, this almost does not matter. — Henrik Schumacher
– Henrik Schumacher, Commented Dec 3, 2024 at 14:05
So doing all this is safe for this value of x. But if we multiply x by, say 0.999, we do get underflow. So, yes, I agree, using machine precision is very dangerous here. — Henrik Schumacher
– Henrik Schumacher, Commented Dec 3, 2024 at 14:10
But the problem is not the precision (i.e., the bit-length of the mantissa). We just run out of bits for the exponent (double precision floating point numbers reserve only 11 bits for the exponent). But it is quite easy to account for it, by carrying the exponent as a separate integer, which allows us to have a 63-bit (one bit is "lost" on the sign bit). — Henrik Schumacher
– Henrik Schumacher, Commented Dec 3, 2024 at 14:16

Stack Exchange Network

Optimizing Iterative Computations with Large Ranges in Mathematica

1 Answer 1

Your Answer

Linked

Hot Network Questions

Optimizing Iterative Computations with Large Ranges in Mathematica

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Linked

Related

Hot Network Questions