25

How can I find the complexity (in terms of big-O) for different Haskell functions?

For example, what is the complexity of subsequences?

7
  • 2
    There's not a good way to determine the complexity of a function, you'll have to look at the source code and understand the algorithm that produces it. You can always do some experimentation to determine it numerically, but that isn't always straightforward either. Commented Dec 2, 2013 at 17:54
  • 7
    Some packages document their complexity, though. For instance, containers is very good about this. Commented Dec 2, 2013 at 18:29
  • 6
    Also, a lot of the time the complexity isn't straight-forward to document. For instance, the complexity of sort from Data.List is something like O(n*log k), where n is the length of the list, and k is the number of elements you examine from the result. Commented Dec 2, 2013 at 19:34
  • 10
    As already mentioned, time complexity can be tricky to calculate in a lazy context. Okazaki's book on purely functional datastructures covers many of these topics, including amortization. You can also check out his PHD thesis for free if you can't get the book. Commented Dec 2, 2013 at 20:09
  • 5
    Much depends on how a function is used. E.g. head $ subsequences [1..n] is O(1). Commented Apr 29, 2014 at 13:52

1 Answer 1

27
+250

You can only calculate the exact complexity of a function by looking at the code. However, you can estimate it using criterion.

For example, the following program estimates the complexity of subsequence as a function of the length of the list.

module Main where

import Criterion (bench, nf)
import Criterion.Main (defaultMain)
import Data.List (subsequences)
import Control.DeepSeq (deepseq)

main = defaultMain (map createBenchmark [0, 2 .. 24])
    where
        createBenchmark n =
            let
                xs = replicate n 'x'
            in
                xs `deepseq` (bench (show n) $ nf subsequences xs)

If you compile it (with -O2!) and run it with

./Main -u report

(or

./Main --csv report

in newer versions of criteron)

you'll get a CSV file with the data (mean time, variance, and other information per run).

If you plot that data you will realise it is exponential in n as the following gnuplot session shows.

> f(x) = a * exp(b * x)
> fit f(x) 'report' using ($0 * 2):2 every ::2 via a,b
...

Final set of parameters            Asymptotic Standard Error
=======================            ==========================

a               = 1.7153e-07       +/- 5.441e-09    (3.172%)
b               = 0.711104         +/- 0.001438     (0.2023%)


correlation matrix of the fit parameters:

               a      b      
a               1.000 
b              -1.000  1.000
> set log y
> set xlabel 'n'
> set ylabel 'mean time [s]'
> plot 'report' using ($0 * 2):2 every ::2 with lp title 'data', f(x) title 'fit'

plot

a is approximately zero, and b has almost no error. So it's a pretty sure guess that complexity is O(2^n), because e^0.71 is almost exactly 2.

Of course, this technique assumes that you're actually using everything returned by the function. If you're only accessing the first element of the returned list, the complexity will be O(1) because of lazy evaluation.

You can probably find a way to make this program independent of the function to benchmark (at least for functions that just take a list). There are also some nice Haskell libraries for plotting data, so you don't need to rely on external programs (unfortunately, as a scientist I never used anything but gnuplot).

Sign up to request clarification or add additional context in comments.

1 Comment

the LogPlot could be great here, for the exponential law. and in cases of power law, the log-log plot.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.