Why does Quicksort have deep function call stack with many duplicates in input?

Question

I'm reading Elements of Programming Interviews, and on page 43, there's the following line/quote:

Implemented naively, quicksort has large run times and deep function call stacks on arrays with many duplicates because the subarrays may differ greatly in size.

One solution is to reorder the array so that all emements less than the pivot appear first, followed by elements equal to the pivot, followed by elements greater than the pivot.

This is known as Dutch National Flag partitioning.

I'm having a lot of trouble visualising the problem described in the first line, and so I don't understand how the Dutch National Flag partitioning scheme helps. I was unable to find a webpage that explains this clearly. Can I please get some help explaining this visually?

Ideally, a visualisation of the call stack on a small example (even in ASCII) would be super helpful. — algon00b
– algon00b, Commented Mar 1, 2021 at 13:30
This is also called "fat pivot", where the pivot is not a single element, but a range of identical elements according to the comparator. That "fat pivot" is the white stripe in the Dutch flag. Because the pivot is already in place after partitioning, it is excluded from the recursive sorts on the left and right arrays. The more elements you can exclude, the smaller your left and right arrays. The smaller these arrays, the shallower your call stack. (But if all elements are different, your pivot will always be one element wide.) — M Oehm
– M Oehm, Commented Mar 1, 2021 at 14:21
@M Oehm Thank you for the comment - that is an intuitive explanation! — algon00b
– algon00b, Commented Mar 2, 2021 at 4:02
@rcgldr If Lomuto's scheme was a clever hack that was more complex but efficient in some cases, I'd agree, but as Wikipedia puts it "As this scheme is more compact and easy to understand, it is frequently used in introductory material, although it is less efficient than Hoare's original scheme". The "naive implementation" was almost certainly meant to imply "the simpler version that is often taught in introductory material". I think your insights into Hoare's scheme are really interesting, but you need to approach it from the position that not everybody knows about that scheme. — IMSoP
– IMSoP, Commented Mar 2, 2021 at 19:35
@IMSoP - and not everyone knows about the Lomuto scheme. Depending on a person's background, a Hoare type scheme may seem more obvious: scan from the left for a element > pivot, scan from the right for an element < pivot, swap the elements, continue the scans until the indexes meet or cross. These scans would need a check for scanning past the bounds of a partition. I would call this approach "naive". Hoare optimized this by changing the compares to >= and <=, eliminating the need to check for either index going past the bounds of a partition (as long as the last element not used for pivot). — rcgldr
– rcgldr, Commented Mar 2, 2021 at 19:43

IMSoP · Accepted Answer · 2021-03-04 12:21:40Z

4

The quick sort algorithm requires as one of its steps some algorithm to partition the data, based on a selected pivot. The ideal scenario is that each partition is as small as possible, so that it requires fewer further divisions. If more than half the elements end up in one partition, more steps will be needed to conclude the algorithm.

As an illustrative example of why duplicates cause this asymmetry, consider sorting the following list, which contains six elements out of eight with the same value:

2, 2, 1, 2, 2, 2, 2, 3

If you were asked to partition this into two lists, you might well put all elements less than the pivot into one partition, and all those greater than or equal to it in another.

This is for instance how the common "Lomuto's algorithm" works, with the pivot itself excluded from both partitions. This algorithm is often considered relatively simple to understand and implement, so may be what the author had in mind with the phrase "naive implementation".

In this scheme, the first step might partition the list as follows:

Pivot: 2
Less than pivot: 1
More than or equal to pivot: 2, 2, 2, 2, 2, 3

The second partition is then recursively partitioned:

Pivot: 2
Less than pivot: [empty list]
More than or equal to pivot: 2, 2, 2, 2, 3

Step 3:

Pivot: 2
Less than pivot: [empty list]
More than or equal to pivot: 2, 2, 2, 3

Step 4:

Pivot: 2
Less than pivot: [empty list]
More than or equal to pivot: 2, 2, 3

Step 5:

Pivot: 2
Less than pivot: [empty list]
More than or equal to pivot: 2, 3

Step 6:

Pivot: 2
Less than pivot: [empty list]
More than or equal to pivot: 3

Here recursion can finally stop. This is much worse than the 3 steps we would expect if the partitions were always of equal size (partitions of 4, 2, and then 1). The choice of pivot doesn't matter, because even if we found the correct position for the 3 sooner, we would still need one step for each 2 in the list.

A "fat pivot" or "Dutch flag" partitioning scheme extends the above by separating out all the values equal to the pivot into a third partition. Rather than just balancing the partitions, this has the result that both partitions are smaller.

In our example, the result immediately looks like this:

Pivot: 2
Equal to pivot: 2, 2, 2, 2, 2
Less than pivot: 1
More than pivot: 3

Since the values in the new partition are all equal, it doesn't need further sorting. In our example, the remaining partitions have one element each, so there is no need to perform any recursion at all. For less extreme cases, the two partitions left to sort will both be smaller, so need fewer further steps to sort.

However, the cost of that one partitioning step will be higher due to the greater complexity of the partitioning algorithm.

Other partitioning schemes have two partitions, but allow values equal to the pivot to be in either partition. This means the partitions can be evenly sized at each step, even with duplicate values (although it doesn't guarantee that they will be). The original algorithm proposed by Tony Hoare when inventing Quick Sort has this property.

In this case, the first step might give a result like this:

Left partition: 2, 2, 1, 2
Right partition: 2, 2, 2, 3

This is less efficient than a "fat pivot" for our extreme example, but much more efficient than having one very large and one very small partition.

edited Mar 4, 2021 at 12:21

answered Mar 1, 2021 at 14:29

IMSoP

99.7k18 gold badges135 silver badges183 bronze badges

Sign up to request clarification or add additional context in comments.

10 Comments

Surt Over a year ago

Hmm, try reading your first sentence again please.

algon00b Over a year ago

Really appreciate the time taken to write the steps out!

Surt Over a year ago

You write that partitions should be "as small as possible", that is not ideal, they should be ideally equal size to avoid O(N^2).

IMSoP Over a year ago

@rcgldr What was obvious to Tony Hoare six decades ago is not necessarily relevant to what would be obvious to a young programmer today. It all depends what information you're given, and what experience you've got of other algorithms. If the task is "partition the list into two parts", because you already know that's part of the Quick Sort algorithm, creating a third partition is definitely not an obvious optimisation. If you've written hundreds of foreach loops, but never a double-ended bubble sort, moving two pointers from outside inwards is not an obvious algorithm.

rcgldr Over a year ago

Moving two pointers from outside inwards is used in shaker sort, a variation of bubble sort, both of which are mostly educational, as insertion sort (1946) is faster. The earliest tape drive, the Uniservo I (1951) could read forwards and backwards. Tape based merge sorts used read backwards to avoid rewind time. Either could also have inspired the idea of scanning from both ends.

|

rcgldr · Accepted Answer · 2021-03-04 04:20:03Z

2

quicksort has large run times and deep function call stacks on arrays with many duplicates

This is only true for Lomuto type partition scheme, a variation of quicksort that first appears to be mentioned in 1984, versus the original Hoare partition scheme published in 1961. Normally Lomuto partition step results in 3 sub-partitions: elements < pivot, pivot (single element), elements >= pivot. In the case where all elements are equal, then a partition step results in 0 elements < pivot, pivot, n-1 elements >= pivot, recursing on the n-1 element partition, only removing one element per level of recursion, the worst case behavior for quicksort with time complexity O(n^2).

For Hoare partition scheme, as the number of duplicates increases, the partitioning will result in closer to equal sized sub-partitions. In the case where all elements are equal, then Hoare partition scheme results in an ideal 50% / 50% split, because the partitioning loop results in the working left (i in the wiki article) and right (j in the wiki article) indexes meeting at the middle. There will be unneeded swaps of equal elements, but the check to avoid swaps generally increases run time more than just doing the swaps for typical cases. In general, as the percentage of duplicates increases, the greater the probability that the indexes will meet near the middle of a partition.

Doing a 3 way partition as suggested in the original question helps when the pivot is one of the duplicate values. It may take quite a few levels of recursion before a pivot is one of the duplicates in a sub-array, but will eliminate the pivot and it's duplicates from further recursion. This will avoid worst case time complexity O(n^2) due to a large number of equal values, but will be slower than Hoare unless there are significant number of equal values.

https://en.wikipedia.org/wiki/Quicksort#Lomuto_partition_scheme

https://en.wikipedia.org/wiki/Quicksort#Hoare_partition_scheme

https://en.wikipedia.org/wiki/Quicksort#Repeated_elements

Although not asked for in the original question, some comments have been made asking to explain Hoare and Lomuto schemes. Both schemes use two indexes, and are optimized to reduce the number of compares (at the cost of more swaps). These explanations are for the Wiki examples for Hoare and Lomuto linked to above, and I added a Dutch National Flag example based on Lomuto.

Hoare - set pivot = middle element of array, array[floor((lo + hi)/2)]. naive implementation for inner loops: scan left to right for an element > pivot (index i), scan right to left for an element < pivot (index j), swap the elements, continue this process until the two indexes meet or cross. The inner loops need to check to make sure an index is not advanced beyond the bounds of the sub-array being partitioned. Hoare optimized this by changing the compares to >= and <=, which eliminates the need for a check for going beyond the bounds of a sub-array, at the cost of sometimes swapping equal elements. After this, elements of array[lo .. j] are <= pivot and elements of array[j+1 .. hi] are >= pivot.

Lomuto - set pivot = array[hi]. i initialized to lo. j current index to find elements < pivot. Main loop: for j = lo until j == hi increment j: if array[j] < pivot, then: swap(array[i], array[j]) and increment i. After each iteration, elements of array[lo .. i-1] < pivot, array[i .. j-1] >= pivot. Note that i == j until the first instance of array[j] >= pivot, and although swapping when i == j is needless, it's faster than checking for i==j to avoid the swap, since i != j after the first instance of array[j] >= pivot. Once the main loop is done, array[i] is swapped with array[hi] to put the pivot into place. After this, elements of array[lo .. i-1] are < pivot, array[i] == pivot, elements of array[i+1 .. hi] are >= pivot.

Lomuto based Dutch National flag scheme - After Lomuto main partition loop is done, repeat the process using similar logic on array[i+1 .. hi], (using j and k), to end up with array[lo .. i-1] < pivot, array[i .. j] == pivot, array[j+1 .. hi] > pivot.

edited Mar 4, 2021 at 4:20

answered Mar 1, 2021 at 15:41

rcgldr

29.4k3 gold badges41 silver badges70 bronze badges

6 Comments

IMSoP Over a year ago

Thanks 👍 Obviously, "easy" is subjective and can vary between people, but I'm not alone in considering Lomuto's as "simpler"; see e.g. the quote at the top of cs.stackexchange.com/a/11550/53313 You are of course right that it has a second pointer; the way I visualise it, it iterates through the list in turn, moving elements <pivot to the "first section" of the list; the second pointer just tracks where that first section ends, and ultimately where the pivot will go. Hoare's algorithm has the pointers move independently, in nested loops, with more chances for "fencepost" errors.

IMSoP Over a year ago

Regarding "first" and "second" pointers, it may have been a poor choice of words: I was saying "second" as in "the one that's not the loop counter", not as in "the one that is introduced second in the code".

rcgldr Over a year ago

@IMSoP - My fault, I went off topic. My issue with the question is the premise "naive implementation ... many duplicates". The author is either stating a "naive implementation" is Lomuto, or the author wasn't aware this isn't an issue for Hoare. Naive is different than simpler or elegant or fewer lines of code. Typically, a naive implementation of any algorithm involves more lines of code to cover corner cases, as opposed to a clever implementation that is simpler and/or involves fewer lines of code. What Hoare and Lomuto ended up with are what I consider clever implementations.

IMSoP Over a year ago

One thing to consider is that if you knew neither partitioning scheme, and tried to invent your own, you might well aim to put all the elements equal to the pivot into one of the two partitions, rather than allocating them evenly to both. So a truly naive implementation might well suffer this problem.

rcgldr Over a year ago

@IMSoP - In the case of Hoare, allowing the pivot and elements equal to the pivot to end up anywhere, including all of them in one partition, is part of the "clever" implementation to eliminate having to bounds check indexes. Part of this "cleverness" was determining that any swap involving the pivot will place it where it needs to be in order to eliminate bounds checks. A truly naive implementation might be Ducth National Flag like, where the pivot and all elements equal to the pivot end up adjacent to each other.

|

Collectives™ on Stack Overflow

Why does Quicksort have deep function call stack with many duplicates in input?

2 Answers 2

10 Comments

6 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

10 Comments

6 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related