Sorting programs executing fewer branches
Author:Jyrki Katajainen
Publication:CPH STL report 2014-1, Department of Computer Science, University of Copenhagen (2014), 51 pp.
Full text:<pdf.gif>PDF (506 kB)  
Abstract:When sorting a sequence of n elements, it is well-known that every comparison-based sorting algorithm must perform at least n lg nO(n) element comparisons. Hence, it seems unavoidable that any sorting algorithm will also execute about the same number of conditional branches. Since nothing is known about the order of the input elements, it is hard to predict the outcome of the branches related to element comparisons. On an average, every second such branch may lead to a branch misprediction.

In this report, together with an accompanying tar ball, we release the source code of the sorting programs discussed and experimented in the following papers:

  1. Amr Elmasry, Jyrki Katajainen, and Max Stenmark, Branch mispredictions don’t affect mergesort, Proceedings of the 11th International Symposium on Experimental Algorithms, Lecture Notes in Computer Science 7276, Springer-Verlag (2012), 160-171
  2. Amr Elmasry and Jyrki Katajainen, Lean programs, branch mispredictions, and sorting, Proceedings of the 6th International Conference on Fun with Algorithms, Lecture Notes in Computer Science 7288, Springer-Verlag (2012), 119-130

The key idea in the described sorting programs is to decouple element comparisons from conditional branches. Therefore, these programs will incur significantly fewer branch mispredictions than, for example, the C++ standard-library introsort (introspective median-of-three quicksort that switches to heapsort if the recursion depth gets too high).

The sorting programs given can be divided into three categories depending on how many branch mispredictions they incur during their execution.

Lean:
O(1) branch mispredictions incurred when the branch predictor used by the underlying hardware is static.
Moderately optimized:
O(n) branch mispredictions incurred under the same assumptions as above.
Unoptimized:
The number of branch mispredictions incurred is proportional to the number of element comparisons performed.

As shown in [2], all programs can be made lean. However, in practice, a moderately-optimized variant is often faster than a lean variant. This is also true for the variants of heapsort, mergesort, and quicksort studied here.

Related:<html.gif>HTML (Conference paper)  <html.gif>HTML (Conference paper)  <html.gif>HTML (Slides)  <html.gif>HTML (Slides)  <html.gif>HTML (tar archive)  
BibLATEX:
@report{Kat2014S,
  author = {Jyrki Katajainen},
  title = {Sorting programs executing fewer branches},
  type = {CPH STL report},
  number = {2014-1},
  institution = {Department of Computer Science, University of Copenhagen},
  year = {2014},
  pagetotal = {51},
}
This page was generated by Jyrki Katajainen <jyrki@di.ku.dk> on 2018-02-17.