next up previous contents
Next: The data structure of Up: Heap implementations and variations Previous: Performance of D-heaps

The External Heapsort

The memory model assumed for this discussion is a paged memory. The N elements to be sorted are stored in a memory of M pages, with room for S elements per page. Thus N=MS.

The heapsort algorithms seen so far are not very well suited for external sorting. The expected number of page accesses for the basic heapsort in a memory constrained environment (e. g. 3 pages of memory) is tex2html_wrap_inline1347 where N is the number of records and M is the number of disk blocks. We would like it to be tex2html_wrap_inline1353 which typically is orders of magnitude faster. (example: 8 byte element, 4KB page, translates into 512 elements pr page).

NOTE: Be aware that the words 'page access' does not necessarily imply that disk accesses are controlled by the virtual memory subsystem. It may be completely under program control.

One may wonder if the d-heaps described in chapter 5 can be used effeciently as an external sorting method. If we consider the same situation as above, it is seen that if we use the number of elements per page S as the fanout, and align the heap on external memory so that all siblings fit in one page, the depth of the tree will drop with a factor tex2html_wrap_inline1357 and thus the number of page accesses will drop accordingly. As observed in the chapter on d-heaps, the cost of a page access should be weighted against the cost of S comparisons needed to find the maximal child. Anyway, because the cost of a disk access is normally much higher than the time it takes to search linearly through one page of data, this idea is definitely an improvement over the traditional heap.

The drawback of the idea is twofold. Firstly it only improves the external cost with a factor tex2html_wrap_inline1357 , not the desired factor S, and secondly it increases the internal cost with a factor tex2html_wrap_inline1365 . Thus it actually makes sorting much slower in the situation where all data fits in memory. Considering that gigabyte main memories are now commonplace and that terabyte memories will eventually be affordable, it is reasonable to require a good internal performance even of external methods.




next up previous contents
Next: The data structure of Up: Heap implementations and variations Previous: Performance of D-heaps

Jesper Bojesen
Sat Apr 3 18:07:59 METDST 1999