Date: 21 Jan 2005 From: jyrki@diku.dk Subject: PE-lab: Annual Report 2004 Performance Engineering Laboratory (PE-lab) http://www.diku.dk/~jyrki/PE-lab/ * Annual Report 2004 The following people have been involved in PE-lab's activities in 2004: Hervé Brönnimann (External collaborator, Polytechnic University) Christopher Derek Curry (M.Sc. student) Amr Elmasry (External collaborator, Alexandria University) Andrei Bjarke Moskvitin Josephsen (M.Sc. student) Claus Jensen (M.Sc. student) Jyrki Katajainen (Group leader) Tomi Armas Pasanen (External collaborator, University of Helsinki) Fabio Vitale (External M.Sc. student, University of Insubria). Most work done in the group is on empirical algorithmics or on theoretical algorithmics. The projects mentioned below have been completed or started in 2004. * Scientosocial service [Jyrki] Scandinavian Algorithm Week was organized in collaboration with three Danish universities, the University of Copenhagen being the main organizer. The 9th Scandinavian Workshop on Algorithm Theory and its satellite events attracted more than 120 algorithmists and computer scientists to Denmark, and about 80 talks were given during the six days. A longer account about the events can be found from http://www.diku.dk/~jyrki/SWAT/press-release.html http://www.diku.dk/~jyrki/PE-lab/News/All-news/2004.09.10-13%3A13%3A57.html I would like to use this opportunity to thank the Department of Computing for the support we received. * The cost of iterator validity [Jyrki] Iterators are objects that point to other objects. An iterator can be used to iterate over a collection of elements stored in a data structure, or simply to access the element pointed to. An iterator and the element pointed to live in a close symbiosis; when the element is moved, the iterator may become invalid if it is not updated accordingly. A data structure is said to provide iterator validity if the iterators to its elements are kept valid at all times independent of the element moves. As to the cost of keeping iterators valid, it can be shown that for all fundamental data structures (linked lists, resizable arrays, unordered dictionaries, ordered dictionaries, and priority queues) an implementation exists that supports bidirectional iterators, keeps the iterators valid under modifications, executes all iterator operations in constant time in the worst case, and uses linear space on the number of elements stored. For random-access iterators the problem appears to be more difficult. In particular, for vectors the general modification operations insert() and erase(), to be supported by the C++ standard, are too strong. Efficient solutions under these strong modification operations can be provided too, though most of them are mainly of theoretical interest. * An experimental evaluation of navigation piles [Claus, Jyrki] A navigation pile is a priority-queue structure proposed recently by Katajainen and Vitale. It has many nice theoretical features, i.e. in many cases it can be used as a substitute for a binomial queue even if it is only a binary tree. We have evaluated the practical efficiency of navigation piles and made several interesting findings like the two mentioned below: 1) Pointer-based navigation piles have difficulties in matching the performance of classical heaps that are implicit data structures. However, the difference is not that big, and when the elements being manipulated are large, piles are clearly faster than heaps. 2) Compact navigation piles that rely on bit packing can be hopelessly slow on contemporary computers. This made us to conclude that most packed data structures and algorithms relying on packing have very little practical relevance. * The comparison complexity of priority-queue operations [Amr, Claus, Jyrki] A priority queue is an abstract data type that should be provided as a standard library component. In spite of its practical importance, it has been long an open problem how to achieve the best possible bounds --- taking into account constant factors --- for various priority-queue operations. We have been able to devise a realization of a priority queue which has optimal performance when the operations to be supported are insert(), find-min(), delete-min(), and delete(); and almost optimal performance if in addition decrease-key() is to be supported. * A critical analysis of randomized data structures [Hervé, Tomi, Jyrki] The main virtue of randomized algorithms is that often offer good average-case performance without any assumptions about the elements manipulated. In spite of this, randomized data structures --- in our study randomized search trees --- are normally analysed under the assumption that all elements are accessed with equal probability. This gives an overly optimistic picture about their performance. According to our theoretical and experimental analysis, the expected worst-case behaviour of randomized search trees is much worse than that of their best deterministic counterparts. * Building heaps faster [Jyrki] The construction of a heap is a classic data-structuring problem. For all practical purposes the problem has been solved satisfactorily, but as to its comparison complexity there is still a gap between the best known lower bounds and upper bounds. Several conjectures have been made about the optimal behaviour in the worst case and in the average case. We have tried hard to disprove the stated conjectures, and have been able to make some progress in the area of average-case algorithms.