Date: 23 Jan 2007 From: jyrki@diku.dk Subject: PE-lab: Annual Report 2006 Performance Engineering Laboratory (PE-lab) http://www.diku.dk/~jyrki/PE-lab/ * Annual Report 2006 In 2006, the main focus of our research has been to achieve stronger guarantees for generic software components. Most work has been done under the auspices of the CPH STL project. For the research done, the following people were active contributors: Hervé Brönnimann (external collaborator, Polytechnic University) Filip Bruman (B.Sc. student) Christopher Derek Curry (Industrial partner, SolvIt Consulting) Amr Elmasry (External collaborator, Alexandria University) Gianni Franceschini (External collaborator, University of Pisa) Csaba Gulyás (exchange student from the Budapest University of Technology and Economics) Torben Hagerup (external collaborator, Universität Augsburg) Claus Jensen (M.Sc. student) Andrei Bjarke Moskvitin Josephsen (M.Sc. student) Jyrki Katajainen (Lab leader) Stephan Lynge Herlev Larsen (M.Sc. student) Pat Morin (external collaborator, Carleton University) Jacob De Fine Skibsted (M.Sc. student) Fabio Vitale (External M.Sc. student, University of Insubria). Most research articles written are available online via the CPH STL website http://cphstl.dk, and all theses completed via the PE-lab's home page http://www.diku.dk/~jyrki/PE-lab/. A more detailed description of the projects completed or started in 2006 is given below. * Comparison complexity of priority-queue operations [Amr, Claus, and Jyrki] The comparison complexity of priority-queue operations was studied. More precisely, the operations to be supported include find-min(), insert(), decrease(), and delete(). Two diffident approaches were investigated, one relying on heap-order violations (relaxed heaps) and another relying on structural violations (pruned heaps). In general, the use of heap-order violations gives a better bound for delete() than the use of structural violations. However, pruned heaps could be modified to mimic relaxed heaps, and thereby the same bounds could be achieved by both approaches. Research output: 2 articles * Experimental evaluation of priority queues [Claus, Fabio, and Jyrki] The practical efficiency of two priority-queue structures were studied: local heaps, which are as classical binary heaps but have another memory layout, and navigation piles, which allow very compact memory representation. Several implementations of both structures were considered and their efficiency was experimentally evaluated in many different test scenarios by letting the type of input as well as the type of ordering function used vary. For local heaps, the research focused on the cache behaviour of various heap data structures. As for cache-aware heaps, our results show that some variants are competitive with classical binary heaps (when the number of elements stored is not small). In particular, local heaps significantly reduce the number of cache misses, and avoid an increase in the number of element comparisons and element moves performed. For navigation piles, the main goal was to analyse the efficiency of compact data structures contra classical implicit heaps. Our experiments show that, when element moves are expensive, navigation piles can be an alternative to binary heaps. Research output: 2 technical reports * Experimental evaluation of search trees [Hervé and Jyrki] Several state-of-the-art implementations of red-black trees were presented and evaluated. The techniques yield implementations of C++ standard-compatible set and map associative containers, which guarantee logarithmic worst-case performance, provide iterators with strict invalidation rules, and lower storage requirements to three, sometimes two, pointers per node. Space-efficiency is important on modern platforms, because the efficiency of a data structure is more and more dependent on smaller storage requirements and depends intricately on several factors, not all related to instruction count (such as efficient cache utilization). Research output: 1 technical report * Optimally subroutine-oblivious algorithms [Gianni and Jyrki] The concept of cache obliviousness was introduced in 1999. A cache-oblivious algorithm is expected to perform well without having any knowledge about the parameters of the underlying memory system. A new way of seeing this is to require that the algorithm performs well independent of the cost of element reads and element writes, elements being the objects manipulated. The cost of these primitives can vary because sequential access may be considerably faster than random access. We generalized the concept of cache obliviousness to subroutine obliviousness to cover any operation, the cost of which is not known at development time. Such a situation occurs whenever a function takes a function parameter, the type/cost of which is unspecified (e.g. in generic programming). An optimally subroutine-oblivious algorithm guarantees the best possible running time without knowing the cost of the unspecified operations. As a specific case study, we presented a subroutine-oblivious algorithm for solving the 0/1-sorting problem defined as follows. In the 0/1-sorting problem, given a sequence S of elements drawn from a universe U and a characteristic function f: U -> {0,1}, the task is to rearrange the elements in S so that every element x, for which f(x)=0, is placed before any element y, for which f(y)=1. Moreover, this reordering should be done stably without altering the relative order of elements having the same f-value, and space efficiently using only O(1) words of extra space. Our algorithm is optimally cache oblivious and subroutine oblivious with respect to f (but not with respect to element moves). Research output: 1 technical report * Stronger guarantees for standard-library containers [Jyrki] Several classical sorting and searching problems have been studied in a new setup where the focus has been on issues like iterator validity, exception safety, space efficiency, comparison complexity, and subroutine obliviousness. The work has been both theoretical and empirical. Research output: several student projects, but no articles * Space-efficient data structures [Hervé, Jyrki, and Pat] Consider a data structure D that stores a dynamic collection of elements. Assume that D uses a linear number of words in addition to the elements stored. In this work several data-structural transformations are described that can be used to transform D into another data structure D' that supports the same operations as D, has considerably smaller memory overhead than D, and performs the supported operations by a small constant factor or a small additive term slower than D, depending on the data structure and operation in question. The compaction technique has been successfully applied for linked lists, dictionaries, and priority queues. Research output: a journal submission * Organizational design [Christopher and Jyrki] An international edition of our book "Reengineering a university department" was released. To get your own copy of the book, please, check the ordering details from the home page for this book: http://www.diku.dk/~jyrki/bpr/. Research output: 1 book * Investigations into efficient priority queues [Claus and Fabio] In this work both the practical and theoretical efficiency of known priority-queue structures was studied (multiary heaps, run-relaxed heaps), as well as new priority-queue structures were developed (navigation piles, local heaps, multipartite binomial queues, multipartite relaxed binomial queues). A brief summary of our results are given below: Multiary heaps: The practical efficiency of in-place multiary heaps studied by developing a number of alternative implementations of them. The implementations were evaluated using different types of inputs and ordering functions. The results of the experimental evaluation show that no single heapifying strategy has the best performance for all the different types of inputs and ordering functions, but Floyd's bottom-up heapifying strategy has a good performance for most types of inputs and ordering functions. Local heaps: The cache behaviour of cache-aware heaps was studied experimentally. Some variants turned out to be competitive with classical binary heaps (when the number of elements stored is not small). When executing a sequence of priority-queue operations, local heaps significantly reduce the number of cache misses, and avoid an increase in the number of element comparisons and element moves performed. Navigation piles: This new data structure was designed from scratch, and analysed both theoretically and experimentally. The practical efficiency of several forms of navigation piles was compared to that of several forms of binary heaps for different types of inputs and ordering functions. The experiments show that, when element moves are expensive, navigation piles can be an alternative to binary heaps. Run-relaxed heaps: This well-known data structure was thoroughly analysed and minor improvements were proposed for the realization of its priority-queue operations. Multipartite binomial queues: A framework for reducing the number of element comparisons performed in priority-queue operations was introduced. The application of this framework gives a priority queue which guarantees the worst-case cost of O(1) per find-min() and insert(), and the worst-case cost of O(log n) with at most log n + O(1) element comparisons per delete-min() and delete(). Furthermore, in addition to the above-mentioned operations, a priority queue that provides decrease() was given. This priority queue achieves the worst-case cost of O(1) per find-min(), insert(), and decrease(); and the worst-case cost of O(log n) with at most log n + O(log log n) element comparisons per delete-min() and delete(). Research output: 2 M.Sc. theses * Positional sequences with less-than-comparable iterators [Andrei, Jyrki, and Torben] In this study, which is just in its initial phase, the classical order-maintenance problem is studied. In this problem the task is to maintain a linked list that allows new elements to be inserted just before a given element, old elements to be removed, and queries to determine the relative order of two elements currently held in the list. All the key operations are to be carried out in constant time. In particular, we are interested in worst-case bounds, rather than amortized bounds. To obtain constant time bounds, it is essential that the locations of the elements operated on are given. In modern terms, the task is to implement a positional sequence providing bidirectional iterators that can be compared in constant time. The goal in this research is to design a data structure, the manipulation of which is possible by simple algorithms whose the correctness proof and the runtime analysis are simple. Research output: 1 M.Sc. thesis * Spam filtering [Csaba] The phenomenon of junk e-mail is an increasing problem in many organizations. To avoid that employers use their valuable time for sorting out unwanted e-mail, spam filtering is often necessary. To avoid that unwanted e-mail is not distributed further, spam filtering may also be done by Internet service providers. In this work, the history of spam and the methods used in spam filtering are surveyed. Based on a statistical analysis of the accuracy of existing spam filters, an idea on the construction of a meta spam filter was introduced, in order to achieve even higher accuracy and effectiveness in spam filtering. Research output: 1 M.Sc. thesis * Distributing usage of bandwidth for on-demand streaming [Jacob and Stephan] The overall goal in this work was to distribute the usage of bandwidth in a client-server network providing video-on-demand streaming. To achieve this, a protocol for on-demand streaming was designed and implemented by employing methods resembling those used in peer-to-peer networks. In our implementation the data stream is divided into pieces in order to enable multiple clients to send part of the data stream to a single receiver and at the same time offer the full functionalities of on-demand streaming such as pause and skip. By letting clients distribute the data the bandwidth usage of the central server will be lowered, but an analysis of the actual savings is still open as this would demand a large-scale real-life usage of the protocol together with empirical measurements. Research output: 1 M.Sc. thesis * Designing a generic string class [Filip] A sequence of symbols, or a string, is a data structure needed in many applications. In the C++ standard library, the string class is built above C strings and the underlying assumption is that the symbols manipulated are characters. Acknowledging the fact that in different applications the symbol alphabet can be small (like in DNA sequences) or large (like in Unicode strings), the goal laid out in this work was to design a generic container that would be efficient for all kinds of symbol sequences manipulated in different environments. The primal design issues considered were flexibility and modularity. By splitting the string class into two, a core class and a utility class, the core class becomes a container supporting a small set of fundamental functions which can be used for the implementation of the more complex functions available at the utility class. This design allows the co-existence of several implementations of the core class, each providing different tradeoffs as to the efficiency of the core functions. Additional design issues treated were immutability, copy-on-write, and reference counting. Research output: 1 B.Sc. thesis