Date: 23 Jan 2007
From: jyrki@diku.dk
Subject: PE-lab: Annual Report 2006

Performance Engineering Laboratory (PE-lab)
http://www.diku.dk/~jyrki/PE-lab/

* Annual Report 2006

In 2006, the main focus of our research has been to achieve stronger
guarantees for generic software components. Most work has been done
under the auspices of the CPH STL project. For the research done, the
following people were active contributors:

Hervé Brönnimann (external collaborator, Polytechnic University)
Filip Bruman (B.Sc. student)
Christopher Derek Curry (Industrial partner, SolvIt Consulting)
Amr Elmasry (External collaborator, Alexandria University)
Gianni Franceschini (External collaborator, University of Pisa)
Csaba Gulyás (exchange student from the Budapest University of 
Technology and Economics)
Torben Hagerup (external collaborator, Universität Augsburg)
Claus Jensen (M.Sc. student)
Andrei Bjarke Moskvitin Josephsen (M.Sc. student)
Jyrki Katajainen (Lab leader)
Stephan Lynge Herlev Larsen (M.Sc. student)
Pat Morin (external collaborator, Carleton University)
Jacob De Fine Skibsted (M.Sc. student)
Fabio Vitale (External M.Sc. student, University of Insubria).

Most research articles written are available online via the CPH STL
website http://cphstl.dk, and all theses completed via the PE-lab's
home page http://www.diku.dk/~jyrki/PE-lab/.

A more detailed description of the projects completed or started in
2006 is given below.


* Comparison complexity of priority-queue operations [Amr, Claus, and Jyrki]

The comparison complexity of priority-queue operations was studied.
More precisely, the operations to be supported include find-min(),
insert(), decrease(), and delete(). Two diffident approaches were
investigated, one relying on heap-order violations (relaxed heaps) and
another relying on structural violations (pruned heaps). In general,
the use of heap-order violations gives a better bound for delete()
than the use of structural violations.  However, pruned heaps could be
modified to mimic relaxed heaps, and thereby the same bounds could be
achieved by both approaches.

Research output: 2 articles


* Experimental evaluation of priority queues [Claus, Fabio, and Jyrki]

The practical efficiency of two priority-queue structures were
studied: local heaps, which are as classical binary heaps but have
another memory layout, and navigation piles, which allow very compact
memory representation. Several implementations of both structures were
considered and their efficiency was experimentally evaluated in many
different test scenarios by letting the type of input as well as the
type of ordering function used vary.  For local heaps, the research
focused on the cache behaviour of various heap data structures. As for
cache-aware heaps, our results show that some variants are competitive
with classical binary heaps (when the number of elements stored is not
small). In particular, local heaps significantly reduce the number of
cache misses, and avoid an increase in the number of element
comparisons and element moves performed.  For navigation piles, the
main goal was to analyse the efficiency of compact data structures
contra classical implicit heaps.  Our experiments show that, when
element moves are expensive, navigation piles can be an alternative to
binary heaps.

Research output: 2 technical reports


* Experimental evaluation of search trees [Hervé and Jyrki]

Several state-of-the-art implementations of red-black trees were
presented and evaluated. The techniques yield implementations of C++
standard-compatible set and map associative containers, which
guarantee logarithmic worst-case performance, provide iterators with
strict invalidation rules, and lower storage requirements to three,
sometimes two, pointers per node.  Space-efficiency is important on
modern platforms, because the efficiency of a data structure is more
and more dependent on smaller storage requirements and depends
intricately on several factors, not all related to instruction count
(such as efficient cache utilization).

Research output: 1 technical report


* Optimally subroutine-oblivious algorithms [Gianni and Jyrki]

The concept of cache obliviousness was introduced in 1999. A
cache-oblivious algorithm is expected to perform well without having
any knowledge about the parameters of the underlying memory system.  A
new way of seeing this is to require that the algorithm performs well
independent of the cost of element reads and element writes, elements
being the objects manipulated. The cost of these primitives can vary
because sequential access may be considerably faster than random
access.  We generalized the concept of cache obliviousness to
subroutine obliviousness to cover any operation, the cost of which is
not known at development time. Such a situation occurs whenever a
function takes a function parameter, the type/cost of which is
unspecified (e.g. in generic programming).  An optimally
subroutine-oblivious algorithm guarantees the best possible running
time without knowing the cost of the unspecified operations.

As a specific case study, we presented a subroutine-oblivious
algorithm for solving the 0/1-sorting problem defined as follows.  In
the 0/1-sorting problem, given a sequence S of elements drawn from a
universe U and a characteristic function f: U -> {0,1}, the task is to
rearrange the elements in S so that every element x, for which f(x)=0,
is placed before any element y, for which f(y)=1.  Moreover, this
reordering should be done stably without altering the relative order
of elements having the same f-value, and space efficiently using only
O(1) words of extra space.  Our algorithm is optimally cache oblivious
and subroutine oblivious with respect to f (but not with respect to
element moves).

Research output: 1 technical report


* Stronger guarantees for standard-library containers [Jyrki]

Several classical sorting and searching problems have been studied in
a new setup where the focus has been on issues like iterator validity,
exception safety, space efficiency, comparison complexity, and
subroutine obliviousness.  The work has been both theoretical and
empirical.

Research output: several student projects, but no articles


* Space-efficient data structures [Hervé, Jyrki, and Pat]

Consider a data structure D that stores a dynamic collection of
elements. Assume that D uses a linear number of words in addition to
the elements stored.  In this work several data-structural
transformations are described that can be used to transform D into
another data structure D' that supports the same operations as D, has
considerably smaller memory overhead than D, and performs the
supported operations by a small constant factor or a small additive
term slower than D, depending on the data structure and operation in
question. The compaction technique has been successfully applied for
linked lists, dictionaries, and priority queues.

Research output: a journal submission


* Organizational design [Christopher and Jyrki]

An international edition of our book "Reengineering a university
department" was released. To get your own copy of the book, please,
check the ordering details from the home page for this book:
http://www.diku.dk/~jyrki/bpr/.

Research output: 1 book


* Investigations into efficient priority queues [Claus and Fabio]

In this work both the practical and theoretical efficiency of known
priority-queue structures was studied (multiary heaps, run-relaxed
heaps), as well as new priority-queue structures were developed
(navigation piles, local heaps, multipartite binomial queues,
multipartite relaxed binomial queues). A brief summary of our
results are given below:

Multiary heaps: The practical efficiency of in-place multiary heaps
  studied by developing a number of alternative implementations of
  them. The implementations were evaluated using different types of
  inputs and ordering functions. The results of the experimental
  evaluation show that no single heapifying strategy has the best
  performance for all the different types of inputs and ordering
  functions, but Floyd's bottom-up heapifying strategy has a good
  performance for most types of inputs and ordering functions.

Local heaps: The cache behaviour of cache-aware heaps was studied
  experimentally. Some variants turned out to be competitive with
  classical binary heaps (when the number of elements stored is not
  small). When executing a sequence of priority-queue operations,
  local heaps significantly reduce the number of cache misses, and
  avoid an increase in the number of element comparisons and element
  moves performed.

Navigation piles: This new data structure was designed from scratch,
  and analysed both theoretically and experimentally.  The practical
  efficiency of several forms of navigation piles was compared to that
  of several forms of binary heaps for different types of inputs and
  ordering functions. The experiments show that, when element moves
  are expensive, navigation piles can be an alternative to binary
  heaps.

Run-relaxed heaps: This well-known data structure was thoroughly
  analysed and minor improvements were proposed for the realization of
  its priority-queue operations.
 
Multipartite binomial queues: A framework for reducing the number of
  element comparisons performed in priority-queue operations was
  introduced. The application of this framework gives a priority queue
  which guarantees the worst-case cost of O(1) per find-min() and
  insert(), and the worst-case cost of O(log n) with at most log n +
  O(1) element comparisons per delete-min() and delete().
  Furthermore, in addition to the above-mentioned operations, a
  priority queue that provides decrease() was given. This priority
  queue achieves the worst-case cost of O(1) per find-min(), insert(),
  and decrease(); and the worst-case cost of O(log n) with at most log
  n + O(log log n) element comparisons per delete-min() and delete().

Research output: 2 M.Sc. theses


* Positional sequences with less-than-comparable iterators [Andrei,
  Jyrki, and Torben]

In this study, which is just in its initial phase, the classical
order-maintenance problem is studied.  In this problem the task is to
maintain a linked list that allows new elements to be inserted just
before a given element, old elements to be removed, and queries to
determine the relative order of two elements currently held in the
list. All the key operations are to be carried out in constant
time. In particular, we are interested in worst-case bounds, rather
than amortized bounds. To obtain constant time bounds, it is essential
that the locations of the elements operated on are given.  In modern
terms, the task is to implement a positional sequence providing
bidirectional iterators that can be compared in constant time. The
goal in this research is to design a data structure, the manipulation
of which is possible by simple algorithms whose the correctness proof
and the runtime analysis are simple.

Research output: 1 M.Sc. thesis


* Spam filtering [Csaba]

The phenomenon of junk e-mail is an increasing problem in many
organizations.  To avoid that employers use their valuable time for
sorting out unwanted e-mail, spam filtering is often necessary. To
avoid that unwanted e-mail is not distributed further, spam filtering
may also be done by Internet service providers.  In this work, the
history of spam and the methods used in spam filtering are surveyed.
Based on a statistical analysis of the accuracy of existing spam
filters, an idea on the construction of a meta spam filter was
introduced, in order to achieve even higher accuracy and effectiveness
in spam filtering.

Research output: 1 M.Sc. thesis


* Distributing usage of bandwidth for on-demand streaming [Jacob and 
Stephan]

The overall goal in this work was to distribute the usage of bandwidth
in a client-server network providing video-on-demand streaming. To
achieve this, a protocol for on-demand streaming was designed and
implemented by employing methods resembling those used in peer-to-peer
networks. In our implementation the data stream is divided into pieces
in order to enable multiple clients to send part of the data stream to
a single receiver and at the same time offer the full functionalities
of on-demand streaming such as pause and skip. By letting clients
distribute the data the bandwidth usage of the central server will be
lowered, but an analysis of the actual savings is still open as this
would demand a large-scale real-life usage of the protocol together
with empirical measurements.

Research output: 1 M.Sc. thesis


* Designing a generic string class [Filip]

A sequence of symbols, or a string, is a data structure needed in many
applications. In the C++ standard library, the string class is built
above C strings and the underlying assumption is that the symbols
manipulated are characters.  Acknowledging the fact that in different
applications the symbol alphabet can be small (like in DNA sequences)
or large (like in Unicode strings), the goal laid out in this work was
to design a generic container that would be efficient for all kinds of
symbol sequences manipulated in different environments.  The primal
design issues considered were flexibility and modularity. By splitting
the string class into two, a core class and a utility class, the core
class becomes a container supporting a small set of fundamental
functions which can be used for the implementation of the more complex
functions available at the utility class.  This design allows the
co-existence of several implementations of the core class, each
providing different tradeoffs as to the efficiency of the core
functions.  Additional design issues treated were immutability,
copy-on-write, and reference counting.

Research output: 1 B.Sc. thesis