Data Management Systems – University of Copenhagen

Forward this page to a friend Resize Print Bookmark and Share

Department of Computer Science DIKU > Research > Algorithms and Programming Languages Section (APL) > Data Management Systems

Data Management Systems

The Data Management Systems group conducts computer systems research in areas emerging with new challenges in data management. Projects include design of spatial databases, scalable data streaming, actor database systems and in-memory databases, graph analysis systems, and cloud computing deployments. The group is keen on validating their work experimentally -- we love writing code, which is not to say that our love for the blackboard is in any way diminished. :-)

When conducting our work we usually resort to one or more of the following:

  • Abstractions & Languages
  • Combinatorial Optimization
  • Indexing & Data Structures
  • System Implementation & Design
  • Statistics & Prediction
  • Parallelism & Distribution

You can learn about our work in detail through our publications.

Current Projects

Actor Database Systems 

With changing architectural and application trends, we are re-visiting the design of online transaction processing databases by pursuing an integration of the actor model with relational database systems. We are studying the programming model of relational actors (or reactors) to achieve in-memory transaction processing that allows for flexible programming, high-level reasoning about transaction latencies, high resource utilization and flexibility in database architecture. To demonstrate these principles, we are building an in-memory database system called ReactDB. 

Scalable Stream Processing

Streaming data appear in many applications, including finance, sensor networks, Social Media, etc. Online filtering, aggregating as well as detecting complex events over such data is crucial to these applications. We especially focus on the scalability problems in stream processing and we have developed and implemented novel algorithms for memory management, stream dissemination, query optimization, load balancing, dynamic and elastic scaling, and fault-tolernace in distributed stream processing systems. These algorithms and techniques have been implemented in a prototype system called Enorm. 

Data Platforms for Geospatial Data 

Our group is participating in the Future Cropping project, a large collaboration in Denmark in the domain of precision agriculture. Future Cropping aims at developing a new generation of tools for improving farming practices by leveraging data from new sensors, such as farm machinery, drones, and satellites as well as open geospatial data. Our contribution to the project is focused on scalability for the project's data platform, which aims at serving geospatial data to a set of added-value analytical services in agriculture. In addition to Future Cropping, we are collaborating with the Machine Learning group on the GANDALF project on spatial prediction of urban contamination.    

Scalable Graph Analysis Systems

There is an increasing amount of data that takes the form of complex graphs in various applications, such as social network, linked data, telecommunication networks, chemistry, life science, etc. The analysis of such data should not only focus on the attributes attached to the nodes or edges, but also on the way how the nodes are interconnected. In this project, we focus on the scalability issues of graph processing, and develop algorithms and techniques to enahnce the scalability of the graph querying and analysis systems. In particular, we have developed a prototye system, called SemStore, to manage and query large-scale RDF data over a cluster of computers. SemStore adopts a path-based RDF data partitioning method and a highly efficient query optimizer, which enhances the system's scalability and query efffiency by minimizing the use of distributed joins and maximizing the parallelism of the query processor. It is shown that SemStore outperforms the state-of-the-art techniques by orders of magnitude for complex graph queries.      

Open Geodata Serving

In a collaboration with the Danish Geodata Agency, we have explored new approaches to cook and serve geodata to the public on the Web. A main challenge in cartography is producing maps of high quality over complex shapes requires the craft of human expertise. However, given the explosion in geospatial data, the pressure for high-productivity tools for cartography is increasing at a fast pace. Our work has explored how to create a new class of declarative cartography tools. Our language CVL, the Cartographic Visualization Language, can be processed entirely within a spatial DBMS, opening up exciting opportunities for automatic optimization and scalability. In a separate line of work, we have also analyzed production logs for map-serving web services. These production logs reveal strong spatial and temporal concentration patterns which can be exploited for more efficient caching.

Past projects

Behavioral Simulations and Computer Games

In collaboration with the Cornell Database Group, we have worked on a new scripting platform for games and agent-based simulations. Our recent work in this project has been around iterated spatial join techniques optimized for main memory, as well as communication, especially latency, optimizations for cloud environments. We have also explored techniques for automatic parallelization of large-scale behavioral simulations, as well as efficient checkpoint-recovery techniques for Massively Multiplayer Online Games (MMOs).

Multidimensional Indexing and Large Main Memories

We have also studied index structures for either read-intensive or write-intensive workloads. For the first class of workloads, we have studied experimentally, together with collaborators from Saarland University and ETH Zurich, the performance of one specific index structure, the Dwarf index. For the second class of workloads, we have studied how to answer queries over collections of moving objects, e.g., for vehicle tracking or spatial agent-based simulations. The problem is challenging because these applications have very high update rates that result from continuous movement. Our technique, MOVIES, is based on frequently rebuilding index snapshots in main memory. Using data partitioning over multiple nodes in a small cluster, we have scaled MOVIES up to 100 million moving objects over the road network of Germany, while keeping snapshot latencies below a few seconds.

Dataspaces and Personal Information Management

In early work at the ETH Zurich Systems Group, we have co-designed the iMeMex Dataspace Management System, a hybrid information integration architecture that allows users to transition from search to data integration in a pay-as-you-go fashion. Unlike traditional relational DBMS, iMeMex does not take full control of the data, but offers services over one's complex personal dataspace. We have explored several interesting themes in the design of iMeMex, such as the definition of a unified data model for personal information, a novel technique based on mapping hints (called trails) to increase the level of integration of personal information over time, and the search over graphs of user data created by view definitions.

 Courses at DIKU

Bachelor Courses

Master Courses

Research seminars and reading groups