Projects

My current project is to provide a scalable, high performance storage service to large scale clusters of workstations executing data-intensive distributed applications, e.g., data mining applications, web search engines. In clusters of workstations, each workstation is often equipped with local disk storage, and the requirements for high volume, high bandwidth file access can easily be satisfied by the aggregate bandwidth and storage space of the disks in the cluster, but with traditional networking technologies this sharing is inefficient.

Emerging cluster networks provide high bandwidth, low overhead communication, and allow the network interfaces to communicate directly with other devices on the same I/O bus. We propose to use these features to design and implement an operating system abstraction called a data storage channel that facilitates low overhead sharing of the distributed data storage in clusters of workstations by providing the nodes with synchronised access to remote disks without incurring a large processing overhead on the nodes hosting the remote disks. Using these data storage channels we will design and implement a prototype distributed file system enabling nodes in the cluster to aggregate several data storage channels to increase the data storage I/O bandwidth and to increase reliability through data replication. The performance of the prototype will be evaluated by measuring the performance of existing applications on a large scale cluster.



Maintained by Jørgen Sværke Hansen / cyller@diku.dk