next up previous
Next: Criteria for success

Automated discovery of format support

The T-diagram notation (or Bratman diagram notation, see [2,4]) is a handy, visual notation for describing the different ways in which a program can executed, e.g. ``compiled to byte code, then interpreted'' or ``compiled to C, then compiled to machine code and executed directly''.

The goal of this project is to use T-diagrams for modelling any given server's collection of compilers, interpreters, byte machines etc. We do not restrict ourselves to ordinary programs - a compiler could also be a converter like ps2pdf, and an interpreter could be a rendering program like acroread - so this collection actually defines which file formats are supported by the server. The model of a server can be used to answer questions like ``how could I run a Lisp program on my server?'' or ``is it possible to view a Word document on my standard BSD PC?''. Automatic deduction of these kinds of information is strongly needed in the world of digital archives, especially in archives of web pages, where the number of formats are often enormous. With this bachelor project, a novel, prototype tool for addressing these challenges will be developed. For more on the applications, see [3].

The project should define a grammar for a T-diagram language. At some stage, the language should also be able to describe a database of formats, compilers and interpreters as well as a server configured with a set of installed programs. On the basis of the T-diagram, the project should provide an implementation that supports some of the following functionalities:




next up previous
Next: Criteria for success
Jakob Grue Simonsen 2005-12-16