As a PhD student, advised by Prof. Stephen M. Watt, I have studied various
aspects related to language interoperability.
The original motivation for my research was the observation that
although parametric polymorphism was already mainstream,
software component architectures of that time, such as CORBA, JNI, DCOM,
were lagging behind the advances in common programming practice and were
not supporting parametric polymorphism across languages boundaries.
In this context, I investigated how to resolve different binding times
and parametrization semantics in a range of representative languages, such
as C++, Java, Aldor, Maple, and have identified a common ground that could
be suitable mapped to different language bindings.
This work has resulted into two frameworks:
Alma [16,20] that allows interoperability between two very different
computer algebra systems (Aldor and Maple), and GIDL [9,14,15,19]
(Generic Interface Definition Language), a more systematic solution for
parametric polymorphism that is designed as a generic extension framework,
that can be easily adapted to work on top of various component architectures.
I have studied in collaboration with Prof. Alan Mycroft topics
related to dynamic extraction and optimization of parallelism.
In particular:
Thread-Level Speculation (TLS) is a parallelization technique that allows
loop iterations to be executed (optimistically) in parallel even in the
presence of cross-iteration dependencies, which are tracked and fixed at
runtime.
Related work on software TLS has studied
the case when one (overarching) TLS implementation is used to
disambiguate dependencies for the whole memory footprint of the
application. In this setting good speedup is typically achieved
when enough accesses are disambiguated statically by the compiler and only
a few other requires TLS support.
Our software-TLS work [10,12,13] has studied in some sense the dual
problem that addresses a max-min performance question:
``What speedup can TLS achieve when all accesses require TLS
disambiguation?''
In this setting different TLS implementations
disambiguate disjoint memory partitions and each TLS implementation
is tuned to take advantage of the access patterns of the variables
that inhabit the corresponding memory partition.
More information and resources related to this work, in particular the
PolyLibTLS library, can be found here.
Tracing algorithms are typically parallelized by associating
the working queue a processor-centric semantics in that
the working queue is split across processors, and each holds
elements that are to be processed by its corresponding processor.
We have proposed a parallelization technique [11], in which the
working queue has a memory-centric semantics in that
the memory is over-partitioned and a working-queue is associated
to each partition and holds addresses that belong to that partition.
This enhances locality of reference, but more importantly,
it eliminates locking from the critical path of the algorithm.
In collaboration with Fabian Gieske and Christian Igle, we have used
similar techniques [1,2] to improve locality-of-reference for a GPGPU
implementation of kd-trees, named Buffer kd-tree. This was
applied to solve a big-data nearest-neighbor problem, in the
context of classifying celestial bodies into galaxies and starts.
I have studied in collaboration with Prof. Lawrence Rauchwerger techniques
ranging from static to dynamic (and anywhere in the middle) for automatic
parallelization of Fortran loops.
Entirely static analysis has been relatively successful in parallelizing
(small) loops with ``simple'' control flow and affine subscripts, but is
too conservative when these assumptions do not hold.
On the other hand, entirely dynamic analysis of memory-reference traces,
e.g., thread-level speculation, can be applied aggressively, but incurs
significant (and often non-scalable) overheads.
I have built on Rus and Rauchwerger work in a project aimed at unifying
static and dynamic analysis into a hybrid Fortran77 compiler framework,
which succeeds in detecting and optimizing parallelism for a large
class of difficult loops at negligible (or in the worst case scalable)
runtime overheads.
The idea has been to use static, interprocedural analysis (i) to model
loop parallelism as an equation on abstract sets (of array references),
and furthermore (ii) to extract a cascade of sufficient conditions for
parallelism, of increasing time complexity, that are verified at runtime [7,8].
A systematic (automatic) evaluation of about 30 benchmarks from SPEC and
Perfect-Club suites, totaling over 1000 loops, has demonstrated the viability
of the approach and has shown that our technique outperforms commercial
compilers by a significant margin.
Working in collaboration with Troels Henriksen on designing and implementing
Futhark [3--6]: a pure-functional, map-reduce language supporting (parallel) bulk
operation on regular arrays. We target efficient execution on many-core
architectures such as GPGPU.
Supervised Students
Troels Henriksen."Exploiting functional invariants to optimise parallelism: a dataflow approach",
MSc Thesis, DIKU, University of Copenhagen, February 2014.
PDF
Refereed Technical Papers
Fabian C. Gieseke, Justin Heinermann, Cosmin E. Oancea and Christian Igel."Buffer k-d trees: processing massive nearest neighbor
queries on GPUs",
31st Int. Conf. on Machine Learning (ICML'14), Beijing, China, 2014.
PDF
Fabian C. Gieseke, Kai L. Polsterer, Cosmin E. Oancea and Christian Igel."Speedy Greedy Feature Selection: Better Redshift
Estimation via Massive Parallelism.",
22nd European Symposium on Artificial Neural Networks (ESANN),
pp. 87-92, Belgium, 2014.
Troels Henriksen, Martin Elsman, and Cosmin E. Oancea. "A Hybrid Approach to Size Inference in Futhark",
3rd ACM-SIGPLAN Workshop on Functional High-Performance Computing (FHPC),
Guthenburg, Sweden, September 2014.
PDF
Troels Henriksen and Cosmin E. Oancea. "Bounds Checking: An Instance of Hybrid Analysis",
ACM SIGPLAN Int. Workshop on Libraries, Languages and Compilers for
Array Programming (ARRAY). Edinburgh, UK, June 2014.
PDF
Troels Henriksen and Cosmin E. Oancea. "A T2 Graph-Reduction Approach To Fusion",
2nd ACM SIGPLAN Workshop on Functional High-Performance Computing.
Boston, Massachusetts. September 2013.
PDF
Cosmin E. Oancea, Christian Andreetta, Jost Berthold,
Alain Frisch, and Fritz Henglein. "Financial software on GPUs: between
Haskell and Fortran.",
1st ACM SIGPLAN workshop on Functional high-performance
computing (FHPC ‘12). Copenhagen 2012.
PDF
Cosmin E. Oancea and Lawrence Rauchwerger. "Logical Inference Techniques for Loop Parallelization",
33rd ACM-SIGPLAN Conf. on Prog. Lang. Design and Implem. (PLDI'12),
pp 509-520, June 2012.
PDF
Cosmin E. Oancea and Lawrence Rauchwerger. "A Hybrid Approach to Proving Memory Reference Monotonicity",
24th Int. Lang. and Compilers for Parallel Computing (LCPC'11),
LNCS, Vol 7146, pp 61-75, Sept 2013.
PDF
Cosmin E. Oancea and Stephen M. Watt. "An Architecture for Generic Extensions",
Science of Computer Programming Journal, Elsevier, Vol 76(4),
pp 258-277, doi:10.1016/j.scico.2009.09.008, 2011.
Cosmin E. Oancea, Alan Mycroft and Tim Harris.
"A Lightweight In-Place Implementation for
Software Thread-Level Speculation",
21st ACM Symposium on Parallelism in Algorithms and Architectures (SPAA'09),
August 2009, Calgary, Canada.
PDF
Cosmin E. Oancea, Alan Mycroft and Stephen M. Watt.
"A New Approach to Parallelising Tracing Algorithms",
International Symposium on Memory Management (ISMM'09),
June 2009, Dublin, Ireland.
PDF
Cosmin E. Oancea and Alan Mycroft.
"Set-Congruence Dynamic Analysis for Software Thread-Level Speculation",
Languages and Compilers for Parallel Computing (LCPC'08) 21st Annual Workshop,
August 2008, Edmonton, Canada.
PDF
Cosmin E. Oancea and Alan Mycroft.
"Software Thread-Level Speculation: an Optimistic Library Implementation",
International Workshop on Multicore Software Engineering (IWMSE'08),
pp 23-32 (ACM Digital Library), May 2008, Leipzig, Germany.
PDF
Cosmin E. Oancea and Stephen M. Watt.
"Generic Library Extension in a Heterogeneous Environment",
Library-Centric Software Design LCSD'06,
pp. 25-35, October 2006, Portland, USA.
PDF
Cosmin E. Oancea.
"Parametric Polymorphism for Software Component Architectures and Related Optimizations",
PhD Thesis, Department of Computer Science, The University of Western Ontario,
Advisor: Stephen M. Watt, July 2005, London, ON, Canada.
PDF
Cosmin E. Oancea and Stephen M. Watt.
"Parametric Polymorphism for Software Component Architectures",
ACM Object-Oriented Programming, Systems, Languages and Applications OOPSLA'05,
pp. 147 - 166, October 2005, San Diego, USA.
PDF
Cosmin E. Oancea and Stephen M. Watt.
"Domains and Expressions: An Interface Between Two Approaches to Computer Algebra,"
ACM International Symposium on Symbolic and Algebraic Computation ISSAC'05,
pp. 261 - 269, July 2005, Beijing, China.
PDF
Cosmin E. Oancea, Jason W. Selby, Mark Giesbrecht, and Stephen M. Watt.
"Distributed Models of Thread Level Speculation",
International Conference on Parallel and Distributed Processing Techniques and
Applications PDPTA'05,
pp. 920-927, June 2005, Las Vegas, USA.
PDF
Cosmin E. Oancea, Clare So, and Stephen M. Watt.
"Generalization in Maple",
Maple Conference,
pp. 277-382, July 2005, Waterloo, Canada.
Yannis Chicha, Michael Lloyd, Cosmin E. Oancea, and Stephen M. Watt.
"Parametric Polymorphism for Computer Algebra Software Components",
6th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing SYNASC'04,
pp. 119 - 130, Sept. 2004, Timisoara, Romania.
PDF
Cosmin E. Oancea and Stephen M. Watt.
"A Framework for Using Aldor Libraries with Maple",
EACA'04 satellite conference to the ISSAC'04,
pp. 219 - 224, June 2004, Universidad de Santander, Spain.
PDF
Francois Boulier, Marc Moreno Maza, and Cosmin E. Oancea.
"A new henselian construction and its application to polynomial GCDs over direct products of fields",
EACA'04 satellite conference to the ISSAC'04,
pp. 47 - 52, June 2004, Universidad de Santander, Spain
Radu G. Belu, Cosmin E. Oancea, and A. C. Belu
"A Wavelet-based Simulation for Identification and Classification of Short-Term Power Disturbances
and Transients as a Teaching Aid",
2003 Annual ASEE Conference,
pp. 1380-4. June 2003, Nashville, TN, USA.
Radu G. Belu, Cosmin E. Oancea, and A. C. Belu
"A 2-D indoor radio propagation modeling",
FIE (Frontiers in Education) Confference,
pp. 1780-4. November 2003, Boulder, CO, USA.
Refereed Posters
Cosmin E. Oancea and Alan Mycroft.
"A Lightweight Model for Software Thread-Level Speculation",
International Conference on Parallel Architecture and Compilation Techniques (PACT 2007),
pp. 419, Brasov, Romania.
Cosmin E. Oancea and Stephen M. Watt.
"Domains and Expressions: An Interface between Two Approaches to Computer Algebra",
MITACS 6th Annual Conference,
May 2005, Calgary, Canada.
Cosmin E. Oancea and Stephen M. Watt.
"An Approach to Using Aldor Libraries with Maple",
East Coast Computer Algebra Day ECCAD'05,
May 8, 2004, Waterloo, Canada.
Additional Conference Presentations and Technical Reports
Christian Andreetta, Vivien Begot, Jost Berthold, Martin Elsman,
Troels Henriksen, Maj-Britt Nordfang and Cosmin E. Oancea
"A Financial Benchmark for GPGPU Compilation",
Technical Report No. 2015/02, ISSN 0107-8283,
Department of computer Science (DIKU),
University of Copenhagen Feb. 2015, Copenhagen, Denmark.
PDF
Cosmin E. Oancea and Stephen M. Watt.
"Domains and Expressions: An Interface between Two Approaches to Computer Algebra",
MITACS 6th Annual Conference, Presentation,
May 2005, Calgary, Canada.
Cosmin E. Oancea and Stephen M. Watt.
"Generic IDL: Parametric Polymorphism for Software Component Architectures",
CASCON'03 Workshop on Compiler Driven Performance, Presentation,
October 2003, Toronto, Canada.
Cosmin E. Oancea and Bart Truyen.
"Numerical study of interpolation techniques for use in the MPEG4 facial animation model. Development of fast algorithms for RBF interpolation",
Technical Report, , Department of Image Processing,
Vrije Universitet Brussels, 2000, Brussels, Belgium.