Searching for similar binding sites

M.Sc. Thesis by Anders Thøgersen


Abstract

In protein research it is often important to determine with which other proteins an interaction may occur. This can help in definition of biological role of the protein, such as inhibition or enhancement of particular functions. The criteria that makes an interaction possible is that the two proteins contain binding sites that structurally and chemically allow the interaction to take place.

In this work a method is developed to allow the deduction of possible new protein interactions by looking at the set of protein interactions that are already known.

A framework has been created for experimenting with superimposition of protein binding sites. The central methods employed are Particle Swarm Optimization and Iterative Closest Point. Because the same protein binding site may have variations across protein families, the removal of outliers, i.e. atoms that should be discarded to achieve a good superimposition of the binding site core, has been a central theme of this work.

The developed methodology is calibrated on representative data and the quality of superimpositions made is compared with MultiBind, an existing program for solving this problem that uses a fundamentally different approach.

Good superimpositions are achieved with geometrically similar structures, but biologically significant results are obtained less frequently. However, some experiments indicate that there is room for significantly improving performance on biologically significant results.


Resumé

I proteinforskning er det ofte vigtigt at kunne bestemme med hvilke andre proteiner en interaktion kan finde sted. Dette kan hjælpe med at definere et proteins biologiske rolle som inhibitor eller fremmer af bestemte funktioner. Det der gør interaktion mellem to proteiner mulig er at de indeholder en region der strukturelt og kemisk tillader interaktionen at finde sted.

I dette arbejde udvikles en metode der tillader deduktion af mulige nye interaktioner mellem proteiner ved at se på det sættet af allerede kendte protein interaktioner.

Programmel er blevet udviklet som muliggør eksperimenteren med samlægning af proteiners bindings regioner. De centrale metoder der tages i brug er Particle Swarm Optimization og Iterative Closest Point. Fordi den samme bindingsregion i et protein kan have variationer i forskellige protein familier, har fjernelse af overflødige atomer i opnåelsen af en god samlægning af de centrale dele af en bindingsregion været et centralt tema i dette arbejde.

Den udviklede metode er blevet kalibreret på representativ data og kvaliteten af de opnåede samlægninger sammenlignes med MultiBind som er et eksisterende program til at løse det samme problem, men med en fundamentalt anderledes metode.

Gode samlægninger opnås med protein strukturer der ligner hinanden geometrisk, men biologisk signifikante resultater opnås mindre ofte. Dog indikerer nogle eksperimenter at der er plads til at forbedre ydelsen på biologisk signifikante resultater betydeligt.


The thesis is available for download here.