REALbasic Olympics Logo


REALbasic Programming Contest 2005:
Event Number Two

This event runs April 18, 2005 through midnight GMT May 30, 2005


On first impressions this contest might seem similar to contest 1, however while the first contest was about StyledText and Regular Expressions you will need a different set of skills to solve this problem (which I had to solve myself not so long ago).

I hope I made the instructions clear but would appreciate some feedback if you need more information - as usual the fastest correct solution wins ;-)

Contest 2: The Binding Problem

Two common methods used in biological and medical sciences are hybridization and PCR. Both have in common that a short piece of DNA (the oligo or primer) binds to a much longer piece of DNA (the template). Binding is dependend on factors like temperature and salt concentration, so the binding strength must exceed a certain limit for a given set of circumstances. The problem is to find the positions where binding occurs.

This is basically a simple matching problem (finding one string within another) with the quality of the match being calculated.
 

Background:

DNA is a simple language consisting of 4 letters: G, A, T, and C (which stand for Guanine, Adenine, Thymine and Cytosine).

An oligo is a string of DNA letters usually 12 to 60 letters long (current practical synthesis limit is about 100).

Example: CATTGCAAATGAC

A template is a long string of DNA letters (it can be up to several million letters long though for most practical purposes rarely exceeds 100,000):

GATTGCACAGTCGACACATTGCAAATGACCAATTCGCCGAGACAG…

and oligo1 CATTGCAAATGAC would perfectly bind where it matches

                CATTGCAAATGAC
                |||||||||||||
GATTGCACAGTCGACACATTGCAAATGACCAATTCGCCGAGACAG…

A and T bind with a strength of 2, G and C with a strength of 3. So oligo1

                CATTGCAAATGAC
                |||||||||||||
GATTGCACAGTCGACACATTGCAAATGACCAATTCGCCGAGACAG…

has a strength of 3+2+2+2+3+3+2+2+2+2+3+2+3 = 31

A match does not have to be perfect (it rarely is) to allow binding, it just has to be strong enough (let's assume a binding strength of >25 for the purpose of this exercise).

Oligo2 CATTGTAAATGAC does not bind perfectly but does bind:

                CATTGTAAATGAC
                ||||| |||||||
GATTGCACAGTCGACACATTGCAAATGACCAATTCGCCGAGACAG…

with a binding strength of 28 which is higher than the required 25.

As long as a certain given limit is reached (which is dependend on temperature and salt concentration to name just two) binding will occur.

 

NOTE: Binding can take place at more than one position for any given oligo.

                CATTGCAAATGAC                       CATTGTAAATGAC
                |||||||||||||                       | ||| |||||||
GATTGCACAGTCGACACATTGCAAATGACCAATTCGCCGAGACCAGTCGACACTTTGCAAATGACCAATAG…
                binding strength: 31                      binding strength: 27

 

The position of the match is the position of the last DNA letter in the template opposite the last DNA letter in the oligo:

                         TGCGGTGCATTGTAAATGACTGC
                           || |||||||||||||||
     TGGCGAGTAACCATTGCACAGTCGATGCATTGCAAATGACCAATTCGCCGAGACAG…
     1                                         43 is the "binding position"

 

Hint: Beware of boundary cases like this one:

         TGCGGTGCATTGTACATGACTC
                 |||| |||||||
                GATTGCACATGACGACACATTGCAAATGACCAATTCGCCGAGACAG…
                1             15 is the "binding position"

 

To make the problem more challenging and life-like we also need to consider the following situation: People are different from each other. While we have the "same" DNA it is slightly different from person to person. For example one person might have an A at position 387, another person might have a C at this position (resulting in blue eyes for one and brown eyes for the other).

That more than one DNA letter could occupy a given position is a phenomenon called genetic variation and is dealt with by using DNA letters which represent groups of G, A, T, and C:

A or C = M
A or G = R
A or T = W
C or G = S
C or T = Y
G or T = K
C, G or T = B
A, G or T = D
A, C or T = H
A, C or G = V
A, C, G or T = N

so a template or oligo string could look like this

GYACAGTYGACACATTGCAAKTGACCAATTCGCCGAGACAG…

(notice the Y and K in there)

C would be a match with Y (binding strength = 3) and T would be a match with Y (binding strength = 2) but A or G would not match.

NOTE:   Both oligo and template can contain these genetic variation. This requires to match "groups" with each other:

To match "M with R" would be a match of "A with A" as A is the only common letter in the two groups.

To match "M with M" could be "A with A" or "C with C" - regarding the binding problem the strongest match is chosen (e.g. "C with C").

 

The Challenge:

Beginners challenge :

Given a listbox of 500 oligos (length varies between 10-100) find the oligos which bind to a given template (length 10,000) with a given minimum strength and output their names, binding positions, and binding strengths in a second listbox. NOTE: all binding positions are required, not just the strongest one.

Advanced challenge:

Given a listbox of 500 oligos (length varies between 10-100) find the oligos which bind to a given template (length 10,000) with a given minimum strength in a window of a given size and output their names, binding positions, and binding strengths in a second listbox. NOTE: all binding positions are required, not just the strongest one

Explanation: This will bind

         AAGCTTGCATTGCAAATGACTGCCAGC
                |||||||||||||
GATTGCACAGTCGACACATTGCAAATGACCAATTCGCCGAGACAG…
                binding strength: 31

however this with the same binding strength will not

         GTGCACGCTGTTGATACTAGCTACTGC
         ||  || |  |  | |  | | | | |
GATTGCACAGTCGACACATTGCAAATGACCAATTCGCCGAGACAG…
                binding strength: 31

the reason being that stacks of binding letters reinforce each other, whereas a mismatch disturbs the stability of the structure.

The challenge for advanced programmers is therefore to include a "window" of a given size (usually 20 letters) within which the binding strength needs to be achieved.

NOTE: The window might be larger than the length of an oligo - e.g. a window might be of size 20 and the smallest oligo of length 12 might still be able to achieve a binding strength of over 25:

GGCATCGGCGGG
length: 12
binding strength: 34 for perfect match

or it might be smaller than the length of the oligo (usually the window is 20 with the oligos varying in length between 15 and 30).

You can download the contest project and sample data (70 kb) here (call your code from the pushbutton).

Good Luck!

 

Markus

Home | Store | Browse Published Issues | Customer Service | Contribute | xDevLibrary

Site Copyright 2002-2024 by xDev Magazine and DesignWrite.
Xojo is a trademark of Xojo, Inc..