2
Algorithm Running Time
Polynomial { n
n2
n10
1
1
1
10
10 2
1010
10 2
10 4
10 20
103
106
1030
2n 103 1030 10300
Exponential { n!
2
1 106 10150 10 2500
3
Algorithm Complexity
• P = solutions in polynomial deterministic time.
– e.g. dynamic programming
• NP = (non-deterministic polynomial time)
solutions checkable in deterministic polynomial time.
– e.g. RSA code breaking by factoring
• NP-complete = most complex subset of NP
– e.g. traveling all vertices with mileage < x
• NP-hard = optimization versions of above
– e.g. Minimum mileage for traveling all vertices
• Undecidable = no way even with unlimited time & space
NIST UCI
4
– e.g. program halting problem
How to deal with NP-complete
and NP-hard Problems
• Redefine the problem into Class P:
– RNA structure Tertiary => Secondary
– Alignment with arbitrary function=>constant
• Worst-case exponential time:
– Devise exhaustive search algorithms.
– Exhaustive searching + Pruning.
• Polynomial-time close-to-optimal solution:
– Exhaustive searching + Heuristics (Chess)
– Polynomial time approximation algorithms 5
What can biology do for difficult
computation problems
• DNA computing
– A molecule is a small processor,
– Parallel computing for exhaustive searching.
• Genetic algorithms
– Heuristics for finding optimal solution, adaptation
• Neural networks
– Heuristics for finding optimal solution, learning,...
6
Net2: Bio-algorithms
7
Electronic, optical & molecular
nano-computing
Steps: assembly > Input > memory > processor/math > output
10
Molecular nano-computing
11
DNA computing: Is there a Hamiltonian
path through all nodes once?
4
3
1
0 6
2 5
Reverse-Complement Node: 4
3
… 1
3: 5’ CGATAAGCAC GAATTTCGAT 3’ 0 6
2 5
Edges + Nodes => Path (2,3,4):
Edge (2,3) Edge (3,4)
GTATATCCGA GCTATTCGAGCTTAAAGCTA GGCTAGGTAC
CGATAAGCAC GAATTTCGAT
Node 2 Reverse Node 3 Reverse (3’ 5’) Node 4 Reverse
14
DNA Computing Process
•Encode graph into DNA sequences. •Oligonucleotide synthesis
16
Problems of DNA Computing
• Polynomial time but exponential volumes
• A 100 node graph needs >1030 molecules.
• Far slower than a PC.
• Experimental errors:
– mismatch hybridization
– incomplete cleavage
• (Some are non-reusable.)
17
Promises of DNA Computing
• High parallelism
• Operation costs near thermodynamic limit
– 2 vs 34x1019 ops/J (109 for conventional computers)
• Solving one NP-complete problem implies
solving many.
• Possible improvement
– Faster readout techniques (eg. DNA chips).
– Natural selection.
18
A sticker-based model for DNA computation.
Roweis et al. J Comput Biol 1998; 5:615-29 (Pub, JCB)
Unlike previous models, the stickers model has a random access memory that
requires no strand extension and uses no enzymes.
Concerns about molecular computation (Smith, 1996; Hartmanis, 1995; Linial et al.,
1995) are addressed:
1) General-purpose algorithms can be implemented by DNA-based computers
2) Only modest volumes of DNA suffice.
3) [Altering] covalent bonds is not intrinsic to DNA-based computation.
4) Means to reduce errors in the separation operation are addressed in
Karp et al., 1995; Roweis and Winfree, 1999).
19
3SAT
Given n boolean (0 / 1) variables x ( x1 , x 2 ,..., x n ),
and m 3-variable clauses c (c1 ,c2 ,...,cm ),
is c1 c2 ... cm satisfiable for some x?
c1 x1 x3 x7
c2 x1 x 2 x 4
...
cm x1 x m 1 x m
20
DNA Computing for 3SAT
x1 x2 xn
v0 v1 v2 vn
x1 x2 xn
ALGORITHMS:
1. Encode Graph G into DNA sequences.
2. Create all paths from v0 to vn.
3. For every clause
4. Select sequences that satisfy this clause.
5. Report Yes or No. 21
DNA computing on surfaces
23
Logical computation using algorithmic self-
assembly of DNA triple-crossover molecules.
A C A G C GC T A A T A G T T GGT C C G A T A A GC T T G C T C T T A A GT A T GC C T GGT A A C A A A T GA C GA T GC
• • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •
T G T C GC GA T T A T C A A C C A G GC T A T T C GA A C G A GA A T T C A T A C GG A C C A T T G T T T A C T G
C T A C C T GT C C T G C GT T C T A GA C A T C A G GA GT C CA GA
• • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •
GA T GG A CA GG A C GC A A GA T C T GT A GT C C T C A G GT C C
Xba I
T A C G G T GGA G C G GA T A T C GT C C A T G C T C GA G T G A GG A C GA A T C T C A GT A GT G C T G A A G
• • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •
A T GC C A C C T C GC C T A T A G C A G G T A C GA GC T C A C T C C T G C T T A G A GT C A T C A C GA C T T C C G GA A C T
BsrB I EcoR V Xho I
tiles
25
Nanoarray microscopy readout
(vs gel assays)
27
Nano-ElectroMechanical Systems (NEMS)
750 to 1400 nm
gbiotinyl Cys
b-his tags
Ni 80 nm
Meller, et al. (2000) "Rapid nanopore discrimination between single polynucleotide molecules."
PNAS 1079-84. Akeson et al. Microsecond time-scale discrimination among polyC, polyA, and
polyU as homopolymers or as segments within single RNA molecules. Biophys J 1999;77:3227-33 29
poly(dA)100 & poly(dC)100 at 15°C
32
A synthetic SsrA 11-aa 'lite' tags reduce repressor
half-life from > 60 min to ~4 min.
oscillatory
network of
transcriptional
regulators
Insets: normalized
Continuous model Stochastic similar parameters
autocorrelation of the
first repressor
Single
cell
GFP
levels
Elowitz &Leibler, 35
Nature 2000;403:335-8
Internal state sensors
37
Genetic Algorithms (GA)
38
Genetic Operations
Mutation Crossover
…ACCGGTTACGTTGGA… …ACCGGTTTTCGTTGGA…
…CGTACGCCGTTTACCC…
…ACCGGTTGCGTTGGA…
…ACCGGTTTGTTTACCC…
…CGTACGCCTCGTTGGA…
39
SAGA: Sequence Alignment by Genetic Algorithm
[DP: O(2NLN) N sequences length L]
A one point crossover
40
C. Notredame & D. G. Higgins, 1996 (Pub)
SAGA continues
42
Net2: Bio-algorithms
43
Artificial Neural Networks
44
Neural Networks
(ANN)
45
An ORF Classification Example
Optimal Linear Separation (minimum errors)
46
Measuring Exons
Exon1 Exon2 Exon3
Intron1 Intron2
Exon Features {
Donor Site Score,
Acceptor Site Score,
In-frame 2-Codon Score,
Exon Length (log),
Intron Scores,
…… }
47
Linear Discriminate Function
and Single Layer Neural Network
Output
Exon: e=(x1 x2...xd)
y
A linear separator :
d
y ( w i xi ) w0 w0 w1 wd
i 1
Output
y
1
w0 wd f (a )
w1 1 ea
x0 x1 xd
Sigmoid Function
Inputs
d
y f ( wi xi )
i 0
49
Determining Edge Weights from
Training Sets
Given a set of n known exons/nonexons :
(e1 ,t1 ), (e2 ,t 2 ),..., (en ,t n )
Step1 Initialize w
Step2 Sum of squares error function :
n
E (w ) 1
2 { f (e k , w ) t k }
2
k 1
Step3 Updating w j
1 E ( w )
wj wj |w
w j 50
Non-linear Discrimination
x2
x1
51
The Multi-Layer Perceptron
y
3
Output y g ( wi( 2) zi )
i 1
d
Hidden z1 z2 z3
z j f ( w xi )(1)
ji
Layer i 0
Inputs x0 x1 xd
54