Anda di halaman 1dari 115

Biologi Matematika

Kuliah mencatat untuk MATEMATIKA 4333

Jeffrey R. Chasnov

T dia H ong K ong U niversity dari


S cience dan T echnology
Hong Kong Universitas Sains dan Teknologi
Departemen Matematika Clear Water
Bay, Kowloon
Hongkong

Copyright c ○ 2009-2016 oleh Jeffrey Robert Chasnov

Karya ini dilisensikan di bawah Creative Commons Attribution 3.0 Hong Kong. Untuk melihat salinan lisensi ini, kunjungi
http://creativecommons.org/licenses/by/3.0/hk/ atau mengirim surat ke Creative Commons, 171 Second Street, Suite 300, San
Francisco, California, 94105, USA.
Kata pengantar
Apa berikut adalah catatan kuliah saya untuk Math 4333: Biologi Matematika, mengajar di Hong Kong
Universitas Sains dan Teknologi. Hal ini diterapkan matematika saja terutama untuk matematika tahun fi nal utama
dan mahasiswa minor. siswa lainnya juga dipersilahkan untuk mendaftar, tetapi harus memiliki keterampilan
matematika yang diperlukan.

Penekanan utama saya adalah pada pemodelan matematika, biologi tunggal daerah tion applica-. Saya berasumsi
bahwa siswa tidak memiliki pengetahuan biologi, tetapi saya berharap bahwa mereka akan belajar sejumlah besar selama
kursus. Siswa diminta untuk mengetahui persamaan diferensial dan linear aljabar, dan ini biasanya berarti setelah
mengambil dua program dalam mata pelajaran ini. Saya juga menyentuh pada topik dalam pemodelan stokastik, yang re-
quires beberapa pengetahuan tentang probabilitas. Sebuah kursus penuh pada probabilitas, bagaimanapun, tidak
prasyarat meskipun mungkin membantu.

Biologi, seperti yang biasanya diajarkan, membutuhkan menghafal berbagai pilihan fakta dan mengingat
mereka untuk ujian, kadang-kadang melupakan mereka segera setelah. Untuk siswa terkena biologi di sekolah
menengah, tentu saja saya mungkin tampak seperti ject sub yang berbeda. Kemampuan untuk model masalah
menggunakan matematika hampir tidak memerlukan menghafal, tetapi tidak memerlukan pemahaman yang
mendalam tentang prinsip-prinsip dasar dan berbagai teknik matematika. Biologi menawarkan berbagai topik yang
setuju untuk pemodelan matematika, dan saya telah memilih spesifik topik yang saya temukan untuk menjadi yang
paling menarik.

Jika, sebagai mahasiswa UST, Anda belum memutuskan jika Anda akan mengambil kursus saya, silakan isi
ceramah ini catatan untuk melihat apakah Anda tertarik pada topik ini. peselancar web lain dipersilakan untuk
men-download catatan ini dari
http://www.math.ust.hk/~machas/mathematical-biology.pdf
dan untuk menggunakannya secara bebas untuk mengajar dan belajar. Saya menyambut komentar, tions sugges-, atau
koreksi dikirim ke saya melalui email ( jeffrey.chasnov@ust.hk ). Meskipun sebagian besar materi dalam catatan saya dapat
ditemukan di tempat lain, saya berharap bahwa beberapa dari itu akan dianggap asli.

J effrey R. C hasnov
Hongkong
Mei 2009

aku aku aku


Isi
1 Dinamika Populasi 1
1.1 Model pertumbuhan Malthus . . . . . . . . . . . . . . . . . . . . . . .1

1.2 Logistik persamaan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2


1.3 Sebuah model kompetisi spesies . . . . . . . . . . . . . . . . . . . . . . . 6
1.4 Lotka-Volterra Model predator-mangsa . . . . . . . . . . . . . . . . . . 7

2 Populasi Age-terstruktur 15
2,1 kelinci Fibonacci . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.2 Rasio emas Φ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.3 Angka-angka Fibonacci di matahari fl ower . . . . . . . . . . . . . . . . . . . 18
2.4 Kelinci adalah penduduk usia-terstruktur . . . . . . . . . . . . . . . . . 21
2,5 populasi usia terstruktur Discrete . . . . . . . . . . . . . . . . . . . . 22
2,6 populasi usia terstruktur berkelanjutan . . . . . . . . . . . . . . . . . . 25
2.7 Ukuran induk dari cacing hermafrodit . . . . . . . . . . . . . . . . 28

3 Pertumbuhan Stochastic Penduduk 35


3.1 Sebuah model stokastik pertumbuhan penduduk . . . . . . . . . . . . . . . . . 35
3.2 asymptotics populasi awal yang besar . . . . . . . . . . . . . . . . . . 38
3.2.1 Penurunan model deterministik . . . . . . . . . . . . . . . 40
3.2.2 Penurunan distribusi probabilitas normal . . . . . . . . 42
3.3 Simulasi pertumbuhan penduduk . . . . . . . . . . . . . . . . . . . . . . 45

4 Penyakit Infeksi Modeling 49


4.1 Model SI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.2 Model SIS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
4.3 SIR Model penyakit epidemi . . . . . . . . . . . . . . . . . . . . . . 51
4.4 Vaksinasi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4,5 The SIR Model penyakit endemik . . . . . . . . . . . . . . . . . . . . . . . 54
4.6 Evolusi virulensi . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

5 Genetika Populasi 59
5.1 genetika haploid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
5.1.1 Penyebaran alel disukai . . . . . . . . . . . . . . . . . . . . . . 61
keseimbangan 5.1.2 Mutasi seleksi . . . . . . . . . . . . . . . . . . . . . 62
5.2 genetika diploid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
5.2.1 Reproduksi seksual . . . . . . . . . . . . . . . . . . . . . . . . . 64
5.2.2 Penyebaran alel disukai . . . . . . . . . . . . . . . . . . . . . . 67
keseimbangan 5.2.3 Mutasi seleksi . . . . . . . . . . . . . . . . . . . . . 69
5.2.4 Heterosis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
seleksi 5.3 Frekuensi tergantung . . . . . . . . . . . . . . . . . . . . . . . 72
kesetimbangan 5.4 Linkage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
5,5 penyimpangan genetik secara acak . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

v
ISI

6 Reaksi Biokimia 85
6.1 hukum aksi massa . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
6.2 kinetika enzim . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
6.3 penghambatan kompetitif . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
6.4 penghambatan alosterik . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
6,5 kooperatititas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94

7 Urutan Keselarasan 99
7.1 DNA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
7.2 Brute kekuatan keselarasan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
7.3 pemrograman dinamis . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
7.4 Kesenjangan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
7,5 keberpihakan lokal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
7,6 Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108

vi ISI
Bab 1

Dinamika populasi
Populasi tumbuh dalam ukuran ketika angka kelahiran melebihi angka kematian. Thomas Malthus, di Sebuah
Esai tentang Prinsip Kependudukan ( 1798), digunakan pertumbuhan penduduk tidak terkendali untuk terkenal
memprediksi kelaparan global yang kecuali pemerintah diatur keluarga ukuran-ide kemudian dikumandangkan oleh
kebijakan satu anak Cina daratan. Pembacaan Malthus dikatakan oleh Charles Darwin dalam otobiografinya telah
mengilhami covery dis-nya yang sekarang landasan biologi modern: prinsip evolusi melalui seleksi alam.

Model pertumbuhan Malthusian adalah kakek dari semua model populasi, dan kita mulai bab ini dengan
derivasi sederhana dari hukum pertumbuhan yang terkenal eksponensial. pertumbuhan eksponensial dicentang
jelas tidak terjadi di alam, dan tingkat pertumbuhan modulasi pop dapat diatur oleh makanan yang terbatas atau
sumber re- lingkungan lainnya, dan dengan persaingan antar individu dalam suatu spesies atau seluruh spesies.
Kami akan mengembangkan model untuk tiga jenis peraturan. Model pertama adalah persamaan logistik terkenal,
model yang juga akan membuat sebuah penampilan di bab-bab berikutnya. Model kedua adalah perluasan dari
model logistik untuk spesies compe- tition. Dan model ketiga adalah Lotka-Volterra persamaan predator-mangsa
terkenal. Karena semua ini model matematika adalah persamaan diferensial nonlinear,

1.1 Model pertumbuhan Malthus

Membiarkan N (t) menjadi jumlah individu dalam suatu populasi pada saat t, dan biarkan b dan d menjadi rata-rata angka
kelahiran kapita dan tingkat kematian per masing-masing. Dalam waktu singkat Δ t, jumlah kelahiran dalam populasi adalah b Δ tN,
dan jumlah kematian adalah d Δ tN. Persamaan untuk N pada waktu t + Δ t kemudian bertekad untuk menjadi

N (t + Δ t) = N (t) + b Δ tN (t) - d Δ tN (t),

yang dapat disusun kembali ke

N (t + Δ t) - N (t)
= ( b - d) N (t);
Δt

dan sebagai Δ t → 0,

dN dt = (b - d) N.

Dengan ukuran populasi awal N 0, dan dengan r = b - d positif, solusi untuk


N = N (t) tumbuh secara eksponensial:

N (t) = N 0 e rt.

Dengan ukuran populasi digantikan oleh jumlah uang di bank, hukum pertumbuhan eksponensial juga menjelaskan
pertumbuhan rekening di bawah peracikan terus menerus dengan tingkat bunga r.

1
1.2. THE LOGISTIK PERSAMAAN

1.2 Logistik persamaan

Hukum pertumbuhan eksponensial untuk ukuran populasi tidak realistis lebih lama kali. Bahkan-tually, pertumbuhan
akan diperiksa oleh over-konsumsi sumber daya. Kami berasumsi bahwa lingkungan memiliki daya dukung intrinsik K,
dan populasi lebih besar dari pengalaman ukuran ini meningkat tingkat kematian.

Untuk model pertumbuhan penduduk dengan daya dukung lingkungan K, kita mencari persamaan nonlinear
dalam bentuk

dN dt = RNF (N),

dimana F (N) menyediakan model untuk regulasi lingkungan. Fungsi ini harus memenuhi F ( 0) = 1 (populasi tumbuh
secara eksponensial dengan tingkat pertumbuhan r kapan N
kecil), F (K) = 0 (populasi berhenti tumbuh pada daya dukung), dan
F (N) < 0 ketika N> K ( populasi meluruh ketika lebih besar dari daya dukung). Fungsi yang paling sederhana F (N) memuaskan
kondisi ini adalah linear dan diberikan oleh F (N) = 1 - N / K. Model yang dihasilkan adalah persamaan logistik
terkenal,

dN dt = rn ( 1 - N / K),
(1.1)

model penting untuk banyak proses selain pertumbuhan penduduk dibatasi.


meskipun ( 1.1 ) Adalah persamaan nonlinear, solusi analitis dapat ditemukan dengan
memisahkan variabel. Sebelum kita memulai aljabar ini, kita terlebih dahulu menggambarkan beberapa konsep dasar yang
digunakan dalam menganalisis persamaan diferensial nonlinear.
titik tetap, juga disebut kesetimbangan, dari persamaan diferensial seperti ( 1.1 ) adalah
didefinisikan sebagai nilai-nilai N dimana dN / dt = 0. Di sini, kita melihat bahwa fi xed poin dari ( 1.1 ) adalah N = 0
dan N = K. Jika nilai awal N adalah pada salah satu poin yang tetap fi, maka N akan tetap fi xed ada untuk semua
waktu. titik tetap, bagaimanapun, dapat stabil atau tidak stabil. Sebuah titik yang tetap stabil jika gangguan kecil dari
titik yang tetap meluruh ke nol sehingga solusi kembali ke titik yang tetap. Demikian juga, titik yang tetap tidak stabil
jika gangguan kecil tumbuh secara eksponensial sehingga solusi bergerak menjauh dari titik yang tetap.
Perhitungan stabilitas dengan cara gangguan kecil disebut analisis stabilitas linear. Sebagai contoh, perhatikan
umum satu dimensi persamaan diferensial (menggunakan ˙ notasi

x = dx / dt)

x˙ = f (x), (1.2)

dengan x * titik yang tetap dari persamaan, yaitu f (x *) = 0. Untuk menentukan analitis jika x * adalah titik yang tetap stabil
atau tidak stabil, kita mengacaukan solusinya. Mari kita menulis solusi kami x = x (t) dalam bentuk

x (t) = x * + e ( t), (1.3)

di mana awalnya e ( 0) kecil tapi berbeda dari nol. mengganti ( 1.3 ) Ke ( 1.2 ), Kita memperoleh

ė = f (x * + e)
e

= f (x *) + e f '( x *) + . . .

= e f '( x *) + . . . .

di mana kesetaraan kedua menggunakan ekspansi deret Taylor dari f (x) tentang x * dan kesetaraan ketiga menggunakan f (x *)
= 0. Jika f '( x *) 6 = 0, kita bisa mengabaikan hal tingkat tinggi di e

2 BAB 1. DINAMIKA PENDUDUK


1.2. THE LOGISTIK PERSAMAAN

labil stabil tidak stabil

0
f(x)

Gambar 1.1: Menentukan stabilitas satu-dimensi menggunakan pendekatan grafis.

untuk kali kecil, dan mengintegrasikan kita memiliki

e ( t) = e ( 0) e f '( x *) t.

perturbasi e ( t) ke titik yang tetap x * pergi ke nol sebagai t → ∞ disediakan f '( x *) <
0. Oleh karena itu, kondisi stabilitas di x * aku s
{ titik yang tetap stabil jika
f '( x *) < 0,
x* aku s
titik yang tetap stabil jika f '( x *) > 0.

setara tapi kadang-kadang sederhana pendekatan lain untuk menganalisis stabilitas fi xed poin dari persamaan
nonlinear satu dimensi seperti ( 1.2 ) Adalah plot
f (x) melawan x. Kami menunjukkan contoh generik pada Gambar. 1.1 . The fi poin yang tetap adalah x-penyadapan
dari grafik. arah panah pada sumbu x dapat ditarik berdasarkan tanda f (x). Jika f (x) < 0, maka panah poin ke kiri; jika f
(x) > 0, maka panah menunjuk ke kanan. Panah menunjukkan arah gerakan untuk sebuah partikel pada posisi x memuaskan
˙
x = f (x). Seperti diilustrasikan pada Gambar. 1.1 , Fi xed poin dengan panah
di kedua sisi menunjuk stabil, dan fi xed poin dengan panah di kedua sisi menunjukkan tidak stabil.

Dalam persamaan logistik ( 1.1 ), Fi xed poin N * = 0, K. Sebuah sketsa F (N) = rn ( 1 - N / K) melawan N, dengan r,
K> 0 pada Gambar. 1.2 segera menunjukkan bahwa N * = 0 adalah titik yang tetap stabil dan N * = K adalah titik yang
tetap stabil. Pendekatan analitis menghitung F '( N) = r ( 1 - 2 N / K), yang seperti itu F '( 0) = r> 0 dan F '( K) = - r < 0. Sekali
lagi kami menyimpulkan bahwa N * = 0 tidak stabil dan N * = K stabil.

Kita sekarang menyelesaikan persamaan logistik analitis. Meskipun persamaan ple relatif sim- ini dapat
diselesaikan seperti, pertama kita nondimensionalize untuk menggambarkan teknik ini sangat penting yang nantinya
akan terbukti paling berguna. Mungkin di sini satu bisa menebak unit yang tepat waktu untuk menjadi 1 / r dan unit
sesuai ukuran populasi menjadi K. Namun, kami lebih memilih untuk menunjukkan teknik yang lebih umum yang
mungkin berguna diterapkan untuk persamaan yang variabel berdimensi yang tepat sulit untuk menebak. Kita mulai
dengan nondimensionalizing waktu dan ukuran populasi:

τ = t / t *, η = N / N *,

BAB 1. DINAMIKA PENDUDUK 3


1.2. THE LOGISTIK PERSAMAAN

F(N)

labil stabil

0 K
N

Gambar 1.2: Menentukan stabilitas fi xed poin dari persamaan logistik.

dimana t * dan N * adalah unit dimensi yang tidak diketahui. Turunan ˙ N dihitung sebagai

dN dt = d (N * η) dτ dη
dτ dt = N * t * d τ.

Oleh karena itu, persamaan logistik ( 1.1 ) menjadi


( )

1- N*η .
d τ = rt * η K

yang mengasumsikan bentuk yang paling sederhana dengan pilihan t * = 1 / r dan N * = K. Oleh karena itu, variabel
berdimensi kami

τ = rt, η = N / K,

dan persamaan logistik, dalam bentuk berdimensi, menjadi


(1.4)
d τ = η ( 1 - η) .

dengan kondisi awal berdimensi η ( 0) = η 0 = N 0 / K, dimana N 0 adalah ukuran populasi awal. Perhatikan bahwa
berdimensi persamaan logistik ( 1.4 ) Tidak memiliki parameter bebas, sedangkan bentuk dimensi dari persamaan ( 1.1
) mengandung r dan K.
Pengurangan jumlah parameter bebas (di sini, dua: r dan K) dengan jumlah unit independen (di sini, juga dua: waktu
dan ukuran populasi) adalah fitur umum nondimensionalization. Hasil teoritis dikenal sebagai Buckingham Pi
Teorema. Mengurangi jumlah parameter bebas dalam masalah ke minimum absolut sangat penting sebelum
melanjutkan ke solusi numerik. Ruang parameter yang harus dieksplorasi dapat dikurangi secara substansial.

Memecahkan persamaan logistik berdimensi ( 1.4 ) Dapat melanjutkan dengan memisahkan


variabel. Memisahkan dan mengintegrasikan dari τ = 0 τ dan η 0 untuk η hasil
∫η ∫τ
d η η ( 1 - η)
=
η0 0d τ.

4 BAB 1. DINAMIKA PENDUDUK


1.2. THE LOGISTIK PERSAMAAN

Integral di kiri-sisi dapat dilakukan dengan menggunakan metode pecahan parsial:

1
η ( 1 - η) = SEBUAH η+B 1-η

= A + (B - SEBUAH) η ;
η ( 1 - η)

dan dengan menyamakan koefisien koefisien pembilang sebanding dengan η 0 dan η 1, kita temukan bahwa A = 1 dan B = 1.
Oleh karena itu,

∫η ∫η ∫η
d η η ( 1 - η) dηη+ dη
=
η0 η0 η0 ( 1 - η)

= ln η
η 0 - ln 1 - η1 - η 0

= ln η ( 1 - η 0)
η 0 ( 1 - η)

= τ.

pemecahan untuk η, kami terlebih dahulu exponentiate kedua sisi dan kemudian mengisolasi η:

η ( 1 - η 0)
η 0 ( 1 - η) = e τ, atau η ( 1 - η 0) = η 0 e τ - ηη 0 e τ,
η0
atau η ( 1 - η 0 + η 0 e τ) = η 0 e τ, atau η =
η 0 + ( 1 - η 0) e - τ.

Kembali ke variabel dimensi, kita akhirnya memiliki

N0
N (t) = (1.5)
N 0 / K + ( 1 - N 0 / K) e - rt.

Ada beberapa cara untuk menulis hasil fi nal diberikan oleh ( 1,5 ). Presentasi dari hasil matematika membutuhkan
rasa estetika yang baik dan merupakan elemen penting dari teknik matematika. Ketika memutuskan bagaimana
menulis ( 1,5 ), Saya menganggap jika itu mudah untuk mengamati hasil membatasi berikut: (1) N ( 0) = N 0; ( 2) lim t → ∞ N
(t) = K;
dan (3) lim K → ∞ N (t) = N 0 exp ( rt).
Pada Gambar. 1.3 , Kita plot solusi untuk persamaan logistik berdimensi untuk awal
kondisi η 0 = 0,02, 0,2, 0,5, 0,8, 1,0, dan 1,2. Kurva terendah adalah karakter-istic 'S-bentuk' biasanya berhubungan
dengan solusi dari persamaan logistik. Kurva sigmoidal ini muncul dalam banyak jenis model. MATLAB script untuk
menghasilkan Gambar. 1.3 ditunjukkan di bawah ini.

eta0 = [0,02 0,2 ​0,5 0,8 1 1,2]; tau = linspace


(0,8); untuk i = 1: length (eta0)

eta = eta0 (i) ./ (eta0 (i) + (1-eta0 (i)) * exp (-tau).); plot (tau, eta); berpegang
pada akhir

axis ([0 8 0 1,25]);


xlabel ( '\ tau'); ylabel ( '\ eta'); judul ( 'Persamaan Logistik');

BAB 1. DINAMIKA PENDUDUK 5


1.3. MODEL JENIS PERSAINGAN

logistik Persamaan
1.4

1.2

0.8 1
η

0,6

0,4

0,2

0
0 2 4 6 8
τ

Gambar 1.3: Solusi dari persamaan logistik berdimensi.

1.3 Sebuah model kompetisi spesies

Misalkan dua spesies bersaing untuk sumber daya yang sama. Untuk membangun sebuah model, kita bisa mulai dengan
persamaan logistik untuk kedua spesies. spesies yang berbeda akan memiliki tingkat pertumbuhan yang berbeda dan daya
dukung yang berbeda. Jika kita membiarkan N 1 dan N 2 menjadi jumlah individu spesies satu dan spesies dua, maka

dN 1
dt = r 1 N 1 ( 1 - N 1 / K 1),
dN 2
dt = r 2 N 2 ( 1 - N 2 / K 2).

Ini adalah persamaan uncoupled sehingga asimtotik, N 1 → K 1 dan N 2 → K 2.


Bagaimana kita model kompetisi antara spesies? Jika N 1 jauh lebih kecil dari
K 1, dan N 2 jauh lebih kecil dari K 2, maka sumber daya yang berlimpah dan populasi tumbuh dengan pesat dengan
tingkat pertumbuhan r 1 dan r 2. Jika spesies satu dan dua bersaing, maka pertumbuhan spesies satu mengurangi
sumber daya yang tersedia untuk spesies dua, dan wakil versa. Karena kita tidak tahu spesies dampak satu dan dua
terhadap satu sama lain, kami memperkenalkan dua parameter tambahan untuk model kompetisi. Sebuah modi
wajar fi kasi bahwa pasangan dua persamaan logistik adalah

( )
dN 1
1 - N 1 + α 12 N 2 . (1.6a)
dt = r 1 N 1 K1
( )
dN 2
1 - α 21 N 1 + N 2 . (1.6b)
dt = r 2 N 2 K2

dimana α 12 dan α 21 adalah parameter berdimensi yang model konsumsi spesies sumber daya seseorang dengan
spesies dua, dan sebaliknya. Misalnya, bahwa kedua spesies makan makanan yang sama, tetapi spesies dua
mengkonsumsi dua kali lipat spesies satu. Sejak satu individu dari spesies dua mengkonsumsi setara dengan dua
individu dari spesies satu, model yang benar adalah α 12 = 2 dan α 21 = 1/2.

6 BAB 1. DINAMIKA PENDUDUK


1.4. THE LOTKA-Volterra MODEL PREDATOR-MANGSA

Contoh lain beranggapan bahwa spesies satu dan dua menempati niche yang sama, mengkonsumsi sumber daya pada
tingkat yang sama, tetapi mungkin memiliki tingkat pertumbuhan yang berbeda dan kapasitas rying mobil-. Dapat spesies
hidup berdampingan, atau apakah satu spesies akhirnya mendorong yang lain untuk kepunahan? Hal ini dimungkinkan untuk
menjawab pertanyaan ini tanpa benar-benar memecahkan persamaan diferensial. Dengan α 12 = α 21 = 1 yang sesuai untuk
contoh ini, persamaan logistik digabungkan ( 1,6 ) menjadi

( ) ( )
dN 1 dN 2
1- N1+N2 . 1- N1+N2 . (1.7)
dt = r 1 N 1 K1 dt = r 2 N 2 K2

Demi argumen, kita asumsikan bahwa K 1> K 2. Satu-satunya fi xed poin lain dari yang sepele ( N 1, N 2) = ( 0, 0) adalah ( N 1,
N 2) = ( K 1, 0) dan ( N 1, N 2) = ( 0, K 2). Stabilitas dapat dihitung secara analitis oleh ekspansi Taylor-seri dua dimensi, tapi
di sini argumen sederhana dapat SUF kantor. Kami terlebih dahulu mempertimbangkan ( N 1, N 2) = ( K 1, e), dengan e

kecil. Sejak K 1> K 2, mengamati dari ( 1,7 ) Yang ˙ N 2 < 0 sehingga spesies dua punah.
Oleh karena itu ( N 1, N 2) = ( K 1, 0) adalah titik yang tetap stabil. Sekarang mempertimbangkan ( N 1, N 2) =
( e, K 2), dengan e kecil. Sekali lagi, karena K 1> K 2, mengamati dari ( 1,7 ) Yang ˙ N 1> 0 dan
spesies satu meningkat jumlahnya. Oleh karena itu, ( N 1, N 2) = ( 0, K 2) adalah titik yang tetap stabil. Kami telah
demikian menemukan bahwa, dalam model logistik ditambah kami, spesies yang menempati niche yang sama dan
mengkonsumsi sumber daya pada tingkat yang sama tidak dapat hidup berdampingan dan bahwa spesies dengan
daya dukung terbesar akan bertahan dan mendorong spesies lain punah. Ini adalah apa yang disebut prinsip pengecualian
kompetitif, disebut juga K- seleksi sejak spesies dengan terbesar kemenangan daya dukung. Bahkan, ahli ekologi
juga berbicara tentang r pilihan; yaitu, spesies dengan kemenangan laju pertumbuhan terbesar. Model logistik
ditambah kami tidak Model r seleksi, menunjukkan potensi keterbatasan dari model matematika terlalu sederhana.

Untuk beberapa nilai-nilai α 12 dan α 21, model kami mengakui solusi ekuilibrium yang stabil di mana dua spesies
hidup berdampingan. Perhitungan fi xed poin dan stabilitas mereka lebih rumit dari perhitungan hanya dilakukan,
dan saya hanya menyajikan hasil. Koeksistensi stabil dua spesies dalam model kami adalah mungkin hanya jika α 12 K
2<K1
dan α 21 K 1 < K 2.

1.4 Lotka-Volterra Model predator-mangsa

catatan Pelt-trading (Gambar. 1.4 ) Dari perusahaan Teluk Hudson dari lebih hampir tampilan abad osilasi periodik
dekat-in jumlah kelinci snowshoe terjebak dan lynxes. Dengan asumsi yang masuk akal bahwa jumlah tercatat
hewan terjebak sebanding dengan populasi hewan, catatan ini menunjukkan bahwa populasi-sebagai
predator-mangsa fi typi ed oleh kelinci dan lynx-dapat berosilasi dari waktu ke waktu. Lotka dan Volterra
independen diusulkan pada tahun 1920 model matematis untuk dinamika populasi predator dan mangsa, dan
Lotka-Volterra persamaan predator-mangsa ini sejak menjadi model ikonik dari ogy biol- matematika.

Untuk mengembangkan persamaan ini, anggaplah bahwa populasi predator feed pada populasi mangsa. Kami
berasumsi bahwa jumlah mangsa tumbuh secara eksponensial dengan tidak adanya predator (ada makanan
terbatas tersedia untuk mangsa), dan bahwa jumlah predator pembusukan secara eksponensial dalam ketiadaan
mangsa (predator harus makan mangsa atau kelaparan). Kontak antara predator dan mangsa meningkatkan jumlah
predator dan mengurangi jumlah mangsa.

Membiarkan U (t) dan V (t) menjadi jumlah mangsa dan predator pada waktu t. Untuk mengembangkan model persamaan
diferensial ditambah, kita mempertimbangkan populasi ukuran pada waktu t + Δ t.

BAB 1. DINAMIKA PENDUDUK 7


1.4. THE LOTKA-Volterra MODEL PREDATOR-MANGSA

halaman 1 dari 1

Gambar 1.4: Pelt-perdagangan catatan Perusahaan Teluk Hudson untuk kelinci snowshoe dan predatornya lynx.
[Dari EP Odum, Dasar-dasar Ekologi, 1953.]

pertumbuhan eksponensial dari mangsa dengan tidak adanya predator dan peluruhan eksponensial dari predator
File: // C: \ Documents and Settings \ macho \ Local Settings \ Temp \ snowshoe-lynx.gif dengan
2010/05/13

tidak adanya mangsa dapat dimodelkan oleh ketentuan linear biasa. The cou- pling antara mangsa dan pemangsa harus
dimodelkan dengan dua parameter tambahan. Kami menulis populasi ukuran pada waktu t + Δ t sebagai

U (t + Δ t) = U (t) + α Δ tU (t) - γ Δ tU (t) V (t), V (t + Δ t) = V (t) + e γ Δ tU


(t) V (t) - β Δ tV (t).

parameter α dan β adalah rata-rata per kapita angka kelahiran mangsa dan deathrate dari predator, dengan tidak
adanya spesies lain. Istilah kopling Model kontak antara predator dan mangsa. parameter γ adalah fraksi mangsa
tertangkap per predator per satuan waktu; total jumlah mangsa tertangkap oleh predator selama waktu Δ t aku s γ Δ TUV.
Mangsa dimakan kemudian diubah menjadi predator yang baru lahir (melihat ini sebagai konversi biomassa),
dengan faktor konversi e, sehingga jumlah predator selama waktu Δ t meningkat e γ Δ TUV.

Konversi persamaan ini menjadi persamaan diferensial dengan membiarkan Δ t → 0, kita


memperoleh terkenal Lotka-Volterra persamaan predator-mangsa

= DU dt α U - γ UV, dV dt = e γ UV - β V.
(1.8)

Sebelum menganalisis persamaan Lotka-Volterra, kita terlebih dahulu meninjau titik yang tetap dan analisis
stabilitas linier diterapkan untuk apa yang disebut sistem otonomi persamaan ferential dif-. Untuk mempermudah, kita
mempertimbangkan sistem hanya dua persamaan diferensial dalam bentuk

x˙ = f (x, y), y˙ = g (x, y), (1.9)

meskipun hasil kami dapat digeneralisasi untuk sistem yang lebih besar. Sistem yang diberikan oleh ( 1,9 ) Dikatakan
otonom sejak f dan g tidak bergantung secara eksplisit pada variabel independen t. titik tetap dari sistem ini
ditentukan dengan menetapkan ˙ x=˙ y = 0 dan
pemecahan untuk x dan y. Misalkan satu fi titik yang tetap adalah ( x *, y *). Untuk menentukan stabilitas linier, kita
mempertimbangkan kondisi awal untuk ( x, y) dekat titik yang tetap dengan

8 BAB 1. DINAMIKA PENDUDUK


1.4. THE LOTKA-Volterra MODEL PREDATOR-MANGSA

gangguan independen kecil di kedua arah, yaitu, x ( 0) = x * + e ( 0), y ( 0) =


y * + δ ( 0). Jika gangguan awal tumbuh dalam waktu, kita katakan bahwa titik yang tetap tidak stabil; jika meluruh, kita
katakan bahwa titik yang tetap stabil. Dengan demikian, kita membiarkan

x (t) = x * + e ( t), y (t) = y * + δ ( t), (1.10)

dan pengganti ( 1.10 ) Ke ( 1,9 ) Untuk menentukan waktu-ketergantungan e dan δ. Sejak


x * dan y * adalah konstanta, kita memiliki

ė = f (x * + e, y * + δ),
e δ̇ = g (x * + e, y * + δ).
δ

Analisis stabilitas linear hasil dengan mengasumsikan bahwa gangguan awal e ( 0) dan δ ( 0) cukup kecil untuk
memotong ekspansi dua dimensi Taylor-seri f dan g tentang e = δ = 0 terlebih dahulu-order di e dan δ. Perhatikan
bahwa secara umum, seri Taylor dua dimensi dari suatu fungsi F (x, y) tentang asal diberikan oleh

F (x, y) = F ( 0, 0) + XF x ( 0, 0) + YF y ( 0, 0)
[ x 2 F xx ( 0, 0) + 2 xyF xy ( 0, 0) + y 2 F Y y( 0, 0) ]
1
+ + ....
2

di mana istilah dalam ekspansi dapat diingat dengan mewajibkan bahwa semua turunan parsial dari seri setuju
dengan itu dari F (x, y) pada titik asal. Kami sekarang Taylor-series memperluas f (x * + e, y * + δ) dan g (x * + e, y * + δ) tentang
( e, δ) = ( 0, 0). Istilah konstan lenyap sejak ( x *, y *) adalah titik yang tetap, dan kita mengabaikan semua hal dengan
perintah lebih tinggi dari e dan δ. Karena itu,

ė = e f x ( x *, y *) + δ f y ( x *, y *),
e δ̇ = e g x ( x *, y *) + δ g y ( x *, y *),
δ

yang dapat ditulis dalam bentuk matriks sebagai

( e δ) (f* )(eδ )
d x
f y*
= . (1.11)
dt g*xg* y

dimana f * x = f x ( x *, y *), dll Persamaan ( 1.11 ) Adalah sistem linear ode ini, dan yang
solusi hasil dengan mengasumsikan bentuk
( e δ)
= e λ t v. (1.12)

Setelah substitusi ( 1.12 ) Ke ( 1.11 ), Dan membatalkan e λ t, kita memperoleh alge- masalah bra eigenvalue linier

(f* )
x
f y*
J * v = λ v, dengan J * = .
g*xg* y

dimana λ adalah eigenvalue, v yang sesuai eigenvector, dan J * matriks Jacobian dievaluasi pada titik yang tetap.
eigen ditentukan dari persamaan teristic charac-

det (J * - λ I) = 0,

yang untuk dua-dua hasil matriks Jacobian dalam persamaan kuadrat untuk λ. Dari bentuk solusi ( 1.12 ), Titik yang
tetap stabil jika untuk semua eigen λ,
Kembali{ λ} < 0, dan tidak stabil jika untuk setidaknya satu λ, Kembali{ λ} > 0. Berikut Re { λ} berarti bagian nyata dari
(mungkin) eigen kompleks λ.

BAB 1. DINAMIKA PENDUDUK 9


1.4. THE LOTKA-Volterra MODEL PREDATOR-MANGSA

Kita sekarang mempertimbangkan kembali persamaan Lotka-Volterra. solusi titik tetap ditemukan dengan memecahkan ˙
U = ˙ V = 0, dan kami memiliki dari ( 1.8 )

U ( α - γ V) = 0, Ve γ U - β) = 0.

Terbukti hanya dua solusi yang mungkin adalah

( U *, V *) = ( 0, 0) atau ( β
e γ, αγ).

The sepele yang tetap titik (0, 0) tidak stabil karena populasi mangsa tumbuh exponen- tially jika awalnya kecil.
Untuk menentukan stabilitas titik yang tetap kedua, kami menulis persamaan Lotka-Volterra dalam bentuk

dU dt = F (U, V), dV dt = G (U, V),

dengan

F (U, V) = α U - γ UV, G (U, V) = e γ UV - β V.

Turunan parsial kemudian dihitung menjadi

F U = α - γ V, FV= -γU
G U = e γ V, G V = e γ U - β.

Jacobian pada titik yang tetap ( U *, V *) = ( β / e γ, α / γ) aku s


( 0 - β / ee α )
J*= ;
0

dan
|||| - λ - β / ee α -||||λ

det (J * - λ I) =

= λ 2 + αβ
=0

memiliki solusi λ ± = ± saya √ αβ, yang murni khayalan. Ketika eigen dari dua-dua Jacobian murni imajiner, titik yang
tetap disebut pusat dan gangguan tidak tumbuh atau meluruh, tapi berosilasi. Di sini, frekuensi sudut osilasi adalah ω
= √ αβ, dan periode osilasi adalah 2 π / ω.

kita plot U dan V melawan t ( time series plot yang), dan V melawan U ( ruang fase dia- gram) untuk melihat
bagaimana solusi berperilaku. Untuk sistem nonlinear persamaan seperti ( 1.8 ), Solusi numerik diperlukan.

Persamaan Lotka-Volterra memiliki empat parameter bebas α, β, γ dan e. Unit yang relevan di sini adalah
waktu, jumlah mangsa, dan jumlah predator. The Bucking- ham Pi Teorema memprediksi bahwa
nondimensionalizing persamaan dapat mengurangi jumlah parameter bebas oleh tiga untuk pengelompokan
berdimensi tunggal dikelola parameter. Kami memilih untuk nondimensionalize waktu menggunakan frekuensi sudut
osilasi dan jumlah predator mangsa dan menggunakan nilai titik yang tetap mereka. Dengan tanda sisipan yang
menunjukkan variabel berdimensi, kita membiarkan

tˆ = √ αβ t, Û = U / U * = e γ
U Vˆ = V / V * = γ (1.13)
β U, α V.

10 BAB 1. DINAMIKA PENDUDUK


1.4. THE LOTKA-Volterra MODEL PREDATOR-MANGSA

Substitusi ( 1.13 ) Ke dalam persamaan Lotka-Volterra ( 1.8 ) Hasil dalam persamaan sionless dimen-

d ˆU V
U - ˆ U ˆV), d ˆ U ˆV - ˆ V),
d tt̂ˆ = r ( ˆ d tt̂ˆ = 1 r(ˆ

dengan pengelompokan berdimensi tunggal r = √ α / β. Spesifik fi kasi r bersama-sama dengan kondisi awal
benar-benar menentukan solusinya. Perlu dicatat di sini bahwa solusi lama dari persamaan Lotka-Volterra
tergantung pada tions menderita penyakit awal. ketergantungan ini asymptotic pada kondisi awal biasanya
dianggap sebagai aw fl model.

Sebuah solusi numerik menggunakan ode45.m built-in fungsi MATLAB untuk mengintegrasikan persamaan
diferensial. Kode di bawah menghasilkan Fig. 1,5 . Perhatikan bagaimana populasi predator tertinggal populasi
mangsa: peningkatan hasil nomor mangsa dalam peningkatan Layed de- dalam jumlah predator sebagai predator
makan lebih banyak mangsa. Diagram ruang fase jelas menunjukkan periodisitas dari osilasi. Perhatikan bahwa
kurva bergerak berlawanan: nomor mangsa meningkat ketika nomor predator yang minimal, dan penurunan angka
mangsa ketika nomor predator yang maksimal.

BAB 1. DINAMIKA PENDUDUK 11


1.4. THE LOTKA-Volterra MODEL PREDATOR-MANGSA

fungsi lotka_volterra
% Plot time series dan diagram ruang fase yang jelas semua; tutup
semua;
t0 = 0; tf = 6 * pi; eps = 0,1; delta = 0; r = [1/2, 1, 2];

Pilihan = odeset ( 'RELTOL', 1e-6, 'AbsTol', 1.e-9); plot series% waktu untuk i
= 1: length (r);

[T, UV] = ode45 (@ (t, UV) lv_eq (t, UV, r (i)), [t0, tf], [1 + eps 1 + delta], pilihan); U = UV (:, 1); V = UV (:, 2);

subplot (3,1, i); plot (t, U, t, V, '-'); axis ([0 6 * pi, 0,8 1,25]); ylabel ( 'predator,
mangsa'); teks (3,1.15, [ 'r =', num2str (r (i))]); akhir

xlabel ( 't');
subplot (3,1,1); legenda ( 'mangsa', 'predator'); % Ruang fase petak

xpos = [2,5 2,5 2,5]; ypos = [3,5 3,5 3,5];% untuk annotating grafik untuk i = 1: length (r);

untuk eps = 0,1: 0,1: 1,0;


[T, UV] = ode45 (@ (t, UV) lv_eq (t, UV, r (i)), [t0, tf], [1 + eps 1 + delta], pilihan); U = UV (:, 1); V = UV (:, 2);

Angka (2); subplot (1,3, i); plot (U, V); tahan; akhir

sumbu yang sama; axis ([0 4 0 4]);


teks (xpos (i), ypos (i), [ 'r =', num2str (r (i))]); jika saya == 1; ylabel (
'predator'); akhir; xlabel ( 'mangsa'); akhir

Fungsi DUV = lv_eq (t, UV, r) DUV = nol


(2,1);
DUV (1) = r * (UV (1) -UV (1) * UV (2)); DUV (2) = (1 / r) *
(UV (1) * UV (2) -UV (2));

12 BAB 1. DINAMIKA PENDUDUK


1.4. THE LOTKA-Volterra MODEL PREDATOR-MANGSA

predator

mangsa
1.2
r = 0,5
predator,prey

0.8 1
0 2 4 6 8 10 12 14 16 18

1.2
r=1
predator,prey

0.8 1
0 2 4 6 8 10 12 14 16 18

1.2
r=2
predator,prey

0.8 1
0 2 4 6 8 10 12 14 16 18
t

r = 0,5 r=1 r=2


predator

4 4 4

23 23 23

01 01 01
0 2 4 0 2 4 0 2 4
mangsa mangsa mangsa

Gambar 1.5: Solusi dari berdimensi persamaan Lotka-Volterra. plot atas: solusi deret waktu; plot yang lebih rendah:
fase diagram ruang.

BAB 1. DINAMIKA PENDUDUK 13


1.4. THE LOTKA-Volterra MODEL PREDATOR-MANGSA

14 BAB 1. DINAMIKA PENDUDUK


Bab 2

Populasi usia-terstruktur
Menentukan umur-struktur populasi membantu pemerintah merencanakan pembangunan ekonomi. Teori
usia-struktur juga dapat membantu ahli biologi evolusi yang lebih baik un- derstand riwayat hidup spesies ini.
Populasi usia terstruktur terjadi karena musim semi off yang lahir dari ibu pada usia yang berbeda. Jika per kapita
kelahiran dan angka kematian rata-rata pada usia yang berbeda adalah konstan, maka usia-struktur yang stabil
muncul. Namun, perubahan yang cepat dalam tingkat kelahiran atau kematian dapat menyebabkan usia-struktur
untuk menggeser tions distribu-. Pada bagian ini, kita mengembangkan teori populasi usia-terstruktur menggunakan
kedua model discrete- dan berkesinambungan-waktu. Kami juga hadir dua Penerapan-penerapan yang menarik: (1)
perubahan pemodelan usia-struktur di Cina dan negara-negara lain seperti usia populasi ini, dan; (2) pemodelan
siklus hidup cacing hermafrodit. Kita mulai bagian ini, bagaimanapun, dengan salah satu masalah tertua di ogy biol-
matematika: kelinci Fibonacci. Hal ini akan membawa kita untuk penyimpangan singkat tentang mean emas,
perkiraan rasional dan pengembangan fl ower, sebelum kembali ke topik utama kita.

2,1 kelinci Fibonacci

Di 1202, Fibonacci diusulkan teka-teki berikut ini, yang kami parafrase di sini:

Seorang pria menempatkan sepasang pria-wanita kelinci yang baru lahir di sebuah lapangan. Kelinci mengambil
satu bulan untuk matang sebelum kawin. Satu bulan setelah kawin, betina melahirkan satu pasang pria-wanita
dan kemudian kawin lagi. Tidak ada kelinci mati. Berapa pasang kelinci yang ada setelah satu tahun?
Pertumbuhan populasi kelinci Fibonacci disajikan pada Tabel 2.1 . Pada

mulai dari setiap bulan, jumlah remaja, dewasa, dan jumlah kelinci akan ditampilkan. Pada awal Januari, sepasang
kelinci remaja dimasukkan ke populasi. Pada awal Februari, pasangan ini kelinci telah matang dan pasangan. Pada
awal Maret, pasangan ini asli dari kelinci melahirkan sepasang baru kelinci remaja. Dan seterusnya.

Jika kita membiarkan F n menjadi jumlah total pasangan kelinci di awal n th bulan, maka jumlah kelinci pasang
pada awal bulan ke-13 akan menjadi solusi untuk teka-teki Fibonacci. Meneliti jumlah pasangan kelinci pada Tabel 2.1
, Jelas bahwa

F n + 1 = F n + F n - 1. (2.1)

Persamaan Perbedaan linear orde kedua ini membutuhkan dua kondisi awal, yang diberikan oleh F 1 = F 2 = 1. Yang
pertama tiga belas angka Fibonacci, membaca dari meja,

bulan J FMAMJ J ASOND J

remaja 1 0 1 12 35 8 13 21 34 55 89
dewasa 011 23 58 13 21 34 55 89 144
total 112 35 8 13 21 34 55 89 144 233

Tabel 2.1: Populasi kelinci Fibonacci.

15
2.2. RASIO EMAS Φ

diberikan oleh
1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233,. . .

dimana F 13 = 233 adalah solusi untuk teka-teki Fibonacci.


Mari kita memecahkan ( 2.1 ) Untuk semua F n 'S. Untuk menyelesaikan persamaan ini, kita mencari solusi dalam
bentuk F n = λ n. Pergantian ke ( 2.1 ) hasil

λ n + 1 = λ n + λ n - 1,

atau setelah pembagian dengan λ n - 1:

λ 2 - λ - 1 = 0,

dengan larutan

λ±=1± √5 .
2

De fi ne

Φ=1+√5
2 = 1,61803. . . .
dan √5-1
φ=
2 = Φ - 1 = 0,61803. . . .

Kemudian λ + = Φ dan λ - = - φ. Juga, perhatikan bahwa sejak Φ 2 - Φ - 1 = 0, pembagian dengan Φ


hasil 1 / Φ = Φ - 1, sehingga

φ=1
Φ.
Seperti dalam solusi persamaan diferensial homogen linear, dua nilai dari
λ dapat digunakan untuk membangun solusi umum untuk persamaan perbedaan linear menggunakan prinsip
superposisi linear:

F n = c 1 Φ n + c 2 ( - φ) n.

Memperluas urutan Fibonacci untuk F 0 = 0 (karena F 0 = F 2 - F 1), kami memenuhi kondisi F 0 = 0 dan F 1 = 1:

c 1 + c 2 = 0,
c 1 Φ - c 2 φ = 1.

Karena itu, c 2 = - c 1, dan c 1 ( Φ + φ) = 1, atau c 1 = 1 / √ 5, c 2 = - 1 / √ 5. Kita bisa menulis ulang solusi sebagai

F n = Φ n - (- φ)√n5 . (2.2)

Sejak φ n → 0 sebagai n → ∞, kita melihat bahwa F n → Φ n / √ 5, dan F n + 1 / F n → Φ.

2.2 Rasio emas Φ

Nomor Φ dikenal sebagai rasio emas. Dua angka positif x dan y, dengan
x> y, dikatakan dalam rasio emas jika rasio antara jumlah dari angka-angka dan satu yang lebih besar adalah sama
dengan rasio antara lebih besar dan lebih kecil; itu adalah,

x + yx = x
(2.3)
y.

16 BAB 2. POPULASI USIA-TERSTRUKTUR


2.2. RASIO EMAS Φ

Solusi ( 2.3 ) hasil x / y = Φ. Dalam beberapa cara ned fi baik-de, Φ juga bisa disebut paling irasional nomor irasional.

Untuk memahami mengapa Φ memiliki perbedaan ini sebagai nomor yang paling irasional, kita perlu terlebih
dahulu memahami terus pecahan. Ingat bahwa bilangan rasional adalah setiap nomor yang dapat dinyatakan
sebagai hasil bagi dua bilangan bulat, dan bilangan irasional adalah nomor yang tidak rasional. bilangan rasional
memiliki fi nite terus pecahan; bilangan irasional memiliki berhingga terus pecahan.

Sebuah fraksi fi nite terus mewakili bilangan rasional x sebagai

1
x=a0+ . (2.4)
1
Sebuah 1 +

Sebuah 2 + 1
.. 1
. +
Sebuah n

dimana Sebuah 1, Sebuah 2, . . . . Sebuah n adalah bilangan bulat positif dan Sebuah 0 adalah setiap bilangan bulat. Bentuk singkatan nyaman ( 2.4 ) aku s

x = [a 0; Sebuah 1, Sebuah 2, . . . . Sebuah n].

Jika x adalah tidak rasional, maka n → ∞.

Sekarang untuk beberapa contoh. Untuk membangun fraksi lanjutan dari bilangan rasional x = 3/5, kita dapat
menulis

3/5 = 1
5/3 = 1 1 + 2/3
1
=1 = .

1+1 1+1
3/2 1 + 1/2

yang adalah dalam bentuk ( 2.4 ), Sehingga 3/5 = [0; 1, 1, 2].

Untuk membangun fraksi lanjutan dari bilangan irasional √ 2, kita dapat membuat
menggunakan trik dan menulis

√ 2 = 1 + ( √ 2 - 1)

=1+1
1 + √ 2.

Kami sekarang memiliki rekursif definisi yang dapat dilanjutkan sebagai

√2=1+ 1
(1+1 )
1+
1+ √ 2

12
=1+ .
+1
1+ √ 2

dan sebagainya, yang menghasilkan fraksi di fi nite terus

√ 2 = [1; 2].

Contoh lain akan kita gunakan nantinya adalah fraksi terus untuk π, yang pertama

BAB 2. POPULASI USIA-TERSTRUKTUR 17


2.3. ANGKA FIBONACCI DALAM bunga matahari

beberapa istilah dapat dihitung dari

π = 3 + 0,14159. . .

1
=3+
7,06251. . .
1
=3+ .
1
7+
15,99659. . .

dan sebagainya, menghasilkan urutan dimulai π = [ 3; 7, 15,. . . ]. The historis penting pertama-order pendekatan
diberikan oleh π = [ 3; 7] = 22/7 = 3,142857. . . , Yang sudah dikenal oleh Archimedes di zaman kuno.

Akhirnya, untuk menentukan fraksi terus untuk rasio emas Φ, kita dapat menulis

Φ=1+1
Φ,

yang lain rekursif de fi ntion yang dapat dilanjutkan sebagai

Φ=1+1 .

1+1
Φ

dan sebagainya, menghasilkan bentuk sangat sederhana

Φ = [ 1; 1].

Karena trailing Sebuah saya 'S semua sama dengan satu, fraksi terus untuk rasio emas (dan nomor lain yang terkait
dengan orang yang membuntuti) konvergen terutama perlahan. Selanjutnya, pendekatan rasional berturut-turut
dengan rasio emas hanya rasio angka Fibonacci yang berurutan, yaitu, 1/1, 2/1, 3/2, 5/3, dll .. Karena konvergensi
yang sangat lambat dari urutan ini , kita mengatakan bahwa rasio emas adalah yang paling sulit untuk mendekati
oleh sejumlah rasional. Lebih puitis, rasio emas telah disebut paling irasional nomor irasional.

Karena rasio emas adalah nomor yang paling irasional, ia memiliki cara muncul tiba-tiba di alam. Salah satu
contoh yang terkenal adalah orets fl dalam bunga matahari kepala, yang kita bahas di bagian berikutnya.

2.3 Angka-angka Fibonacci di matahari fl ower

Pertimbangkan foto dari bunga matahari ditunjukkan pada Gambar. 2.1 , Dan perhatikan spiral jelas dalam orets fl
memancar keluar dari pusat ke tepi. spiral ini muncul untuk memutar baik searah jarum jam dan berlawanan.
Dengan menghitung mereka, salah satu nds fi 34 spiral searah jarum jam dan 21 spiral berlawanan. Nomor 21 dan
34 yang terkenal karena mereka nomor berturut-turut di urutan Fibonacci.

Mengapa angka Fibonacci muncul di bawah sinar matahari fl kepala ower? Untuk menjawab-pertanyaan tion ini,
kita membangun model yang sangat sederhana untuk cara yang orets fl berkembang. Misalkan selama
pengembangan, fl orets pertama muncul dekat dengan pusat kepala dan sub sequently bergerak radial keluar dengan
kecepatan konstan sebagai bunga matahari kepala tumbuh. Untuk mengisi matahari melingkar fl ower kepala, kita
menganggap bahwa karena setiap oret fl baru dibuat di pusat, itu diputar melalui sudut konstan sebelum pindah radial.
Kami akan terus menganggap bahwa sudut rotasi adalah optimal dalam arti bahwa matahari yang dihasilkan fl ower
kepala terdiri dari orets fl yang baik-spasi.

18 BAB 2. POPULASI USIA-TERSTRUKTUR


2.3. ANGKA FIBONACCI DALAM bunga matahari

Gambar 2.1: The fl owering kepala bunga matahari.

Mari kita menunjukkan sudut rotasi oleh 2 πα. Kami terlebih dahulu mempertimbangkan kemungkinan bahwa α

adalah bilangan rasional, katakanlah n / m, dimana n dan m adalah bilangan bulat positif tanpa faktor umum, dan n < m. karena
setelah m rotasi fl orets akan kembali ke garis radial di mana mereka mulai, matahari yang dihasilkan fl ower kepala
terdiri dari orets fl berbaring bersama m
garis lurus. Seperti kepala bunga matahari untuk α = 1/7 ditunjukkan pada Gambar. 2.2 a, di mana satu mengamati
tujuh garis lurus. nilai-nilai Terbukti, rasional untuk α tidak menghasilkan dengan baik orets fl spasi. Angka ini dan
yang berikutnya diproduksi menggunakan MATLAB, dan kode simulasi ditunjukkan pada akhir ayat ini.

Bagaimana nilai-nilai tidak rasional? Tidak peduli berapa banyak rotasi, yang orets fl akan pernah kembali ke
garis start radial mereka. Namun demikian, matahari yang dihasilkan fl ower kepala mungkin tidak memiliki
baik-spasi orets fl. Sebagai contoh, jika α = π - 3, maka dihasilkan matahari fl kepala ower terlihat seperti Gambar. 2.2
b. Ada, satu dapat melihat tujuh spiral berlawanan. Ingat bahwa pendekatan rasional yang baik untuk π adalah 22/7,
yang sedikit lebih besar dari π. Pada setiap rotasi berlawanan ketujuh, saat itu, fl baru oret jatuh hanya singkat dari
garis radial diikuti oleh oret fl dari tujuh rotasi lalu.

Angka-angka irasional yang paling mungkin untuk membangun sebuah matahari fl ower head dengan orets fl
baik-spasi adalah mereka yang tidak bisa baik-didekati dengan bilangan rasional. Di sini, kita memilih yang disebut
angle emas, mengambil α = 1 - φ, sehingga 2 π ( 1 - φ) ≈ 137.5 ◦, dan melakukan rotasi searah jarum jam. Tions yang
mendekati rasional untuk 1 - φ diberikan oleh F n / F n + 2, sehingga jumlah spiral diamati akan sesuai dengan angka
Fibonacci.

Dua simulasi kepala bunga matahari dengan α = 1 - φ ditunjukkan pada Gambar. 2.3 .
Simulasi hanya berbeda dengan pilihan kecepatan radial. Pada Gambar. 2.3 , ada dapat dihitung 13 spiral searah jarum
jam dan 21 spiral berlawanan; pada Gambar. 2.3 b, ada 21 spiral berlawanan arah jarum jam dan 34 spiral searah
jarum jam, seperti bunga matahari kepala ditunjukkan pada Gambar. 2.1 .

BAB 2. POPULASI USIA-TERSTRUKTUR 19


2.3. ANGKA FIBONACCI DALAM bunga matahari

(Sebuah) (B)

Gambar 2.2: Simulasi kepala bunga matahari untuk (a) α = 1/7; (B) α = π - 3.

(Sebuah) (B)

Gambar 2.3: Simulasi kepala bunga matahari untuk α = 1 - φ. ( Sebuah) v 0 = 1/2; (B) v 0 = 1/4.

20 BAB 2. POPULASI USIA-TERSTRUKTUR


2.4. KELINCI ADALAH PENDUDUK USIA-TERSTRUKTUR

Fungsi bunga (alpha)


% Menarik poin bergerak secara radial tapi diputar sudut 2 * angka pi * num

angle = 2 * pi * alpha; r0 = 0;
v0 = 1.0; % Untuk nomor Fibonacci lebih tinggi mencoba v0 = 0,5, 0,25, 0,1 npts = 100 / v0;

theta = (0: npts-1) * sudut; % Semua sudut dari npts r = r0 * orang (1, npts); % Mulai
jari-jari npts
untuk i = 1: npts% i adalah jumlah total poin yang akan diplot (juga waktu)
untuk j = 1: i% menemukan koordinat semua i poin; j = i adalah titik terbaru
r (j) = r (j) + v0; %*(aku j);
x (j) = r (j) * cos (theta (j)); y (j) = r (j) * sin (theta (j)); akhir

plot ( '' x, y,, 'MarkerSize', 5.0); sumbu yang sama;

axis ([- npts * v0-r0 npts * v0 + r0 -npts * v0-r0 npts * v0 + r0]); jeda (0,1); akhir

% Menggambar lingkaran di pusat


tahan; phi = linspace (0,2 * pi); x = r0 * cos (phi); y = r0 * sin (phi); mengisi (x, y, 'r')

2.4 Kelinci adalah penduduk usia-terstruktur

kelinci Fibonacci membentuk penduduk usia-terstruktur dan kita dapat menggunakan kasus sederhana ini untuk
menggambarkan pendekatan yang lebih umum. kelinci Fibonacci dapat dikategorikan menjadi dua kelas usia
bermakna: remaja dan orang dewasa. Berikut, remaja adalah baru- lahir kelinci yang belum bisa kawin; orang
dewasa adalah mereka kelinci setidaknya satu bulan. Dimulai dengan sepasang bayi yang baru lahir pada awal
pertama bulan, kami sensus penduduk pada setiap awal bulan berikutnya setelah betina dikawinkan telah
melahirkan. Pada awal n bulan th, biarkan u 1, n menjadi jumlah pasangan kelinci yang baru lahir, dan membiarkan u 2, n
menjadi jumlah pasangan kelinci setidaknya satu bulan. Karena setiap pasangan dewasa melahirkan sepasang
remaja, jumlah pasangan remaja di awal ( n + 1) bulan -st adalah sama dengan jumlah pasangan dewasa di awal n bulan
th. Dan karena jumlah pasangan dewasa di awal ( n + 1) bulan -st adalah sama dengan jumlah dari pasangan
dewasa dan remaja di awal n bulan th, kita memiliki

u 1, n + 1 = u 2, n,
u 2, n + 1 = u 1, n + u 2, n;

atau ditulis dalam bentuk matriks ( u 1, n + 1


) ( 0 1 1 1 ) ( u 1, n )
= . (2.5)
u 2, n + 1 u 2, n

Ditulis ulang dalam bentuk vektor, kita memiliki

u n + 1 = L u n, (2.6)

dimana definisi fi de vektor u n dan matriks L yang jelas. Kondisi awal, dengan satu pasangan remaja dan tidak ada
orang dewasa, diberikan oleh
( u 1,1 ) ( 10 )

= .
u 2,1

BAB 2. POPULASI USIA-TERSTRUKTUR 21


2.5. DISKRIT POPULASI USIA-TERSTRUKTUR

Solusi sistem ini digabungkan, pertama-tama-order, linear, persamaan perbedaan, ( 2.6 ), Hasil sama dengan yang
persamaan digabungkan, pertama-order, linear, diferensial. Dengan ansatz itu, u n = λ n v, kita memperoleh pada
substitusi ke ( 2.6 ) Masalah eigenvalue

L v = λ v,

solusi yang menghasilkan dua nilai eigen λ 1 dan λ 2, dengan vektor eigen yang sesuai
v 1 dan v 2. Solusi umum untuk ( 2.6 ) Kemudian

un=c1λn 1 v1+c2λn 2 v 2, (2.7)

dengan c 1 dan c 2 ditentukan dari kondisi awal. Sekarang anggaplah bahwa | λ 1 | > | λ 2 |.
Jika kita menulis ulang ( 2.7 ) Dalam bentuk

( (λ2 ) nv2 )
un=λn 1
c1v1+c2 .
λ1

kemudian karena | λ 2 / λ 1 | < 1, u n → c 1 λ n 1 v 1 sebagai n → ∞. The lama asymptotics dari


populasi, oleh karena itu, hanya bergantung pada λ 1 dan sesuai eigenvector v 1.
Untuk kelinci Fibonacci kita, nilai-nilai eigen diperoleh dengan memecahkan det (L - λ I) =
0, dan kami mendapati

(-λ )
111
det = - λ ( 1 - λ) - 1

= 0,

atau λ 2 - λ - 1 = 0, dengan solusi Φ dan - φ. Sejak Φ> φ, eigenvalue Φ


dan yang sesuai vektor eigen menentukan lama populasi asimtotik usia-struktur. eigenvector yang dapat ditemukan
dengan memecahkan

( L - Φ SAYA) v 1 = 0,

atau (-Φ ) ( v 11 ) ( 00 )
1
= .
1 1-Φ v 12

Persamaan pertama hanya - Φ kali persamaan kedua (penggunaan Φ 2 - Φ - 1 = 0), sehingga v 12 = Φ v 11. Pengambilan v 11 = 1,
kita memiliki
(1 )
v1= .
Φ

The asymptotic usia struktur yang diperoleh dari v 1 menunjukkan bahwa rasio orang dewasa untuk remaja
pendekatan mean emas; itu adalah,

u 2, n
lim
n→∞ u 1, n = v 12 / v 11

= Φ.

2,5 populasi usia terstruktur Discrete

Dalam model diskrit, sensus penduduk terjadi pada waktu diskrit dan individu yang ditugaskan untuk kelas usia,
yang mencakup berbagai usia. Untuk model kesederhanaan, kita mengasumsikan bahwa waktu antara sensus
sama dengan rentang usia dari semua kelas umur.

22 BAB 2. POPULASI USIA-TERSTRUKTUR


2.5. DISKRIT POPULASI USIA-TERSTRUKTUR

u di jumlah perempuan di kelas usia saya di sensus n


s saya sebagian kecil dari betina yang masih hidup dari kelas umur saya - 1 untuk saya

m saya jumlah yang diharapkan dari anak perempuan dari seorang wanita di kelas usia saya

l i = s 1 · s saya sebagian kecil dari betina yang masih hidup dari lahir sampai kelas umur saya

f i = m saya l saya

R 0 = Σ saya f saya rasio reproduksi dasar

Tabel 2.2: De definisi yang diperlukan dalam model usia terstruktur diskrit-waktu penduduk

Contohnya adalah negara yang sensus penduduk setiap lima tahun, dan memberikan individu untuk kelas usia rentang lima
tahun (misalnya, berusia 0-4 tahun, berusia 5-9 tahun, dll). Meskipun sensus negara umumnya mengandalkan kedua
perempuan dan laki-laki secara terpisah, kami hanya akan menghitung betina dan mengabaikan laki-laki.

Ada beberapa definisi fi de baru dalam Bagian ini dan saya menempatkan ini pada Tabel 2.2 untuk referensi mudah. Kami
mendefinisikan u di menjadi jumlah perempuan di kelas usia saya di sensus
n. Kami berasumsi bahwa i = 1 mewakili kelas fi usia pertama dan i = ω yang terakhir. Tidak ada perempuan bertahan melewati kelas
umur terakhir. Kami juga menganggap bahwa sensus pertama terjadi ketika n = 1. Kami mendefinisikan s saya sebagai fraksi betina yang
bertahan dari kelas usia saya - 1 untuk kelas umur i ( dengan s 1 fraksi bayi yang baru lahir yang bertahan sensus pertama fi mereka),
dan mendefinisikan m saya sebagai jumlah yang diharapkan dari kelahiran perempuan per perempuan di kelas usia saya.

Kami membangun persamaan perbedaan untuk { u i, n + 1} istilah dari { u di}. Pertama, bayi yang baru lahir di sensus n
+ 1 lahir antara sensus n dan n + 1 untuk perempuan berusia berbeda, dengan berbeda fertilitas. Juga, hanya faksi
bayi yang baru lahir ini bertahan untuk sensus pertama mereka. Kedua, hanya sebagian kecil dari betina di kelas
usia saya yang dihitung dalam sensus n bertahan hidup untuk dihitung di kelas usia i + 1 di sensus n + 1. Puting dua
ide ini bersama-sama dengan fi parameter ned tepat de, persamaan perbedaan untuk { u i, n + 1} bertekad untuk
menjadi

u 1, n + 1 = s 1 ( m 1 u 1, n + m 2 u 2, n + · + m ω u ω, n) .
u 2, n + 1 = s 2 u 1, n,
u 3, n + 1 = s 3 u 2, n,
.
.
.

u ω, n + 1 = s ω u ω - 1, n,

yang dapat ditulis ulang sebagai persamaan matriks


• • • • • •
u 1, n + 1 s1m1s1m2 ... s1mω-1s1mω u 1, n
• u 2, n + 1 • • s2 0 .. .0 0 • • u 2, n •
• • • • • •
• u 3, n + 1 • • 0 s3 .. .0 0 • • u 3, n •
• • = • • • • ;
• . • • . . . . . • • . •
• . • • . . . . . • • . •
. . . . . . .
u ω, n + 1 0 0 ... sω 0 u ω, n

atau dalam bentuk vektor kompak


u n + 1 = L u n, (2.8)

di mana L disebut Leslie Matrix.


Sistem ini persamaan linear dapat diselesaikan dengan menentukan nilai eigen dan vektor eigen terkait dari
Leslie Matrix. Satu dapat memecahkan langsung persamaan acteristic char-, det (L - λ I) = 0, atau mengurangi sistem
persamaan perbedaan pertama-order ( 2.8 ) Untuk persamaan-order tinggi tunggal untuk jumlah perempuan di
pertama

BAB 2. POPULASI USIA-TERSTRUKTUR 23


2.5. DISKRIT POPULASI USIA-TERSTRUKTUR

kelas umur. Mengikuti pendekatan yang terakhir, dan dimulai dengan baris kedua ( 2.8 ), kita punya

u 2, n + 1 = s 2 u 1, n,
u 3, n + 1 = s 3 u 2, n
= s 3 s 2 u 1, n - 1,
.
.
.

u ω, n + 1 = s ω u ω - 1, n
= s ω s ω - 1 u ω - 2, n - 1
.
.
.

= s ω s ω - 1 · s 2 u 1, n - ω + 2.

Jika kita mendefinisikan l i = s 1 s 2 · s saya menjadi fraksi betina yang bertahan dari lahir sampai kelas umur saya, dan f i = m saya l saya
menjadi jumlah anak perempuan diharapkan dari perempuan yang baru lahir setelah mencapai kelas umur i ( memperhitungkan
bahwa dia tidak dapat bertahan hidup untuk kelas umur saya), maka baris pertama dari ( 2.8 ) menjadi

u 1, n + 1 = f 1 u 1, n + f 2 u 1, n - 1 + f 3 u 1, n - 2 + · + f ω u 1, n - ω + 1. (2.9)

Di sini, kami telah membuat asumsi penyederhanaan yang n ≥ ω sehingga semua betina dihitung dalam n + 1
sensus lahir setelah sensus pertama.
High-order persamaan perbedaan linear ( 2,9 ) Dapat diselesaikan dengan menggunakan ansatz yang
u 1, n = λ n. substitusi langsung dan pembagian dengan λ n + 1 hasil dalam persamaan Euler- Lotka diskrit

Σ f j λ - j = 1, (2.10)
j=1

yang mungkin memiliki akar baik yang nyata dan kompleks-konjugat.


Setelah nilai eigen λ ditentukan dari ( 2.10 ), Yang sesuai eigenvector
v dapat dihitung dengan menggunakan matriks Leslie. Kita punya

• • • • • •
s1m1- λs1m2 ... s1mω-1s1mω v1 000 ...
• s2 -λ .. .0 0 • • v2 • • •
• • • • • •
• 0 s3 .. .0 0 • • v3 • • •
• • • • = • • .
• . . . . . • • . • • •
• . . . . . • • . • • •
. . . . . .
0 0 ... sω -λ vω 0

Pengambilan v ω = l ω / λ ω, dan dimulai dengan baris terakhir dan bekerja mundur, kita memiliki:

v ω - 1 = l ω - 1 / λ ω - 1,

v ω - 2 = l ω - 2 / λ ω - 2,
.
.
.

v 1 = l 1 / λ,

yang seperti itu

v i = l saya/ λ saya, untuk i = 1, 2,. . . . ω.

24 BAB 2. POPULASI USIA-TERSTRUKTUR


2.6. TERUS MENERUS POPULASI USIA-TERSTRUKTUR

Kita bisa memperoleh implikasi yang menarik dari hasil ini dengan membentuk rasio dua kelas usia berturut-turut.
Jika λ adalah nilai eigen dominan (dan nyata dan positif, seperti halnya untuk populasi manusia), maka asimtotik,

u i + 1, n / u di ~ v i + 1 / v saya

= s i + 1 / λ.

Dengan fraksi kelangsungan hidup { s saya} fi xed, meningkatkan λ menyiratkan rasio menurun: pertumbuhan populasi
yang lebih cepat memiliki relatif lebih muda orang dari populasi tumbuh lebih lambat. Bahkan, kita sekarang hidup
melalui saat negara-negara maju, terutama Jepang dan mereka di Eropa Barat, serta Hong Kong dan pori Singa-,
telah secara substansial menurunkan tingkat pertumbuhan penduduk dan meningkatkan usia rata-rata warganya.

Jika kita ingin hanya menentukan apakah populasi tumbuh atau meluruh, kita dapat menghitung rasio reproduksi
dasar R 0, didefinisikan sebagai harapan bersih anak perempuan ke perempuan yang baru lahir. Stasis diperoleh jika
perempuan satu-satunya menggantikan dirinya sebelum meninggal. Jika R 0> 1, maka populasi tumbuh, dan jika R 0 < 1
maka populasi meluruh.
R 0 adalah sama dengan jumlah anak perempuan yang diharapkan dari bayi yang baru lahir ketika dia di kelas usia saya, dijumlahkan
atas semua kelas umur, atau

ω
R0= Σ f saya.
i=1

Untuk populasi dengan angka kira-kira sama laki-laki dan perempuan, R 0 = 1 berarti perempuan yang baru lahir
harus menghasilkan rata-rata dua anak selama hidupnya. berita di media Barat sering menyatakan bahwa
pertumbuhan penduduk nol, wanita perlu memiliki 2,1 anak. Syarat perempuan digunakan dalam cerita ini mungkin
berarti wanita usia subur. Sejak gadis yang mati muda tidak memiliki anak, statistik dari 2,1 anak menyiratkan
bahwa 0,1 / 2,1, atau sekitar 5% dari anak-anak meninggal sebelum mencapai usia dewasa.

Sebuah aplikasi yang berguna dari model matematika yang dikembangkan dalam Bagian ini adalah untuk
memprediksi struktur umur masa depan dalam berbagai negara. Hal ini dapat menjadi penting bagi ekonomi misalnya
perencanaan-untuk, menentukan pendapatan pajak yang dapat membayar meningkatnya biaya perawatan kesehatan
sebagai penduduk usia. Untuk prediksi yang akurat pada umur-struktur masa depan suatu negara, imigrasi dan migrasi
juga harus dimodelkan. Sebuah situs menarik untuk menelusuri berada di

http://www.census.gov/ipc/www/idb.

website ini, diciptakan oleh biro sensus AS, menyediakan akses ke Data Base Internasional (IDB), sumber
komputerisasi tics statis- demografi dan sosial ekonomi untuk 227 negara dan wilayah di dunia. Di kelas, kita akan
melihat dan membahas output dinamis dari beberapa piramida penduduk, termasuk untuk Hong Kong dan China.

2,6 populasi usia terstruktur berkelanjutan

Kita dapat memperoleh model kontinu-waktu dengan mempertimbangkan model diskrit dalam batas sebagai rentang usia Δ
Sebuah dari kelas usia (juga sama dengan waktu antara sensus) pergi ke nol. Untuk n> ω, ( 2,9 ) Dapat ditulis kembali
sebagai

ω
u 1, n = Σ f saya u 1, n - saya. (2.11)
i=1

BAB 2. POPULASI USIA-TERSTRUKTUR 25


2.6. TERUS MENERUS POPULASI USIA-TERSTRUKTUR

Yang pertama kelas umur dalam model diskrit terdiri dari perempuan yang lahir antara dua sensus secutive con.
Sesuai fungsi dalam model kontinu adalah tingkat kelahiran laki-laki fe- dari populasi secara keseluruhan, B (t), memuaskan

u 1, n = B (t n) Δ Sebuah.

Jika kita berasumsi bahwa n th sensus berlangsung pada suatu waktu t n = n Δ Sebuah, kami juga punya

u 1, n - i = B (t n - saya) Δ Sebuah

= B (t n - t saya) Δ Sebuah.

Untuk menentukan analog yang terus menerus dari parameter f i = m saya l saya, kita definisikan spesifik fungsi c kelangsungan hidup
usia- la) menjadi fraksi betina yang baru lahir yang bertahan hidup sampai usia Sebuah, dan mendefinisikan fungsi fi c bersalin
usia-spesifik m (a), dikalikan dengan Δ Sebuah, menjadi rata-rata jumlah perempuan yang lahir perempuan antara usia Sebuah dan
a + Δ Sebuah.
Dengan definisi de dari fungsi bersalin net usia-spesifik, f (a) = m (a) l (a), dan
Sebuah i = saya Δ Sebuah, kita punya

f i = f (a saya) Δ Sebuah.

Dengan definisi de fi baru, ( 2.11 ) menjadi

ω
B (t n) Δ a = Σ f (a saya) B (t n - t saya)( Δ Sebuah) 2.
i=1

Membatalkan salah satu faktor Δ Sebuah, dan menggunakan t i = Sebuah saya, sisi kanan menjadi sum Rie- mann. Pengambilan t n
= t dan menugaskan f (a) = 0 ketika Sebuah lebih besar dari usia maksimal kesuburan wanita, batas Δ Sebuah → 0
transformasi ( 2.11 ) untuk

∫∞
B (t) = (2.12)
0B (t - a) f (a) da.

persamaan ( 2.12 ) Menyatakan bahwa tingkat kelahiran perempuan populasi-lebar saat t memiliki tributions con- dari
betina dari segala usia, dan bahwa kontribusi untuk tingkat kelahiran ini dari perempuan antara usia Sebuah dan a + da ditentukan
dari tingkat kelahiran perempuan populasi-lebar pada waktu sebelumnya t - Sebuah kali fraksi betina yang bertahan
hidup sampai usia Sebuah kali jumlah kelahiran perempuan untuk perempuan antara usia Sebuah dan a + da.

persamaan ( 2.12 ) Adalah homogen persamaan integral linear, berlaku untuk t lebih besar dari usia maksimal
kesuburan wanita. Amore lengkap tapi homogen persamaan berlaku untuk yang lebih kecil t juga dapat diturunkan.

persamaan ( 2.12 ) Dapat diselesaikan dengan ansatz yang B (t) = e rt. hasil substitusi langsung

∫∞
e rt = f (a) e r (t - Sebuah) da,
0

yang setelah membatalkan e rt Hasil dalam persamaan Euler-Lotka terus menerus

∫∞
f (a) e - ra da = 1. (2.13)
0

persamaan ( 2.13 ) Adalah persamaan integral untuk r mengingat usia-spesifik fungsi bersalin bersih f (a). Hal ini
dimungkinkan untuk membuktikan bahwa untuk f (a) kontinu non-negatif tion Fungsi, ( 2.13 ) Memiliki tepat satu akar
nyata r *, dan bahwa penduduk tumbuh ( r *> 0) atau meluruh ( r * < 0) asimtotik sebagai e r * t. Laju pertumbuhan penduduk r *
telah disebut tingkat intrinsik kenaikan, tingkat pertumbuhan intrinsik, atau parameter Malthus.

26 BAB 2. POPULASI USIA-TERSTRUKTUR


2.6. TERUS MENERUS POPULASI USIA-TERSTRUKTUR

Biasanya, ( 2.13 ) Diselesaikan secara numerik untuk r menggunakan algoritma nding fi root- seperti metode Newton.

Setelah asimtotik mencapai struktur usia yang stabil, populasi tumbuh seperti
e r * t, dan diskusi kita sebelumnya dari model pertumbuhan Malthus menunjukkan bahwa
r * dapat ditemukan dari tingkat kelahiran per kapita konstan b dan tingkat kematian d. Dengan menentukan ekspresi untuk b dan
d, kita memang akan menunjukkan bahwa r * = b - d.
Karena perempuan yang bertahan sampai usia Sebuah pada waktu t lahir sebelumnya pada waktu
t - a, B (t - a) l (a) da mewakili jumlah perempuan pada waktu t yang antara usia Sebuah dan a + da. Jumlah total
betina N (t) pada waktu t Oleh karena itu diberikan oleh

∫∞
N (t) = (2.14)
0B (t - a) l (a) da.

Tingkat kelahiran per kapita b (t) sama dengan tingkat kelahiran penduduk di seluruh B (t) dibagi dengan ukuran populasi N (t), dan
menggunakan ( 2.12 ) Dan ( 2.14 ),

b (t) = B (t) / N (t)


∫∞
= ∫0∞B (t - a) f (a) da
0 B (t - a) l (a) da.

Demikian pula, per kapita tingkat kematian d (t) sama dengan tingkat kematian penduduk di seluruh
D (t) dibagi dengan N (t). untuk memperoleh D (t), kita terlebih dahulu mendefinisikan fungsi fi c angka kematian usia tertentu μ ( Sebuah),
dikalikan dengan Δ Sebuah, menjadi fraksi wanita usia Sebuah yang meninggal sebelum mencapai usia a + Δ Sebuah. Hubungan
antara usia-spesifik kematian tion func- μ ( Sebuah) dan fungsi fi c kelangsungan hidup usia-spesifik la) dapat diperoleh dengan
menghitung fraksi betina yang bertahan hidup sampai usia a + Δ Sebuah. fraksi ini sama dengan fraksi betina yang bertahan
hidup sampai usia Sebuah kali fraksi betina 1 - μ ( Sebuah) Δ Sebuah yang tidak mati dalam interval kecil di sebelah waktu Δ Sebuah;
itu adalah,

l (a + Δ a) = l (a) ( 1 - μ ( Sebuah) Δ Sebuah),

atau
l (a + Δ Sebuah) - la)
= - μ ( a) l (a);
Δ Sebuah

dan sebagai Δ Sebuah → 0,

l '( a) = - μ ( a) l (a). (2.15)

Fungsi fi c angka kematian usia tertentu μ ( Sebuah) analog dengan usia-spesifik mater- fungsi nity m (a), dan kami
mendefinisikan fungsi kematian net fi c usia-spesifik g (a) =
μ ( a) l (a) dalam analogi fungsi bersalin net usia-spesifik f (a) = m (a) l (a). Tingkat kelahiran penduduk di seluruh B (t) ditentukan
dari f (a) menggunakan ( 2.12 ), Dan di ogy anal-, tingkat kematian penduduk di seluruh D (t) ditentukan dari g (a) menggunakan

∫∞
D (t) = (2.16)
0B (t - a) g (a) da,

di mana integran merupakan kontribusi terhadap tingkat kematian dari betina yang mati antara usia Sebuah dan a +
da. Oleh karena itu tingkat kematian per kapita

d (t) = D (t) / N (t)


∫∞
= ∫0∞B (t - a) g (a) da
0 B (t - a) l (a) da;

BAB 2. POPULASI USIA-TERSTRUKTUR 27


2.7. Merenung UKURAN A WORM hermafrodit

Gambar 2.4: Caenorhabditis elegans, cacing nematoda yang digunakan oleh ahli biologi sebagai hewan model sederhana
dari organisme multiseluler. (Foto oleh Amy Pasquinelli.)

dan perbedaan antara tingkat kelahiran dan kematian per kapita dihitung dari
∫∞
0B (t ∫- a) [f (a) - g (a)] da
b (t) - d (t) = ∞ . (2.17)
0B (t - a) l (a) da

Asimtotik, struktur usia yang stabil didirikan dan tingkat kelahiran penduduk di seluruh tumbuh sebagai B (t) ~ e r * t. Pergantian
ekspresi ini untuk B (t) ke ( 2,17 ) Dan pembatalan dari e r * t hasil di

∫∞
0[ f (a) - g (a)] e - r * Sebuah da
b-d= ∫∞
0l (a) e - r * Sebuah da

0 l '( a) e - r * Sebuah da
= 1 + ∫∫∞∞ .
0l (a) e - r * Sebuah da

dimana penggunaan yang telah dibuat dari ( 2.13 ) Dan ( 2.15 ). Menyederhanakan pembilang menggunakan integrasi dengan bagian,

∫∞ ∫∞

0+ r*
0 l '( a) e - r * Sebuah da = l (a) e - r * a | ∞ 0l (a) e - r * Sebuah da
∫∞
= -1+r*
0l (a) e - r * Sebuah da,

menghasilkan hasil yang diinginkan,


r * = b - d.

Hal ini biasanya seharusnya bahwa evolusi oleh seleksi alam akan menghasilkan ulations pop dengan nilai
terbesar dari parameter Malthus r *, dan bahwa seleksi alam akan memilih orang-orang perempuan yang merupakan
populasi tersebut. Kami akan mantan-ploit ide ini pada bagian berikutnya untuk menghitung ukuran merenung dari
cacing hermafrodit diri pemupukan dari spesies elegans Caenorhabditis.

2.7 Ukuran induk dari cacing hermafrodit

Caenorhabditis elegans, tanah-tinggal nematoda cacing sekitar 1 mm, adalah model organisme dipelajari secara
luas dalam biologi. Dengan tubuh yang terdiri dari sekitar

28 BAB 2. POPULASI USIA-TERSTRUKTUR


2.7. Merenung UKURAN A WORM hermafrodit

sperma- telur -

0 g g+s g+s+e

remaja - dewasa
-

Gambar 2.5: Sebuah disederhanakan waktu kehidupan hermaprodit ini.

1000 sel, itu adalah salah satu organisme multisel sederhana yang diteliti. Kemajuan dalam memahami
perkembangan organisme multisel ini menyebabkan pemberian Nobel hadiah dalam Fisiologi atau Kedokteran
tahun 2002 untuk tiga C. elegans ahli biologi Sydney Brenner, H. Robert Horvitz dan John E. Sulston.

Cacing C. elegans memiliki dua jenis kelamin: hermafrodit, yang pada dasarnya laki-laki fe- yang dapat
menghasilkan sperma internal dan self-membuahi telur mereka sendiri, dan laki-laki, yang harus kawin dengan
hermaprodit untuk menghasilkan keturunan. Di laboratorium membangun struktur cul, laki-laki yang langka dan cacing
umumnya menyebarkan dengan diri-fertilisasi. Biasanya, hermaprodit meletakkan sekitar 250-350 telur diri dibuahi
sebelum menjadi subur. Hal ini masuk akal untuk mengasumsikan bahwa kekuatan seleksi alam telah membentuk
sejarah hidup dari C. elegans, dan bahwa jumlah keturunan yang dihasilkan oleh sel fi ng dirinya-maphrodite harus
dalam arti optimal. Di sini, kita menunjukkan bagaimana model usia terstruktur diterapkan C. elegans menghasilkan
wawasan teoritis ke dalam ukuran induk dari hermaprodit sel fi ng.

Untuk mengembangkan model matematis untuk C. elegans, kita perlu tahu beberapa rincian sejarah hidupnya.
Sebagai pertama pendekatan (Barker, 1992), sebuah fi ed waktu menyederhanakan kehidupan hermaprodit ini yang
ditampilkan pada Gambar. 2,5 . telur yang telah dibuahi diletakkan pada waktu t = 0. Selama masa pertumbuhan remaja,
cacing dewasa berkembang melalui empat tahap larva (L1-L4). Menjelang akhir L4 dan untuk sementara waktu setelah nal
fi nya meranggas sampai dewasa, hermaprodit menghasilkan sperma, yang kemudian disimpan untuk digunakan nanti.
Kemudian hermaprodit menghasilkan telur, self-menyuburkan mereka menggunakan nya disimpan secara internal sperma,
dan meletakkan mereka. Dengan tidak adanya laki-laki, produksi telur berhenti setelah semua sperma dimanfaatkan. Kami
berasumsi bahwa masa pertumbuhan remaja terjadi selama 0 < t < g, spermatogenesis terjadi selama g <t <g + s, dan telur
produksi, pemupukan diri, dan peletakan telur terjadi selama g + s <t <g + s + e.

Di sini, kami ingin memahami mengapa hermafrodit membatasi sperma produc- tion mereka. Ahli biologi
mendefinisikan pria dan wanita dari ukuran dan biaya metabolisme gamet mereka: sperma kecil dan murah dan
telur besar dan mahal. Jadi pada pertama lihat, itu membingungkan mengapa jumlah keturunan yang dihasilkan
oleh Dite hermaphro- dibatasi oleh jumlah sperma yang dihasilkan, bukan dengan jumlah telur. Harus ada biaya
tersembunyi ke hermaprodit memproduksi sperma tambahan selain metabolik. Untuk memahami biologi dasar, itu
adalah pelajaran untuk mempertimbangkan dua kasus membatasi: (1) tidak ada produksi sperma; (2) dalam
produksi fi nite sperma. Dalam kedua kasus, hermaprodit yang tidak menghasilkan keturunan-dalam kasus pertama
karena tidak ada sperma, dan dalam kasus kedua karena tidak ada telur. Jumlah sperma yang dihasilkan oleh
hermaprodit sebelum bertelur karena itu merupakan mise compro-; meskipun lebih sperma berarti lebih banyak
keturunan, lebih banyak sperma juga berarti produksi telur yang tertunda.

asumsi teoritis utama kami adalah bahwa seleksi alam akan mendukung cacing dengan kemampuan untuk
membangun populasi dengan parameter Malthus terbesar r. Cacing yang mengandung mutasi genetik sehingga
menghasilkan nilai yang lebih besar untuk r akhirnya akan keluar-

BAB 2. POPULASI USIA-TERSTRUKTUR 29


2.7. Merenung UKURAN A WORM hermafrodit

g masa pertumbuhan 72 h
s masa produksi sperma 11,9 h
e Periode produksi telur 65 h
p Tingkat produksi sperma 24 h - 1
m Tingkat produksi telur 4.4 h - 1
B ukuran merenung 286

Tabel 2.3: Parameter dalam model kehidupan-sejarah C. elegans, dengan rekan esti- eksperimental.

Jumlah semua cacing lainnya.

Parameter yang kita butuhkan untuk model matematis kami tercantum pada Tabel 2.3 , ke-
gether dengan taksiran nilai eksperimental (Cutter, 2004). Selain masa pertumbuhan g, periode produksi sperma s, dan
periode produksi telur e ( semua dalam satuan jam), kita perlu tingkat produksi sperma p dan tingkat produksi telur m
( baik dalam satuan jam terbalik). Kami juga mendefinisikan ukuran merenung B sebagai jumlah total telur yang
dibuahi diletakkan oleh hermaprodit sel fi ng. Ukuran merenung sama dengan jumlah sperma yang dihasilkan, dan
juga sama dengan jumlah telur yang diletakkan, sehingga

B = ps = saya. (2.18)

Kami dapat menggunakan ( 2.18 ) untuk mengeliminasi s dan e mendukung B:

s = B / p, e = B / m. (2.19)

Terus menerus persamaan Euler-Lotka ( 2.13 ) untuk r membutuhkan model untuk f (a) = m (a) l (a), dimana m
(a) adalah fungsi fi c bersalin usia tertentu dan la) adalah fungsi kelangsungan hidup yang spesifik usia-. fungsi la) terpenuhi
es persamaan diferensial ( 2.15 ), Dan di sini kita membuat asumsi penyederhanaan bahwa usia-spesifik kematian
func- tion μ ( a) = d, dimana d adalah tingkat kematian per kapita usia independen. Secara implisit, kita
mengasumsikan bahwa cacing tidak mati tua selama bertelur, melainkan mati predasi, kelaparan, penyakit, atau
penyebab usia independen lain. Seperti asumsi yang wajar karena cacing bisa hidup di laboratorium selama
beberapa minggu setelah deplesi sperma. pemecahan ( 2.15 ) Dengan kondisi awal l ( 0) = 1 hasil di

l (a) = exp ( - d · Sebuah). (2.20)

Fungsi fi c bersalin usia-spesifik m (a) adalah didefinisikan sehingga m (a) Δ Sebuah adalah jumlah pected mantan dari
keturunan yang dihasilkan selama interval usia Δ Sebuah. Kami berasumsi bahwa hermaprodit bertelur pada tingkat yang konstan
m selama berabad-abad g + s <a <g + s + e;
karena itu,
{ m untuk g + s <a <g + s + e,

m (a) = (2.21)
0 sebaliknya.

menggunakan ( 2.19 ), ( 2.20 ) Dan ( 2.21 ), Persamaan Euler-Lotka terus menerus ( 2.13 ) Untuk
parameter Malthus r menjadi

∫g+B/p+B/m

m exp [ - ( r + d) a] da = 1. (2.22)
g+B/p

30 BAB 2. POPULASI USIA-TERSTRUKTUR


2.7. Merenung UKURAN A WORM hermafrodit

mengintegrasikan,

∫g+B/p+B/m

1= m exp [ - ( r + d) a] da
g+B/p
{ exp [ - ( g + B / p) (r + d)] - exp [ - ( g + B / p + B / m) (r + d)]}
=m
r+d

=m
r + d exp [ - ( g + B / p) (r + d)] { 1 - exp [ - ( B / m) (r + d)]},

yang dapat ditulis kembali sebagai

( r + d) exp [( g + B / p) (r + d)] = m { 1 - exp [ - ( B / m) (r + d)]}. (2.23)

Dengan parameter d, g, p, dan m fi xed, persamaan Euler-Lotka terpadu ( 2.23 ) Adalah persamaan implisit untuk r = r
(B).
Untuk menunjukkan bahwa r = r (B) memiliki maksimum pada beberapa nilai B, kami numerik memecahkan ( 2.23 ) untuk r dengan
nilai-nilai parameter g, p dan m diperoleh dari Tabel 2.3 . Sejak r + d adalah maksimum pada nilai yang sama dari B bahwa r maksimum,
dan d hanya masuk ( 2.23 ) Dalam bentuk r + d, tanpa kehilangan umum kita dapat mengambil d = 0. Untuk mengatasi ( 2.23 ), Yang
terbaik adalah untuk menggunakan metode Newton. kita membiarkan

F (r) = (r + d) exp [( g + B / p) (r + d)] - m { 1 - exp [ - ( B / m) (r + d)]},

dan membedakan sehubungan dengan r untuk memperoleh

F '( r) = [ 1 + ( g + B / p) (r + d)] exp [( g + B / p) (r + d)] - B


m exp [ - ( B / m) (r + d)].

Untuk diberikan B, kita kemudian memecahkan F (r) = 0 dengan iterasi

r n + 1 = r n - F (r n)
F '( r n) .

Menggunakan nilai awal yang tepat untuk r, fungsi r = r (B) dapat dihitung dan disajikan pada Gambar. 2.6 . Ternyata, r
adalah maksimum dekat nilai B = 152, yang merupakan 53% dari nilai eksperimental untuk B ditunjukkan pada
Tabel 2.3 . Kami juga dapat langsung menentukan persamaan tunggal untuk nilai B di mana r

maksimum. Kami secara implisit membedakan ( 2.23 ) dengan hormat B -dengan r satu-satunya parameter yang
tergantung pada B -dan menerapkan kondisi dr / dB = 0. Kami fi nd

( r + d) exp [( g + B / p) (r + d)] = p exp [ - ( B / m) (r + d)]. (2.24)

Mengambil rasio ( 2.23 ) Ke ( 2.24 ) Hasil di


{ exp [( B / m) (r + d)] - 1},
1=m (2.25)
p

dari mana kita dapat fi nd


r+d=m (2.26)
B ln (1 + sore).
mengganti ( 2,26 ) Kembali menjadi baik ( 2.23 ) atau ( 2.24 ) Hasil di

(1+p ) mp + mg )
m B ln (1 + p
= sore (2,27)
B m m p + m.

BAB 2. POPULASI USIA-TERSTRUKTUR 31


2.7. Merenung UKURAN A WORM hermafrodit

0,056

0,054

0,052

0,05
r

0.048

0,046

0,044

0,042
0 100 200 300 400 500

Gambar 2.6: Sebuah plot r = r (B), yang menunjukkan bahwa tingkat pertumbuhan Malthus
r maksimum dekat ukuran induk dari B = 152.

persamaan ( 2,27 ) Berisi empat parameter p, m, g dan B, yang dapat lebih dikurangi menjadi tiga parameter dengan
analisis dimensi. Ukuran merenung B sudah berdimensi. tingkat m di mana telur diproduksi dan meletakkan dapat
dikalikan dengan masa produksi sperma s = B / p untuk membentuk parameter berdimensi

x = mB / p. parameter x mewakili jumlah telur yang harus dilepaskan karena masa produksi sperma dewasa dan
merupakan ukuran dari biaya produksi sperma. Demikian pula, m dapat dikalikan dengan masa pertumbuhan larva g
untuk membentuk parameter sionless dimen- y = mg. parameter y mewakili jumlah telur yang harus dilepaskan
karena masa pertumbuhan remaja dan merupakan ukuran dari biaya pembangunan. Dengan B, x dan y tiga
parameter berdimensi kami, ( 2,27 ) menjadi

( ) x+yB ln ( )
1
1+B 1+B =B (2,28)
B x x B + x.

Mengingat nilai-nilai untuk dua dari tiga parameter berdimensi x, y dan B, ( 2,28 ) Dapat diselesaikan untuk
parameter yang tersisa, baik secara eksplisit untuk kasus y = y (x, B),
atau dengan metode Newton.
Nilai-nilai x dan y diperoleh dari Tabel 2.3 adalah x = 52,5 dan y = 317. Dengan
B = 286, solusinya y = y (x) ditunjukkan pada Gambar. 2.7 , Dengan nilai eksperimental
( x, y) diplot sebagai salib.
Ketidaksepakatan yang tampaknya besar antara hasil teoritis dan data eksperimental membawa kita untuk
mempertanyakan asumsi yang mendasari model. Memang, Cutter (2004) pertama menyarankan bahwa sperma
yang dihasilkan precociously sebagai remaja tidak menunda produksi telur dan harus dianggap bebas biaya. Salah
satu kemungkinan adalah untuk fi x jumlah mutlak sperma yang dihasilkan precociously dan untuk mengoptimalkan
jumlah sperma yang dihasilkan sebagai orang dewasa. Kemungkinan lain adalah untuk fi x fraksi sperma yang
dihasilkan precociously dan untuk mengoptimalkan jumlah sperma yang dihasilkan. Asumsi terakhir ini dibuat oleh
Cutter (2004) dan tampaknya meningkatkan terbaik perjanjian model dengan data eksperimen.

Oleh karena itu kami membagi periode produksi sperma Total s menjadi remaja dan dewasa

32 BAB 2. POPULASI USIA-TERSTRUKTUR


2.7. Merenung UKURAN A WORM hermafrodit

700

600

200
y

500

400

300

5 10 15 20 25 30 35 40 45 50 55
x

Gambar 2.7: Kurva Solusi y melawan x dengan ukuran merenung B = 286, diperoleh dengan memecahkan ( 2,28 ). Nilai
untuk B, m, p dan g diambil dari Tabel 2.3 . salib, melintasi lingkaran terbuka dan lingkaran terbuka sesuai dengan y = mg dan
x = mf B / p dengan f = 1, 1/3 dan 1/8, masing-masing.

sperma- telur -

0 g- sJ g g + s SEBUAH g+sA+e

remaja - dewasa
-

Gambar 2.8: Sebuah lebih re fi waktu ned kehidupan hermaprodit ini.

BAB 2. POPULASI USIA-TERSTRUKTUR 33


2.7. Merenung UKURAN A WORM hermafrodit

periode produksi sperma s J dan s SEBUAH, dengan s = s j + s SEBUAH. timeline direvisi dari kehidupan hermaprodit ini sekarang
ditunjukkan pada Gambar. 2.8 . Dengan fraksi sperma yang dihasilkan sebagai orang dewasa dilambangkan dengan f, dan fraksi yang
dihasilkan sebagai remaja dengan 1 - f, kita punya

s J = ( 1 - f) s, s A = f s. (2,29)

persamaan ( 2.21 ) Untuk fungsi fi c bersalin usia tertentu menjadi


{ m untuk g + s A < a <g + s A + e,

m (a) = (2.30)
0 sebaliknya.

dengan

s A = f B / p. (2.31)

Persamaan Euler-Lotka ( 2.22 ) Kemudian diubah dengan substitusi p → p / f. berikut adalah penyandang substitusi ini untuk
hasil fi nal diberikan oleh ( 2,28 ) Menunjukkan bahwa persamaan ini masih memegang (yang sebenarnya setara dengan
(12) dari Chasnov (2011)), tetapi dengan sekarang berubah definisi

x = mf B / p. (2,32)

Sebuah pemeriksaan dekat hasil yang ditunjukkan pada Gambar. 2.7 menunjukkan bahwa kesepakatan yang hampir
sempurna dapat dibuat antara hasil model teoritis dan data perimental mantan jika f = 1/8 (ditampilkan sebagai lingkaran
terbuka pada Gambar. 2.7 ). Cutter (2004) menyarankan nilai f = 1/3, dan hasil ini ditampilkan sebagai lingkaran terbuka
melintas di Gambar. 2.7 , Masih dalam kesepakatan yang jauh lebih baik dengan data eksperimen dari lingkaran terbuka
sesuai dengan f = 1. Pemodelan tambahan produksi sperma menjadi dewasa sebelum waktunya melalui parameter f sehingga
tampaknya meningkatkan verisimilitude dari model untuk biologi yang mendasari.

Referensi

Barker, DM Evolusi kekurangan sperma dalam hermaprodit sel fi ng. Evolusi


(1992) 46, 1951-1955.

Cutter, AD fekunditas di nematoda Sperma terbatas: howmany sperma cukup?


Evolusi ( 2004) 58, 651-655.

Chasnov, JR Evolusi peningkatan produksi sendiri sperma pada nematoda roditic postdauer hermaph-. Evolusi ( 2011)
65, 2117-2122.

Hodgkin, J. & Barnes, TM Lebih tidak lebih baik: ukuran induk dan pertumbuhan penduduk di nematoda diri
pemupukan. Proc. R. Soc. Lond. B. ( 1991) 246, 19-24.

Charlesworth, B. Evolusi pada populasi usia terstruktur. (1980) Cambridge Uni- hayati Press.

34 BAB 2. POPULASI USIA-TERSTRUKTUR


bagian 3

Stochastic Modeling
Pertumbuhan Penduduk
derivasi kami model pertumbuhan Malthus implisit diasumsikan ukuran modulasi pop besar. populasi yang lebih kecil
menunjukkan efek stokastik dan ini dapat mempertimbangkan penggunaan cakap mempersulit modeling. Karena pada
umumnya, pemodelan proses stokastik dalam biologi adalah belum dif topik fi kultus penting, kita akan menghabiskan
beberapa waktu di sini menganalisis model sederhana dari kelahiran pada populasi berhingga.

3.1 Sebuah model stokastik pertumbuhan penduduk

Ukuran populasi N sekarang dianggap sebagai variabel acak diskrit. Kami mendefinisikan fungsi massa probabilitas
tergantung waktu p N ( t) dari N menjadi kemampuan masalah.Safe_mode bahwa populasi adalah ukuran N pada waktu t.
Sejak N harus mengambil salah satu nilai dari nol sampai di fi nity, kita memiliki

Σ p N ( t) = 1,
N=0

untuk semua t ≥ 0. Sekali lagi, biarkan b menjadi rata-rata tingkat kelahiran per kapita. Kami membuat perkiraan
menyederhanakan bahwa semua kelahiran adalah singlet, dan bahwa probabilitas suatu pemberian lahir individu adalah
independen dari sejarah melahirkan masa lalu. Kita kemudian dapat menafsirkan b probabilistically dengan mengandaikan
bahwa sebagai Δ t → 0, probabilitas bahwa seorang individu melahirkan selama ini Δ t diberikan oleh b Δ t. Sebagai contoh, jika
rata-rata per kapita tingkat kelahiran adalah salah satu anak setiap 365 hari setahun, maka probabilitas bahwa individu
tertentu melahirkan pada hari tertentu adalah 1/365. Karena kami akan mempertimbangkan batas sebagai Δ t → 0, kita
mengabaikan probabilitas lebih dari satu kelahiran dalam populasi di interval waktu Δ t karena mereka adalah pesanan ( Δ t) 2 atau
lebih tinggi. Selain itu, kami akan menganggap bahwa di t = 0, ukuran populasi diketahui N 0, yang seperti itu p N 0 ( 0) = 1, dengan
semua lainnya p N 'duduk t = 0 sama dengan nol.

Kita bisa menentukan sistem persamaan diferensial untuk fungsi massa probabilitas p N ( t) sebagai berikut.
Untuk populasi menjadi ukuran N> 0 pada suatu waktu t + Δ t,
baik itu dari ukuran N - 1 pada waktu t dan satu kelahiran terjadi, atau itu ukuran N pada waktu t dan tidak ada
kelahiran; itu adalah

p N ( t + Δ t) = p N - 1 ( t) b (N - 1) Δ t + p N ( t) ( 1 - BN Δ t).

mengurangkan p N ( t) dari kedua belah pihak, membaginya dengan Δ t, dan mengambil batas Δ t → 0 hasil dalam maju
Kolmogorov persamaan diferensial,

dp N ], N = 1, 2,. . . .
(3.1)
dt = b [(N - 1) p N - 1 - Np N

dimana p 0 ( t) = p 0 ( 0) karena populasi ukuran nol tetap nol. Sistem ini digabungkan, pertama-order, persamaan
diferensial linear dapat diselesaikan iteratif.

35
3.1. MODEL STOKASTIK PENDUDUK PERTUMBUHAN

Kami pertama-tama meninjau bagaimana memecahkan pertama-order persamaan diferensial linear dalam bentuk

dy dt + ay = g (t),
y ( 0) = y 0, (3.2)

dimana y = y (t) dan Sebuah adalah konstan. Pertama, kita mencari faktor integrasi μ seperti yang

( dy )
d dt ( μ y) = μ
.
dt + ay

Membedakan kiri-sisi dan mengalikan hasil-tangan-kanan di


dt y + μ dy dt = μ dy dt + a μ y;

dan membatalkan istilah hasil



dt = a μ.
Kami dapat mengintegrasikan persamaan ini dengan kondisi awal yang sewenang-wenang, sehingga untuk kesederhanaan kita ambil μ ( 0) = 1. Oleh

karena itu, μ ( t) = e di. Karenanya,

d ( e di y) = e di g (t).

dt

Mengintegrasikan persamaan ini dari 0 sampai hasil t

∫t
e di y (t) - y ( 0) =
0 e sebagai g (s) ds.

Oleh karena itu, solusinya adalah

( ∫t )
y (t) = e - di y ( 0) + . (3.3)
0 e sebagai g (s) ds

Persamaan maju Kolmogorov diferensial ( 3.1 ) Adalah dalam bentuk ( 3.2 ) dengan
a = Bn dan g (t) = b (N - 1) p N - 1. Dengan ukuran populasi diketahui N 0 di
t = 0, kondisi awal dapat ditulis seringkas p N ( 0) = δ N, N 0, dimana δ aku j adalah Kronecker delta, didefinisikan sebagai

{ 0, jika saya 6 = j;

δ ij =
1, jika i = j.

Oleh karena itu, integrasi formal ( 3.1 ) Menggunakan ( 3.3 ) Hasil di


[ ∫t ]
p N ( t) = e - BNT δ N, N 0 + b (N - 1) . (3.4)
0 e bNs p N - 1 ( s) ds

Yang pertama beberapa solusi dari ( 3.4 ) Sekarang dapat diperoleh dengan integrasi berturut-turut:


•• 0, jika N <N 0;
••
•• e - BN 0 t, jika N = N 0;

p N ( t) = N 0 e - BN 0 t [ 1 - e - bt], jika N = N 0 + 1;
••
•• 12 N 0 ( N 0 + 1) e - BN 0 t [ 1 - e - bt] 2,
jika N = N 0 + 2;
••

.... jika. . . .

36 BAB 3. STOKASTIK PERTUMBUHAN PENDUDUK


3.1. MODEL STOKASTIK PENDUDUK PERTUMBUHAN

Meskipun kita tidak perlu ini, untuk kelengkapan saya memberikan solusi lengkap. Dengan mendefinisikan binomial
koefisien sebagai jumlah cara yang bisa memilih objek k dari satu set n objek yang identik, di mana urutan seleksi
tidak material, kita memiliki
( nk )
n! k! (n - k)!
= .

(Dibaca sebagai “ n memilih k “). Solusi umum untuk p N ( t), N ≥ N 0, diketahui


(N-1 )
p N ( t) = e - BN 0 t [ 1 - e - bt] N - N 0,
N0- 1

yang ahli statistik panggilan bergeser distribusi binomial negatif. Penentuan waktu-evolusi fungsi massa probabilitas N
benar-benar memecahkan masalah stokastik ini.

Yang menarik utama yang biasa adalah mean dan varians dari ukuran populasi, dan meskipun kedua pada
prinsipnya bisa dihitung dari fungsi massa probabilitas, kita akan menghitung mereka langsung dari persamaan
diferensial untuk p N. The definisi fi de dari ukuran populasi rata-rata < N > dan varians nya σ 2 adalah

∞ ∞ ( N - < N >) 2 p N,
<N>= Σ Np N, σ 2 = Σ (3.5)
N=0 N=0

dan kami akan memanfaatkan kesetaraan

σ 2 = < N 2 > - < N > 2. (3.6)

Mengalikan persamaan diferensial ( 3.1 ) Oleh konstan N, menjumlahkan N, dan menggunakan p N = 0 untuk N <N 0, kita
memperoleh
[∞ ]
d<N>

dt = b
Σ N (N - 1) p N - 1 - ∞ Σ N2pN .
N = N 0+ 1 N = N0

Sekarang Tulis N (N - 1) = ( N - 1) ( N - 1 + 1) = ( N - 1) 2 + ( N - 1), sehingga istilah pertama di tangan-kanan adalah

∞ ∞ ∞

Σ N (N - 1) p N - 1 = Σ ( N - 1) 2 p N - 1 + Σ ( N - 1) p N - 1
N = N 0+ 1 N = N 0+ 1 N = N 0+ 1
∞ ∞
= Σ N2pN+ Σ Np N,
N = N0 N = N0

di mana kesetaraan kedua diperoleh dengan menggeser bangsal indeks penjumlahan down per satu. Oleh karena itu,
kita menemukan persamaan pertumbuhan Malthus familiar

d<N>

dt = b < N >.

Bersama-sama dengan kondisi awal < N > ( 0) = N 0, kita dapat fi nd solusi

< N > ( t) = N 0 e bt. (3.7)

Kami melanjutkan mirip dengan fi nd σ 2 oleh pertama menentukan persamaan diferensial untuk
< N 2 >. Mengalikan persamaan diferensial untuk p N, ( 3.1 ), oleh N 2 dan menjumlahkan
N hasil di
[∞ ]
d <N2 >
dt = b
Σ N 2 ( N - 1) p N - 1 - ∞ Σ N3pN .
N = N 0+ 1 N = N0

BAB 3. STOKASTIK PERTUMBUHAN PENDUDUK 37


3.2. Asymptotics OF POPULASI AWAL BESAR

Di sini, kita menulis N 2 ( N - 1) = ( N - 1) ( N - 1 + 1) 2 = ( N - 1) 3 + 2 ( N - 1) 2 + ( N - 1). Melanjutkan dengan cara yang sama seperti di
atas dengan menggeser indeks ke bawah, kita memperoleh

d <N2 >
(3.8)
dt - 2 b < N 2 > = b < N >.

Sejak < N > dikenal, ( 3.8 ) Adalah pertama-order, linear, persamaan homogen untuk < N 2 >,
yang dapat diselesaikan dengan menggunakan faktor integrasi. Solusinya diperoleh dengan menggunakan ( 3.3 ) aku s

( ∫t )
< N 2 > = e 2 bt N 20 + b .
0 e - 2 bs < N > ( s) ds

dengan < N > diberikan oleh ( 3.7 ). Pertunjukan integrasi, kita memperoleh

].
< N 2 > = e 2 bt [ N 2 0 + N 0 ( 1 - e - bt)

Akhirnya, dengan menggunakan σ 2 = < N 2 > - < N > 2, kita memperoleh varians. Dengan demikian kita sampai pada hasil nal fi kami untuk
populasi mean dan varians:

< N > = N 0 e bt, σ 2 = N 0 e 2 bt ( 1 - e - bt) . (3.9)

The koefisien variasi c v mengukur deviasi standar relatif terhadap mean, dan di sini diberikan oleh

cv=σ/ <N >


√ 1 - e - bt

= .
N0

untuk besar t, yang koefisien variasi karena itu berjalan seperti 1 / √ N 0, dan kecil ketika N 0 besar. Pada bagian
berikutnya, kita akan menentukan bentuk membatasi distribusi probabilitas untuk besar N 0, memulihkan kedua model
deterministik dan model pendekatan Gaussian.

3.2 asymptotics populasi awal yang besar

Tujuan kami di sini adalah untuk memecahkan perluasan distribusi dalam kekuatan dari 1 / N 0 untuk
memimpin-perintah; melihat bahwa 1 / N 0 kecil jika N 0 besar. Untuk Zeroth-order, yang ada di batas N 0 → ∞, kami
akan menunjukkan bahwa model deterministik pertumbuhan penduduk pulih. Untuk pertama-tama-order dalam 1 / N 0,
kami akan menunjukkan bahwa distribusi probabilitas normal. Hasil yang terakhir akan terlihat menjadi konsekuensi
dari terkenal Teorema Limit Sentral dalam teori probabilitas.

Kami mengembangkan ekspansi kami dengan bekerja secara langsung dengan persamaan diferensial untuk

p N ( t). Sekarang, ketika ukuran populasi N adalah variabel acak diskrit (mengambil hanya nilai-nilai non negatif
integer 0, 1, 2,...), p N ( t) adalah fungsi massa probabilitas untuk N. Jika N 0 besar, maka sifat diskrit N adalah ngawur,
dan adalah lebih baik untuk bekerja dengan variabel acak kontinu dan fungsi kepadatan probabilitas. Dengan
demikian, kita mendefinisikan variabel random x = N / N 0, dan mengobati x sebagai variabel acak kontinu, dengan 0 ≤
x < ∞. Sekarang, p N ( t) adalah probabilitas bahwa populasi adalah ukuran N pada waktu t, dan fungsi kepadatan
probabilitas x, P (x, t),

adalah didefinisikan sehingga ∫ b


Sebuah P (x, t) dx adalah probabilitas bahwa Sebuah ≤ x ≤ b. Hubungan

38 BAB 3. STOKASTIK PERTUMBUHAN PENDUDUK


3.2. Asymptotics OF POPULASI AWAL BESAR

antara p dan P dapat ditentukan dengan mempertimbangkan bagaimana untuk mendekati distribusi probabilitas diskrit
dengan distribusi yang kontinu, yaitu dengan mendefinisikan P seperti yang

∫ ( N + 12) / N 0
p N ( t) = P (x, t) dx
( N - 12) / N 0

= P (N / N 0, t) / N 0

di mana kesetaraan terakhir menjadi persis seperti N 0 → ∞. Oleh karena itu, sesuai definisi untuk P (x, t) diberikan
oleh

P (x, t) = N 0 p N ( t), x = N / N 0, (3.10)

yang terpenuhi es
∫∞ ∞

Σ P (N / N 0, t) ( 1 / N 0)
0P (x, t) dx = N=0

= Σ p N ( t)
N=0

= 1,

yang pertama kesetaraan (tepat hanya jika N 0 → ∞) menjadi Reimann jumlah perkiraan integral.

Kita sekarang mengubah di fi nite set persamaan diferensial biasa ( 3.1 ) untuk
p N ( t) menjadi persamaan diferensial parsial tunggal untuk P (x, t). Kita kalikan ( 3.1 ) oleh N 0
dan pengganti N = N 0 x, p N ( t) = P (x, t) / N 0, dan p N - 1 ( t) = P (x - 1
N 0, t) / N 0 untuk
mendapatkan [ ]
∂ P (x, t)
( N 0 x - 1) P (x - 1 . (3.11)
∂t=b N 0, t) - N 0 xP (x, t)

Kami seri Taylor selanjutnya memperluas P (x - 1 / N 0, t) sekitar x, mengobati 1 / N 0 sebagai parameter kecil. Artinya, kita
menggunakan

P (x - 1 P xx ( x, t) -. . .
N 0, t) = P (x, t) - 1 N 0 P x ( x, t) + 1 2 N 20

( - 1) n ∂nP
= Σ n! N n0 ∂ x n.
n=0

Dua istilah terkemuka sebanding dengan N 0 di tangan-kanan ( 3.11 ) Membatalkan persis, dan jika kita kelompok
istilah yang tersisa dalam kekuatan dari 1 / N 0, kita memperoleh untuk pertama tiga istilah terkemuka di ekspansi

[ ( xP xx ) ( xP xxx ) ]
1
Pt= -b ( xP x + P) - 1 + -. . .
N0 2! + P x 1! N 20 3! + P xx 2!
[ ] (3.12)

=-b ( xP) x - 1 ;
N 0 2! ( xP) xx + 1 N 20 3! ( xP) xxx -. . .

dan istilah tingkat tinggi dapat diperoleh dengan mengikuti pola jelas.
persamaan ( 3.12 ) Dapat dianalisa lebih lanjut oleh ekspansi gangguan dari
fungsi kepadatan probabilitas dalam kekuatan dari 1 / N 0:

P (x, t) = P ( 0) ( x, t) + 1 P ( 2) ( x, t) + . . . . (3.13)
N 0 P ( 1) ( x, t) + 1 N 20

BAB 3. STOKASTIK PERTUMBUHAN PENDUDUK 39


3.2. Asymptotics OF POPULASI AWAL BESAR

fungsi di sini, yang tidak diketahui P ( 0) ( x, t), P ( 1) ( x, t), P ( 2) ( x, t), dll untuk deter- ditambang dengan menggantikan
ekspansi ( 3.13 ) Ke ( 3.12 ) Dan menyamakan koefisien fi koe kekuasaan dari 1 / N 0. sehingga kita memperoleh untuk
koefisien koefisien dari (1 / N 0) 0 dan 1/ N 0) 1,

P t(=0)- b (xP ( 0)) (3.14)


x,
[( xP ( 1)) ( xP ( 0)) ]
P t(=1)- b . (3.15)
x- 1 2 xx

3.2.1 Penurunan model deterministik

The zeroth-order istilah dalam ekspansi gangguan ( 3.13 ),

P ( 0) ( x, t) = lim
N0 → ∞ P (x, t),

es terpenuhi ( 3.14 ). persamaan ( 3.14 ) Adalah linear persamaan diferensial parsial pertama-order dengan variabel koefisien. Salah satu
cara untuk memecahkan persamaan ini adalah untuk mencoba ansatz yang

P ( 0) ( x, t) = h (t) f (r), r = r (x, t), (3.16)

bersama-sama dengan kondisi awal

P ( 0) ( x, 0) = f (x),

karena pada t = 0, distribusi probabilitas diasumsikan fungsi yang dikenal. Kondisi awal ini membebankan kendala
lanjut berguna pada fungsi h (t) dan r (x, t):

h ( 0) = 1, r (x, 0) = x.

Derivatif parsial ( 3.16 ) adalah

P t(=0)h ' f + hr t f ', P ( 0) x= hr x f ',

yang setelah substitusi ke ( 3.14 ) Hasil di


( h '+ bh) f + (r t + BXR x) HF '= 0.

Persamaan ini dapat puas untuk setiap f asalkan h (t) dan r (x, t) memuaskan

h '+ bh = 0, r t + BXR x = 0. (3.17)

Persamaan pertama untuk h (t), bersama-sama dengan kondisi awal h ( 0) = 1, mudah


dipecahkan untuk menghasilkan

h (t) = e - bt. (3.18)

Untuk menentukan solusi untuk r (x, t), kami mencoba teknik pemisahan variabel. Kami menulis r (x, t) = X (x) T (t), dan
pada substitusi ke dalam persamaan diferensial untuk r (x, t), kita memperoleh

XT '+ BXX ' = T 0;

dan pembagian dengan XT dan pemisahan hasil

T'
(3.19)
= T - BXX ' X.

40 BAB 3. STOKASTIK PERTUMBUHAN PENDUDUK


3.2. Asymptotics OF POPULASI AWAL BESAR

Sejak kiri-sisi independen dari x, dan tangan-kanan independen dari t, kedua kiri-sisi dan tangan-kanan harus
konstan, independen dari kedua x dan t. Sekarang, kondisi awal kami adalah r (x, 0) = x, yang seperti itu X (x) T ( 0) = x.

Tanpa kehilangan umum, kita dapat mengambil T ( 0) = 1, sehingga X (x) = x. Hak-tangan-sisi ( 3.19 ) Oleh karena itu
sama dengan konstan - b, dan kita memperoleh persamaan diferensial dan kondisi awal

T '+ bT = 0, T ( 0) = 1,

yang kita dapat memecahkan untuk menghasilkan T (t) = e - bt. Oleh karena itu, solusi kami untuk r (x, t) aku s

r (x, t) = xe - bt. (3.20)

Dengan meletakkan solusi kami ( 3.18 ) Dan ( 3.20 ) Bersama-sama ke ansatz kami ( 3.16 ), Kami telah memperoleh solusi
umum untuk pde yang:

P ( 0) ( x, t) = e - bt f (xe - bt).

Untuk menentukan f, kita menerapkan kondisi awal dari fungsi massa probabilitas,
p N ( 0) = δ N, N 0. dari ( 3.10 ), Yang sesuai kondisi awal pada fungsi distribusi probabilitas adalah

{N0
jika 1 -
2 N0≤ x≤ 1+1
1
2 N 0,
P (x, 0) =
0 jika tidak.

Dalam batas N 0 → ∞, P (x, 0) → P ( 0) ( x, 0) = δ ( x - 1), di mana δ ( x - 1) adalah Dirac delta-fungsi, berpusat di sekitar 1.
delta-fungsi secara luas digunakan dalam fisika kuantum dan diperkenalkan oleh Dirac untuk tujuan itu. Sekarang fi
nds banyak kegunaan dalam matematika terapan. Hal ini dapat didefinisikan dengan mensyaratkan bahwa, untuk
setiap fungsi g (x),

∫+∞

-∞g (x) δ ( x) dx = g ( 0).

Biasa pandangan delta-fungsi δ ( x - Sebuah) adalah bahwa itu adalah nol mana-mana kecuali di
x = a di mana itu adalah berhingga, dan integral adalah satu. Hal ini tidak benar-benar fungsi, tapi itu adalah apa yang
ahli matematika distribusi.
Sekarang, karena P ( 0) ( x, 0) = f (x) = δ ( x - 1), solusi kami menjadi

P ( 0) ( x, t) = e - bt δ ( xe - bt - 1). (3.21)

Hal ini dapat ditulis ulang dengan mencatat bahwa (membiarkan y = ax - c),

∫+∞ ∫+∞

-∞g (x) δ ( kapak - c) dx = 1 -∞g


Sebuah ((y + c) / a) δ ( y) dy

=1
ag (c / a),

menghasilkan identitas

δ ( kapak - c) = 1
Sebuah δ ( x -Sebuah
c ).

Dari ini, kita dapat menulis ulang solusi kami ( 3.21 ) Dalam bentuk yang lebih intuitif

P ( 0) ( x, t) = δ ( x - e bt). (3.22)

BAB 3. STOKASTIK PERTUMBUHAN PENDUDUK 41


3.2. Asymptotics OF POPULASI AWAL BESAR

menggunakan ( 3.22 ), Nilai ke nol-order diharapkan dari x aku s

∫∞
<x0 > =
0 xP ( 0) ( x, t) dx
∫∞
=
0x δ ( x - e bt) dx
= e bt;

sedangkan zeroth-order varians adalah

σ x20 = < x 2 0 > - <x0 >2


∫∞
=
0x2 P ( 0) ( x, t) dx - e 2 bt
∫∞
=
0x2 δ ( x - e bt) dx - e 2 bt
= e 2 bt - e 2 bt
= 0.

Dengan demikian, di dalam fi batas populasi nite, variabel random x memiliki nol varians, dan karena itu tidak lagi
acak, tapi berikut x = e bt deterministik. Kami mengatakan bahwa distribusi probabilitas x menjadi tajam dalam batas
ukuran populasi yang besar. Prinsip umum dari pemodelan populasi besar deterministik dapat menyederhanakan
model matematik ketika efek stokastik tidak penting.

3.2.2 Penurunan distribusi probabilitas normal

Kita sekarang mempertimbangkan istilah pertama-order dalam ekspansi gangguan ( 3.13 ), Yang terpenuhi es ( 3.15 ).
Kita tidak tahu bagaimana memecahkan ( 3.15 ) Secara langsung, jadi kami akan mencoba untuk fi nd solusi berikut rute
yang lebih memutar. Pertama, kita lanjutkan dengan menghitung momen dari distribusi probabilitas. Kita punya

∫∞
<xn > =
0xn P (x, t) dx
∫∞ ∫∞
=
0xn P ( 0) ( x, t) dx + 1 N0 0xn P ( 1) ( x, t) dx + . . .

= < x n0 > + 1 1> +....


N0<xn

di mana kesetaraan terakhir mendefinisikan < x n


0 >, dll Sekarang, dengan menggunakan ( 3.22 ),

∫∞
< x0n> =
0xn P ( 0) ( x, t) dx
∫∞
(3.23)
=
0xn δ ( x - e bt) dx
= e NBT.

Untuk menentukan < x n


1 >, kita gunakan ( 3.15 ). mengalikan dengan x n dan mengintegrasikan, kita memiliki

[∫ ∞ ∫∞ ]
d < x1n>
. (3.24)
dt = - b 0xn( xP ( 1)) x dx -1 2 0xn( xP ( 0)) xx dx

42 BAB 3. STOKASTIK PERTUMBUHAN PENDUDUK


3.2. Asymptotics OF POPULASI AWAL BESAR

Kami mengintegrasikan oleh bagian-bagian untuk menghapus turunan dari xP, berasumsi bahwa xP dan semua turunannya
lenyap pada batas-batas integrasi, di mana x adalah sama dengan nol atau fi nity. Kita punya

∫∞

0xn( xP ( 1)) x dx =-∫∞ 0 nx n P ( 1) dx


= - n < x n 1 >,

dan

∫∞ ∫∞
1

2 0xn( xP ( 0)) xx dx =-n 2 0 x n - 1 ( xP ( 0)) x dx


∫∞
= n (n - 1) 2
0xn-1 P ( 0) dx

= n (n - 1) 2 < x0n>.- 1

Oleh karena itu, setelah integrasi dengan bagian, ( 3.24 ) menjadi

[
d < x1n>
n < x1n> + n (n - 1) 2 < x0n>]- 1 . (3.25)
dt = - b

persamaan ( 3,25 ) Adalah linear persamaan diferensial homogen pertama-order dan dapat diselesaikan dengan
menggunakan faktor integrasi (lihat ( 3.3 ) Dan pembahasan sebelumnya). Solv- ing persamaan diferensial ini
menggunakan ( 3.23 ) Dan kondisi awal < x n
1> ( 0) = 0, kita
mendapatkan

< x1n> = n (n - 1) 2 e NBT ( 1 - e - bt) . (3.26)

Fungsi distribusi probabilitas, akurat untuk memesan 1 / N 0, dapat diperoleh dengan menggunakan apa yang disebut
fungsi pembangkit momen Ψ ( s), didefinisikan sebagai

Ψ ( s) = < e sx >

= 1+s <x > +s2


2! < x 2 > + s 33! < x 3 > + . . .

sn <xn >
= Σ n!
.
n=0

Untuk memesan 1 / N 0, kita punya

∞ ∞
s n < x0n> s n < x1n>
Ψ ( s) = Σ n! + 1 N0
Σ n! + O (1 / N 2
0). (3.27)
n=0 n=0

Sekarang, dengan menggunakan ( 3.23 ),

( se bt) n
∞ ∞
s n < x0n>
Σ n! =
Σ n!
n=0 n=0

= e se bt,

CHAPTER 3. STOCHASTIC POPULATION GROWTH 43


3.2. ASYMPTOTICS OF LARGE INITIAL POPULATIONS

and using ( 3.26 ),

∞ ( 1 − e − bt) ∞ ( se bt)n
sn 〈xn
1〉
1
∑ n! = 1 2
∑ n! n(n − 1)
n= 0 n= 0
( 1 − e − bt) s 2 e 2 bt ∞ ( se bt)n − 2
1
=1
2
∑ n! n(n − 1)
n= 0
( 1 − e − bt) s 2 ∞ ( se bt)n
1 ∂2
=1
2
∑ n! ∂ s 2
n= 0
( se bt)n
( 1 − e − bt) s 2 ∂ 2 ∞
=1
2 ∂s2
∑ n!
n= 0
( 1 − e − bt) s 2 ∂ 2 ( e se bt )
=1
2 ∂s2
( 1 − e − bt) s 2 e 2 bt e se bt .
=1
2

Therefore, )
( 1 − e − bt) s 2 e 2 bt + . . .
Ψ( s) = e se bt ( 1+1 . (3.28)
2N0

We can recognize the parenthetical term of ( 3.28 ) as a Taylor-series expansion of an exponential function truncated
to first-order, i.e.,
( 1 2 N 0 ( 1 − e − bt) s 2 e 2 bt ) ( 1 − e − bt) s 2 e 2 bt + O(1/ N 2
exp =1+1 0 ).
2N0

Therefore, to first-order in 1/ N 0, we have


( ( 1 − e − bt) s 2 e 2 bt )
Ψ( s) = exp se bt + 1 + O(1/ N 2 0 ). (3.29)
2N0

Standard books on probability theory (e.g., A first course in probability by Sheldon Ross, pg. 365) detail the
derivation of the moment generating function of a normal random variable:

(〈x〉s+1 )
Ψ( s) = exp , for a normal random variable; (3.30)
2σ2s2

and comparing ( 3.30 ) with ( 3.29 ) shows us that the probability distribution P(x, t)
to first-order in 1/ N 0 is normal with the mean and variance given by

〈 x 〉 = e bt, σ 2 x= 1 (3.31)
N 0 e 2 bt ( 1 − e − bt) .

The mean and variance of x = N/N 0 is equivalent to those derived for N in ( 3.9 ), but now we learn that N is
approximately normal for large populations.
The appearance of a normal probability distribution (also called a Gaussian probability distribution) in a
first-order expansion is in fact a particular case of the Central Limit Theorem, one of the most important and useful
theorems in prob- ability and statistics. We state here a simple version of this theorem without proof:

Central Limit Theorem: Suppose that X 1, X 2, . . . , X n are independent and identically distributed (iid) random variables
with mean 〈 X 〉 and variance σ 2
X. Then for sufficiently

44 CHAPTER 3. STOCHASTIC POPULATION GROWTH


3.3. SIMULATION OF POPULATION GROWTH

large n, the probability distribution of the average of the X i ’s, denoted as the random variable Z = 1

n ∑ n i= 1 X i, is well approximated by a Gaussian with mean 〈 X 〉 and variance σ 2 =


σ X/
2
n.
The central limit theorem can be applied directly to our problem. Consider that our population consists of N 0 founders.
If m i( t) denotes the number of individuals descendent from founder i at time t ( including the still living founder), then
the total number of individuals at time t is N(t) = ∑ N 0

i= 1 m i( t); and the average number


of descendants of a single founder is x(t) = N(t)/N 0. If the mean number of descendants of a single founder is 〈 m 〉, with
variance σ 2
m, then by applying the
central limit theorem for large N 0, the probability distribution function of x is well approximated by a Gaussian with
mean 〈 x 〉 = 〈 m 〉 and variance σ 2
x=σ2 m/ N 0.
Comparing with our results ( 3.31 ), we find 〈 m 〉 = e bt and σ 2 m = e 2 bt( 1 − e − bt).

3.3 Simulation of population growth

As we have seen, stochastic modeling is significantly more complicated than de- terministic modeling. As the
modeling becomes more sophisticated, a numerical simulation becomes necessary. Here, for illustration, we show
how to simulate individual realizations of population growth.

A naive approach would make use of the birth rate b directly. During the short time interval ∆ t, each individual
has probability b ∆ t of giving birth. We can de- cide if an individual gives birth by generating a random deviate (a
pseudo-random number between zero and one): if the random deviate is less than b ∆ t, then the individual gives
birth; if larger than b ∆ t, then the individual does not. With N

individuals at time t, we then simply compute N random deviates. Counting the number of random deviates less than b
∆ t allows us to update the population size to the time t + ∆ t. For accuracy, ∆ t must be small, making this a
computationally slow method.

There is, however, a much more efficient way to simulate population growth. Define a random variable τ = τ( N) to
be the time it takes for a population to grow from size N to size N + 1 because of a single birth. The random variable τ
is called the interevent time and represents the elapsed time between births. A simulation from population size N 0 to
size N f would then simply require computing N f − N 0

different random values of τ, a relatively easy and quick computation if we know the probability density function (pdf)
of τ.
Accordingly, we define P( τ) to be the pdf of τ for a population of size N. The cumulative distribution function (cdf),
F( τ), defined as the probability that the in- terevent time is less that τ is given by

∫τ
F( τ) =
0 P( τ) d τ,

where P( τ) = F ′( τ). The complementary cumulative distribution function (ccdf),


G( τ), defined as the probability that the interevent time is greater than τ is given by G( τ) = 1 − F( τ).

Now the probability that the interevent time is greater than τ + ∆ τ, with ∆ τ
small, is given by the probability that it is greater than τ times the probability that there are no births in the time
interval ∆ τ. Therefore, G( τ + ∆ τ) satisfies

G( τ + ∆ τ) = G( τ)( 1 − bN ∆ τ).

CHAPTER 3. STOCHASTIC POPULATION GROWTH 45


3.3. SIMULATION OF POPULATION GROWTH

Differencing G and taking the limit ∆ τ → 0 yields the differential equation

dG d τ = − bNG,

which can be integrated using the initial condition G( 0) = 1 to obtain

G( τ) = e − bN τ.

From G( τ), we can find

F( τ) = 1 − e − bN τ, P( τ) = bNe − bN τ. (3.32)

The pdf, P( τ), has the form of an exponential distribution with parameter bN.
Here, we make use of a well-known result from probability theory that enables us to compute τ using random
deviates. With y a random deviate, τ can be com- puted from τ = F − 1( y), where F − 1 is the inverse function of F. The
correct formula for the exponential distribution is

τ = − ln (1 − y) bN . (3.33)

To simulate a population growing from N 0 to N f , we compute N f − N 0 random de- viates y, and then compute the
corresponding interevent times using ( 3.33 ), taking care to adjust the population size N as the population grows.

Below, we illustrate a simple MATLAB function that simulates one realization of population growth from initial
size N 0 to final size N f , with birth rate b.

function [t, N] = population_growth_simulation(b,N0,Nf) % simulates population growth from N0 to Nf


with birth rate b N=N0:Nf;

y=rand(1,Nf-N0); % random deviates


tau=-log(1-y)./(b*N(1:Nf-N0)); % interevent times t=[0 cumsum(tau)]; % cumulative sum of
interevent times

The function population_growth_simulation.m can be driven by a MATLAB script to compute realizations of


population growth. For instance, the following script computes 25 realizations for a population growth from 10 to 100
with b = 1 and plots all the realizations:

% calculate nreal realizations and plot b=1; N0=10; Nf=100;


nreal=25; for i=1:nreal

[t,N]=population_growth_simulation(b,N0,Nf); plot(t,N); hold on; end

xlabel(’t’); ylabel(’N’);

Figure 3.1 presents three graphs, showing 25 realizations of population growth starting with population sizes of 10,
100, and 1000, and ending with population sizes a factor of 10 larger. Observe that the variance, relative to the
initial popula- tion size, decreases as the initial population size increases, following our analytical result ( 3.31 ).

46 CHAPTER 3. STOCHASTIC POPULATION GROWTH


3.3. SIMULATION OF POPULATION GROWTH

(a) (b)
100 1000

50 500
N

0 0
0 1 2 3 0 1 2 3
t t
(c)
10000

5000
N

0
0 1 2 3
t

Figure 3.1: Twenty-five realizations of population growth with initial population sizes of 10, 100, and 1000, in (a), (b),
and (c), respectively.

CHAPTER 3. STOCHASTIC POPULATION GROWTH 47


3.3. SIMULATION OF POPULATION GROWTH

48 CHAPTER 3. STOCHASTIC POPULATION GROWTH


Chapter 4

Infectious Disease Modeling


In the late 1320’s, an outbreak of the bubonic plague occurred in China. This disease is caused by the bacteria Yersinia
pestis and is transmitted from rats to hu- mans by fleas. The outbreak in China spread west, and the first major
outbreak in Europe occurred in 1347. During a five year period, 25 million people in Europe, approximately 1/3 of
the population, died of the black death. Other more recent epidemics include the influenza pandemic known as the
Spanish flu killing 50-100 million people worldwide during the years 1918-1919, and the present AIDS epi- demic,
originating in Africa and first recognized in the USA in 1981, killing more than 25 million people. For comparison, the
SARS epidemic for which Hong Kong was the global epicenter in the year 2003 resulted in 8096 known SARS
cases and 774 deaths. Yet, we know well that this relatively small epidemic caused local social and economic
turmoil.

Here, we introduce the most basic mathematical models of infectious disease epidemics and endemics. These
models form the basis of the necessarily more detailed models currently used by world health organizations, both to
predict the future spread of a disease and to develop strategies for containment and eradication.

4.1 The SI model

The simplest model of an infectious disease categorizes people as either susceptible or infective ( SI). One can
imagine that susceptible people are healthy and infective people are sick. A susceptible person can become
infective by contact with an in- fective. Here, and in all subsequent models, we assume that the population under
study is well mixed so that every person has equal probability of coming into con- tact with every other person. This
is a major approximation. For example, while the population of Amoy Gardens could be considered well mixed
during the SARS epi- demic because of shared water pipes and elevators, the population of Hong Kong as a whole
could not because of the larger geographical distances, and the limited travel of many people outside the
neighborhoods where they live.

We derive the governing differential equation for the SI model by considering the number of people that
become infective during time ∆ t. Let β ∆ t be the probability that a random infective person infects a random
susceptible person during time
∆ t. Then with S susceptible and I infective people, the expected number of newly infected people in the total
population during time ∆ t is β ∆ tSI. Thus,

I(t + ∆ t) = I(t) + β ∆ tS(t)I(t),

and in the limit ∆ t → 0,


dI dt = β SI.
(4.1)

We diagram ( 4.1 ) as

S β−−→
SI I.

Later, diagrams will make it easier to construct more complicated systems of equa- tions. We now assume a
constant population size N, neglecting births and deaths,

49
4.2. THE SIS MODEL

so that S + I = N. We can eliminate S from ( 4.1 ) and rewrite the equation as


( )
dI dt = β NI
1−I ,
N

which can be recognized as a logistic equation, with growth rate β N and carrying capacity N. Therefore I → N as t → ∞
and the entire population will become infective.

4.2 The SIS model

The SI model may be extended to the SIS model, where an infective can recover and become susceptible again.
We assume that the probability that an infective recovers during time ∆ t is given by γ ∆ t. Then the total number of
infective people that recover during time ∆ t is given by I × γ ∆ t, and

I(t + ∆ t) = I(t) + β ∆ tS(t)I(t) − γ ∆ tI(t),

or as ∆ t → 0,
dI dt = β SI − γ I,
(4.2)

which we diagram as

−−
SI ⇀
S β↽−−
γ I I.

Using S + I = N, we eliminate S from ( 4.2 ) to obtain


( )
dI dt = ( β N − γ) I
1−β , (4.3)
βN−γI

which is again a logistic equation, but now with growth rate β N − γ and carrying capacity N − γ/β. In the SIS model,
an epidemic will occur if β N > γ. And if an epidemic does occur, then the disease becomes endemic with the number
of infectives at equilibrium given by I ∗ = N − γ/β, and the number of susceptibles given by S ∗ = γ/β.

In general, an important metric for whether or not an epidemic will occur is called the basic reproductive ratio.
The basic reproductive ratio is defined as the expected number of people that a single infective will infect in an
otherwise sus- ceptible population. To compute the basic reproductive ratio, define l(t) to be the probability that an
individual initially infected at t = 0 is still infective at time t.

Since the probability of being infective at time t + ∆ t is equal to the probability of being infective at time t multiplied by
the probability of not recovering during time
∆ t, we have
l(t + ∆ t) = l(t)( 1 − γ ∆ t),

or as ∆ t → 0,
dl dt = − γ l.

With initial condition l( 0) = 1,


l(t) = e − γ t. (4.4)

Now, the expected number of secondary infections produced by a single primary infective over the time period ( t,
t + ∆ t) is given by the probability that the primary

50 CHAPTER 4. INFECTIOUS DISEASE MODELING


4.3. THE SIR EPIDEMIC DISEASE MODEL

infective is still infectious at time t multiplied by the expected number of secondary infections produced by a single
infective during time ∆ t; that is, l(t) × S(t) β ∆ t.
Here, the definition of the basic reproductive ratio assumes that the entire popula- tion is susceptible so that S(t) = N. Therefore,
the expected number of secondary infectives produced by a single primary infective in a completely susceptible
popu- lation is

∫∞ ∫∞

0β l(t)Ndt = β N 0e−γt dt

=βN
γ.

The basic reproductive ratio, written as R 0, is therefore defined as

R0=βN
γ,

and from ( 4.3 ), we can see that in the SIS model an epidemic will occur if R 0 > 1. In other words, an epidemic can
occur if an infected individual in an otherwise susceptible population will on average infect more than one other
individual.
We have also seen an analogous definition of the basic reproductive ratio in our previous discussion of
age-structured populations (§ 2.5 ). There, the basic reproduc- tive ratio was the number of female offspring expected
from a new born female over her lifetime; the population size would grow if this value was greater than unity.

In the SIS model, after an epidemic occurs the population reaches an equilibrium between susceptible and
infective individuals. The effective basic reproductive ratio of this steady-state population can be defined as β S ∗/ γ, and
with S ∗ = γ/β this ratio is evidently unity. Clearly, for a population to be in equilibrium, an infective individual must
infect on average one other individual before he or she recovers.

4.3 The SIR epidemic disease model

The SIRmodel, first published by Kermack andMcKendrick in 1927, is undoubtedly the most famous mathematical
model for the spread of an infectious disease. Here, people are characterized into three classes: susceptible S, infective
I and removed
R. Removed individuals are no longer susceptible nor infective for whatever reason; for example, they have
recovered from the disease and are now immune, or they have been vaccinated, or they have been isolated from
the rest of the population, or perhaps they have died from the disease. As in the SIS model, we assume that
infectives leave the I class with constant rate γ, but in the SIR model they move directly into the R class. The model
may be diagrammed as

S β−→
SI I γ I −→ R,

and the corresponding coupled differential equations are

dS dt = − β SI, dI dt = β SI − γ I, dR dt = γ I,
(4.5)

with the constant population constraint S+ I +R = N. For convenience, we nondi- mensionalize ( 4.5 ) using N for
population size and γ − 1 for time; that is, let

Ŝ = S/N, Iˆ = I/N, R̂ = R/N, tˆ = γ t,

CHAPTER 4. INFECTIOUS DISEASE MODELING 51


4.3. THE SIR EPIDEMIC DISEASE MODEL

and define the dimensionless basic reproductive ratio as

R0=βN (4.6)
γ.

The dimensionless SIR equations are then given by

d ˆS I R
S ˆI, d ˆ S ˆI −ˆ I, d ˆ I, (4.7)
d ˆtt̂ = −R 0 ˆ d tt̂ˆ = R0ˆ d tt̂ˆ = ˆ

with dimensionless constraint ˆ S + ˆ I + ˆ R = 1.


We will use the SIR model to address two fundamental questions: (1) Under what condition does an epidemic
occur? (2) If an epidemic occurs, what fraction of a well-mixed population gets sick?

Let ( ˆ S ∗, ˆ I ∗, ˆ R ∗) be the fixed points of ( 4.7 ). Setting d ˆ S/d ˆ t = d ˆ I/d ˆ t = d ˆ R/d ˆ t = 0,


we immediately observe from the equation for d ˆ R/d ˆ t that ˆ I = 0, and this value
forces all the time-derivatives to vanish for any ˆ S and ˆ R. Since with ˆ I = 0, we have
R̂ = 1 − ˆ
R S, evidently all the fixed points of ( 4.7 ) are given by the one parameter
family ( ˆ S ∗, ˆ I ∗, ˆ R ∗) = ( ˆ S ∗, 0, 1 − ˆ S ∗).
An epidemic occurs when a small number of infectives introduced into a sus- ceptible population results in an
increasing number of infectives. We can assume an initial population at a fixed point of ( 4.7 ), perturb this fixed point
by introducing a small number of infectives, and determine the fixed point’s stability. An epidemic occurs when the
fixed point is unstable. The linear stability problem may be solved by considering only the equation for d ˆ

I/d ˆ t in ( 4.7 ). With ˆ I 1 and ˆ S ≈ ˆ S 0, we have

d ˆI
S 0 − 1) ˆ I,
d tt̂ˆ = ( R0ˆ
so that an epidemic occurs if R 0 ˆ S 0 − 1 > 0. With the basic reproductive ratio given
by ( 4.6 ), and ˆ S 0 = S 0/ N, where S 0 is the number of initial susceptible individuals,
an epidemic occurs if

R 0 ˆS 0 = β S 0 (4.8)
γ > 1,
which could have been guessed. An epidemic occurs if an infective individual introduced into a population of S 0 susceptible
individuals infects on average more than one other person. If an epidemic occurs, then initially the number of
infective individuals increases exponentially with growth rate β S 0 − γ.

We now address the second question: If an epidemic occurs, what fraction of the population gets sick? For
simplicity, we assume that the entire initial population is susceptible to the disease, so that ˆ
S 0 = 1. We expect the solution of the governing
equations ( 4.7 ) to approach a fixed point asymptotically in time (so that the final number of infectives will be zero),
and we define this fixed point to be ( ˆ S, ˆI, ˆR) =
( 1 − ˆ R ∞, 0, ˆ R ∞), with ˆ R ∞ equal to the fraction of the population that gets sick. To
compute ˆ R ∞, it is simpler to work with a transformed version of ( 4.7 ). By the chain
rule, d ˆ S/d ˆ t = (d ˆ S/d ˆ R)(d ˆ R/d ˆ t), so that

d ˆŜ S/d ˆ t
d ˆR =d ˆ d ˆR/d ˆt
= −R 0 ˆ S,

which is separable. Separating and integrating from the initial to final conditions,
∫ ˆ 1 − ˆR ∞ ∫ ˆ R∞
d ˆS
d ˆR,
1 Ŝ = −R 0
S 0

52 CHAPTER 4. INFECTIOUS DISEASE MODELING


4.3. THE SIR EPIDEMIC DISEASE MODEL

fraction of population that get sick


0.8

0.7

0.6

0.5

0.4
R̂ ∞
R

0.3

0.2

0.1

0
0 0.5 1 1.5 2
R0

Figure 4.1: The fraction of the population that gets sick in the SIR model as a function of the basic reproduction ratio
R 0.

which upon integration and simplification, results in the following transcendental equation for ˆ
R ∞:
R∞= 0,
1 − ˆ R ∞ − e −R 0 ˆ (4.9)

an equation that can be solved numerically using Newton’s method. We have

R ∞,
F( ˆR ∞) = 1 − ˆ R ∞ − e −R 0 ˆ
R ∞;
F ′( ˆR ∞) = − 1 + R 0 e −R 0 ˆ

and Newton’s method for solving F( ˆ R ∞) = 0 iterates

R(∞n))
R̂(∞n+
R( = ˆ1) R n ∞ − F( ˆ
F ′( ˆR(∞n))

for fixed R 0 and a suitable initial condition for R( 0) ∞ , which we take to be unity. My
code for computing R ∞ as a function of R 0 is given below, and the result is shown in Fig. 4.1 . There is an explosion
in the number of infections as R 0 increases from unity, and this rapid increase is a classic example of what is known
more generally as a threshold phenomenon.

function [R0, R_inf] = sir_rinf


% computes solution of R_inf using Newton’s method from SIR model nmax=10; numpts=1000;

R0 = linspace(0,2,numpts); R_inf = ones(1,numpts); for i=1:nmax

R_inf = R_inf - F(R_inf,R0)./Fp(R_inf,R0); end

plot(R0,R_inf); axis([0 2 -0.02 0.8])


xlabel(’$\mathcal{R}_0$’, ’Interpreter’, ’latex’, ’FontSize’,16)

CHAPTER 4. INFECTIOUS DISEASE MODELING 53


4.4. VACCINATION

ylabel(’$\hat R_\infty$’, ’Interpreter’, ’latex’, ’FontSize’,16); title(’fraction of population that get sick’)
%subfunctions

function y = F(R_inf,R0) y = 1 - R_inf -


exp(-R0.*R_inf); function y = Fp(R_inf,R0) y = -1 +
R0.*exp(-R0.*R_inf);

4.4 Vaccination

Table 4.1 lists the diseases for which vaccines exist and are widely administered to children. Health care authorities
must determine the fraction of a population that must be vaccinated to prevent epidemics.

We address this problem within the SIR epidemic disease model. Let p be the fraction of the population that is
vaccinated and p ∗ the minimum fraction required to prevent an epidemic. When p > p ∗, an epidemic can not occur.
Since even non-vaccinated people are protected by the absence of epidemics, we say that the population has
acquired herd immunity.

We assume that individuals are susceptible unless vaccinated, and vaccinated individuals are in the removed
class. The initial population is then modeled as
S, ˆI, ˆR) = ( 1 − p, 0, p). We have already determined the stability of this fixed point to perturbation by a small
( ˆS,
number of infectives. The condition for an epidemic to occur is given by ( 4.8 ), and with ˆ
S 0 = 1 − p, an epidemic occurs if

R 0( 1 − p) > 1.

Therefore, the minimum fraction of the population that must be vaccinated to pre- vent an epidemic is

p∗=1− 1
R0.

Diseases with smaller values of R 0 are easier to eradicate than diseases with larger values R 0 since a population
can acquire herd immunity with a smaller fraction of the population vaccinated. For example, smallpox with R 0 ≈ 4
has been eradicated throughout the world whereas measles with R 0 ≈ 17 still has occasional outbreaks.

4.5 The SIR endemic disease model

A disease that is constantly present in a population is said to be endemic. For example, malaria is endemic to
Sub-Saharan Africa, where about 90% of malaria- related deaths occur. Endemic diseases prevail over long time
scales: babies are born, old people die. Let b be the birth rate and d the disease-unrelated death rate. We
separately define c to be the disease-related death rate; R is now the immune class. We may diagram a SIR model
of an endemic disease as

54 CHAPTER 4. INFECTIOUS DISEASE MODELING


4.5. THE SIR ENDEMIC DISEASE MODEL

Disease Description Symptoms Complications

Diphtheria A bacterial res- Airway obstruction,


piratory disease Sore throat
grade
and fever
low- coma, and death
Haemophilus A bacterial Skin and throat in- fections, Death in one out of 20 children,
influenzae type b infection occur- ring meningitis, pneumonia, and perma- nent brain damage in
(Hib) primarily in infants sepsis, and arthritis 10% - 30% of the sur- vivors

Hepatitis A A viral liver dis- ease Potentially none; usually none


yellow skin or eyes,
tiredness, stom-
ach ache, loss of
appetite, or nausea
Hepatitis B Same as Hepati- tis A Same as Hepatitis A Life-long liver prob-
lems, such as scarring of the liver
and liver cancer

Measles A viral respira- tory Rash, high fever, cough, Diarrhea, ear infections,
disease runny nose, and red, watery pneumonia, encephali- tis,
eyes seizures, and death

Mumps A viral lymph node Fever, headache, Meningitis, inflamma-


disease muscle ache, and swelling tion of the testicles or ovaries,
of the inflammation
lymph nodes close to the of the pancreas and deafness
jaw
Pertussis A bacterial res- Pneumonia, encephali- tis, and
(whooping piratory disease Severe spasms
coughing
of death, especially in infants
cough)
Pneumococcal A bacterial dis- ease High fever, cough, and death
disease stabbing chest pains,
bacteremia, and
meningitis
Polio A viral lym- Fever, sore throat, nausea, Paralysis that can lead to
phatic and headaches, stomach aches, permanent disability and death
nervous system stiff- ness in the neck, back,
disease and legs

Rubella (Ger- Rash and fever for two to Birth defects if acquired by a
man measles) A viral respira-
tory disease three days pregnant woman
Tetanus (lock- jaw) A bacterial Lockjaw, stiffness in the Death in one third of the cases,
nervous system neck and ab- domen, and especially people over age 50
disease diffi- culty swallowing

Varicella A viral disease in the A skin rash of blister-like Bacterial infection of the skin,
(chickenpox) Herpes family lesions swelling of the brain,
and pneumonia
Human papil- A viral skin Warts, cervical can- cer The 5-year survival rate from all
lomavirus and mucous diagnoses of cervical cancer is
membrane disease 72%

Table 4.1: Previously common diseases for which vaccines have been developed.
CHAPTER 4. INFECTIOUS DISEASE MODELING 55
4.6. EVOLUTION OF VIRULENCE

6 cI

bN β SI γI
- S
- I
- R

? dS ? dI ? dR

and the governing differential equations are

dS dt = bN − β SI − dS, dI dt = β SI − ( d + c + γ) I, dR dt = γ I − dR,
(4.10)

with N = S + I + R. In our endemic disease model, N separately satisfies the differential equation

dN/dt = (b − d)N − cI, (4.11)

and is not necessarily constant.


A disease can become endemic in a population if dI/dt stays nonnegative; that is,

β S(t) d + c + γ ≥ 1.

For a disease to become endemic, newborns must introduce an endless supply of new susceptibles into a
population.

4.6 Evolution of virulence

Microorganisms continuously evolve due to selection pressures in their environ- ments. Antibiotics are a common
source of selection pressure on pathogenic bac- teria, and the development of antibiotic-resistant strains presents a
major health challenge to medical science. Bacteria and viruses also compete directly with each other for
reproductive success resulting in the evolution of virulence. Here, using the SIR endemic disease model, we study
how virulence may evolve.

For the sake of argument, we will assume that a population is initially in equi- librium with an endemic disease
caused by a wildtype virus; that is, S, I and R are assumed to be nonzero and at equilibrium values. Now suppose
that some virus particles mutate by a random, undirected process that occurs naturally. We want to determine the
conditions under which the mutant virus will replace the wildtype virus in the population. In mathematical terms, we
want to determine the linear stability of the endemic disease equilibrium to the introduction of a mutant viral strain.

We assume that the original wildtype virus has infection rate β, removal rate γ,
and disease-related death rate c, and that the mutant virus has corresponding rates
β ′, γ ′ and c ′. We further assume that an individual infected with either a wildtype or mutant virus gains immunity to
subsequent infection from both wildtype and mutant viral forms. Our model thus has a single susceptible class S, two
distinct infective classes I and I ′ depending on which virus causes the infection, and a single recovered class R. The
appropriate diagram is

56 CHAPTER 4. INFECTIOUS DISEASE MODELING


4.6. EVOLUTION OF VIRULENCE

6( d + c)I

β SI γI
- I

bN
- S - R

-
I′
dS γ′I′
β ′ SI ′
? ? dR

?( d + c ′) I ′

with corresponding differential equations

dS dt = bN − dS − S( β I + β ′ I ′),
(4.12)

dI dt = β SI − ( d + c + γ) I,
(4.13)

dI ′
(4.14)
dt = β ′ SI ′ − ( d + c ′ + γ ′) I ′,
dR dt = γ I + γ ′ I ′ − dR.
(4.15)

If the population is initially in equilibrium with the wildtype virus, then we have
I˙ = 0 with I 6= 0, and the equilibrium value for S is determined from ( 4.13 ) to be

S∗=d+c+γ , (4.16)
β

which corresponds to a basic reproductive ratio β S ∗/( d + c + γ) of unity.


We perturb this endemic disease equilibrium by introducing a small number of infectives carrying the mutated
virus, that is, by letting I ′ be small. Rather than solve the stability problem by means of a Jacobian analysis, we can
directly examine the equation for dI ′/ dt given by ( 4.14 ). Here, with S = S ∗ given by ( 4.16 ), we have

[ β ′( d + c + γ) ]
dI ′
− ( d + c ′ + γ ′) I ′;
dt = β

and I ′ increases exponentially if

β ′( d + c + γ)
− ( d + c ′ + γ ′) > 0,
β

or after some elementary algebra,

β′ β
(4.17)
d+c′+γ′> d+c+γ.

Our result ( 4.17 ) suggests that endemic viruses (or other microorganisms) will tend to evolve (i) to be more easily
transmitted between people ( β ′ > β); ( ii) to make people sick longer ( γ ′ < γ), and; (iii) to be less deadly c ′ < c. In other
words, viruses evolve to increase their basic reproductive ratios. For instance, our model suggests that viruses
evolve to be less deadly because the dead do not spread disease. Our result would not be applicable, however, if
the dead in fact did spread disease, a possibility if disposal of the dead was not done with sufficient care, perhaps
because of certain cultural traditions such as family washing of the dead body.

CHAPTER 4. INFECTIOUS DISEASE MODELING 57


4.6. EVOLUTION OF VIRULENCE

58 CHAPTER 4. INFECTIOUS DISEASE MODELING


Chapter 5

Population Genetics
Deoxyribonucleic acid, or DNA—a large double-stranded, helical molecule, with rungs made from the four base
pairs adenine (A), cytosine (C), thymine (T) and gua- nine (G)—carries inherited genetic information. The ordering of
the base pairs A, C, T and G determines the DNA sequence. A gene is a particular DNA sequence that is the
fundamental unit of heredity for a particular trait. Some species develop as diploids, carrying two copies of every
gene, one from each parent, and some species develop as haploids with only one copy. There are even species that
develop as both diploids and haploids.

Consider the pea plant, which develops as a diploid. When we say there is a gene for pea color, say, we mean
there is a particular DNA sequence that may vary in a pea plant population, and that there are at least two subtypes,
called alleles,
where plants with two copies of the yellow-color allele have yellow peas, those with two copies of the green-color
allele, green peas. A plant with two copies of the same allele is homozygous for that particular gene (or a homozygote),
while a plant carrying two different alleles is heterozygous ( or a heterozygote). For the pea color gene, a plant
carrying both a yellow- and green-color allele has yellow peas. We say that the green color is a recessive trait (or
the green-color allele is recessive),

and the yellow color is a dominant trait (or the yellow-color allele is dominant). The combination of alleles carried by
the plant is called its genotype, while the actual trait (green or yellow peas) is called its phenotype. A gene that has
more than one allele in a population is called polymorphic, and we say the population has a polymorphism

for that particular gene.

Population genetics can be defined as the mathematical modeling of the evo- lution and maintenance of
polymorphism in a population. Population genetics together with Charles Darwin’s theory of evolution by natural
selection and Gregor Mendel’s theory of biological inheritance forms the modern evolutionary synthe- sis
(sometimes called the modern synthesis, the evolutionary synthesis, the neo- Darwinian synthesis, or
neo-Darwinism). The primary founders in the early twen- tieth century of population genetics were Sewall Wright, J.
B. S. Haldane and Ronald Fisher.

Allele frequencies in a population can change due to the influence of four pri- mary evolutionary forces: natural
selection, genetic drift, mutation, and migration. Here, we mainly focus on natural selection and mutation. Genetic
drift is the study of stochastic effects, and it is important in small populations. Migration typically requires
consideration of the spatial distribution of a population, and it is usually modeled mathematically by partial
differential equations.

The simplified models we will consider assume infinite population sizes (ne- glecting stochastic effects except in
§ 5.5 ), well-mixed populations (neglecting any spatial distribution), and discrete generations (neglecting any
age-structure). Our main purpose is to illustrate the fundamental ways that a genetic polymorphism can be
maintained in a population.

59
5.1. HAPLOID GENETICS

genotype Aa
number nAna
viability fitness g A g a
fertility fitness fA fa

Table 5.1: Haploid genetics using population size, absolute viability, and fertility fitnesses.

5.1 Haploid genetics

We first consider the modeling of selection in a population of haploid organisms. Selection is modeled by fitness
coefficients, with different genotypes having differ- ent fitnesses. We begin with a simple model that counts the
number of individuals in the next generation, and then show how this model can be reformulated in terms of allele
frequencies and relative fitness coefficients.

Table 5.1 formulates the basic model. We assume that there are two alleles A
and a for a particular haploid gene. These alleles are carried in the population by
n A and n a individuals, respectively. A fraction g A ( g a) of individuals carrying allele
A (a) is assumed to survive to reproduction age, and those that survive contribute
f A ( f a) offspring to the next generation. These are of course average values, but under the assumption of an (almost)
infinite population, our model is deterministic. Accordingly, with n( i)

A ( n( i) a ) representing the number of individuals carrying allele


A (a) in the i th generation, and formulating a discrete generation model, we have

n(Ai+=1)f A g A n( i) A, n( i+ 1) a = f a g a n( i)a . (5.1)

It is mathematically easier and more transparent to work with allele frequencies rather than individual numbers. We
denote the frequency (or more accurately, proportion) of allele A (a) in the i th generation by p i ( q i); that is,

A a
p i = n( i) , q i = n( i) ,
n( i) A + n( i) a n( i) A + n( i) a

where evidently p i + q i = 1. Now, from ( 5.1 ),

n(Ai++1)n( i+ 1) a = f A g A n( i) A + f a g a n( i) a, (5.2)

so that dividing the first equation in ( 5.1 ) by ( 5.2 ) yields

f A g A n( i) A
p i+ 1 =
f A g A n( i) A + f a g a n( i)a
fAgApi
= (5.3)
fAgApi+fagaqi
( fA gA )

fa ga
pi
= ( fA gA ) ,
fa ga
p i+qi

where the second equality comes from dividing the numerator and denominator by

n( i) A + n( i)a , and the third equality from dividing the numerator and denominator by

60 CHAPTER 5. POPULATION GENETICS


5.1. HAPLOID GENETICS

genotype A a
freq. of gamete p q
relative fitness 1+s 1
freq after selection (1 + s)p/w q/w
normalization w = ( 1 + s)p + q

Table 5.2: Haploid genetic model of the spread of a favored allele.

f A g A. Similarly,
qi
q i+ 1 = ( fA gA ) , (5.4)
fa ga
pi+qi

which could also be derived using q i+ 1 = 1 − p i+ 1. We observe from the evolution equations for the allele frequencies, ( 5.3
) and ( 5.4 ), that only the relative fitness
f A g A/ f a g a of the alleles matters. Accordingly, in our models, we will consider only relative fitnesses, and we will
arbitrarily set one fitness to unity to simplify the algebra and make the final result more transparent.

5.1.1 Spread of a favored allele

We consider a simple model for the spread of a favored allele in Table 5.2 , with
s > 0. Denoting p ′ by the frequency of A in the next generation (not (!) the derivative of p), the evolution equation is
given by

p ′ = ( 1 + s)p w
(5.5)
= ( 1 + s)p
1 + sp ,

where we have used (1 + s)p + q = 1 + sp, since p + q = 1. Note that ( 5.5 ) is the same as ( 5.3 ) with p ′ = p i+ 1, p = p i, and f A g
A/ f a g a = 1 + s. Fixed points of ( 5.5 ) are determined from p ′ = p. We find two fixed points: p ∗ = 0, corresponding to a
population in which allele A is absent; and p ∗ = 1, corresponding to a population in which allele A is fixed. Intuitively, p ∗
= 0 is unstable while p ∗ = 1 is stable.

To illustrate how a stability analysis is performed analytically for a difference equation (instead of a differential
equation), consider the general difference equa- tion

p ′ = f (p). (5.6)

With p = p ∗ a fixed point such that p ∗ = f (p ∗), we write p = p ∗ + e so that ( 5.6 ) becomes

p ∗ + e ′ = f (p ∗ + e)

= f (p ∗) + e f ′( p ∗) + . . .

= p ∗ + e f ′( p ∗) + . . . ,

where f ′( p ∗) denotes the derivative of f evaluated at p ∗. Therefore, to leading-order in e

∣∣ e ′/ e ∣∣ = ∣∣ f ′( p ∗) ∣∣ ,

and the fixed point is stable provided that | f ′( p ∗)| < 1. For our haploid model,

f (p) = ( 1 + s)p f ′( p) = 1 + s
1 + sp , ( 1 + sp) 2 ,

CHAPTER 5. POPULATION GENETICS 61


5.1. HAPLOID GENETICS

genotype A a
freq. of gamete p q
relative fitness 1 1−s
freq after selection p/w ( 1 − s)q/w
freq after mutation (1-u)p/w [(1 − s)q + up]/w
normalization w = p + ( 1 − s)q

Table 5.3: A haploid genetic model of mutation-selection balance.

so that f ′( p ∗ = 0) = 1 + s > 1, and f ′( p ∗ = 1) = 1/(1 + s) < 1, confirming that


p ∗ = 0 is unstable and p ∗ = 1 is stable.
If the selection coefficient s is small, the model equation ( 5.5 ) simplifies further.
We have

p ′ = ( 1 + s)p
1 + sp

= ( 1 + s)p( 1 − sp + O( s 2))

= p + (p − p 2) s + O( s 2),

so that to leading-order in s,

p ′ − p = sp( 1 − p).

If p ′ − p 1, which is valid for s 1, we can approximate this difference equation


by the differential equation
dp/dn = sp( 1 − p),

which shows that the frequency of allele A satisfies the now very familiar logistic equation.

Although a polymorphism for this gene exists in the population as the new allele spreads, eventually A becomes
fixed in the population and the polymorphism is lost. In the next section, we consider how a polymorphism can be
maintained in a haploid population by a balance between mutation and selection.

5.1.2 Mutation-selection balance

We consider a gene with two alleles: a wildtype allele A and a mutant allele a.
We view the mutant allele as a defective genotype, which confers on the carrier a lowered fitness 1 − s relative to
the wildtype. Although all mutant alleles may not have identical DNA sequences, we assume that they share in
common the same phenotype of reduced fitness. We model the opposing effects of two evolutionary forces: natural
selection, which favors the wildtype allele A over the mutant allele

a, and mutation, which confers a small probability u that allele A mutates to allele
a in each newborn individual. Schematically,

A u−
↽−⇀s a

where u represents mutation and s represents selection. The model is shown in Table 5.3 . The equations for p and q
in the next generation are

p ′ = ( 1 − u)p w
(5.7)
= ( 1 − u)p
1 − s( 1 − p) ,

62 CHAPTER 5. POPULATION GENETICS


5.2. DIPLOID GENETICS

genotype AA Aa aa
referred to as wildtype homozygote heterozygote mutant homozygote frequency
P Q R

Table 5.4: The terminology of diploidy.

and

q ′ = ( 1 − s)q + up
w
(5.8)
= ( 1 − s − u)q + u ,
1 − sq

where we have used p+q = 1 to eliminate q from the equation for p ′ and p from the equation for q ′. The equations for p ′
and q ′ are linearly dependent since p ′ + q ′ = 1, and we need solve only one of them.

Considering ( 5.7 ), the fixed points determined from p ′ = p are p ∗ = 0, for which
the mutant allele a is fixed in the population and there is no polymorphism, and the solution to

1 − s( 1 − p ∗) = 1 − u,

which is p ∗ = 1 − u/s, and there is a polymorphism. The stabilities of these two fixed points are determined by
considering p ′ = f (p), with f (p) given by the right- hand-side of ( 5.7 ). Taking the derivative of f ,

f ′( p) = ( 1 − u)( 1 − s)
[ 1 − s( 1 − p)] 2 ,

so that

f ′( p ∗ = 0) = 1 − u
1 − s , f ′( p ∗ = 1 − u/s) = 1 − s 1−u.

Applying the criterion | f ′( p ∗)| < 1 for stability, p ∗ = 0 is stable for s < u and
p ∗ = 1 − u/s is stable for s > u. A polymorphism is therefore possible under mutation-selection balance when s > u > 0.

5.2 Diploid genetics

Most sexually reproducing species are diploid. In particular, our species Homo sapiens is diploid with two
exceptions: we are haploid at the gamete stage (sperm and unfertilized egg); and males are haploid for most genes
on the unmatched X and Y sex chromosomes (females are XX and diploid). This latter seemingly innocent fact is of
great significance to males suffering from genetic diseases due to an X-linked recessive mutation inherited from
their mother. Females inheriting this mutation are most probably disease-free because of the functional gene
inherited from their father.

A polymorphic gene with alleles A and a can appear in a diploid gene as three distinct genotypes: AA, Aa and aa.
Conventionally, we denote A to be the wildtype allele and a the mutant allele. Table 5.4 presents the terminology of
diploidy.
As for haploid genetics, we will determine evolution equations for allele and/or genotype frequencies. To
develop the appropriate definitions and relations, we initially assume a population of size N ( which we will later take
to be infinite), and

CHAPTER 5. POPULATION GENETICS 63


5.2. DIPLOID GENETICS

assume that the number of individuals with genotypes AA, Aa and aa are N AA,
N Aa and N aa. Now, N = N AA + N Aa + N aa. Define genotype frequencies P, Q and
R as

P = N AA
N , Q = N Aa N , R = N aa N,
so that P + Q + R = 1. It will also be useful to define allele frequencies. Let n A
and n a be the number of alleles A and a in the population, with n = n A + n a the total number of alleles. Since the
population is of size N and diploidy, n = 2 N;
and since each homozygote contains two identical alleles, and each heterozygote contains one of each allele, n A = 2 N
AA + N Aa and n a = 2 N aa + N Aa. Defining the allele frequencies p and q as previously,

p = n A/ n

= 2 N AA + N Aa
2N

=P+1
2 Q;

and similarly,

q = n a/ n

= 2 N aa + N Aa
2N

= R+ 1
2 Q.

With five frequencies, P, Q, R, p, q, and four constraints P +Q+ R = 1, p + q = 1,


p = P +Q/ 2, q = R +Q/ 2, how many independent frequencies are there? In fact, there are two because one of the four
constraints is linearly dependent. We may choose any two frequencies other than the choice { p, q} as our linearly
independent set. For instance, one choice is { P, p}; then,

q = 1 − p, Q = 2( p − P), R = 1 + P − 2 p.

Similarly, another choice is { P, Q}; then

R = 1 − P − Q, p=P+1
2 Q, q = 1 − P − 1 2 Q.

5.2.1 Sexual reproduction

Diploid reproduction may be sexual or asexual, and sexual reproduction may be of varying types (e.g., random
mating, selfing, brother-sister mating, and various other types of assortative mating). The two simplest types to
model exactly are random mating and selfing. These mating systems are useful for contrasting the biology of both
outbreeding and inbreeding.

Random mating

Random mating is perhaps the simplest mating system to model. Here, we assume a well-mixed population of
individuals that have equal probability of mating with every other individual. We will determine the genotype
frequencies of the zygotes (fertilized eggs) in terms of the allele frequencies using two approaches: (1) the gene
pool approach, and (2) the mating table approach.

64 CHAPTER 5. POPULATION GENETICS


5.2. DIPLOID GENETICS

progeny frequency
mating frequency AA Aa aa
AA × AA P2 P2 0 0
AA × Aa 2 PQ PQ PQ 0
AA × aa 2 PR 0 2 PR 0
14 Q2 12 Q2 14 Q2
Aa × Aa Q2
Aa × aa 2 QR 0 QR QR
aa × aa R2 0 0 R2
Totals ( P +Q+ R) 2 ( P + 12 Q) 2 2( P + 12 Q)(R+ 12 Q) ( R+ 12 Q) 2
=1 = p2 = 2 pq = q2

Table 5.5: Random mating table.

The gene pool approach models sexual reproduction by assuming that males and females release their
gametes into pools. Offspring genotypes are determined by randomly combining one gamete from the male pool
and one gamete from the female pool. As the probability of a random gamete containing allele A or a is equal to the
allele’s population frequency p or q, respectively, the probability of an offspring being AA is p 2, of being Aa is 2 pq ( male
A female a + female A male a),

and of being aa is q 2. Therefore, after a single generation of random mating, the genotype frequencies can be given
in terms of the allele frequencies by

P = p 2, Q = 2 pq, R = q 2.

This is the celebrated Hardy-Weinberg law. Notice that under the assumption of random mating, there is now only a
single independent frequency, greatly simpli- fying the mathematical modeling. For example, if p is taken as the
independent frequency, then

q = 1 − p, P = p 2, Q = 2 p( 1 − p), R = ( 1 − p) 2.

Most modeling is done assuming random mating unless the biology under study is influenced by inbreeding.

The second approach uses a mating table (see Table 5.5 ). This approach to mod-
eling sexual reproduction is more general and can be applied to other mating sys- tems. We explain this approach
by considering the mating AA × Aa. The genotypes
AA and Aa have frequencies P and Q, respectively. The frequency of AA males mat- ing with Aa females is PQ and
is the same as AA females mating with Aa males, so the sum is 2 PQ. Half of the offspring will be AA and half Aa, and
the frequencies
PQ are denoted under progeny frequency. The sums of all the progeny frequencies are given in the Totals row, and
the random mating results are recovered upon use of the relationship between the genotype and allele frequencies.

Selfing

Perhaps the next simplest type of mating system is self-fertilization, or selfing. Here, an individual reproduces
sexually (passing through a haploid gamete stage in its life-cycle), but provides both of the gametes. For example,
the nematode worm C. elegans can reproduce by selfing. The mating table for selfing is given in Table 5.6 . The
selfing frequency of a particular genotype is just the frequency of the genotype itself. For a selfing population,
disregarding selection or any other evolutionary

CHAPTER 5. POPULATION GENETICS 65


5.2. DIPLOID GENETICS

progeny frequency
mating frequency AA Aa aa
AA ⊗ P P 0 0
Q Q 14 Q
Aa ⊗
14 12
Q
aa ⊗ R 0 0 R
Totals 1 P + 14 Q 12 Q R+ 14 Q

Table 5.6: Selfing mating table.

forces, the genotype frequencies evolve as

P′=P+1 (5.9)
4 Q, Q ′ = 1 2 Q, R ′ = R+ 1 4 Q.

Assuming an initially heterozygous population, we solve ( 5.9 ) with the initial


conditions Q 0 = 1 and P 0 = R 0 = 0. In the worm lab, this type of initial population is commonly created by crossing
wildtype homozygous C. elegans males with mu- tant homozygous C. elegans hermaphrodites, where the mutant
allele is recessive. Wildtype hermaphrodite offspring, which are necessarily heterozygous, are then picked to
separate worm plates and allowed to self-fertilize. (Do you see why the experiment is not done with wildtype
hermaphrodites and mutant males?) From the equation for Q ′ in ( 5.9 ), we have Q n = ( 1/2) n, and from symmetry, P n = R n.

Then, since P n + Q n + R n = 1, we obtain the complete solution


( ) n) ( 1 2)
2 n,Rn=1 ( ) n)
Pn=1 1 −( 1 , Qn= 1 −( 1 .
2 2 2 2

The main result to be emphasized here is that the heterozygosity of the population decreases by a factor of two in
each generation. Selfing populations rapidly become homozygous.

Constancy of allele frequencies

We can show that both random mating and selfing do not by themselves change the allele frequencies of a
population, but only reshuffles alleles into different geno- types. For random mating,

p′=P′+1
2Q′

= p2+1
2(2 pq)
= p(p + q)
= p;

and for selfing,

p′=P′+1
2Q′
( ) ( 1 2 Q)
1
= P+1 +
4Q 2

=P+1
2Q
= p.

66 CHAPTER 5. POPULATION GENETICS


5.2. DIPLOID GENETICS

Figure 5.1: Evolution of peppered moths in industrializing England (from H. B. D. Kettlewell).

These results are not general, however, and it is possible to construct other mating systems for which allele
frequencies do change.
Nevertheless, the conservation of allele frequencies by random mating is an im- portant element of
neo-Darwinism. In Darwin’s time, most biologists believed in
blending inheritance, where the genetic material from parents with different traits actually blended in their offspring,
rather like the mixing of paints of different col- ors. If blending inheritance occurred, then genetic variation, or
polymorphism, would eventually be lost over several generations as the “genetic paints” became well-mixed.
Mendel’s work on peas, published in 1866, suggested a particulate theory of inheritance, where the genetic material,
later called genes, maintain their integrity across generations. Sadly, Mendel’s paper was not read by Darwin (who
published The Origin of Species in 1859 and died in 1882) or other influential biolo- gists during Mendel’s lifetime
(Mendel died in 1884). After being rediscovered in

1900, Mendel and his work eventually became widely celebrated.

5.2.2 Spread of a favored allele

We consider the spread of a favored allele in a diploid population. The classic example – widely repeated in biology
textbooks as a modern example of natural selection – is the change in the frequencies of the dark and light
phenotypes of the peppered moth during England’s industrial revolution. The evolutionary story begins with the
observation that pollution killed the light colored lichen on trees during industrialization of the cities. On the one
hand, light colored peppered moths camouflage well on light colored lichens, but are exposed to birds on plain tree
bark. On the other hand, dark colored peppered moths camouflage well on plain tree bark, but are exposed on light
colored lichens (see Fig. 5.1 ). Natural selection therefore favored the light-colored allele in preindustrialized England
and the dark-colored allele during industrialization. It is believed that the dark-colored allele increased rapidly under
natural selection in industrializing England.

We present our model in Table 5.7 . Here, we consider aa as the wildtype geno- type and normalize its fitness to
unity. The allele A is the mutant whose frequency increases in the population. In our example of the peppered moth,
the aa pheno-

CHAPTER 5. POPULATION GENETICS 67


5.2. DIPLOID GENETICS

genotype AA Aa aa
freq. of zygote p2 2 pq q2
relative fitness 1+s 1 + sh 1
freq after selection (1 + s)p 2/ w 2(1 + sh)pq/w q 2/ w
normalization w = ( 1 + s)p 2 + 2(1 + sh)pq + q 2

Table 5.7: A diploid genetic model of the spread of a favored allele assuming ran- dom mating.

type is light colored and the AA phenotype is dark colored. The color of the Aa
phenotype depends on the relative dominance of A and a. Usually, no pigment results in light color and is a
consequence of nonfunctioning pigment-producing genes. One functioning pigment-producing allele is usually
sufficient to result in a dark-colored moth. With A a functioning pigment-producing allele and a the mu- tated
nonfunctioning allele, a is most likely recessive, A is most likely dominant, and the phenotype of Aa is most likely
dark, so h ≈ 1. For the moment, though, we leave h as a free parameter.

We assume randommating, and this simplification is used to write the genotype frequencies as P = p 2, Q = 2 pq, and
R = q 2. Since q = 1 − p, we reduce our problem to determining an equation for p ′ in terms of p. Using p ′ = P s + ( 1/2) Q s,

where p ′ is the A allele frequency in the next generation’s zygotes, and P s and Q s
are the AA and Aa genotype frequencies, respectively, in the present generation after selection,

p ′ = ( 1 + s)p 2 + ( 1 + sh)pq ,
w
where q = 1 − p, and

w = ( 1 + s)p 2 + 2(1 + sh)pq + q 2

= 1 + s(p 2 + 2 hpq).

After some algebra, the final evolution equation written solely in terms of p is

p ′ = ( 1 + sh)p + s( 1 − h)p 2 (5.10)


1 + 2 shp + s( 1 − 2 h)p 2 .

The expected fixed points of this equation are p ∗ = 0 (unstable) and p ∗ = 1 (stable), where our assignment of stability
assumes positive selection coefficients.
The evolution equation ( 5.10 ) in this form is not particularly illuminating. In
general, a numerical solution would require specifying numerical values for s and h,
as well as an initial value for p. Here, to determine how the spread of A depends on the dominance coefficient h, we
investigate analytically the increase of A assuming
s 1. We Taylor-series expand the right-hand-side of ( 5.10 ) in powers of s, keeping terms to order s:

p ′ = ( 1 + sh)p + s( 1 − h)p 2
1 + 2 shp + s( 1 − 2 h)p 2

= p + s(hp + ( 1 − h)p 2)
1 + s( 2 hp + ( 1 − 2 h)p 2) (5.11)
( p + s(hp + ( 1 − h)p 2))( 1 − s( 2 hp + ( 1 − 2 h)p 2) + O( s 2) )
=

= p + sp(h + ( 1 − 3 h)p − ( 1 − 2 h)p 2) + O( s 2).

68 CHAPTER 5. POPULATION GENETICS


5.2. DIPLOID GENETICS

disease mutation symptoms


Thalassemia haemoglobin anemia
Sickle cell anemia haemoglobin anemia
Haemophilia blood clotting factor uncontrolled bleeding
Cystic Fibrosis chloride ion channel thick lung mucous
Tay-Sachs disease Hexosaminidase A enzyme nerve cell damage
Fragile X syndrome FMR1 gene mental retardation
Huntington’s disease HD gene brain degeneration

Table 5.8: Seven common monogenic diseases.

If s 1, we expect a small change in allele frequency in each generation, so we


can approximate p ′ − p ≈ dp/dn, where n denotes the generation number, and
p = p(n). The approximate differential equation obtained from ( 5.11 ) is

dp dn = sp(h + ( 1 − 3 h)p − ( 1 − 2 h)p 2).


(5.12)

If A is partially dominant so that h 6= 0 ( e.g., the heterozygous moth is darker than the homozgygous mutant
moth), then the solution to ( 5.12 ) behaves similarly to the solution of a logistic equation: p initially grows
exponentially as p(n) = p 0 exp ( shn), and asymptotes to one for large n. If A is recessive so that h = 0 ( e.g.,

the heterozygous moth is as light-colored as the homozygous mutant moth), then ( 5.12 ) reduces to

dp dn = sp 2 ( 1 − p) ,
for h = 0. (5.13)

Of main interest is the initial growth of p when p( 0) = p 0 1, so that dp/dn ≈ sp 2.


This differential equation may be integrated by separating variables to yield

p(n) = p 0
1 − sp 0 n

≈ p 0( 1 + sp 0 n).

The frequency of a recessive favored allele increases only linearly across genera- tions, a consequence of the
heterozygote being hidden from natural selection. Most likely, the peppered-moth heterozygote is significantly
darker than the light-colored homozygote since the dark colored moth rapidly increased in frequency over a short
period of time.

As a final comment, linear growth in the frequency of A when h = 0 is sensitive to our assumption of random
mating. If selfing occurred, or another type of close family mating, then a recessive favored allele may still increase
exponentially. In this circumstance, the production of homozygous offspring from more frequent heterozygote
pairings allows selection to act more effectively.

5.2.3 Mutation-selection balance

By virtue of self-knowledge, the species with the most known mutant phenotypes is Homo sapiens. There are
thousands of known genetic diseases in humans, many of them caused by mutation of a single gene (called a
monogenic disease). For an easy-to-read overview of genetic disease in humans, see the website

http://www.who.int/genomics/public/geneticdiseases.

CHAPTER 5. POPULATION GENETICS 69


5.2. DIPLOID GENETICS

genotype AA Aa aa
freq. of zygote p2 2 pq q2
relative fitness 1 1 − sh 1−s

freq after selection p 2/ w 2(1 − sh)pq/w ( 1 − s)q 2/ w


normalization w = p 2 + 2(1 − sh)pq + ( 1 − s)q 2

Table 5.9: A diploid genetic model of mutation-selection balance assuming random mating.

Table 5.8 lists seven common monogenic diseases. The first two diseases are main- tained at significant
frequencies in some human populations by heterosis. We will discuss in § 5.2.4 the maintenance of a polymorphism
by heterosis, for which the het- erozygote has higher fitness than either homozygote. It is postulated that Tay-Sachs
disease, prevalent among ancestors of Eastern European Jews, and cystic fibrosis may also have been maintained
by heterosis acting in the past. (Note that the cystic fibrosis gene was identified in 1989 by a Toronto group led by
Lap Chee Tsui, who later became President of the University of Hong Kong.) The other disease genes listed may be
maintained by mutation-selection balance.

Our model for diploid mutation-selection balance is given in Table 5.9 . We


further assume that mutations of type A → a occur in gamete production with frequency u. Back-mutation is
neglected. The gametic frequency of A and a after selection but before mutation is given by ˆ
p = P s + Q s/ 2 and ˆ q = R s + Q s/ 2, and
the gametic frequency of a after mutation is given by q ′ = u ˆ p + ˆ q. Therefore,

q ′ = ( u(p 2 + ( 1 − sh)pq) + (( 1 − s)q 2 + ( 1 − sh)pq))/w,

where

w = p 2 + 2(1 − sh)pq + ( 1 − s)q 2


= 1 − sq( 2 hp + q).

Using p = 1 − q, we write the evolution equation for q ′ in terms of q alone. After some algebra that could be facilitated
using a computer algebra software such as Mathematica, we obtain

q ′ = u + ( 1 − u − sh( 1 + u))q − s( 1 − h( 1 + u))q 2 . (5.14)


1 − 2 shq − s( 1 − 2 h)q 2

To determine the equilibrium solutions of ( 5.14 ), we set q ∗ ≡ q ′ = q to obtain a cubic equation for q ∗. Because of
the neglect of back mutation in our model, one solution readily found is q ∗ = 1, in which all the A alleles have mutated
to
a. The q ∗ = 1 solution may be factored out of the cubic equation resulting in a quadratic equation, with two solutions.
Rather than show the exact result here, we determine equilibrium solutions under two approximations: (i) 0 < u
h, s, and;
(ii) 0 = h < u < s.
First, when 0 < u h, s, we look for a solution of the form q ∗ = au + O( u 2),
with a constant, and Taylor series expand in u ( assuming s, h = O( u 0)). If such a solution exists, then ( 5.14 ) will
determine the unknown coefficient a. We have

au + O( u 2) = u + ( 1 − sh)au + O( u 2)
1 − 2 shau + O( u 2)

= ( 1 + a − sha)u + O( u 2);

70 CHAPTER 5. POPULATION GENETICS


5.2. DIPLOID GENETICS

genotype AA Aa aa
freq: 0 < u s, h 1 +O( u) 2 u/sh + O( u 2) u 2/( sh) 2 + O( u 3)
freq: 0 = h < u < s 1 +O( √ u) 2 √ u/s + O( u) u/s

Table 5.10: Equilibrium frequencies of the genotypes at the diploid mutation- selection balance.

and equating powers of u, we find a = 1 + a − sha, or a = 1/ sh. Therefore,

q ∗ = u/sh + O( u 2), for 0 < u h, s.

Second, when 0 = h < u < s , we substitute h = 0 directly into ( 5.14 ),

q ∗ = u + ( 1 − u)q ∗ − sq 2 ∗ ,
1 − sq 2 ∗

which we then write as a cubic equation q ∗,

q3∗− q2∗− u
s q ∗ + u s = 0.

By factoring this cubic equation, we find

( q ∗ − 1)( q 2 ∗ − u/s) = 0;

and the polymorphic equilibrium solution is

q ∗ = √ u/s, for 0 = h < u < s.

Because q ∗ < 1 only if s > u, this solution does not exist if s < u.
Table 5.10 summarizes our results for the equilibrium frequencies of the geno- types at mutation-selection
balance. The first row of frequencies, 0 < u s, h,
corresponds to a dominant ( h = 1) or partially-dominant ( u h < 1) mutation,
where the heterozygote is of reduced fitness and shows symptoms of the genetic disease. The second row of
frequencies, 0 = h < u < s, corresponds to a recessive mutation, where the heterozygote is symptom-free. Notice that
individuals carry- ing a dominant mutation are twice as prevalent in the population as individuals homozygous for a
recessive mutation (with the same u and s).

A heterozygote carrying a dominant mutation most commonly arises either de novo (by direct mutation of allele A)
or by the mating of a heterozygote with a wildtype. The latter is more common for s
1, while the former must occur for
s = 1 (a heterozygote with an s = h = 1 mutation by definition does not reproduce). One of the most common
autosomal dominant genetic diseases is Huntington’s disease, resulting in brain deterioration during middle age.
Because individuals with Huntington’s disease have children before disease symptoms appear, s is small and the
disease is usually passed to offspring by the mating of a (heterozygote) with a wildtype homozygote. For a recessive
mutation, a mutant homozygote usually occurs by the mating of two heterozygotes. If both parents carry a single
recessive disease allele, then their child has a 1/4 chance of getting the disease.

5.2.4 Heterosis

Heterosis, also called overdominance or heterozygote advantage, occurs when the het- erozygote has higher fitness
than either homozygote. The best-known examples

CHAPTER 5. POPULATION GENETICS 71


5.3. FREQUENCY-DEPENDENT SELECTION

genotype AA Aa aa
freq. of zygote p2 2 pq q2
relative fitness 1−s 1 1−t
freq after selection (1 − s)p 2/ w 2 pq/w ( 1 − t)q 2/ w
normalization w = ( 1 − s)p 2 + 2 pq + ( 1 − t)q 2

Table 5.11: A diploid genetic model of heterosis assuming random mating.

are sickle-cell anemia and thalassemia, diseases that both affect hemoglobin, the oxygen-carrier protein of red
blood cells. The sickle-cell mutations are most com- mon in people of West African descent, while the thalassemia
mutations are most common in people from the Mediterranean and Asia. In Hong Kong, the television stations
occasionally play public service announcements concerning thalassemia. The heterozygote carrier of the sickle-cell
or thalassemia gene is healthy and resis- tant to malaria; the wildtype homozygote is healthy, but susceptible to
malaria; the mutant homozygote is sick with anemia. In class, we will watch the short video, A Mutation Story, about
the sickle cell gene.

Table 5.11 presents our model of heterosis. Both homozygotes are of lower fit- ness than the heterozygote,
whose relative fitness we arbitrarily set to unity. Writing the equation for p ′, we have

p ′ = ( 1 − s)p 2 + pq
1 − sp 2 − tq 2

p − sp 2
=
1 − t + 2 tp − ( s + t)p 2 .

At equilibrium, p ∗ ≡ p ′ = p, and we obtain a cubic equation for p ∗:

( s + t)p 3 ∗ − ( s + 2 t)p 2 ∗ + tp ∗ = 0. (5.15)

Evidently, p ∗ = 0 and p ∗ = 1 are fixed points, and ( 5.15 ) can be factored as

p( 1 − p) (t − ( s + t)p) = 0.

The polymorphic solution is therefore

p∗=t
s+t,q∗=s s+t,

valid when s, t > 0. Since the value of q ∗ can be large, recessive mutations that cause disease, yet are highly
prevalent in a population, are suspected to provide some benefit to the heterozygote. However, only a few genes
are unequivocally known to exhibit heterosis.

5.3 Frequency-dependent selection

A polymorphism may also result from frequency-dependent selection. A well- known model of frequency-dependent
selection is the Hawk-Dove game. Most commonly, frequency-dependent selection is studied using game theory,
and fol- lowing John Maynard Smith, one looks for an evolutionarily stable strategy (ESS).

We consider two phenotypes: Hawk and Dove, with no mating between dif- ferent phenotypes (for example,
different phenotypes may correspond to different

72 CHAPTER 5. POPULATION GENETICS


5.3. FREQUENCY-DEPENDENT SELECTION

player \ opponent H D
H E HH = − 2 E HD = 2
D E DH = 0 E DD = 1

Table 5.12: General payoff matrix for the Hawk-Dove game, and the usually as- sumed values. The payoffs are
payed to the player (first column) when playing against the opponent (first row).

species, such as hawks and doves). We describe the Hawk-Dove game as follows: (i) when Hawk meets Dove,
Hawk gets the resource and Dove retreats before injury; (ii) when two Hawks meet, they engage in an escalating
fight, seriously risking injury, and; (iii) when two Doves meet, they share the resource.

The Hawk-Dove game is modeled by a payoff matrix, as shown in Table 5.12 .


The player in the first column receives the payoff when playing the opponent in the first row. For instance, Hawk
playing Dove gets the payoff E HD. The numerical values are commonly chosen such that E HH < E DH < E DD < E HD, that is,
Hawk playing Dove does better than Dove playing Dove does better than Dove playing Hawk does better than Hawk
playing Hawk.

Frequency-dependent selection occurs because the expected payoff to a Hawk or a Dove depends on the
frequency of Hawks and Doves in the population. For example, a Hawk in a population of Doves does well, but a
Hawk in a population of Hawks does poorly.

A population of all Doves is unstable to invasion by Hawks (because Hawk playing against Dove does better
than Dove playing against Dove), and similarly a population of all Hawks is unstable to invasion by Doves. These
two possible equilibria are therefore unstable, and the stable equilibrium consists of a mixed population of Hawks
and Doves. In game theory, this mixed equilibrium is called a mixed Nash equilibrium, and is determined by
assuming that the expected payoff to a Hawk in a mixed population of Hawks and Doves is the same as the
expected payoff to a Dove.

With p the frequency of Hawks and q the frequency of Doves, the expected payoff to a Hawk is pE HH + qE HD, and
the expected payoff to a Dove is pE DH +
qE DD, so that the mixed Nash equilibrium satisfies

pE HH + qE HD = pE DH + qE DD.

Substituting in q = 1 − p and solving for p, we obtain

E HD − E DD
p=
( E HD − E DD) + ( E DH − E HH) ,

and with the numerical values in Table 5.12 ,

2−1
p∗=
( 2 − 1) + (0 + 2)
= 1/3.

Thus the stable polymorphic population maintained by frequency-dependent selec- tion consists of 1/3 Hawks and
2/3 Doves.

CHAPTER 5. POPULATION GENETICS 73


5.4. LINKAGE EQUILIBRIUM

5.4 Recombination and the approach to linkage equilibrium

When considering a polymorphism at a single genetic locus, we assumed two dis- tinct alleles, A and a. The diploid
then occurs as one of three types: AA, Aa and
aa. We now consider a polymorphism at two genetic loci, each with two distinct alleles. If the alleles at the first
genetic loci are A and a, and those at the second
B and b, then four distinct haploid gametes are possible, namely AB, Ab, aB and
ab. Ten distinct diplotypes are possible, obtained by forming pairs of all possible haplotypes. We can write these ten
diplotypes as AB/AB, AB/Ab, AB/aB, AB/ab, Ab/Ab, Ab/aB, Ab/ab, aB/aB, aB/ab, and ab/ab, where the numerator
represents the haplotype from one parent, the denominator represents the haplotype from the other parent. We do
not distinguish here which haplotype came fromwhich parent.

To proceed further, we define the allelic and gametic frequencies for our two loci problem in Table 5.13 . If the
probability that a gamete contains allele A or a does not depend on whether the gamete contains allele B or b, then
the two loci are said to be independent. Under the assumption of independence, the gametic frequencies are the
products of the allelic frequencies, i.e., p AB = p A p B, p Ab = p A p b, etc.

Often, the two loci are not independent. This can be due to epistatic selection, or
epistasis. As an example, suppose that two loci in humans influence height, and that the most fit genotype is the one
resulting in an average height. Selection that favors the average population value of a trait is called normalizing or
stabilizing. Suppose that A and B are hypothetical tall alleles, a and b are short alleles, and a person with two tall
and two short alleles obtains average height. Then selection may favor the specific genotypes AB/ab, Ab/Ab, Ab/aB, and
aB/aB. Selection may act against both the genotypes yielding above average heights, AB/AB, AB/Ab,

and AB/aB, and those yielding below average heights, Ab/ab, aB/ab and ab/ab.
Epistatic selection occurs because the fitness of the A, a loci depends on which alleles are present at the B, b loci.
Here, A has higher fitness when paired with b
than when paired with B.
The two loci may also not be independent because of a finite population size (i.e., stochastic effects). For
instance, suppose a mutation a → A occurs only once in a finite population (in an infinite population, any possible
mutation occurs an infinite number of times), and that A is strongly favored by natural selection. The frequency of A may
then increase. If a nearby polymorphic locus on the same chromosome as A happens to be B ( say, with a
polymorphism b in the population), then AB gametes may substantially increase in frequency, with Ab absent. We
say that the allele B hitchhikes with the favored allele A.

When the two loci are not independent, we say that the loci are in gametic phase disequilibrium, or more
commonly linkage disequilibrium, sometimes abbreviated as LD. When the loci are independent, we say they are in linkage
equilibrium. Here, we will model how two loci, initially in linkage disequilibrium, approach linkage equilibrium through
the process of recombination.

To begin, we need a rudimentary understanding of meiosis. During meiosis, a

allele or gamete genotype A a B b AB Ab aB ab


frequency pApa pBpb p AB p Ab p aB p ab

Table 5.13: Definitions of allelic and gametic frequencies for two genetic loci each with two alleles.

74 CHAPTER 5. POPULATION GENETICS


5.4. LINKAGE EQUILIBRIUM

Figure 5.2: A schematic of crossing-over and recombination during meiosis (figure from Access Excellence @ the
National Health Museum)

diploid cell’s DNA, arranged in very long molecules called chromosomes, is repli- cated once and separated twice,
producing four haploid cells, each containing half of the original cell’s chromosomes. Sexual reproduction results in syngamy,
the fusing of a haploid egg and sperm cell to form a diploid zygote cell.

Fig. 5.2 presents a schematic of meiosis and the process of crossing-over resulting in recombination. In a
diploid, each chromosome has a corresponding sister chro- mosome, one chromosome originating from the egg,
one from the sperm. These sibling chromosomes have the same genes, but possibly different alleles. In Fig. 5.2 , we
schematically show the alleles a, b, c on the light chromosome, and the alleles

A, B, C on its sister’s dark chromosome. In the first step of meiosis, each chro- mosome replicates itself exactly. In
the second step, sister chromosomes exchange genetic material by the process of crossing-over. All four
chromosomes then sepa- rate into haploid cells. Notice from the schematic that the process of crossing-over can
result in genetic recombination. Suppose that the schematic of Fig. 5.2 repre- sents the production of sperm by a
male. If the chromosome from the male’s father contains the alleles ABC and that from the male’s mother abc, recombination
can result in the sperm containing a chromosome with alleles ABc ( the third gamete in Fig. 5.2 ). We say this
chromosome is a recombinant; it contains alleles from both its paternal grandfather and paternal grandmother. It is
likely that the precise com- bination of alleles on this recombinant chromosome has never existed before in a single
person. Recombination is the reason why everybody, with the exception of identical twins, is genetically unique.

Genes that occur on the same chromosome are said to be linked. The closer the genes are to each other on
the chromosome, the tighter the linkage, and the less likely recombination will separate them. Tightly linked genes
are likely to be inher- ited from the same grandparent. Genes on different chromosomes are by definition unlinked;
independent assortment of chromosomes results in a 50% chance of a gamete receiving either grandparents’
genes.

CHAPTER 5. POPULATION GENETICS 75


5.4. LINKAGE EQUILIBRIUM

To define and model the evolution of linkage disequilibrium, we first obtain allele frequencies from gametic
frequencies by

p A = p AB + p Ab, p a = p aB + p ab,
p B = p AB + p aB, p b = p Ab + p ab. (5.16)

Since the frequencies sum to unity,

p A + p a = 1, p B + p b = 1, p AB + p Ab + p aB + p ab = 1. (5.17)

There are three independent gametic frequencies and only two independent allelic frequencies, so in general it is
not possible to obtain the gametic frequencies from the allelic frequencies without assuming an additional constraint
such as linkage equilibrium. We can, however, introduce an additional variable D, called the co- efficient of linkage
disequilibrium, and define D to be the difference between the gametic frequency p AB and what this gametic
frequency would be if the loci were in linkage equilibrium:

p AB = p A p B + D. (5.18a)

Using p AB + p Ab = p A to eliminate p AB in ( 5.18a ), we obtain

p Ab = p A p b − D. (5.18b)

Likewise, using p AB + p aB = p B,

p aB = p a p B − D; (5.18c)

and using p aB + p ab = p a,
p ab = p a p b + D. (5.18d)

With our definition, positive linkage disequilibrium ( D > 0) implies excessive AB


and ab gametes and deficient Ab and aB gametes; negative linkage disequilibrium ( D < 0) implies the opposite. D attains
its maximum value of 1/4 when p AB =
p ab = 1/2, and attains its minimum value of − 1/4 when p Ab = p aB = 1/2. An equality obtainable from ( 5.18 ) that we will
later find useful is

p AB p ab − p Ab p aB = ( p A p B + D)(p a p b + D) − ( p A p b − D)(p a p B − D)
= D(p A p B + p a p b + p A p b + p a p B)
= D. (5.19)

Without selection and mutation, D evolves only because of recombination. With primes representing the values
in the next generation, and using p ′ A = p A and
p ′ B = p B because sexual reproduction by itself does not change allele frequencies,

D ′ = p ′ AB − p ′ A p ′ B

= p ′ AB − p A p B

= p ′ AB − ( p AB − D)
)
= D+ (p ′ AB − p AB ,

where we have used ( 5.18a ) to obtain the third equality. The change in D is therefore equal to the change in
frequency of the AB gametes,

D ′ − D = p ′ AB − p AB. (5.20)

76 CHAPTER 5. POPULATION GENETICS


5.4. LINKAGE EQUILIBRIUM

gamete freq / diploid freq


diploid dip freq AB Ab aB ab
AB/AB p 2 AB 1 0 0 0
AB/Ab 2 p AB p Ab 1/2 1/2 0 0
AB/aB 2 p AB p aB 1/2 0 1/2 0
AB/ab 2 p AB p ab ( 1 − r)/ 2 r/ 2 r/ 2 ( 1 − r)/ 2
Ab/Ab p 2 Ab 0 1 0 0
Ab/aB 2 p Ab p aB r/ 2 ( 1 − r)/ 2 (1 − r)/ 2 r/ 2
Ab/ab 2 p Ab p ab 0 1/2 0 1/2
aB/aB p 2 aB 0 0 1 0
aB/ab 2 p aB p ab 0 0 1/2 1/2
ab/ab p 2 ab 0 0 0 1

Table 5.14: Computation of gamete frequencies.

To understand why gametic frequencies change across generations, we should first recognize when they do
not change. Without genetic recombination, chromo- somes maintain their exact identity across generations.
Chromosome frequencies without recombination are therefore constant, and for genetic loci on the same
chromosome with alleles A,a and B,b, say, p ′ AB = p AB. In an infinite population without selection or mutation, gametic
frequencies change only for genetic loci in linkage disequilibrium on different chromosomes, or for genetic loci in
linkage dis- equilibrium on the same chromosome subjected to genetic recombination.

We will compute the frequency p ′ AB of AB gametes in the next generation, given the frequency p AB of AB gametes
in the present generation, using two different methods. The first method uses a mating table. The second method
makes a direct probability argument.

The mating table is shown in Table 5.14 . The first column is the parent diplotype
before meiosis. The second column is the diplotype frequency assuming random mating. The next four columns are
the haploid genotype frequencies (normalized by the corresponding diploid frequencies to simplify the table
presentation). Here, we define r to be the frequency at which the gamete arises from a combination of grandmother
and grandfather genes. If the A,a and B,b loci occur on the same chromosome, then r is the recombination
frequency due to crossing-over. If the

A,a and B,b loci occur on different chromosomes, then because of the independent assortment of chromosomes
there is an equal probability that the gamete contains all grandfather or grandmother genes, or contains a
combination of grandmother and grandfather genes, so that r = 1/2. Notice that crossing-over or independent
assortment is of importance for those pairs of genes for which the grandfather’s and grandmother’s contribution to
the diploid genotype share no common alleles (i.e., AB/ab and Ab/aB genotypes). The frequency p ′ AB in the next
generation is given by the sum of the AB column (after multiplication by the diploid frequencies). Therefore,

p ′ AB = p 2 AB + p AB p Ab + p AB p aB + ( 1 − r)p AB p ab + rp Ab p aB
= p AB( p AB + p Ab + p aB + p ab) + r(p Ab p aB − p AB p ab)
= p AB − rD, (5.21)

where the final equality makes use of ( 5.17 ) and ( 5.19 ). The second method for computing p ′ AB is more direct. An AB
haplotype can arise from a diploid of general type AB/XX without recombination, or a diploid of

CHAPTER 5. POPULATION GENETICS 77


5.5. RANDOM GENETIC DRIFT

type AX/XB with recombination. Therefore,

p ′ AB = ( 1 − r)p AB + rp A p B,

where the first term is from non-recombinants and the second term from recombi- nants. With p A p B = p AB − D, we
have

p ′ AB = ( 1 − r)p AB + r(p AB − D)
= p AB − rD,

the same result as ( 5.21 ). Using ( 5.20 ) and ( 5.21 ),


we derive

D ′ = ( 1 − r)D,

with the solution


D n = D 0( 1 − r) n.

Recombination decreases linkage disequilibrium in each generation by a factor of


( 1 − r). Tightly linked genes on the same chromosome have small values of r; un- linked genes on different
chromosomes have r = 1/2. For unlinked genes, linkage disequilibrium decreases by a factor of two in each
generation. We conclude that very strong selection is required to maintain linkage disequilibrium for genes on
different chromosomes, while weak selection can maintain linkage disequilibrium for tightly linked genes.

5.5 Random genetic drift

Up to now, our simplified genetic models have all assumed an infinite population, neglecting stochastic effects.
Here, we consider a finite population: the resulting stochastic effects on the allelic frequencies is called random
genetic drift. The sim- plest genetic model incorporating random genetic drift assumes a fixed-sized pop- ulation of N
individuals, and models the evolution of a diallelic haploid genetic locus.

There are two widely used genetic models for finite populations: the Wright- Fisher model and the Moran
model. The Wright-Fisher model is most similar to our infinite-population discrete-generation model. In the
Wright-Fisher model, N adult individuals release a very large number of gametes into a gene pool, and the next
generation is formed from N random gametes independently chosen from the gene pool. The Moran model takes a
different approach. In the Moran model, a single evolution step consists of one random individual in the population
reproducing, and another random individual dying, with the population always maintained at the constant size N. Because
two random events occur every generation in the Moran model, and N random events occur every generation in the
Wright-Fisher model, a total of N/ 2 evolution steps in the Moran model is comparable, but not exactly identical, to a
single discrete generation in the Wright-Fisher model. It has been shown, however, that the two models become
identical in the limit of large N.

For our purposes, the Moran model is mathematically more tractable and we adopt it here.

We develop our model analogously to the stochastic population growth model derived in § 3.1 . We let n denote
the number of A- alleles in the population, and
N − n the number of a- alleles. With n = 0, 1, 2, . . . , N a discrete random variable,

78 CHAPTER 5. POPULATION GENETICS


5.5. RANDOM GENETIC DRIFT

the probability mass function p n( g) denotes the probability of having n A- alleles at evolution step g. With n A- alleles in
a population of size N, the probability of an individual with A either reproducing or dying is given by s n = n/N; the cor-
responding probability for a is 1 − s n. There are three ways to obtain a population of n A- alleles at evolution step g + 1.
First, there were n A- alleles at evolution step

g, and the individual reproducing carried the same allele as the individual dying. Second, there were n − 1 A- alleles at
evolution step g, and the individual reproduc- ing has A and the individual dying has a. And third, there were n + 1 A- alleles
at evolution step g, and the individual reproducing has a and the individual dying has A. Multiplying the probabilities
and summing the three cases results in

p n( g + 1) = ( s 2 n + ( 1 − s n) 2) p n( g) + s n − 1( 1 − s n − 1) p n − 1( g)

+ s n+ 1( 1 − s n+ 1) p n+ 1( g). (5.22)

Note that this equation is valid for 0 < n < N, and that the equations at the boundaries—representing the probabilities
that one of the alleles is fixed—are

p 0( g + 1) = p 0( g) + s 1( 1 − s 1) p 1( g), (5.23a)

p N( g + 1) = p N( g) + s N − 1( 1 − s N − 1) p N − 1( g). (5.23b)

The boundaries are called absorbing and the probability of fixation of an allele monotonically increases with each
birth and death. Once the probability of fixation of an allele is unity, there are no further changes in allele
frequencies.
We illustrate the solution of ( 5.22 )-( 5.23 ) in Fig. 5.3 for a small population of size
N = 20, and where the number of A- alleles is precisely known in the founding generation, with either (a) p 10( 0) = 1, or;
(b) p 13( 0) = 1. We plot the probability mass density of the number of A- alleles every N evolution steps, up to 7 N steps,
corresponding to approximately fourteen discrete generations of evolution in the Wright-Fisher model. Notice how
the probability distribution diffuses away from its initial value, and how the probabilities eventually concentrate on the
boundaries, with both p 0 and p 20 monotonically increasing. In fact, after a large number of generations, p 0 approaches
the initial frequency of the a- allele and p 20 approaches the initial frequency of the A- allele (not shown in figures).

To better understand this numerical solution, we consider the limit of large (but not infinite) populations by
expanding ( 5.22 ) in powers of 1/ N. We first rewrite ( 5.22 ) as

p n( g + 1) − p n( g) = s n+ 1( 1 − s n+ 1) p n+ 1( g) − 2 s n( 1 − s n) p n( g)
+ s n − 1( 1 − s n − 1) p n − 1( g). (5.24)

We then introduce the continuous random variable x = n/N, with 0 ≤ x ≤ 1, and the continuous time t = g/(N/ 2). The
variable x corresponds to the frequency of the A- allele in the population, and a unit of time corresponds to
approximately a single discrete generation in the Wright-Fisher model. The probability density function is defined by

P(x, t) = Np n( g), with x = n/N, t = 2 g/N.

Furthermore, we define

S(x) = s n,

CHAPTER 5. POPULATION GENETICS 79


5.5. RANDOM GENETIC DRIFT

(a)

0.2 0.2

0.1 0.1

0 0
0 5 10 15 20 0 5 10 15 20
0.2 0.2

0.1 0.1

0 0
0 5 10 15 20 0 5 10 15 20
0.2 0.2

0.1 0.1

0 0
0 5 10 15 20 0 5 10 15 20
0.2 0.2

0.1 0.1

0 0
0 5 10 15 20 0 5 10 15 20

(b)

0.2 0.2

0.1 0.1

0 0
0 5 10 15 20 0 5 10 15 20

0.2 0.2

0.1 0.1

0 0
0 5 10 15 20 0 5 10 15 20

0.2 0.2

0.1 0.1

0 0
0 5 10 15 20 0 5 10 15 20

0.2 0.2

0.1 0.1

0 0
0 5 10 15 20 0 5 10 15 20

Figure 5.3: p n versus n with N = 20. Evolution steps plotted correspond to g =


0, N, 2 N, . . . , 7 N. ( a) Initial number of A individuals is n = 10. (b) Initial number of
A individuals is n = 13.

80 CHAPTER 5. POPULATION GENETICS


5.5. RANDOM GENETIC DRIFT

and note that S(x) = x. Similarly, we have s n+ 1 = S(x + ∆ x) and s n − 1 = S(x − ∆ x),
where ∆ x = 1/ N. Then, with ∆ t = 2/ N, ( 5.24 ) transforms into

P(x, t + ∆ t) − P(x, t) = S(x + ∆ x)( 1 − S(x + ∆ x))P(x + ∆ x, t)

− 2 S(x)( 1 − S(x))P(x, t) + S(x − ∆ x)( 1 − S(x − ∆ x))P(x − ∆ x, t). (5.25)

To simplify further, we use the well-known central-difference approximation to the second-derivative of a function f
(x),

f ′′( x) = f (x + ∆ x) − 2 f (x) + f (x − ∆ x) + O( ∆ x 2),


∆x2

and recognize the right-hand side of ( 5.25 ) to be the numerator of a second-derivative. With ∆ x = ∆ t/ 2 = 1/ N → 0, and S(x)
= x, we derive to leading-order in 1/ N the partial differential equation

∂ P(x, t) ∂ 2 ( V(x)P(x, t)),


(5.26)
∂t=1 2 ∂x2

with

V(x) = x( 1 − x) . (5.27)
N

To interpret the meaning of the function V(x) we will use the following result from probability theory. For n independent
trials, each with probability of success p
and probability of failure 1 − p, the number of successes, denoted by X, is a binomial random variable with
parameters ( n, p). Well-known results are E[X] = np and Var[ X] = np( 1 − p), where E[. . . ] is the expected value, and
Var[. . . ] is the variance.
Now, the number of A- alleles chosen when forming the next Wright-Fisher gen- eration is a binomial random
variable n ′ with parameters ( N, n/N). Therefore,
E[n ′] = n and Var[ n ′] = n( 1 − n/N). With x ′ = n ′/ N and x = n/N, we have
E[x ′] = x , and Var[ x ′] = x( 1 − x)/N. The function V(x) can therefore be inter- preted as the variance of x over a single
Wright-Fisher generation.
Although ( 5.26 ) and ( 5.27 ) depend explicitly on the population size N, the pop- ulation size can be eliminated by
a simple change of variables. If we let

τ = t/N, (5.28)

then the differential equation ( 5.26 ) transforms to

∂ P(x, τ) ∂ 2 ( x( 1 − x)P(x, τ)),


(5.29)
∂τ = 1 2 ∂x2

independent of N. The change-of-variables given by ( 5.28 ) simply states that a doubling of the population size will
lengthen the time scale of evolution by a cor- responding factor of two. Remember that here we are already working
under the assumption that population sizes are large.

Equation ( 5.29 ) is a diffusion-like equation for the probability distribution func-


tion P(x, τ). A diffusion approximation for studying genetic drift was first intro- duced into population genetics by its
founders Fisher and Wright, and was later extensively developed in the 1950’s by the Japanese biologist Motoo
Kimura. Here, our analysis of this equation relies on a more recent paper by McKane and Wax- man (2007), who
showed how to construct the analytical solution of ( 5.29 ) at the boundaries of x.

CHAPTER 5. POPULATION GENETICS 81


5.5. RANDOM GENETIC DRIFT

For 0 < x < 1, it is suggestive from the numerical solutions shown in Fig. 5.3
that the probability distribution might asymptotically become independent of x.
Accordingly, we look for an asymptotic solution to ( 5.29 ) at the interior values of x
satisfying P(x, τ) = P( τ). Equation ( 5.29 ) becomes

dP( τ) ( x( 1 − x)) ′′ P( τ)

dτ=1 2
= − P( τ),

with asymptotic solution


P( τ) = ce − τ, (5.30)

and we observe that the total probability over the interior values of x decays expo- nentially.

To understand how the solution behaves at the boundaries of x, we require boundary conditions for P(x, τ). In
fact, because P(x, τ) is singular at the bound- aries of x, appropriate boundary conditions can not be obtained directly
from the difference equations given by ( 5.23 ). Rather, boundary conditions are most easily obtained by first
recasting ( 5.29 ) into the form of a continuity equation. We let j(x, τ) denote the so-called probability current.

In a small region of
size ∆ x lying in the interval ( x, x + ∆ x), the time-rate of change of the probabil- ity P(x, τ) ∆ x is due to the flow of
probability into and out of this region. With an appropriate definition of the probability current, we have in general

∂ ( P(x, τ) ∆ x) = j(x, t) − j(x + ∆ x),


∂τ

or as ∆ x → 0,
∂ P(x, τ)
(5.31)
∂τ + ∂ j(x, τ) ∂ x = 0,
which is the usual form of a continuity equation. Identification of ( 5.31 ) with ( 5.29 ) shows that the probability current
of our problem is given by

∂ ∂ (xx( 1 − x)P(x, τ)).


j(x, τ) = − 1 (5.32)
2

Now, since the total probability is unity; that is,

∫1

(5.33)
0 P(x, τ) dx = 1,

probability can not flow in or out of the boundaries of x, and we must therefore have

j( 0, τ) = 0, j( 1, τ) = 0, (5.34)

which are the appropriate boundary conditions for ( 5.29 ). We can look for stationary solutions of ( 5.29 ). Use of the
continuity equation
( 5.31 ) together with the boundary conditions ( 5.34 ) shows that the stationary solu- tion has zero probability current;
that is, j(x) = 0. Integration of j(x) using ( 5.32 ) results in

x( 1 − x)P(x) = c 1,

where c 1 is an integration constant. The readily apparent solution is P(x) = c 1/[ x( 1 − x)], but there are also two less
obvious solutions. Probability distribution functions are allowed to contain singular solutions corresponding to Dirac
delta

82 CHAPTER 5. POPULATION GENETICS


5.5. RANDOM GENETIC DRIFT

functions, and here we consider the possibility of there being Dirac delta functions at the boundaries. By noticing
that both x δ( x) = 0 and (1 − x) δ( 1 − x) = 0, we can see that the general solution for P(x) can be written as

c1
P(x) =
x( 1 − x) + c 2 δ( x) + c 3 δ( 1 − x).

Requiring P(x) to satisfy ( 5.33 ) results in c 1 = 0 and c 2 + c 3 = 1. To determine the remaining free constant, we can
compute the mean frequency of the A- allele in the population. From the continuity equation ( 5.31 ), we have

∫1

∂τ 0 xP(x, τ) dx = − ∫ 1 0x ∂ x dx
∂ j(x, τ)
∫1

=
0 j(x, τ) dx
= 0,

where the first integral on the right-hand-side was done by parts using the bound- ary conditions given by ( 5.34 ),
and the second integral was done using ( 5.32 ) and the vanishing of x( 1 − x)P(x) on the boundaries. The mean
frequency of the A-
allele is therefore a constant—as one would expect for a nondirectional random genetic drift—and we can assume
that its initial value is p. We therefore obtain for our stationary solution

P(x) = ( 1 − p) δ( x) + p δ( 1 − x).

The eventual probability of fixing the A- allele is therefore simply equal to its initial frequency. For example,
suppose that within a population of N individuals homogeneous for a, a single neutral mutation occurs so that one
individual now carries the A- allele. What is the probability that the A- allele eventually becomes fixed? Our result
would yield the probability 1/ N, which is the initial frequency of the A- allele. Intuitively, after a sufficient number of
generations has passed, all living individuals should be descendant from a single ancestral individual living at the
time the single mutation occurred. The probability that that single individual carried the A- allele is just 1/ N.

We further note that Kimura was the first to find an analytical solution of the diffusion equation ( 5.29 ). A solution
method using Fourier transforms can be found in the appendix of McKane and Waxman (2007). In addition to
making use of Dirac delta functions, these authors require Heaviside step functions, Bessel functions, spherical
Bessel functions, hypergeometric functions, Legendre polynomials, and Gegenbauer polynomials. The resulting
solution is of the form

P(x, τ) = Π 0( τ)δ( x) + Π 1( τ)δ( 1 − x) + f (x, τ). (5.35)

If the number of A- alleles are known at the initial instant, and the frequency is p,
then
P(x, 0) = δ( x − p),

and Π 0( 0) = Π 1( 0) = 0; f (x, 0) = δ( x − p). As τ → ∞, we have f (x, τ) → 0;


Π 0( τ) → 1 − p and Π 1( τ) → p.
We can easily demonstrate at least one exact solution of the form ( 5.35 ). For an
initial uniform probability distribution given by

P(x, 0) = 1,

CHAPTER 5. POPULATION GENETICS 83


5.5. RANDOM GENETIC DRIFT

the probability distribution at the interior points remains uniform and is given by ( 5.30 ) with c = 1. If we assume the
form of solution given by ( 5.35 ) together with the requirement of unit probability given by ( 5.33 ), we can obtain the
exact result

( 1 − e − τ) ( δ( x) + δ( 1 − x)) + e − τ.
P(x, τ) = 1
2

References

Kimura, M. Solution of a process of random genetic drift with a continuous model.


Proceeding of the National Academy of Sciences, USA ( 1955) 41, 144-150. McKane, A.J. & Waxman, D. Singular
solutions of the diffusion equation of popu- lation genetics. Journal of Theoretical Biology ( 2007) 247, 849-858.

84 CHAPTER 5. POPULATION GENETICS


Chapter 6

Biochemical Reactions
Biochemistry is the study of the chemistry of life. It can be considered a branch of molecular biology, perhaps
more focused on specific molecules and their reac- tions, or a branch of chemistry focused on the complex chemical
reactions occurring in living organisms. One can guess that the first application of biochemistry hap- pened about
5000 years ago when bread was made using yeast.

Modern biochemistry, however, had a relatively slow start among the sciences, as did modern biology. Isaac
Newton’s publication of Principia Mathematica in 1687 preceded Darwin’s Origin of Species in 1859 by almost 200
years. I find this amazing because the ideas of Darwin are in many ways simpler and easier to understand than the
mathematical theory of Newton. Most of the delay must be attributed to a fundamental conflict between science and
religion. The physical sciences experi- enced this conflict early—witness the famous prosecution of Galileo by the
Catholic Church in 1633, during which Galileo was forced to recant his heliocentric view— but the conflict of religion
with evolutionary biology continues even to this day. Advances in biochemistry were initially delayed because it was
long believed that life was not subject to the laws of science the way non-life was, and that only living things could
produce the molecules of life. Certainly, this was more a religious con- viction than a scientific one. Then Friedrich
Wöhler in 1828 published his landmark paper on the synthesis of urea (a waste product neutralizing toxic ammonia
before excretion in the urine), demonstrating for the first time that organic compounds can be created artificially.

Here, we present mathematical models for some important biochemical reac- tions. We begin by introducing a
useful model for a chemical reaction: the law of mass action. We then model what may be the most important
biochemical reactions, namely those catalyzed by enzymes. Using the mathematical model of enzyme ki- netics, we
consider three fundamental enzymatic properties: competitive inhibition, allosteric inhibition, and cooperativity.

6.1 The law of mass action

The law of mass action describes the rate at which chemicals interact in reactions. It is assumed that different
chemical molecules come into contact by collision be- fore reacting, and that the collision rate is directly proportional
to the number of molecules of each reacting species. Suppose that two chemicals A and B react to form a product
chemical C, written as

A+ B k → C,

with k the rate constant of the reaction. For simplicity, we will use the same symbol
C, say, to refer to both the chemical C and its concentration. The law of mass action says that dC/dt is proportional to
the product of the concentrations A and B, with proportionality constant k. That is,

dC dt = kAB.
(6.1)

85
6.1. THE LAW OF MASS ACTION

Similarly, the law of mass action enables us to write equations for the time-derivatives of the reactant concentrations A
and B:

dA dt = − kAB, dB dt = − kAB.
(6.2)

Notice that when using the law of mass action to find the rate-of-change of a con- centration, the chemical that the
arrow points towards is increasing in concentration (positive sign), the chemical that the arrow points away from is
decreasing in con- centration (negative sign). The product of concentrations on the right-hand-side is always that of
the reactants from which the arrow points away, multiplied by the rate constant that is on top of the arrow.

Equation ( 6.1 ) can be solved analytically using conservation laws. Each reactant,
original and converted to product, is conserved since one molecule of each reactant gets converted into one
molecule of product. Therefore,

d dt (A+C) = 0
= ⇒ A+C = A 0,

d dt (B +C) = 0
= ⇒ B +C = B 0,

where A 0 and B 0 are the initial concentrations of the reactants, and no product is present initially. Using the
conservation laws, ( 6.1 ) becomes

dC dt = k(A 0 − C)(B 0 − C), with C( 0) = 0,

which may be integrated by separating variables. After some algebra, the solution is determined to be

C(t) = A 0 B 0 e( B 0 − A 0) kt − 1 ,
B 0 e( B 0 − A 0) kt − A 0

which is a complicated expression with the simple limits


{A0
if A 0 < B 0,
lim (6.3)
t→∞ C(t) = B0 if B 0 < A 0.

The reaction stops after one of the reactants is depleted; and the final concentration of the product is equal to the
initial concentration of the depleted reactant.
If we also include the reverse reaction,

k+
-
A+ B C,
k−

then the time-derivative of the product is given by

dC dt = k+AB − k − C.

Notice that k+ and k − have different units. At equilibrium, ˙ C = 0, and using the
conservation laws A+C = A 0, B +C = B 0, we obtain

( A 0 − C)(B 0 − C) − k − C = 0,
k+

86 CHAPTER 6. BIOCHEMICAL REACTIONS


6.2. ENZYME KINETICS

from which we define the equilibrium constant K eq by

K eq = k −/ k+,

which has units of concentration. Therefore, at equilibrium, the concentration of the product is given by the solution
of the quadratic equation

C 2 − ( A 0 + B 0 + K eq) C + A 0 B 0 = 0,

with the extra condition that 0 < C < min( A 0, B 0). For instance, if A 0 = B 0 ≡ R 0,
then at equilibrium,
(√ 1 + 4 R 0/ K eq − 1) .
C=R0− 1
2 K eq

If K eq R 0, then A and B have a high affinity, and the reaction proceeds mainly to
C, with C → R 0.
Below are two interesting reactions. In reaction (ii), A is assumed to be held at a constant concentration. (i)

k+
-
A+ X 2X
k−

(ii)

A+ X k 1 → 2 X, X +Y k 2 → 2 Y, Y k 3 → B
Can you write down the equations for ˙ X in reaction (i), and ˙ X and ˙ Y in reaction (ii)?
When normalized properly, the equations from reaction (ii) reduce to the Lotka- Volterra predator-prey equations
introduced in § 1.4 . The chemical concentrations
X and Y, therefore, oscillate in time like predators and their prey.

6.2 Enzyme kinetics

Enzymes are catalysts, usually proteins, that help convert other molecules called substrates into products, but are
themselves unchanged by the reaction. Each en- zyme has high specificity for at least one reaction, and it can
accelerate this reaction by millions of times. Without enzymes, most biochemical reactions are too slow for life to be
possible. Enzymes are so important to our lives that a single amino acid mutation in one enzyme out of the more
than 2000 enzymes in our bodies can result in a severe or lethal genetic disease.

Enzymes do not follow the law of mass action directly: with S substrate, P
product, and E enzyme, the reaction

S+Ek → P + E,

is a poor model since the reaction velocity dP/dt is known to attain a finite limit with increasing substrate
concentration. Rather, Michaelis and Menten (1913) pro- posed the following reaction scheme with an intermediate
molecule:

k1
- k2
S+E C
- P + E,
k−1

CHAPTER 6. BIOCHEMICAL REACTIONS 87


6.2. ENZYME KINETICS

Figure 6.1: A Michaelis-Menten reaction of two substrates converting to one prod- uct. (Drawn by User:IMeowbot,
released under the GNU Free Documentation Li- cense.)

where C is a complex formed by the enzyme and the substrate. A cartoon of the Michaelis-Menten reaction with an
enzyme catalyzing a reaction between two sub- strates is shown in Fig. 6.1 . Commonly, substrate is continuously
provided to the reaction and product is continuously removed. The removal of product has been modeled by
neglecting the reverse reaction P + E → C. A continuous provision of substrate allows us to assume that S is held at
an approximately constant concen- tration.

The differential equations for C and P can be obtained from the law of mass action:

dC/dt = k 1 SE − ( k − 1 + k 2) C, dP/dt = k 2 C.

Biochemists usually want to determine the reaction velocity dP/dt in terms of the substrate concentration S and the
total enzyme concentration E 0. We can eliminate
E in favor of E 0 from the conservation law that the enzyme, free and bound, is conserved; that is

d(E +C)
= 0 = ⇒ E +C = E 0 = ⇒ E = E 0 − C;
dt

and we can rewrite the equation for dC/dt eliminating E:

dC dt = k 1 S(E 0 − C) − ( k − 1 + k 2) C

= k 1 E 0 S − ( k − 1 + k 2 + k 1 S)C. (6.4)

Because S is assumed to be held constant, the complex C is expected to be in equilib- rium, with the rate of
formation equal to the rate of dissociation. With this so-called quasi-steady-state approximation, we may assume
that ˙ C = 0 in ( 6.4 ), and we have

k1E0Sk−1+k2+k
C= 1 S.

88 CHAPTER 6. BIOCHEMICAL REACTIONS


6.3. COMPETITIVE INHIBITION

The reaction velocity is then given by

dP dt = k 2 C

k1k2E0Sk−1+k2
=
+ k1S

= VmS (6.5)
Km+S,

where two fundamental constants are defined:

K m = ( k − 1 + k 2)/ k 1, V m = k 2 E 0. (6.6)

The Michaelis-Menten constant or the Michaelis constant K m has units of concentra- tion, and the maximum reaction
velocity V m has units of concentration divided by time. The interpretation of these constants is obtained by
considering the following limits:

as S → ∞, C → E 0 and dP/dt → V m,

if S = K m, C=1
2 E 0 and dP/dt = 1 2 V m.

Therefore, V m is the limiting reaction velocity obtained by saturating the reaction with substrate so that every
enzyme is bound; and K m is the concentration of S at which only one-half of the enzymes are bound and the
reaction proceeds at one-half maximum velocity.

6.3 Competitive inhibition

Competitive inhibition occurs when inhibitor molecules compete with substrate molecules for binding to the same
enzyme’s active site. When an inhibitor is bound to the enzyme, no product is produced so competitive inhibition
will reduce the velocity of the reaction. A cartoon of this process is shown in Fig. 6.2 . To model competitive
inhibition, we introduce an additional reaction associated with the inhibitor-enzyme binding:

k1
- k2
S+E C1 - P + E,
k−1

k3
-
I+E C 2.
k−3

With more complicated enzymatic reactions, the reaction schematic becomes diffi- cult to interpret. Perhaps an
easier way to visualize the reaction is from the follow- ing redrawn schematic:

CHAPTER 6. BIOCHEMICAL REACTIONS 89


6.3. COMPETITIVE INHIBITION

Figure 6.2: Competitive inhibition. (Drawn by G. Andruk, released under the GNU Free Documentation License.)

k1S
- k2
E C1 - P+E

6 k−1

k3I k−3

?
C2

Here, the substrate S and inhibitor I are combined with the relevant rate con- stants, rather than treated separately.
It is immediately obvious from this redrawn schematic that inhibition is accomplished by sequestering enzyme in the
form of C 2
and preventing its participation in the catalysis of S to P.
Our goal is to determine the reaction velocity ˙ P in terms of the substrate and in-
hibitor concentrations, and the total concentration of the enzyme (free and bound). The law of mass action applied
to the two complexes and the product results in

dC 1
dt = k 1 SE − ( k − 1 + k 2) C 1,
dC 2
dt = k 3 IE − k − 3 C 2,
dP dt = k 2 C 1.

The enzyme, free and bound, is conserved so that

d dt (E +C 1 + C 2) = 0 = ⇒ E +C 1 + C 2 = E 0 = ⇒ E = E 0 − C 1 − C 2.

Under the quasi-equilibrium approximation, ˙ C 1 = ˙ C 2 = 0, so that

k 1 S(E 0 − C 1 − C 2) − ( k − 1 + k 2) C 1 = 0,
k 3 I(E 0 − C 1 − C 2) − k − 3 C 2 = 0,

90 CHAPTER 6. BIOCHEMICAL REACTIONS


6.4. ALLOSTERIC INHIBITION

which results in the following system of two linear equations and two unknowns ( C 1 and C 2):

( k − 1 + k 2 + k 1 S)C 1 + k 1 SC 2 = k 1 E 0 S, (6.7)

k 3 IC 1 + ( k − 3 + k 3 I)C 2 = k 3 E 0 I. (6.8)

We define the Michaelis-Menten constant K m as before, and an additional constant


K i associated with the inhibitor reaction:

Km=k−1+k2 , Ki=k−3
k1 k3.

Dividing ( 6.7 ) by k 1 and ( 6.8 ) by k 3 yields

( K m + S)C 1 + SC 2 = E 0 S, (6.9)

IC 1 + ( K i + I)C 2 = E 0 I. (6.10)

Since our goal is to obtain the velocity of the reaction, which requires determining
C 1, we multiply ( 6.9 ) by ( K i + I) and ( 6.10 ) by S, and subtract:

( K m + S)(K i + I)C 1 + S(K i + I)C 2 = E 0( K i + I)S


− SIC 1 + S(K i + I)C 2 = E 0 SI

(( K m + S)(K i + I) − SI)C 1 = K i E 0 S;

or after cancellation and rearrangement

KiE0SKmKi+KiS
C1=
+K m I
E 0 S K m( 1 + I/K i) + S
= .

Therefore, the reaction velocity is given by

dP dt = ( k 2 E 0) S K m( 1 + I/K i)
+ S

= VmS (6.11)
K ′m + S ,

where

V m = k 2 E 0, K ′ m= K m( 1 + I/K i). (6.12)

By comparing the inhibited reaction velocity ( 6.11 ) and ( 6.12 ) with the uninhibited reaction velocity ( 6.5 ) and ( 6.6 ),
we observe that inhibition increases the Michaelis- Menten constant of the reaction, but leaves unchanged the
maximum reaction veloc- ity. Since the Michaelis-Menten constant is defined as the substrate concentration
required to attain one-half of the maximum reaction velocity, addition of an in- hibitor with a fixed substrate
concentration acts to decrease the reaction velocity. However, a reaction saturated with substrate still attains the
uninhibited maximum reaction velocity.

CHAPTER 6. BIOCHEMICAL REACTIONS 91


6.4. ALLOSTERIC INHIBITION

Figure 6.3: Allosteric inhibition. (Unknown artist, released under the GNU Free Documentation License.)

6.4 Allosteric inhibition

The term allostery comes from the Greek word allos, meaning different, and stereos,
meaning solid, and refers to an enzyme with a regulatory binding site separate from its active binding site. In our
model of allosteric inhibition, an inhibitor molecule is assumed to bind to its own regulatory site on the enzyme,
resulting in either a lowered binding affinity of the substrate to the enzyme, or a lowered conversion rate of
substrate to product. A cartoon of allosteric inhibition due to a lowered binding affinity is shown in Fig. 6.3 . In
general, we need to define three complexes: C 1 is the complex formed from substrate and enzyme; C 2 from
inhibitor and enzyme, and; C 3 from substrate, in- hibitor, and enzyme. We write the chemical reactions as follows:

k1S
- k2
E C1 - P+E

6 k−1 6

k3I k−3 k′3I k ′− 3

? k′1S ?
- k′2
C2 C3 - P +C 2

k ′− 1

The general model for allosteric inhibition with ten independent rate constants appears too complicated to
analyze. We will simplify this general model to one with fewer rate constants that still exhibits the unique features of
allosteric inhibition. One possible but uninteresting simplification assumes that if I binds to E, then S

does not; however, this reduces allosteric inhibition to competitive inhibition and loses the essence of allostery.
Instead, we simplify by allowing both I and S to simultaneously bind to E, but we assume that the binding of I prevents
substrate conversion to product. With this simplification, k ′ 2 = 0. To further reduce the number of independent rate
constants, we assume that the binding of S to E is

92 CHAPTER 6. BIOCHEMICAL REACTIONS


6.4. ALLOSTERIC INHIBITION

unaffected by the bound presence of I, and the binding of I to E is unaffected by the bound presence of S. These
approximations imply that all the primed rate constants equal the corresponding unprimed rate constants, e.g., k ′ 1 = k 1,
etc. With these simplifications, the schematic of the chemical reaction simplifies to

k1S
- k2
E C1 - P+E

6 k−1 6

k3I k−3 k3I k−3

? k1S ?
-
C2 C3
k−1

and now there are only five independent rate constants. We write the equations for the complexes using the law of
mass action:

dC 1
(6.13)
dt = k 1 SE + k − 3 C 3 − ( k − 1 + k 2 + k 3 I)C 1,
dC 2
(6.14)
dt = k 3 IE + k − 1 C 3 − ( k − 3 + k 1 S)C 2,
dC 3
(6.15)
dt = k 3 IC 1 + k 1 SC 2 − ( k − 1 + k − 3) C 3,

while the reaction velocity is given by

dP dt = k 2 C 1.
(6.16)

Again, both free and bound enzyme is conserved, so that E = E 0 − C 1 − C 2 − C 3.


With the quasi-equilibrium approximation ˙ C 1 = ˙ C 2 = ˙ C 3 = 0, we obtain a system
of three equations and three unknowns: C 1, C 2 and C 3. Despite our simplifica- tions, the analytical solution for the
reaction velocity remains messy (see Keener & Sneyd, referenced at the chapter’s end) and not especially
illuminating. We omit the complete analytical result here and determine only the maximum reaction velocity.

The maximum reaction velocity V ′ m for the allosteric-inhibited reaction is defined


as the time-derivative of the product concentration when the reaction is saturated with substrate; that is,

Vm
′ = lim
S → ∞ dP/dt

= k 2 lim
S→∞ C 1.

With substrate saturation, every enzyme will have its substrate binding site occu- pied. Enzymes are either bound
with only substrate in the complex C 1, or bound together with substrate and inhibitor in the complex C 3. Accordingly,
the schematic of the chemical reaction with substrate saturation simplifies to

CHAPTER 6. BIOCHEMICAL REACTIONS 93


6.5. COOPERATIVITY

k2
C1 - P +C 1

k3I k−3

?
C3

The equations for C 1 and C 3 with substrate saturation are thus given by

dC 1
(6.17)
dt = k − 3 C 3 − k 3 IC 1,
dC 3
(6.18)
dt = k 3 IC 1 − k − 3 C 3,

and the quasi-equilibrium approximation yields the single independent equation

C 3 = ( k 3/ k − 3) IC 1
= ( I/K i) C 1, (6.19)

with K i = k − 3/ k 3 as before. The equation expressing the conservation of enzyme is given by E 0 = C 1 + C 3. This
conservation law, together with ( 6.19 ), permits us to solve for C 1:

C1=E0
1 + I/K i .

Therefore, the maximum reaction velocity for the allosteric-inhibited reaction is given by

Vm
′ =k2E0
1 + I/K i

= Vm
1 + I/K i ,

where V m is the maximum reaction velocity of both the uninhibited and the compet- itive inhibited reaction. The
allosteric inhibitor is thus seen to reduce the maximum velocity of the uninhibited reaction by the factor (1 + I/K i), which
may be large if the concentration of allosteric inhibitor is substantial.

6.5 Cooperativity

Enzymes and other protein complexes may have multiple binding sites, and when a substrate binds to one of these
sites, the other sites may become more active. A well-studied example is the binding of the oxygen molecule to the
hemoglobin protein. Hemoglobin can bind four molecules of O 2, and when three molecules are bound, the fourth
molecule has an increased affinity for binding. We call this

cooperativity.
We will model cooperativity by assuming that an enzyme has two separated but indistinguishable binding sites
for a substrate S. For example, the enzyme may

94 CHAPTER 6. BIOCHEMICAL REACTIONS


6.5. COOPERATIVITY

Figure 6.4: Cooperativity.

be a protein dimer, composed of two identical sub-proteins with identical binding sites for S. A cartoon of this enzyme
is shown in Fig. 6.4 . Because the two binding sites are indistinguishable, we need consider only two complexes: C 1 and
C 2, with enzyme bound to one or two substrate molecules, respectively. When the enzyme exhibits cooperativity, the
binding of the second substrate molecule has a greater rate constant than the binding of the first. We therefore
consider the following reaction:

k1S
- k2
E C1 - P+E
k−1 6

k3S k−3

? k4
C2 - P +C 1

where cooperativity supposes that k 1 k 3. Application of the law of mass action


results in

dC 1
dt = k 1 SE + (k − 3 + k 4) C 2 − ( k − 1 + k 2 + k 3 S)C 1,
dC 2
dt = k 3 SC 1 − ( k − 3 + k 4) C 2.

Applying the quasi-equilibrium approximation ˙ C 1 = ˙ C 2 = 0 and the conservation


law E 0 = E + C 1 + C 2 results in the following system of two equations and two unknowns: ( k − 1 + k 2 + ( k 1 + k 3) S)C 1 − ( k − 3

+ k 4 − k 1 S)C 2 = k 1 E 0 S, (6.20)

k 3 SC 1 − ( k − 3 + k 4) C 2 = 0. (6.21)

CHAPTER 6. BIOCHEMICAL REACTIONS 95


6.5. COOPERATIVITY

We divide ( 6.20 ) by k 1 and ( 6.21 ) by k 3 and define

K1=k−1+k2 , K2=k−3+k4 , e = k 1/ k 3
k1 k3

to obtain
( e K 1 + ( 1 + e) S)C 1 − ( K 2 − e S) C 2 = e E 0 S,
(6.22)

SC 1 − K 2 C 2 = 0. (6.23)

We can subtract ( 6.23 ) from ( 6.22 ) and cancel e to obtain

( K 1 + S) C 1 + SC 2 = E 0 S. (6.24)

Equations ( 6.23 ) and ( 6.24 ) can be solved for C 1 and C 2:

K2E0SK1K2+K2S
C1= +S2, (6.25)

E0S2
C2= (6.26)
K1K2+K2S+S2,

so that the reaction velocity is given by

dP dt = k 2 C 1 + k 4 C 2

= ( k 2 K 2 + k 4 S) E 0 S (6.27)
K1K2+K2S+S2.

To illuminate this result, we consider two limiting cases: (i) no cooperativity, where the active sites act
independently so that each protein dimer, say, can be con- sidered as two independent protein monomers; (ii)
strong cooperativity, where the binding of the second substrate has a much greater rate constant than the binding of
the first.

Independent active sites

The free enzyme E has two independent binding sites while C 1 has only a single binding site. Consulting the
reaction schematic: k 1 is the rate constant for the binding of S to two independent binding sites; k − 1 and k 2 are the
rate constants for the dissociation and conversion of a single S from the enzyme; k 3 is the rate constant for the
binding of S to a single free binding site, and; k − 3 and k 4 are the rate constants for the dissociation and conversion of
one of two independent S ’s from the enzyme. Accounting for these factors of two and assuming independence of
active sites, we have

k 1 = 2 k 3, k − 3 = 2 k − 1, k 4 = 2 k 2.

We define the Michaelis-Menten constant K m that is representative of the protein monomer with one binding site;
that is,

Km=k−1+k2
k 1/ 2
= 2K1

=1
2 K 2.

96 CHAPTER 6. BIOCHEMICAL REACTIONS


6.5. COOPERATIVITY

n=2
0.9 1

0.8

0.7
n=1

0.6

dP/dt
0.5

0.4

0.3

0.2

0.1

0
0 2 4 6 8 10

Figure 6.5: The reaction velocity dP/dt as a function of the substrate S. Shown are the solutions to the Hill equation
with V m = 1, K m = 1, and for n = 1, 2.

Therefore, for independent active sites, the reaction velocity becomes

dP dt = ( 2 k 2 K m + 2 k 2 S) E 0 S
K 2m + 2 K m S + S 2

= 2k2E0S
Km+S.

The reaction velocity for a dimer protein enzyme composed of independent identi- cal monomers is simply double
that of a monomer protein enzyme, an intuitively obvious result.

Strong cooperativity

We now assume that after the first substrate binds to the enzyme, the second sub- strate binds much more easily,
so that k 1 k 3. The number of enzymes bound
to a single substrate molecule should consequently be much less than the number bound to two substrate
molecules, resulting in C 1 C 2. Dividing ( 6.25 ) by ( 6.26 ),
this inequality becomes
C1
1.
C2=K2 S

Dividing the numerator and denominator of ( 6.27 ) by S 2, we have

dP dt = ( k 2 K 2/ S + k 4) E 0
( K 1/ S)(K 2/ S) + (K 2/ S) + 1 .

To take the limit of this expression as K 2/ S → 0, we set K 2/ S = 0 everywhere except in the first term in the
denominator, since K 1/ S is inversely proportional to k 1 and may go to infinity in this limit. Taking the limit and
multiplying the numerator and denominator by S 2,

dP dt = k 4 E 0 S 2
K1K2+S2.

CHAPTER 6. BIOCHEMICAL REACTIONS 97


6.5. COOPERATIVITY

Here, the maximum reaction velocity is V m = k 4 E 0, and the modified Michaelis- Menten constant is K m = √ K 1 K 2, so that

dP dt = V m S 2
K 2m + S 2 .

In biochemistry, this reaction velocity is generalized to

dP dt = V m S n
Knm+Sn,

known as the Hill equation, and by varying n is used to fit experimental data.


In Fig. 6.5 , we have plotted the reaction velocity dP/dt versus S as obtained from the Hill equation with n = 1 or
2. In drawing the figure, we have taken both V m
and K m equal to unity. It is evident that with increasing n the reaction velocity more rapidly saturates to its maximum
value.

References

Keener, J. & Sneyd, J. Mathematical Physiology. Springer-Verlag, New York (1998). Pg. 30, Exercise 2.

98 CHAPTER 6. BIOCHEMICAL REACTIONS


Chapter 7

Sequence Alignment
The software program BLAST ( B asic L ocal A lignment S earch T ool) uses se- quence alignment algorithms to
compare a query sequence against a database to identify other known sequences similar to the query sequence.
Often, the annota- tions attached to the already known sequences yield important biological informa- tion about the
query sequence. Almost all biologists use BLAST, making sequence alignment one of the most important algorithms
of bioinformatics.

The sequence under study can be composed of nucleotides (from the nucleic acids DNA or RNA) or amino
acids (from proteins). Nucleic acids chain together four different nucleotides: A,C,T,G for DNA and A,C,U,G for
RNA; proteins chain together twenty different amino acids. The sequence of a DNA molecule or of a protein is the
linear order of nucleotides or amino acids in a specified direction, defined by the chemistry of the molecule. There is
no need for us to know the exact details of the chemistry; it is sufficient to know that a protein has distinguishable
ends called the N-terminus and the C-terminus, and that the usual convention is to read the amino acid sequence
from the N-terminus to the C-terminus. Specification of the direction is more complicated for a DNAmolecule than for
a protein molecule because of the double helix structure of DNA, and this will be explained in Section

7.1 .The basic sequence alignment algorithm aligns two or more sequences to high- light their similarity, inserting a
small number of gaps into each sequence (usually denoted by dashes) to align wherever possible identical or similar
characters. For instance, Fig 7.1 presents an alignment using the software tool ClustalW of the hemoglobin
beta-chain from a human, a chimpanzee, a rat, and a zebrafish. The human and chimpanzee sequences are
identical, a consequence of our very close evolutionary relationship. The rat sequence differs from
human/chimpanzee at only 27 out of 146 amino acids; we are all mammals. The zebrafish sequence, though clearly
related, diverges significantly. Notice the insertion of a gap in each of the mammal sequences at the zebra fish
amino acid position 122. This permits the subsequent zebrafish sequence to better align with the mammal
sequences, and implies either an insertion of a new amino acid in fish, or a deletion of an amino acid in mammals.
The insertion or deletion of a character in a sequence is called an indel. Mismatches in sequence, such as that
occurring between zebrafish and mammals at amino acid positions 2 and 3 is called a mutation. ClustalW places a ‘*’
on the last line to denote exact amino acid matches across all sequences, and a ‘:’ and ‘.’ to denote chemically
similar amino acids across all sequences (each amino acid has characteristic chemical properties, and amino acids
can be grouped according to similar properties). In this chapter, we detail the algorithms used to align sequences.

7.1 The minimum you need to know about DNA chemistry and the
genetic code

In one of the most important scientific papers ever published, James Watson and Francis Crick, pictured in Fig. 7.2 ,
determined the structure of DNA using a three-

99
7.1. DNA

CLUSTAL W (1.83) multiple sequence alignment

Human VHLTPEEKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGNPKV 60
Chimpanzee VHLTPEEKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGNPKV 60
Rat VHLTDAEKAAVNGLWGKVNPDDVGGEALGRLLVVYPWTQRYFDSFGDLSSASAIMGNPKV 60
Zebrafish VEWTDAERTAILGLWGKLNIDEIGPQALSRCLIVYPWTQRYFATFGNLSSPAAIMGNPKV 60
*. * * ::*: .****:* *::* :**.* *:*******:* :**:**:. *:******

Human KAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHFGK 120


Chimpanzee KAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHFGK 120
Rat KAHGKKVINAFNDGLKHLDNLKGTFAHLSELHCDKLHVDPENFRLLGNMIVIVLGHHLGK 120
Zebrafish AAHGRTVMGGLERAIKNMDNVKNTYAALSVMHSEKLHVDPDNFRLLADCITVCAAMKFGQ 120
* * * :.*:..:. .: ::**:*.*:* ** :*.:******:*****.: :. . ::*:

Human E-FTPPVQAAYQKVVAGVANALAHKYH 146


Chimpanzee E-FTPPVQAAYQKVVAGVANALAHKYH 146
Rat E-FTPCAQAAFQKVVAGVASALAHKYH 146
Zebrafish AGFNADVQEAWQKFLAVVVSALCRQYH 147
*. . . * * :**.:* *..**.::**

Figure 7.1: Multiple alignment of the hemoglobin beta-chain for Human, Chimpanzee, Rat and Zebra fish, obtained
using ClustalW.

dimensional molecular model that makes plain the chemical basis of heredity. The DNA molecule consists of two
strands wound around each other to form the now famous double helix. Arbitrarily, one strand is labeled by the
sequencing group to be the positive strand, and the other the negative strand. The two strands of the DNA molecule
bind to each other by base pairing: the bases of one strand pair with the bases of the other strand. Adenine (A)
always pairs with thymine (T), and guanine (G) always pairs with cytosine (C): A with T, G with C. For RNA, T is
replaced by uracil (U). When reading the sequence of nucleotides from a single strand, the direction of reading must
be specified, and this is possible by referring to the chemical bonds of the DNA backbone. There are of course only
two possible directions to read a linear sequence of bases, and these are denoted as 5’-to-3’ and 3’-to-5’.
Importantly, the two separate strands of the DNA molecule are oriented in opposite directions. Below is the
beginning of the DNA coding sequence for the human hemoglobin beta chain protein discussed earlier:

5’-GTGCACCTGACTCCTGAGGAGAAGTCTGCCGTTACTGCCCTGTGGGGCAAGGTGAACGTG-3’
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
3’-CACGTGGACTGAGGACTCCTCTTCAGACGGCAATGACGGGACACCCCGTTCCACTTGCAC-5’

It is important to realize that there are two unique DNA sequences here, and either one, or even both, can be
coding. Reading from 5’-to-3’, the upper sequence begins ‘GTGCACCTG...’, while the lower sequence ends
‘...CAGGTGCAC’. Here, only the upper sequence codes for the human hemoglobin beta chain, and the lower se-
quence is non-coding.

How is the DNA code read? Enzymes separate the two strands of DNA, and
transcription occurs as the DNA sequence is copied into messenger RNA (mRNA). If the upper strand is the coding
sequence, then the complementary lower strand serves as the template for constructing the mRNA sequence. The
ACUG nucleotides of the mRNA bind to the lower sequence and construct a single stranded mRNA molecule
containing the sequence ‘GUGCACCUG...’, which exactly matches the sequence of the upper, coding strand, but
with T replaced by U. This mRNA is subsequently translated in the ribosome of the cell, where each nucleotide
triplet

100 CHAPTER 7. SEQUENCE ALIGNMENT


7.1. DNA

Figure 7.2: James Watson and Francis Crick posing in front of their DNA model. The original photograph was taken
in 1953, the year of discovery, and was recreated in 2003, fifty years later. Francis Crick, the man on the right, died
in 2004.

Figure 7.3: The genetic code.

CHAPTER 7. SEQUENCE ALIGNMENT 101


7.2. BRUTE FORCE ALIGNMENT

codes for a single amino acid. The triplet coding of nucleotides for amino acids is the famous genetic code, shown in
Fig. 7.3 . Here, the translation to amino acid sequence is ‘VHL...’, where we have used the genetic code ‘GUG’ = V,
‘CAC’=H, ‘CUG’=L. The three out of the twenty amino acids used here are V = Valine, H = Histidine, and L =
Leucine.

7.2 Sequence alignment by brute force

One (bad) approach to sequence alignment is to align the two sequences in all possi- ble ways, score the
alignments with an assumed scoring system, and determine the highest scoring alignment. The problem with this
brute-force approach is that the number of possible alignments grows exponentially with sequence length; and for
sequences of reasonable length, the computation is already impossible. For exam- ple, the number of ways to align
two sequences of 50 characters each—a rather small alignment problem—is about 1.5 × 10 37, already an
astonishingly large number. It is informative to count the number of possible alignments between two sequences
since a similar algorithm is used for sequence alignment.

Suppose we want to align two sequences. Gaps in either sequence are allowed but a gap can not be aligned
with a gap. By way of illustration, we demonstrate the three ways that the first character of the upper-case alphabet
and the lower-case alphabet may align:

A -A A-
| , || , || a a-
-a

and the five ways in which the first two characters of the upper-case alphabet can align with the first character of the
lower-case alphabet:

AB AB AB- A-B - AB
|| , || , ||| , ||| , ||| .
- a a- – a -a- a–

A recursion relation for the total number of possible alignments of a sequence of i characters with a sequence
of j characters may be derived by considering the alignment of the last character. There are three possibilities that
we illustrate by assuming the i th character is ‘F’ and the j th character is ‘d’: (1) i − 1 characters of the first sequence
are already aligned with j − 1 characters of the second sequence, and the i th character of the first sequence aligns
exactly with the j th character of the second sequence:

. . .F
||||

. . .d

(2) i − 1 characters of the first sequence are aligned with j characters of the second sequence and the i th character
of the first sequence aligns with a gap in the second sequence:

. . .F
||||

. . . -

(3) i characters of the first sequence are aligned with j − 1 characters of the second sequence and a gap in the first
sequence aligns with the j th character of the second sequence:

102 CHAPTER 7. SEQUENCE ALIGNMENT


7.3. DYNAMIC PROGRAMMING

. . . -
||||
. . .d

If C(i, j) is the number of ways to align an i character sequence with a j character sequence, then, from our counting,

C(i, j) = C(i − 1, j − 1) + C(i − 1, j) +C(i, j − 1). (7.1)

This recursion relation requires boundary conditions. Because there is only one way to align an i > 0 character
sequence against a zero character sequence (i.e., i
characters against i gaps) the boundary conditions are C( 0, j) = C(i, 0) = 1 for all
i, j > 0 We may also add the additional boundary condition C( 0, 0) = 1, obtained from the known result C( 1, 1) = 3.
Using the recursion relation ( 7.1 ), we can construct the following dynamic matrix

to count the number of ways to align the two five-character sequences a 1 a 2 a 3 a 4 a 5


and b 1 b 2 b 3 b 4 b 5:

- b1 b2 b3 b4 b5
- 11 1 1 1 1
a113 5 7 9 11
a 2 1 5 13 25 41 61
a 3 1 7 25 63 129 231
a 4 1 9 41 129 321 681
a 5 1 11 61 231 681 1683

The size of this dynamic matrix is 6 × 6, and for convenience we label the rows and columns starting from zero (i.e.,
row 0, row 1, . . . , row 5). This matrix was con- structed by first writing − a 1 a 2 a 3 a 4 a 5 to the left of the matrix and − b
1b2b3b4b5
above the matrix, then filling in ones across the zeroth row and down the zeroth column to satisfy the boundary
conditions, and finally applying the recursion re- lation directly by going across the first row from left-to-right, the
second row from left-to-right, etc. To demonstrate the filling in of the matrix, we have across the first row: 1 + 1 + 1 = 3,
1 + 1 + 3 = 5, 1 + 1 + 5 = 7, etc, and across the second row: 1 + 3 + 1 = 5, 3 + 5 + 5 = 13, 5 + 7 + 13 = 25, etc. Finally,
the last element entered gives the number of ways to align two five character sequences: 1683, already a
remarkably large number.

It is possible to solve analytically the recursion relation ( 7.1 ) for C(i, j) using generating functions. Although the
solution method is interesting—and in fact was shown to me by a student—the final analytical result is messy and
we omit it here. In general, computation of C(i, j) is best done numerically by constructing the dynamic matrix.

7.3 Sequence alignment by dynamic programming

Two reasonably sized sequences cannot be aligned by brute force. Luckily, there is another algorithm borrowed
from computer science, dynamic programming, that makes use of a dynamic matrix.

What is needed is a scoring system to judge the quality of an alignment. The goal is to find the alignment that
has the maximum score. We assume that the alignment of character a i with character b j has the score S(a i, b j). For
example,

CHAPTER 7. SEQUENCE ALIGNMENT 103


7.3. DYNAMIC PROGRAMMING

when aligning two DNA sequences, a match (A-A, C-C, T-T, G-G) may be scored as +2, and a mismatch (A-C, A-T,
A-G, etc.) scored as − 1. We also assume that an indel (a nucleotide aligned with a gap) is scored as g, with a typical
value for DNA alignment being g = − 2. In the next section, we develop a better and more widely used model for indel
scoring that distinguishes gap openings from gap extensions.

Now, let T(i, j) denote the maximum score for aligning a sequence of length
i with a sequence of length j. We can compute T(i, j) provided we know T(i −
1, j − 1), T(i − 1, j) and T(i, j − 1). Indeed, our logic is similar to that used when counting the total number of
alignments. There are again three ways to compute
T(i, j): ( 1) i − 1 characters of the first sequence are aligned with j − 1 characters of the second sequence with
maximum score T(i − 1, j − 1), and the i th character of the first sequence aligns with the j th character of the second
sequence with updated maximum score T(i − 1, j − 1) + S(a i, b j); ( 2) i − 1 characters of the first sequence are aligned
with j characters of the second sequence with maximum score T(i −

1, j), and the i th character of the first sequence aligns with a gap in the second sequence with updated maximum
score T(i − 1, j)+ g, or; (3) i characters of the first sequence are aligned with j − 1 characters of the second sequence
with maximum score T(i, j − 1), and a gap in the first sequence aligns with the j th character of the second sequence
with updated maximum score T(i, j − 1) + g. We then compare these three scores and assign T(i, j) to be the maximum;
that is,


•• T(i − 1, j − 1) + S(a i, b j),

T(i, j) = max T(i − 1, j) + g, T(i, j − 1) (7.2)


••
+ g.

Boundary conditions give the score of aligning a sequence with a null sequence of gaps, so that

T(i, 0) = T( 0, i) = ig, i > 0, (7.3)

with T( 0, 0) = 0. The recursion ( 7.2 ), together with the boundary conditions ( 7.3 ), can be used

to construct a dynamic matrix. The score of the best alignment is then given by the last filled-in element of the
matrix, which for aligning a sequence of length n
with a sequence of length m is T(n, m). Besides this score, however, we also want to determine the alignment itself.
The alignment can be obtained by tracing back the path in the dynamic matrix that was followed to compute each
matrix element
T(i, j). There could be more than one path, so that the best alignment may be degenerate.

Sequence alignment is always done computationally, and there are excellent software tools freely available on
the web (see § 7.6 ). Just to illustrate the dynamic programming algorithm, we compute by hand the dynamic matrix
for aligning two short DNA sequences GGAT and GAATT, scoring a match as +2, a mismatch as − 1 and an indel as
− 2:

- GAATT
-0 - 2 -4 -6 -8 -10 G -2 2
0 - 2 -4 -6

G -4 0 1 - 1 -3 -5

A -6 -2 2 3 1 -1

T -8 -4 0 1 5 3

In our hand calculation, the two sequences to be aligned go to the left and above the dynamic matrix, leading with a
gap character ‘-’. Row 0 and column 0 are then filled

104 CHAPTER 7. SEQUENCE ALIGNMENT


7.3. DYNAMIC PROGRAMMING

in with the boundary conditions, starting with 0 in position (0, 0) and incrementing by the gap penalty − 2 across row
0 and down column 0. The recursion relation ( 7.2 ) is then used to fill in the dynamic matrix one row at a time moving
from left- to-right and top-to-bottom. To determine the ( i, j) matrix element, three numbers must be compared and the
maximum taken: (1) inspect the nucleotides to the left of row i and above column j and add +2 for a match or -1 for a
mismatch to the

( i − 1, j − 1) matrix element; (2) add − 2 to the ( i − 1, j) matrix element; (3) add − 2 to the ( i, j − 1) matrix element. For
example, the first computed matrix element 2 at position (1, 1) was determined by taking the maximum of (1) 0 + 2 = 2,
since G-G is a match; (2) − 2 − 2 = − 4; (3) − 2 − 2 = − 4. You can test your understanding of dynamic programming by
computing the other matrix elements.

After the matrix is constructed, the traceback algorithm that finds the best align- ment starts at the bottom-right
element of the matrix, here the (4, 5) matrix element with entry 3. The matrix element used to compute 3 was either
at (4, 4) (horizon- tal move) or at (3, 4) (diagonal move). Having two possibilities implies that the best alignment is
degenerate. For now, we arbitrarily choose the diagonal move. We build the alignment from end to beginning with
GGAT on top and GAATT on bottom:

T|T

We illustrate our current position in the dynamic matrix by eliminating all the ele- ments that are not on the traceback
path and are no longer accessible:

- GAATT
-0 - 2 -4 -6 -8
G-22 0 - 2 -4
G-40 1 - 1 -3
A - 6 -2 2 3 1
T 3

We start again from the 1 entry at (3, 4). This value came from the 3 entry at (3, 3) by a horizontal move. Therefore,
the alignment is extended to

-T
||
TT

where a gap is inserted in the top sequence for a horizontal move. (A gap is inserted in the bottom sequence for a
vertical move.) The dynamic matrix now looks like

- GAATT
-0 - 2 -4 -6
G-22 0 -2

G-40 1 -1

A - 6 -2 2 31
T 3

Starting again from the 3 entry at (3, 3), this value came from the 1 entry at (2, 2) in a diagonal move, extending the
alignment to

A-T
|||
ATT

CHAPTER 7. SEQUENCE ALIGNMENT 105


7.4. GAPS

The dynamic matrix now looks like

- GAATT
-0 - 2 -4
G-22 0
G-40 1
A 31
T 3

Continuing in this fashion (try to do this), the final alignment is

GGA-T : : : ,
GAATT

where it is customary to represent a matching character with a colon ‘:’. The trace- back path in the dynamic matrix
is

- GAATT
-0

G 2
G 1
A 31
T 3

If the other degenerate path was initially taken, the final alignment would be

GGAT- :
::
GAATT

and the traceback path would be

- GAATT
-0

G 2
G 1
A 3
T 53

The score of both alignments is easily recalculated to be the same, with 2 − 1 + 2 −


2 + 2 = 3 and 2 − 1 + 2 + 2 − 2 = 3.
The algorithm for aligning two proteins is similar, except match and mismatch scores depend on the pair of
aligning amino acids. With twenty different amino acids found in proteins, the score is represented by a 20 × 20 substitution
matrix. The most commonly used matrices are the PAM series and BLOSUM series of matrices, with BLOSUM62
the commonly used default matrix.

7.4 Gap opening and gap extension penalties

Empirical evidence suggests that gaps cluster, in both nucleotide and protein se- quences. Clustering is usually
modeled by different penalties for gap opening ( g o)
and gap extension ( g e), with g o < g e < 0. For example, the default scoring scheme

106 CHAPTER 7. SEQUENCE ALIGNMENT


7.4. GAPS

for the widely used BLASTN software is +1 for a nucleotide match, − 3 for a nu- cleotide mismatch, − 5 for a gap
opening, and − 2 for a gap extension.
Having two types of gaps (opening and extension) complicates the dynamic pro- gramming algorithm. When an
indel is added to an existing alignment, the scoring increment depends on whether the indel is a gap opening or a
gap extension. For example, the extended alignment

AB AB- ||
to ||| ab
abc

adds a gap opening penalty g o to the score, whereas

A- A– ||
to ||| ab
abc

adds a gap extension penalty g e to the score. The score increment depends not only on the current aligning pair, but
also on the previously aligned pair.
The final aligning pair of a sequence of length i with a sequence of length j can be one of three possibilities
(top:bottom): (1) a i : b j; ( 2) a i : −; ( 3) − : b j. Only for (1) is the score increment S(a i, b j) unambiguous. For (2) or (3), the
score increment depends on the presence or absence of indels in the previously aligned characters. For instance,
for alignments ending with a i : −, the previously aligned character pair could be one of (i) a i − 1 : b j, ( ii) − : b j, ( iii) a i − 1 : −. If
the previous aligned character pair was (i) or (ii), the score increment would be the gap opening penalty

g o; if it was (iii), the score increment would be the gap extension penalty g e.
To remove the ambiguity that occurs with a single dynamic matrix, we need to compute three dynamic matrices
simultaneously, with matrix elements denoted by
T(i, j), T −( i, j) and T −( i, j), corresponding to the three types of aligning pairs. The recursion relations are (1) a i : b j


•• T(i − 1, j − 1) + S(a i, b j),

T(i, j) = max T −( i − 1, j − 1) + S(a i, b j), (7.4)


••
T −( i − 1, j − 1) + S(a i, b j);

(2) a i : −

•• T(i − 1, j) + g o,

T −( i, j) = max T −( i − 1, j) + g e, (7.5)
••
T −( i − 1, j) + g o;

(3) − : b j

•• T(i, j − 1) + g o,

T −( i, j) = max T −( i, j − 1) + g o, (7.6)
••
T −( i, j − 1) + g e;

To align a sequence of length n with a sequence of length m, the best alignment score is the maximum of the scores
obtained from the three dynamic matrices:

•• T(n, m), T −( n, m),

T opt( n, m) = max T −( n, m). (7.7)


••

CHAPTER 7. SEQUENCE ALIGNMENT 107


7.5. LOCAL ALIGNMENTS

The traceback algorithm to find the best alignment proceeds as before by starting with the matrix element
corresponding to the best alignment score, T opt( n, m), and tracing back to the matrix element that determined this
score. The optimum align- ment is then built up from last-to-first as before, but now switching may occur between
the three dynamic matrices.

7.5 Local alignments

We have so far discussed how to align two sequences over their entire length, called a global alignment. Often,
however, it is more useful to align two sequences over only part of their lengths, called a local alignment. In
bioinformatics, the algorithm for global alignment is called “Needleman-Wunsch,” and that for local alignment
“Smith-Waterman.” Local alignments are useful, for instance, when searching a long genome sequence for
alignments to a short DNA segment. They are also useful when aligning two protein sequences since proteins can
consist of multiple domains, and only a single domain may align.

If for simplicity we consider a constant gap penalty g, then a local alignment can be obtained using the rule


•• 0,
••
T(i − 1, j − 1) + S(a i, b j),
T(i, j) = max (7.8)
•• T(i − 1, j) + g, T(i, j − 1)
••
+ g.

After the dynamic matrix is computed using ( 7.8 ), the traceback algorithm starts at the matrix element with the
highest score, and stops at the first encountered zero score.

If we apply the Smith-Waterman algorithm to locally align the two sequences GGAT and GAATT considered
previously, with a match scored as +2, a mismatch as − 1 and an indel as − 2, the dynamic matrix is

- GAATT
- 000000G020000G02
1000A004310T002353

The traceback algorithm starts at the highest score, here the 5 in matrix element
( 4, 4), and ends at the 0 in matrix element (0, 0). The resulting local alignment is

GGAT
: ::
GAAT

which has a score of five, larger than the previous global alignment score of three.

7.6 Software

If you have in hand two or more sequences that you would like to align, there is a choice of software tools available.
For relatively short sequences, you can use the LALIGN program for global or local alignments:

108 CHAPTER 7. SEQUENCE ALIGNMENT


7.6. SOFTWARE

http://embnet.vital-it.ch/software/LALIGN_form.html

For longer sequences, the BLAST software has a flavor that permits local alignment of two sequences:

http://www.ncbi.nlm.nih.gov/blast/bl2seq/wblast2.cgi

Another useful software for global alignment of two or more long DNA sequences is PipMaker:

http://pipmaker.bx.psu.edu/pipmaker/

Multiple global alignments of protein sequences use ClustalW or T-Coffee:

http://www.clustal.org/
http://tcoffee.crg.cat/

Most users of sequence alignment software want to compare a given sequence against a database of sequences.
The BLAST software is most widely used, and comes in several versions depending on the type of sequence and
database search one is performing:

http://www.ncbi.nlm.nih.gov/BLAST/

CHAPTER 7. SEQUENCE ALIGNMENT 109

Anda mungkin juga menyukai