Anda di halaman 1dari 174

Manual Arlequin ver 3.

5 2

1 ARLEQUI N VER 3.5.1.3 USER MANUAL



An Integrated Software Package for
Population Genetics Data Analysis


Aut hor s:
Laurent Excoffier and Heidi Lischer

Comput at ional and Molecular
Populat ion Genet ics Lab ( CMPG)
I nst it ut e of Ecology and Evolut ion
Universit y of Berne
Balt zerst rasse 6
3012 Bern
Swit zer land



Swiss I nst it ut e of Bioinformat ics


E- mail : laurent .excoffier@iee.unibe.ch
URL: ht t p: / / cmpg.unibe.ch/ soft ware/ arlequin3

Sept ember 2011


Manual Arlequin ver 3.5 Table of cont ent s 3

1.1 Tabl e of cont ent s
1 ARLEQUI N v er 3 .5 user man ual 2
1.1 Table of cont ent s 3
2 I nt r oduct i on 8
2.1 Why Arlequin? 8
2.2 Arl equin philosophy 8
2.3 About t his manual 8
2.4 Dat a t ypes handled by Arlequin 9
2.4.1 DNA sequences 10
2.4.2 RFLP Dat a 10
2.4.3 Mi cr osat elli t e dat a 10
2.4.4 St andard dat a 11
2.4.5 Al l el e fr equency dat a 11
2.5 Met hods impl ement ed in Arl equin 12
2.6 Syst em requir ement s 13
2.7 I nst alling and uninst alling Arlequin 13
2.7.1 I nst al l at i on 13
2.7.1.1 Arl equi n 3.5 i nst all at i on 13
2.7.1.2 Arl equi n 3.5 uninst all at i on 14
2.8 List of files included in t he Arl equin package 14
2.9 Arl equin comput ing limit at ions 15
2.10 How t o cit e Arlequi n 15
2.11 Acknowl edgement s 15
2.12 How t o get t he last versi on of t he Arlequin soft war e? 16
2.13 What ' s new in version 3.5 16
2.13.1 Changes i nt roduced i n previ ous r el eases 17
2.13.1.1 Ver si on 3.11 compar ed t o ver si on 3.1 17
2.13.1.2 Ver si on 3.1 compared t o versi on 3.01 18
2.13.1.3 Ver si on 3.01 compar ed t o ver si on 3.0 18
2.13.1.4 Ver si on 3.0 compared t o versi on 2 19
2.14 Report ing bugs and comment s 19
3 Get t i n g st ar t ed 20
3.1 Arl equin configurat i on 20
3.2 Pr eparing input files 20
3.2.1 Defi ning t he Genet i c St ruct ure t o be t est ed 22
3.3 Loading proj ect files int o Arl equin 23
3.4 Sel ect ing analyses t o be performed on your dat a 25
3.5 Cr eat ing and using Set t ing Files 25
3.6 Perfor ming t he anal yses 26
3.7 I nt er rupt ing t he comput at ions 26
3.8 Checking t he r esult s 27
4 I nput f i l es 28
4.1 Format of Arlequin i nput files 28
4.2 Pr oj ect fil e st ruct ur e 28

Manual Arlequin ver 3.5 Table of cont ent s 4

4.2.1 Profil e sect i on 28
4.2.2 Dat a sect i on 30
4.2.2.1 Hapl ot ype l i st ( opt i onal ) 30
4.2.2.2 Di st ance mat ri x ( opt i onal ) 31
4.2.2.3 Sampl es 32
4.2.2.4 Genet i c st ruct ur e 34
4.2.2.5 Mant el t est set t i ngs 35
4.3 Example of an input file 39
4.4 Aut omat ically cr eat i ng t he out line of a pr oj ect file 41
4.5 Conversion of dat a files 41
4.6 Arl equin bat ch files 42
5 Ex ampl es of i nput f i l es 44
5.1 Example of allel e frequency dat a 44
5.2 Example of st andar d dat a ( Genot ypic dat a, unknown gamet ic phase, recessive
alleles) 44
5.3 Example of DNA sequence dat a ( Hapl ot ypic) 45
5.4 Example of microsat ellit e dat a ( Genot ypic) 46
5.5 Example of RFLP dat a( Haplot ypic) 47
5.6 Example of st andar d dat a ( Genot ypic dat a, known gamet i c phase) 49
6 Ar l equ i n i nt er f ace 51
6.1 Menus 51
6.1.1 Fi l e Menu 51
6.1.2 Vi ew Menu 52
6.1.3 Opt i ons Menu 52
6.1.4 Hel p Menu 53
6.2 Toolbar 53
6.3 Tab dial ogs 54
6.3.1 Open proj ect 55
6.3.2 Handli ng of unphased genot ypi c dat a 56
6.3.3 Arl equi n Confi gurat i on 58
6.3.4 Proj ect Wi zard 60
6.3.5 I mport dat a 61
6.3.6 Loaded Proj ect 62
6.3.7 Bat ch fil es 64
6.3.8 Cal cul at i on Set t ings 66
6.3.8.1 General Set t i ngs 67
6.3.8.2 Di versi t y i ndi ces 69
6.3.8.3 Mi smat ch di st ri but i on 70
6.3.8.4 Hapl ot ype i nfer ence 72
6.3.8.4.1 Hapl ot ypi c dat a, or genot ypi c ( di pl oid) dat a wi t h known gamet i c phase
72
6.3.8.4.2 Genot ypi c dat a wi t h unknown gamet ic phase 73
6.3.8.5 Li nkage di sequili bri um 78
6.3.8.5.1 Li nkage di sequilibrium bet ween pai rs of l oci 78
6.3.8.5.2 Hardy- Wei nberg equi li bri um 81
6.3.8.6 Neut ral i t y t est s 82
6.3.8.7 Genet i c st ruct ur e 85

Manual Arlequin ver 3.5 Table of cont ent s 5

6.3.8.7.1 AMOVA 85
6.3.8.7.2 Det ect i on of l oci under sel ect i on 88
6.3.8.7.3 Popul at i on compari son 91
6.3.8.7.4 Popul at i on di ffer ent i at i on 93
6.3.8.8 Genot ype assi gnment 94
6.3.8.9 Mant el t est 95
7 Out put f i l es 96
7.1 Result file 96
7.2 Arl equin log file 96
7.3 Linkage di sequilibrium r esult file 96
7.4 Allele frequencies 96
7.5 View r esult s in your HTML browser 97
7.6 XML out put file 98
7.6.1 Pot ent i al XML for mat t ing probl em wi t h Firefox ver 3.x 98
7.6.2 I ncl ude graphi cs i nt o t he xml out put fi l e 98
7.6.3 Why use R t o make graphs? 99
7.6.4 Exampl e of R- l equi n graphi cal out put s 100
7.6.4.1 Genet i c di versi t y 100
7.6.4.1.1 Number of al lel es per l ocus 100
7.6.4.1.2 Expect ed het erozygosi t y 100
7.6.4.1.3 Thet a val ues 101
7.6.4.1.4 Thet a ( H) for mi crosat el li t e dat a 102
7.6.4.1.5 Al l el e si ze range at di fferent l oci ( mi crosat elli t e dat a) 102
7.6.4.1.6 Garza- Wi lliamson i ndex ( mi crosat el li t e dat a) 102
7.6.4.1.7 Modi fi ed Garza- Willi amson i ndex ( mi crosat el li t e dat a) 103
7.6.4.2 Genet i c di st ances bet ween popul at i ons 103
7.6.4.2.1 Mat ri x of pai rwi se F
ST
s 103
7.6.4.2.2 Mat ri x of Reynol ds coancest ry coeffi ci ent 104
7.6.4.2.3 Sl at ki ns li neari zed F
ST
s 104
7.6.4.2.4 Average number of pai rwi se di ffer ences wi t hi n and bet ween popul at i ons
105
7.6.4.2.5 Model of popul at i on di vergence al l owing for unequal deri ved popul at i on
si ze 106
7.6.4.3 Mat ri x of mol ecul ar di st ance bet ween hapl ot ypes 107
7.6.4.4 Mat ri x of mol ecul ar di st ances bet ween gene copi es wi t hi n and bet ween
popul at i ons ( phase known onl y) 107
7.6.4.5 Mat ri x of mol ecul ar di st ances bet ween hapl ot ypes wi t hi n populat i ons 109
7.6.4.6 Hapl ot ype fr equenci es wi t hin popul at i on 109
7.6.4.7 Hapl ot ype fr equenci es i n popul at i ons 110
7.6.4.8 Mi smat ch di st ri but i on 110
7.6.4.8.1 Demogr aphi c expansi on 110
7.6.4.8.2 Spat i al expansi on 111
7.6.4.9 Popul at i on assi gnment t est 112
7.6.4.10 Det ect i on of l oci under sel ect i on 113
8 Met h odol ogi cal out l i nes 114
8.1 I nt ra- populat ion l evel met hods 115
8.1.1 St andard di versi t y i ndi ces 115

Manual Arlequin ver 3.5 Table of cont ent s 6

8.1.1.1 Gene di ver si t y 115
8.1.1.2 Expect ed het er ozygosi t y per l ocus 115
8.1.1.3 Number of usabl e l oci 115
8.1.1.4 Number of pol ymorphi c si t es ( S) 115
8.1.1.5 Al l eli c range ( R) 115
8.1.1.6 Garza- Willi amson i ndex ( G- W) 116
8.1.2 Mol ecul ar i ndi ces 116
8.1.2.1 Mean number of pai rwi se di fferences ( ) 116
8.1.2.2 Nucl eot i de di ver si t y or average gene di ver si t y over L l oci 117
8.1.2.3 Thet a est i mat or s 117
8.1.2.3.1 Thet a( Hom) 117
8.1.2.3.2 Thet a( S) 118
8.1.2.3.3 Thet a( k) 119
8.1.2.3.4 Thet a( ) 119
8.1.2.4 Mi smat ch di st ri but i on 120
8.1.2.4.1 Pur e demographi c expansi on 120
8.1.2.4.2 Spat i al expansi on 122
8.1.2.5 Est i mat i on of genet i c di st ances bet ween DNA sequences 124
8.1.2.5.1 Pai rwi se di ffer ence 124
8.1.2.5.2 Percent age di fference 124
8.1.2.5.3 Jukes and Cant or 125
8.1.2.5.4 Ki mura 2- paramet ers 125
8.1.2.5.5 Tamura 126
8.1.2.5.6 Taj i ma and Nei 126
8.1.2.5.7 Tamura and Nei 127
8.1.2.6 Est i mat i on of genet i c di st ances bet ween RFLP hapl ot ypes 128
8.1.2.6.1 Number of pai rwi se di fference 128
8.1.2.6.2 Pr oport i on of di fference 128
8.1.2.7 Est i mat i on of dist ances bet ween Mi crosat elli t e hapl ot ypes 129
8.1.2.7.1 No. of di fferent all el es 129
8.1.2.7.2 Sum of squar ed si ze di ffer ence 129
8.1.2.8 Est i mat i on of dist ances bet ween St andard hapl ot ypes 129
8.1.2.8.1 Number of pai rwi se di fferences 129
8.1.2.9 Mi ni mum Spanni ng Net work among hapl ot ypes 129
8.1.3 Hapl ot ype i nfer ence 130
8.1.3.1 Hapl ot ypi c dat a or Genot ypi c dat a wi t h known Gamet i c phase 130
8.1.3.2 Genot ypi c dat a wi t h unknown Gamet i c phase 130
8.1.3.2.1 EM al gori t hm 130
8.1.3.2.2 EM zi pper al gori t hm 131
8.1.3.2.3 ELB al gori t hm 132
8.1.4 Li nkage di sequili bri um bet ween pai rs of l oci 136
8.1.4.1 Exact t est of l i nkage di sequi librium ( hapl ot ypi c dat a) 136
8.1.4.2 Li keli hood rat i o t est of l i nkage di sequili bri um ( genot ypi c dat a, gamet i c
phase unknown) 137
8.1.4.3 Measures of gamet i c di sequilibrium ( hapl ot ypi c dat a) 139
8.1.5 Hardy- Wei nberg equilibrium. 139
8.1.6 Neut ral i t y t est s. 141

Manual Arlequin ver 3.5 Table of cont ent s 7

8.1.6.1 Ewens- Wat t erson homozygosi t y t est 141
8.1.6.2 Ewens- Wat t erson- Sl at kin exact t est 141
8.1.6.3 Chakrabort y' s t est of popul at i on amal gamat i on 142
8.1.6.4 Taj i ma' s t est of sel ect i ve neut rali t y 142
8.1.6.5 Fus F
S
t est of sel ect i ve neut ral i t y 143
8.2 I nt er- populat ion level met hods 144
8.2.1 Popul at i on genet i c st ruct ur e i nfer r ed by anal ysi s of vari ance ( AMOVA) 144
8.2.1.1 Hapl ot ypi c dat a, one gr oup of popul at i ons 147
8.2.1.2 Hapl ot ypi c dat a, several groups of popul at i ons 147
8.2.1.3 Genot ypi c dat a, one gr oup of popul at i ons, no wi t hi n- indi vi dual l evel 148
8.2.1.4 Genot ypi c dat a, several groups of popul at i ons, no wi t hi n- i ndi vidual l evel
149
8.2.1.5 Genot ypi c dat a, one popul at i on, wi t hi n- i ndi vi dual l evel 150
8.2.1.6 Genot ypi c dat a, one gr oup of popul at i ons, wi t hi n- i ndi vi dual l evel 150
8.2.1.7 Genot ypi c dat a, several groups of popul at i ons, wi t hin- i ndi vi dual l evel 151
8.2.2 Mi ni mum Spanni ng Net wor k ( MSN) among hapl ot ypes 152
8.2.3 Locus- by- l ocus AMOVA 152
8.2.4 Popul at i on pai rwi se genet i c di st ances 153
8.2.4.1 Reynol ds di st ance ( Reynol ds et al . 1983) : 153
8.2.4.2 Sl at ki ns lineari zed F
ST
' s ( Sl at ki n 1995) : 153
8.2.4.3 M val ues ( M = Nm for hapl oi d popul at ions, M = 2Nm for di pl oid
popul at i ons) . 154
8.2.4.4 Nei s average number of di ffer ences bet ween popul at i ons 154
8.2.4.5 Genet i c di st ance ( u)
2
( mi crosat el li t e dat a onl y) 155
8.2.4.6 Rel at i ve popul at i on si zes - Di vergence bet ween popul at i ons of unequal
si zes 155
8.2.5 Exact t est s of popul at i on di fferent i at i on 156
8.2.6 Assi gnment of i ndi vi dual genot ypes t o popul at i ons 157
8.2.7 Mant el t est 158
8.2.8 Det ect i on of l oci under sel ect i on fr om F- st at ist ics 159
8.2.8.1 I sl and model ( FDI ST appr oach) 159
8.2.8.2 Hi erarchi cal i sl and model 161
9 Ref er ences 164
10 Appendi x 171
10.1 Overview of input file keywords 171


Manual Arlequin ver 3.5 I nt roduct ion 8

2 I NTRODUCTI ON
2.1 Why Ar l equi n?
Arl equin i s t he Fr ench t r ansl at i on of "Arl ecchi no", a famous char act er of t he I t al i an
"Commedi a dell ' Art e". As a charact er he has many aspect s, but he has t he abili t y t o
swi t ch among t hem ver y easil y accordi ng t o i t s needs and t o necessi t i es. Thi s
pol ymorphi c abili t y i s symboli zed by hi s col orful cost ume, fr om whi ch t he Arl equi n i con
was desi gned.
2.2 Ar l equi n phi l osophy
The goal of Arl equi n i s t o pr ovi de t he average user i n popul at i on genet ics wi t h quit e a
l arge set of basi c met hods and st at i st i cal t est s, i n order t o ext ract i nfor mat i on on genet i c
and demographi c feat ur es of a col l ect i on of popul at i on sampl es.
The graphi cal int erface i s desi gned t o al l ow user s t o rapi dl y sel ect t he di fferent anal yses
t hey want t o perform on t hei r dat a. We fel t i mport ant t o be abl e t o expl ore t he dat a, t o
anal yze several t i mes t he same dat a set from different per spect i ves, wi t h di fferent
sel ect ed opt i ons.
The st at i st i cal t est s i mplement ed i n Arl equi n have been chosen such as t o mi ni mize
hi dden assumpt i ons and t o be as powerful as possi bl e. Thus, t hey oft en t ake t he form of
ei t her permut at i on t est s or exact t est s, wi t h some except i ons.
Fi nall y, we want ed Arl equin t o be abl e t o handle genet i c dat a under many di ffer ent
for ms, and t o t ry t o car r y out t he same t ypes of anal yses i rr espect i ve of t he format of
t he dat a.
Because Arl equi n has a ri ch set of feat ur es and many opt i ons, i t means t hat t he user has
t o spend some t i me i n learni ng t hem. However , we hope t hat t he l earni ng curve wi ll not
be t hat st eep.
Arl equin i s made avail abl e free of charge, as l ong as we have enough local resources t o
support t he devel opment of t he pr ogram.
2.3 About t hi s manual
The mai n purpose of t hi s manual i s t o al l ow you t o use Arl equin on your own, i n or der
t o l i mi t as f ar as possi bl e e- mai l ex ch an ge w i t h u s.
I n t hi s manual , we have t ri ed t o provi de a descr i pt i on of
1) The dat a t ypes handled by Arl equi n
2) The way t hese dat a shoul d be for mat t ed before t he anal yses
3) The graphi cal int erface

Manual Arlequin ver 3.5 I nt roduct ion 9

4) Out put fil es
5) The i mpact of di ffer ent opt i ons on t he comput at i ons
6) Met hodol ogi cal out lines descri bi ng whi ch comput at i ons ar e act ual l y per formed by
Arl equin.
Even t hough t hi s manual cont ains t he descri pt ion of some t heor et i cal aspect s, i t shoul d
not be consi der ed as a t ext book i n basi c popul at i on genet i cs. We st r on gl y r ecommend
y ou t o consu l t t he or i gi n al r ef er ences pr ov i ded w i t h t he descr i pt i on of a gi v en
met hod i f y ou ar e i n doubt w i t h any aspect of t he anal y si s.
2.4 Dat a t y pes handl ed by Ar l equi n
Arl equin can handl e several t ypes of dat a ei t her i n haplot ypic or genot ypic form. The
basi c dat a t ypes ar e:
DNA sequences
RFLP dat a
Mi crosat elli t e dat a
St andard dat a
All el e fr equency dat a
By haplot ypic form we mean t hat genet i c dat a can be pr esent ed under t he form of
hapl ot ypes ( i .e. a combi nat i on of all el es at one or more l oci ) . Thi s haplot ypi c form can
resul t from t he anal yses of hapl oi d genomes ( mt DNA, Y chr omosome, prokaryot es) , or
from di pl oi d genomes wher e t he gamet i c phase coul d be i nferr ed by one way or anot her.
Not e t hat all eli c dat a are t r eat ed here as a si ngl e l ocus hapl ot ype.

Ex 1: Haplotypic RFLP data : 100110100101001010
Ex 2: Haplotypic standard HLA data : DRB1*0101 DQB1*0102 DPB1*0201

By genot ypic form, we mean t hat genet i c dat a i s present ed under t he form of di pl oi d
genot ypes ( i .e. a combi nat i on of pai rs of all el es at one or mor e l oci ) . Each genot ype i s
ent er ed on t wo separ at e l ines, wi t h t he t wo allel es of each l ocus bei ng on a di ffer ent li ne.

Ex1: Genot ypi c DNA sequence dat a:
ACGGCATTTAAGCATGACATACGGATTGACA
ACGGGATTTTAGCATGACATTCGGATAGACA

Ex 2: Genot ypi c Mi crosat elli t e dat a:
63 24 32
62 24 30


Manual Arlequin ver 3.5 I nt roduct ion 10

The gamet i c phase of a mul t i- l ocus genot ype may be ei t her known or unknown. I f t he
gamet i c phase i s known, t he genot ype can be consi dered as made up of t wo wel l - defined
hapl ot ypes. For genot ypi c dat a wi t h unknown gamet i c phase, you can consi der t he t wo
all el es pr esent at each l ocus as codomi nant , or you can all ow for t he pr esence of a
recessi ve al l el e. Thi s gi ves fi nall y four possi bl e forms of genet i c dat a:
Hapl ot ypi c dat a,
Genot ypi c dat a wi t h known gamet i c phase,
Genot ypi c dat a wi t h unknown gamet i c phase ( no r ecessi ve al l el es)
Genot ypi c dat a wi t h unknown gamet i c phase ( r ecessi ve al l el es) .
2.4.1 DNA sequences
Arl equin can accommodat e DNA sequences of arbi t rary l engt h. Each nucl eot i de i s
consi der ed as a di st i nct l ocus. The four nucl eot i des "C", "T" , "A", " G" ar e consi der ed as
unambi guous all el es for each l ocus, and t he " - " i s used t o i ndi cat e a del et ed nucl eot i de.
Usuall y t he quest i on mark " ?" codes for an unknown nucl eot i de. The followi ng not at i on
for ambi guous nucl eot i des ar e al so recogni zed:
R: A/ G ( puri ne)
Y: C/ T ( pyri midi ne)
M: A/ C
W: A/ T
S: C/ G
K: G/ T
B: C/ G/ T
D: A/ G/ T
H: A/ C/ T
V: A/ C/ G
N: A/ C/ G/ T
2.4.2 RFLP Dat a
Arl equin can handl e RFLP hapl ot ypes of arbi t rar y l engt h. Each rest ri ct i on si t e i s
consi der ed as a di st i nct l ocus. The pr esence of a r est ri ct i on si t e shoul d be coded as a "1" ,
and i t s absence as a "0" . The " - " charact er shoul d be used t o denot e t he del et i on of a
si t e, not i t s absence due t o a poi nt mut at i on.
2.4.3 Mi cr osat el l i t e dat a
The raw dat a consi st here of t he al l eli c st at e of one or an arbi t rar y number of
mi crosat el li t e l oci . For each l ocus, one shoul d pr ov i de t he number of r epeat s of t h e
mi cr osat el l i t e mot i f as t he al l eli c defi nit i on, i f one want s hi s dat a t o be anal yzed
accordi ng t o t he st ep- wi se mut at i on model ( for t he anal ysi s of genet i c st ruct ure) . I t may
occur t hat t he absol ut e number of repeat s i s unknown. I f t he di ffer ence i n l engt h

Manual Arlequin ver 3.5 I nt roduct ion 11

bet ween ampl i fi ed product s i s t he di rect consequence of changes i n repeat numbers,
t hen t he mi ni mum l engt h of t he ampl i fi ed product coul d serve as a r efer ence, al l owing t o
code t he ot her al l el es i n t erms of addi t i onal repeat s as compared t o t hi s r efer ence. I f t hi s
st rat egy i s i mpossi bl e, t hen any ot her number coul d be used as an al l elic code, but t he
st epwi se mut at i on model coul d not be assumed for t hese dat a.
2.4.4 St andar d dat a
Dat a for whi ch t he mol ecul ar basi s of t he pol ymorphi sm i s not part i cul arl y defi ned, or
when di ffer ent al l el es ar e consi der ed as mut at i onall y equi di st ant from each ot her.
St andard dat a hapl ot ypes ar e t hus compared for t hei r cont ent at each l ocus, wi t hout
t aki ng speci al care about t he nat ure of t he al l el es, whi ch can be ei t her si mil ar or
di fferent . For i nst ance, HLA dat a ( human MHC) ent ers t he cat egory of st andard dat a.
2.4.5 Al l el e f r equency dat a
The raw dat a consi st of onl y all el e fr equenci es ( si n gl e- l ocus t r eat men t on l y ) , so t hat
no hapl ot ypi c i nformat i on i s needed for such dat a. Popul at i on sampl es are t hen onl y
compared for t hei r al l elic frequenci es.

Manual Arlequin ver 3.5 I nt roduct ion 12

2.5 Met hods i mpl ement ed i n Ar l equi n
The anal yses Arl equi n can per form on t he dat a fall i nt o t wo mai n cat egori es: i nt ra-
popul at i on and i nt er- popul at i on met hods. I n t he fi rst cat egor y st at i st i cal i nformat i on i s
ext ract ed i ndependent l y fr om each popul at i on, wher eas i n t he second cat egory, sampl es
are compar ed t o each ot her.
I nt r a- popul at i on met hods: Shor t descr i pt i on:
St andard i ndi ces Some di versi t y measur es li ke t he number of
pol ymorphi c si t es, gene di versi t y.
Mol ecul ar di versi t y Cal cul at es several di ver si t y indi ces l i ke
nucl eot i de di versi t y, di ffer ent est i mat ors of t he
popul at i on paramet er .
Mi smat ch di st ri but i on The di st ri but i on of t he number of pai rwi se
di fferences bet ween hapl ot ypes, fr om whi ch
paramet er s of a demogr aphi c ( NEW) or spat i al
popul at i on expansi on can be est i mat ed
Hapl ot ype fr equency est i mat i on Est i mat es t he frequency of hapl ot ypes present
i n t he popul at i on by maxi mum li keli hood
met hods.
Gamet i c phase est i mat ion ( NEW) Est i mat es t he most l i ke gamet i c phase of
mul t i- l ocus genot ypes usi ng a pseudo-
Bayesi an approach ( ELB al gori t hm) .
Li nkage di sequilibrium Test of non- random associ at i on of all el es at
di fferent l oci .
Hardy- Wei nberg equili bri um Test of non- random associ at i on of all el es
wi t hin di pl oi d indi vi duals.
Taj i mas neut ral it y t est ( i nfi nit e si t e
model )
Test of t he sel ect i ve neut rali t y of a r andom
sampl e of DNA sequences or RFLP hapl ot ypes
under t he i nfi ni t e si t e model .
Fu' s F
S
neut ral i t y t est ( i nfi nit e si t e
model )
Test of t he sel ect i ve neut rali t y of a r andom
sampl e of DNA sequences or RFLP hapl ot ypes
under t he i nfi ni t e si t e model .
Ewens- Wat t erson neut r ali t y t est
( infini t e all el e model )
Test s of sel ect i ve neut r ali t y based on Ewens
sampli ng t heory under t he i nfi ni t e all el es
model .
Chakrabort ys amal gamat i on t est
( infini t e all el e model )
A t est of sel ect i ve neut r ali t y and popul at i on
homogenei t y. Thi s t est can be used when
sampl e het erogenei t y i s suspect ed.
Mi ni mum Spanni ng Net wor k ( MSN) Comput es a Mi ni mum Spanni ng Tree ( MST)
and Net wor k ( MSN) among hapl ot ypes. Thi s
t ree can al so be comput ed for all t he
hapl ot ypes found i n di ffer ent popul at i ons i f
act i vat ed under t he AMOVA sect i on.


Manual Arlequin ver 3.5 I nt roduct ion 13


I nt er - popul at i on met hods: Shor t descr i pt i on:
Sear ch for shar ed hapl ot ypes
bet ween popul at i ons
Compari son of popul at i on sampl es for t hei r
hapl ot ypi c cont ent . Al l t he r esul t s ar e t hen
summari zed i n a t abl e.
AMOVA Di ffer ent hi erarchi cal Anal yses of Mol ecul ar
Vari ance t o eval uat e t he amount of popul at i on
genet i c st ruct ure.
Pai rwi se genet i c di st ances F
ST
based genet i c di st ances for short
di vergence t i me.
Exact t est of popul at i on
di fferent i at i on
Test of non- random di st ri but i on of hapl ot ypes
i nt o popul at i on sampl es under t he hypot hesi s
of panmi xi a.
Assi gnment t est of genot ypes
Assi gnment of i ndi vi dual genot ypes t o
part i cul ar popul at i ons accordi ng t o est i mat ed
all el e fr equenci es.
Det ect i on of l oci under sel ect i on from
F- st at i st i cs
Det ect i on of l oci under sel ect i on by t he
exami nat i on of t he j oi nt di st ri but i on of FST
and het er ozygosi t y under a hi erar chi cal i sl and
model .
Mant el t est : Shor t descr i pt i on:
Cor r el at i ons or part i al cor rel at i ons
bet ween a set of 2 or 3 mat ri ces
Can be used t o t est for t he pr esence of
i sol at i on- by - di st ance
2.6 Sy st em r equi r ement s
Wi ndows XP/ Vi st a/ 7.
A mi ni mum of 256 MB RAM, and mor e t o avoi d swappi ng.
At l east 50Mb free hard di sk space.
2.7 I nst al l i ng and uni nst al l i ng Ar l equi n
2.7.1 I nst al l at i on
2.7. 1.1 Arlequin 3.5 inst allat ion
1) Downl oad Arl equi n35.zi p t o any t emporary di rect ory.
2) Ext ract al l fil es cont ained i n Arl equi n35.zi p i n t he di rect ory of your choi ce.
3) St art Arl equi n by doubl e- cli cki ng on t he fil e Wi nArl 35.exe, whi ch i s t he mai n
execut abl e fil e.

Manual Arlequin ver 3.5 I nt roduct ion 14

2.7. 1.2 Arlequin 3.5 uninst allat ion
Si mpl y del et e t he di rect ory wher e you i nst all ed Arl equin. The regi st ri es wer e not
modi fi ed by t he i nst all at i on of Arl equi n.
2.8 Li st of f i l es i ncl uded i n t he Ar l equi n pack age

Fi l es Descr i pt i on
Requ i r ed
by
Ar l equi n
t o r u n
pr oper l y

Arl equin fil es

WinArl35.exe Arl equin main appli cat i on fil e i ncl uding
graphi cal int erface and comput at i onal
rout i nes.

Arlequin.ini A fil e cont ai ni ng t he descri pt i on of t he l ast
cust om set t i ngs defi ned by t he user. ( NOT TO
BE MODI FI ED BY HAND)

Arl_run. ar s A fil e cont ai ni ng all t he comput at i on set t i ngs
sel ect ed by t he user t o perform some
cal cul at i on wi t h Arl equin. ( NOT TO BE
MODI FI ED BY HAND)

Arl_run. t xt A fil e cont ai ni ng i nformat i on about Arl equi n
wor ki ng di rect ory and pat h t o worki ng proj ect
fil e. ( NOT TO BE MODI FI ED BY HAND)

recent _pr o.t xt A fil e cont ai ni ng t he li st of up t o t he l ast t en
proj ect s l oaded i nt o Arlequi n. ( NOT TO BE
MODI FI ED BY HAND)

ua j s. And ft iens4.j s ua.j s and ft i ens4.j s cont ai n t he Java scri pt s
t hat all ows t he br owsi ng of t he r esul t HTML
fil es. Thi s scri pt needs gi f fil es.

14 gif files These gi f fi l es are used by t he j ava scri pt s for
graphi cal di spl ay i n t he mai n resul t ht ml fil e.

14 gif files These gi f fi l es are used by t he j ava scri pt s for
graphi cal di spl ay i n t he mai n resul t ht ml fil e.

Qt inf.dll A dynami c link li brary necessary for t he
di spl ay of graphi cal component s of t he
appli cat i on

ArlequinSt yleSheet .xsl Ext ensi bl e St yl esheet for t he format t i ng of
Arl equin xml resul t fil e.

Arlequin35.pdf Arl equin 3.5 user manual i n pfd format

Vari ous Arl equi n example fi l es ar e found i n subdi rect or y Ex ampl e Fi l es
R scri pt s t o pr oduce graphi cs in out put fil es are found i n subdi rect ory Rf unct i ons


Manual Arlequin ver 3.5 I nt roduct ion 15

2.9 Ar l equi n comput i ng l i mi t at i ons
The amount of dat a t hat Arl equin can handl e most l y depends on t he memory avai l abl e on
your comput er . However, a few paramet er s are li mi t ed t o val ues wi t hi n t he range shown
bel ow.
Port i ons of Arl equi n concerned
by t he l i mi t at i ons

Li mi t ed paramet er

Maxi mum val ue
Ewens- Wat t erson and
Chakrabort ys neut ral it y t est s
Sampl e si ze 2,000
Ewens- Wat t erson and
Chakrabort ys neut ral it y t est s
Number of hapl ot ypes 1,000
DNA sequence
Maxi mum l engt h
249,000

Ot her li mi t at i ons:
Li ne l engt h i n i nput fil e is li mi t ed t o 250,000 charact er s
I nt erl eaved format i s not support ed i n Arl equi n. Thi s concerns hapl ot ype
defi ni t i on, mul t il ocus genot ypes, and di st ance mat ri ces.
2.10 How t o ci t e Ar l equi n
A manuscri pt descri bi ng t he new funct i onali t i es of Arl equi n ver 3.5 i s i n preparat i on. Unt il
i t i s out , pl ease ci t e:
Excoffi er, L. G. Laval , and S. Schnei der ( 2005) Arl equin ver. 3.0: An i nt egrat ed soft war e
package for popul at i on genet i cs dat a anal ysi s. Evol ut i onary Bi oi nformat i cs Onli ne 1: 47-
50.
2.11 Ack now l edgement s
Thi s program has been made possi bl e by Swi ss NSF grant s No. 32- 37821- 93,
32.047053.96, and 31- 56755.99.
St efan Schnei der , Davi d Roessl i , and Jean- Marc Kuffer have been i nvol ved t he
devel opment of ver si ons 1 and 2, cont ri but i ng very si gni fi cant l y t o several of i t s
component s.
The fol l owi ng peopl e of t he CMPG l ab have al so det ect ed many bugs i n devel opment and
rel eased versi ons: Ni col as Ray, Samuel Neuenschwander , Dani el Wegmann, Carl o
Largi adr, Pi er re Bert hier , Mat hi as Currat , Guillaume Laval , I sabel l e Dupanl oup, Tamara
Hofer, Mart i n Fi sher, Geral d Heckel , and Benj ami n Pet er.
Fri ends and col l eagues have al so provi ded useful comment s and suggest i ons. We woul d
li ke t o t hank Yanni s Mi chal aki s, Mont gomery Sl at ki n, Davi d Bal di ng, Pet er Smouse, Oscar
Gaggi ot t i , Gi orgi o Bert orel l e, Gui do Barbuj ani , Mi chel e Bell edi , Evel yne Heyer Phili ppe
Jarne, Manuel Ruedi , Pet er de Kni j ff, Pet er Beerl i , Mat t hew Hurl es, Mark St oneki ng,

Manual Arlequin ver 3.5 I nt roduct ion 16

Rosal ind Hardi ng, St eve Car r, John Novembr e, Nel son Fagundes, Eri c Mi nch, Pi erre Darl u,
Jr me Goudet , Franoi s Bal l oux, Eri c Pet i t , Et t or e Randi , and Sergey Gavril et s.
Fi nall y, we woul d li ke t o t hank all t he ot her bet a- t est ers and users of Arl equi n t hat have
send us t hei r comment s and det ect ed somet i mes seri ous bugs.
2.12 How t o get t he l ast v er si on of t he Ar l equi n
sof t w ar e?
Arl equin will be updat ed r egul arl y and can be fr eel y r et ri eved on
ht t p: / / cmpg.unibe.ch/ soft ware/ arlequin3
2.13 What ' s new i n v er si on 3.5
Compar ed t o ver si on 3. 11, Arl equi n 3.5 includes several bug cor rect i ons, addi t i on of new
comput at i ons, and several si gni fi cant i mprovement s. The mai n i mprovement i s i t s
i nt erfaci ng wi t h t he R st at i st i cal package, all owing one t o pr oduce hi gh qualit i y graphs of
many r esul t s found i n t he r esul t fil es. We al so i nt roduce new consol e ver si ons of Arl equi n
for bot h Wi ndows and Li nux.
Addit ions:
New pr ocedur e t o det ect l oci under sel ect i on fr om hi erarchi cal F- st at i st ics, as
i mpl ement ed i n Excoffi er et al . ( 2009)
Comput at i on of al l el e fr equenci es at all l oci for all popul at i ons, whi ch are out put i n
l ocus- speci fi c fil es.
Computation of the genetic distance () for microsatellite data.
Possi bili t y t o out put result s as an XML fi l e wi t h a dedi cat ed st yl e sheet .
R- l equi n:
o Devel opment s of R funct i ons t o parse t he XML out put fil e and produce
publi cat i on qualit y graphi cs
o Graphi cs can be di rect l y embedded i nt o t he XML r esul t fil e bel ow resul t
t abl es.
o R funct i ons can be modi fi ed by t he user t o cust omi ze graphi cs.
Consol e versi on of Arl equin, arl ecore, for Wi ndows and Li nux, all owi ng t he
anal ysi s of a l arge number of fi l es wi t h bash scri pt s.
Modi fi ed consol e versi on of Arl equi n, call ed arl sumst at , for Wi ndows and Li nux, t o
comput e speci fi c summary st at i st i cs for each pr oj ect
Modificat i ons:
All comput at i ons can now be performed at t he group l evel , by aut omat i call y
pool i ng all popul at i on sampl es fr om a gi ven group defi ned i n t he [ STRUCTURE]
sect i on i nt o a si ngl e art i fi ci al popul at i on.
Maxi mum number of charact er s i n input li ne i s now 250,000, whi ch li mi t s t he
maxi mum si zes of, say, DNA sequences t hat can be read.
Removed t he comput at ion of popul at i on speci fi c FSTs
Changed t he order of t he pr esent at i on of t he r esul t s. Now i t begi ns wi t h t he i nt ra-
popul at i on comput at i ons and t hen out put i nt er- popul at i on comput at i ons.
I ndi vi dual s wi t h part i ally mi ssi ng dat a at a gi ven l ocus ar e now excl uded i n t he
l ocus by l ocus AMOVA anal ysi s when t aki ng indivi dual l evel int o account .
Bug corr ect ions:

Manual Arlequin ver 3.5 I nt roduct ion 17

I n t he summary st at i st ics, t he report ed mean number of al l el es was zer o when
t here was a si ngl e monomorphi c l ocus.
LocusSeparat or = None was not r ecogni zed ( NONE was needed)
Chakrabort y' s neut ral it y t est : t her e was an over fl ow when t he number of all el e
was l arger t han 265. Larger numbers of all el es are now possi bl e.
I n t he mol ecul ar di versi t y summar y t abl e, t he number of si t es wi t h t ransversi on
was i ncor rect l y report ed as t he number of si t es wi t h t ransi t i ons
The t ot al number of pol ymorphi c si t es report ed i n summary st at t abl e was not
real l y t he t ot al number of pol ymorphi c si t es t hat woul d be comput ed on t he
pool ed popul at i ons. I t was rat her t he t ot al number of si t es t hat wer e found
pol ymorphi c wi t hin popul at i ons.
Er rors when comput i ng average summar y st at i st i cs wi t hi n- sampl es, i f some l oci
wer e monomorphi c in some popul at i ons.
Wr ong comput at i ons of st andard devi at i ons of some summar y st at i st i cs ( Garza-
Willi amson, modi fi ed Garza- Willi amson, t ot al range) and Thet a( H) for
mi crosat el li t e dat a.
Opt i on t o use associ at ed set t i ngs di d not work anymor e
Er ror when comput i ng st at i st i cs wi t hin groups and wi t hin sampl es when DNA
sequences cont ai ned whi t e spaces and LocusSeparat or was set t o Whi t espace.
No message was i ssued when a popul at i on cont ai ned onl y mi ssi ng dat a at a gi ven
l ocus and one was at t empt ing t o perform a l ocus- by- l ocus anal ysi s. The l ocus was
j ust not li st ed i n t he l ocus- by- l ocus AMOVA. Now, a warni ng message i s i ssued.
Bad handling of di pl oi d indi vi dual s havi ng part i all y mi ssi ng dat a ( on one
chromosome onl y) when one at t empt s t o comput e l ocus by l ocus AMOVA wi t h
i ndi vi dual l evel ( FI S and FI T) .
Set t i ng fil e ( arl _run.ars) was al ways saved i n t he arl equin di rect ory i nst ead of t he
di rect or y chosen i n t he di al og box.
I t was i mpossi bl e t o comput e t he expect ed mi smat ch di st ri but i on under t he
demographi c and t he range expansi ons model s at t he same t i me
Mant el t est was not per formed when a cust om Ymat ri x was pr ovi ded.
When t here i s a si ngl e pol ymorphi c mi crosat l ocus, t he r eport ed average Garza-
Willi amson st at i st i cs was t he number of l oci .
Si nce ver si on 3.0, t he name of ext ernal fil es cont aini ng i nformat i on on Di st ance
mat ri x, Hapl ot ype Li st , or Sampl e Dat a, coul d not cont ai n an absol ut e pat h. Thi s
i s now possi bl e agai n.
2.13.1 Changes i nt r oduced i n pr ev i ous r el eases
2.13.1.1 Version 3.11 compared t o version 3.1
Compar ed t o ver si on 3. 1, Arl equi n 3.11 i s mai nl y an updat e of ver 3.1, and t here was no
new manual .
Bug corr ect ions:
1. Si gni fi cance l evel of FSC and Var( b) . The p- value associ at ed t o t he var i ance
component due t o di ffer ences bet ween popul at ions wi t hin groups was err oneousl y
comput ed when t he number of sampl es i n t he genet i c st ruct ure t o t est was
i dent i cal t o t he t ot al number of sampl es defi ned i n t he Sampl es sect i on but t he
order of t he sampl es i n t he Genet i c St ruct ur e sect i on was di ffer ent fr om t hat i n
t he Sampl es sect i on.Thi s bug has been ar ound si nce t he fi rst rel ease of Arl equi n
2.0... Thanks t o Romi na Pi cci nali for fi ndi ng i t .
2. The expect ed homozygosi t y report ed i n t he Ewens- Wat t er son t est i n t he sampl es
summary sect i on was t hat of t he l ast si mul at ed sampl e. Cor rect val ue was
report ed onl y i f no per mut at i ons were done.
3. Tot al number of al l el es report ed i n t he st at i st i cs summary sect i on al so i ncl uded
t he mi ssi ng dat a all el e.
4. The popul at i on l abel s wer e i ncor r ect l y r eport ed when comput i ng popul at i on-
speci fi c FI S st at i st i cs. The r eport ed order corr esponded t o t hat of t he l ast

Manual Arlequin ver 3.5 I nt roduct ion 18

permut at i on. The popul at i on l abel s wer e onl y cor rect when t he si gni fi cance of t he
gl obal FI S st at i st i c was not t est ed. Thanks t o Jeff Lozi er for fi ndi ng t hi s bad bug.
Modificat i ons:
Mean expect ed het er ozygosi t y and mean all ele number are r eport ed over
pol ymorphi c si t es i n t he Sampl e sect i on, whil e t hey are r eport ed over al l l oci i n
t he st at i st i cs summari es at t he end of t he r esul t fil e.
Addit ions:
Sampl e all el e fr equenci es can now be out put i n l ocus- speci fi c fil es, i f t hi s opt i on i s
sel ect ed i n t he Mol ecul ar Di ver si t y t ab. Locus- speci fi c fil es are out put i n t he
Arl equin proj ect r esul t di rect or y.
2.13.1.2 Version 3.1 compared t o version 3.01
Arl equin 3.1 i ncl udes some bug cor r ect i ons, some i mprovement s and addi t i onal feat ures:
I mprovement s
Locus- by- l ocus AMOVA can now be per formed i ndependent l y fr om convent i onal
AMOVA. Thi s can l ead t o fast er comput at i ons for l arge sampl e si zes and l arge
number of popul at i on sampl es.
Fast er r out i nes t o handle l ong DNA sequences or l arge number of mi cr osat el lit es.
Fast er r eadi ng of i nput fil e
Fast er comput at i on of demographi c paramet er s fr om mi smat ch di st ri but i on.
I mproved convergence of l east - square fi t t i ng algori t hm.
Addit ions:
Comput at i ons of popul at i on speci fi c i nbreedi ng coeffi ci ent s and comput at i ons of
t hei r si gni fi cance l evel .
Comput at i on of t he number of all el es as wel l as observed and expect ed
het er ozygosi t y per l ocus
Comput at i on of t he Gar za- Willi amson st at i st i c for MI CROSAT dat a.
I n bat ch mode, t he summary fil e ( * .sum) now r eport t he name of t he anal yzed
fil e as well as t he name of t he anal yzed popul at ion sampl e.
When savi ng current set t ings, user are now asked t o choose a fi l e name. Defaul t
i s "proj ect fi l e name".ar s.
New sect i ons ar e provi ded at t he end of t he r esult fil e, i n order t o repor t summary
st at i st i cs comput ed over al l popul at i ons:
o Basi c propert i es of t he sampl es ( si ze, no. of l oci, et c...)
o Het er ozygosi t y per l ocus
o Number of al l el es + t ot al no. of al l el es over al l pops
o All eli c range + t ot al all eli c range over al l pops ( for mi crosat el lit e dat a)
o Garza- Wi lli amson i ndex ( for mi crosat el lit e dat a)
o Number of segr egat i ng si t es, + t ot al over al l pops
o Mol ecul ar di versi t y i ndi ces ( t het a val ues)
o Neut ral i t y t est s summar y st at i st i cs and p- val ues
o Demographi c paramet er s est i mat ed fr om t he mi smat ch di st ri but i on and p-
val ues.
New short cut s ar e pr ovi ded i n t he l eft pane of t he ht ml resul t fil e for F- st at i st i cs
boot st rap confi dence i nt er val s, popul at i on speci fi c FI S, and summary of i nt ra-
popul at i on st at i st i cs.
2.13.1.3 Version 3.01 compared t o version 3.0
Arl equin 3.01 i ncl ude some bug cor r ect i ons and some addi t i onal feat ures:
Addit ions:
New edi t or of genet i c st r uct ur e all owi ng one t o modi fy t he cur r ent Genet i c
St ruct ure di rect l y i n t he graphi cal i nt erface ( see sect i on Defi ning t he Genet i c
St ruct ure t o be t est ed 3.2.1)

Manual Arlequin ver 3.5 I nt roduct ion 19

Comput at i on of popu l at i on- speci f i c F
ST
i ndi ces, when a si ngl e group i s defi ned
i n t he Genet i c St ruct ur e. Thi s may be useful t o recogni ze popul at i on cont ri but ing
part i cul arl y t o t he gl obal F
ST
measur e. Thi s i s also avail abl e i n t he l ocus- by- l ocus
AMOVA sect i on ( di scont inued i n ver 3.5) .
2.13.1.4 Version 3.0 compared t o version 2
Arl equin ver si on 3 now i nt egrat es t he cor e comput at i onal rout i nes and t he i nt erface i n a
si ngl e program wri t t en in C+ + . Theref ore Arl equin does not rel y on Java anymore. Thi s
has t wo consequences: t he new graphi cal i nt erface i s ni cer and fast er , but i t i s l ess
port abl e t han befor e. At t he moment we rel ease a Wi ndows versi on ( 2000, XP, and
above) and we shal l probabl y rel ease l at er a Li nux. Support for t he Mac has been
di scont inued.
Ot her mai n changes i nclude:
1. Cor r ect i on of many small bugs
2. I ncorpor at i on of t wo new met hods t o est i mat e gamet i c phase and hapl ot ype
fr equenci es
a. EM zi pper al gori t hm: An ext ensi on of t he EM al gori t hm al l owi ng one t o
handl e a l arger number of pol ymorphi c si t es t han t he pl ai n EM al gori t hm.
b. ELB al gori t hm: a pseudo- Bayesi an approach t o speci fi call y est i mat e
gamet i c phase i n recombi ning sequences.
3. I ncorpor at i on of a l east - square appr oach t o est i mat e t he paramet ers of an
i nst ant aneous spat i al expansi on from DNA sequence di versi t y wi t hi n sampl es, and
comput at i ons of boot st r ap confi dence i nt erval s usi ng coal escent si mul at i ons.
4. Est i mat i on of confi dence i nt erval s for F- st at i st i cs, usi ng a boot st rap approach
when genet i c dat a on mor e t han 8 l oci are avai labl e.
5. Updat e of t he j ava- scri pt rout i nes i n t he out put ht ml fil es, maki ng t hem full y
compat i bl e wit h Fi refox 1.X.
6. A compl et el y r ewri t t en and mor e robust i nput fil e parsi ng procedur e, gi vi ng more
preci se i nformat i on on t he l ocat i on of pot ent i al synt ax and for mat mi st akes.
7. Use of t he ELB al gori t hm descri bed above t o generat e sampl es of phased mul t i -
l ocus genot ypes, whi ch all ows one t o anal yse unphased mul t i - l ocus genot ype dat a
as i f t he phase was known. The phased dat a set s ar e out put i n Arl equin proj ect s
t hat can be anal ysed i n a bat ch mode t o obt ai n t he di st ri but i on of st at i st i cs t aki ng
phase uncert ai nt y i nt o account .
8. No need t o defi ne a web browser for consul t ing t he r esul t s. Arl equi n will
aut omat i call y present t he resul t s in your defaul t web browser ( we r ecommend t he
use of Fi refox freel y avail abl e on
ht t p: / / www.mozill a.org/ product s/ fi refox/ cent ral .ht ml.
2.14 Repor t i ng bugs and comment s
Probl ems about Arl equin comput at i ons and i nt erface can be r eport ed t o
l aurent .excoffi er @i ee.uni be.ch. Pr obl ems concerni ng graphi cal out put s ( R- l equin) can be
report ed t o hei di .li scher @i ee.uni be.ch


Manual Arlequin ver. 3.5 Get t ing st art ed 20

3 GETTI NG STARTED
The fi rst t hi ng t o do befor e running Arl equi n for t he fi rst t i me i s cert ai nly t o r ead t he
pr esent manu al . I t wi ll provi de you wi t h most of t he i nformat i on you are l ooki ng for .
So, t ak e some t i me t o r ead i t bef or e y ou ser i ou sl y st ar t an al y zi n g y our dat a.
3.1 Ar l equi n conf i gur at i on

Befor e a fi rst use of Arl equin, you need t o speci fy whi ch t ext edi t or will be used by
Arl equin t o edi t proj ect fil es or vi ew t he l og fil e. We r ecommend t he use of a power ful
t ext edi t or li ke Text Pad, fr eel y avai l abl e on ht t p: / / www.t ext pad. com.
3.2 Pr epar i ng i nput f i l es
The fi rst st ep for t he anal ysi s of your dat a i s t o prepare an i nput dat a file for Arl equi n.
Thi s input fil e i s call ed her e a pr oj ect file. As Arl equi n i s qui t e a versat i le pr ogram abl e t o
anal yze several dat a t ypes, you have t o i ncl ude some i nformat i on about t he propert i es of
your dat a i n t he proj ect fil e t oget her wi t h t he raw dat a.

Manual Arlequin ver. 3.5 Get t ing st art ed 21

Ther e ar e t wo ways t o creat e Arl equin proj ect s:
1) You can st art fr om scrat ch and use a t ext edi t or t o defi ne your dat a usi ng
reserved keywords.
2) You can l et Arl equins cr eat e t he out li ne of a proj ect by sel ect i ng t he t ab panel
Proj ect Wizard ( see sect i on Pr oj ect Wi zard 6.3.4) .

The cont r ol s on t hi s t ab panel all ow you t o speci fy t he t ype of pr oj ect out line t hat shoul d
be buil d. Use t he Br owse but t on t o choose a name and a hard di sk l ocat i on for t he
proj ect . Once al l t he set t ings have been chosen, t he pr oj ect out l ine i s cr eat ed by pr essi ng
t he "Creat e Proj ect " but t on. Not e t hat i t i s not aut omat i call y l oaded i nt o Arl equi n. The
name of t he dat a fi l e shoul d have a "* . arp" ext ensi on ( for ARl equi n Proj ect ) . You can
t hen edi t t he proj ect by pressi ng t he Edit Proj ect but t on.
Not e t hat t hi s wi zard onl y creat es an out li ne and t hat you manuall y need t o fi ll in t he
dat a, and speci fy your genet i c st ruct ure.

Manual Arlequin ver. 3.5 Get t ing st art ed 22

3.2.1 Def i ni ng t he Genet i c St r uct ur e t o be t est ed

A new Gen et i c St r uct u r e Edi t or has been i mplement ed i n ver si on 3.01. I n t he l eft
pane, al l popul at i on sampl es found i n t he opened pr oj ect ar e l i st ed i n t he ri ght col umn,
wi t h a corr espondi ng group i dent i fi er i n t he l eft col umn. I f no Genet i c St ruct ure i s
defi ned, t he "0" i dent i fier will be li st ed. I n t he r i ght pane, t he resul t i ng st ruct ur e i s
shown.
Popul at i on sampl es can be assi gned t o di ffer ent groups by gi vi ng t hem a new group
i dent i fi er, l i ke:


Manual Arlequin ver. 3.5 Get t ing st art ed 23

By pr essi ng on t he but t on "Updat e Pr oj ect ", t hi s new St ruct ur e wi ll be added i n t he
proj ect fi l e, a backup- copy of t he ol d proj ect wi ll be cr eat ed ( wi t h t he ext ensi on
* .arp.bak) , and t he new r evi sed pr oj ect wi ll be r el oaded i nt o Arl equi n.
3.3 Loadi ng pr oj ect f i l es i nt o Ar l equi n
Once t he proj ect fi l e i s built , you must l oad i t int o Arl equi n. You can do t hi s ei t her by
act i vat i ng t he menu File | Open pr oj ect , by cl i cki ng on t he Open proj ect but t on on t he
t ool bar, or by act i vat i ng t he File | Recent proj ect smenu.

A di al og box shoul d open t o al l ow t he sel ect i on of an exi st i ng proj ect you want t o wor k
on, l i ke











Manual Arlequin ver. 3.5 Get t ing st art ed 24


The Arl equin proj ect fi l es must have t he * . arp ext ensi on. I f your pr oj ect fil e i s vali d, i t s
mai n propert i es will be vi si bl e i n t he Proj ect t ab, as shown bel ow:



Manual Arlequin ver. 3.5 Get t ing st art ed 25

3.4 Sel ect i ng anal yses t o be per f or med on your dat a
Di ffer ent anal yses can be sel ect ed and t hei r par amet ers t uned i n t he Set t ings t ab.

You can navi gat e i n t he t ree on t he l eft si de t o sel ect di ffer ent t ypes of comput at i ons you
whi sh t he set up. Dependi ng on your sel ect i on, t he ri ght part of t he t ab di al og i s will
show you di fferent paramet ers t o set up.
3.5 Cr eat i ng and usi ng Set t i ng Fi l es
By set t ings we mean any al t ernat i ve choi ce of anal yses and t hei r paramet ers t hat can be
set up i n Arl equin. As you can choose di ffer ent t ypes of anal yses, as well as di ffer ent
opt i ons for each of t hese anal yses, al l t hese choi ces can be saved i nt o set t ing files. These
fil es general l y t ake t he same name as t he pr oj ect fi l es, but wi t h t he ext ensi on * . ar s.
Set t i ng fil es can be cr eat ed at any t i me of your wor k by cl i cki ng on t he Save but t on on
t op of t he set t i ngs t ree. Al t ernat i vel y, i f you act i vat e t he Use associat ed set t ings i n t he
Arlequin configurat ion pane ( see Arl equi n confi gurat i on sect i on 3.1) , t he l ast used
set t i ngs used on t hi s proj ect will be aut omat i call y saved when you cl ose t he pr oj ect and
rel oaded when you open i t l at er agai n. The set t i ng are st ored i n a fil e havi ng same name
as t he proj ect fil e, and t he .ars ext ensi on. These set t i ng fil es are conveni ent when you
want t o r epeat some anal yses done previ ousl y, or when you want t o make di ffer ent t ypes

Manual Arlequin ver. 3.5 Get t ing st art ed 26

of comput at i ons on several pr oj ect s, as i t i s possi bl e usi ng bat ch fil es ( see Bat ch fi l es i n
sect i on 4.6) gi vi ng you consi derabl e fl exi bilit y on t he anal yses you can perform, and
avoi ding t edi ous and repet i t i ve mouse- cl i cks.
3.6 Per f or mi ng t he anal yses
The sel ect ed anal yses can be performed ei t her by cl i cki ng on t he St art but t on.

I f an er ror occurs duri ng t he execut i on, Arl equin will wri t e di agnost i c i nfor mat i on in a
l og fil e. I f t he er r or i s not t oo severe, Arl equi n will open t he web br owser where you can
consul t t he l og fil e. I f t her e i s a memory err or, Arl equin will shut down i t sel f. I n t he
l at t er case, you shoul d consul t t he Arl equi n l og fil e bef or e l aunchi ng a new anal ysi s i n
order t o get some i nfor mat i on on wher e or at whi ch st age of t he execut i on t he pr obl em
occurr ed. To do t hat , j ust r eopen your l ast pr oj ect , and pr ess on t he Vi ew Log Fil e
but t on on t he Tool Bar above. I n any case, t he fil e Arlequin_log.t xt i s l ocat ed i n t he
proj ect resul t s di rect ory.
3.7 I nt er r upt i ng t he comput at i ons
The comput at i ons can be st opped at any t i me by pr essi ng ei t her t he Pause or t he St op
but t ons on t he t ool bar.

Aft er pr essi ng on t he Pause but t on, comput at i ons can be r esumed by pressi ng on t he
Resume but t on.


Manual Arlequin ver. 3.5 Get t ing st art ed 27

Not e t hat by pressi ng t he St op but t on you have no guarant ee t hat t he curr ent
comput at i ons gi ve cor rect r esul t s. For very l arge pr oj ect fi l es, you may have t o wai t
for a few seconds befor e t he cal cul at i ons are st opped.
3.8 Check i ng t he r esul t s
When t he cal cul at i ons are over, Arl equi n will creat e a r esul t di rect ory, whi ch has t he
same name as t he pr oj ect fi l e, but wi t h t he * . r es ext ensi on. Thi s di rect ory cont ai ns all
t he r esul t fil es, part i cul arl y t he mai n resul t fil e wi t h t he same name as t he pr oj ect fi l e,
but wi t h t he * . ht m or * . xml ext ensi on dependi ng on t he opt i on defi ned i n t he Opt ion
menu. Aft er t he comput at i ons, t he r esul t fil e [ proj ect name] _main.ht ml i s aut omat i call y
l oaded i n t he defaul t ht ml browser. You can al so vi ew your r esul t s at anyt i me by
cli cki ng on t he View r esult s but t on.


Manual Arlequin ver 3.5 I nput files 28

4 I NPUT FI LES
4.1 For mat of Ar l equi n i nput f i l es
Arl equin i nput fil es ar e al so call ed proj ect fil es. The proj ect fi l es cont ai n t he descri pt i on of
t he pr opert i es of t he dat a, as wel l as t he raw dat a t hemsel ves. The pr oj ect fi l e may al so
refer t o one or more ext ernal dat a fil es.
Not e t hat comment s begi nni ng by a "# " charact er can be put anywher e i n t he Arl equi n
proj ect fi l es. Everyt hi ng t hat fol l ows t he "# " charact er on a l i ne will be ignored by
Arl equin.
Al so not e t hat Arl equin does not support i nt erl eaved dat a, i mpl ying t hat hapl ot ypes,
mul t i- l ocus genot ypes, as well as ent i re rows of di st ance mat ri ces must be ent ered on a
si ngl e line. A maxi mum of 100,000 char act ers can be ent er ed on each l ine.
4.2 Pr oj ect f i l e st r uct ur e
I nput fi l es are st ruct ured i nt o t wo mai n sect i ons wi t h addi t i onal subsect i ons t hat must
appear i n t he fol l owi ng order:
1) Pr ofi l e sect i on ( mandat ory)
2) Dat a sect i on ( mandat ory)
2a) Hapl ot ype l i st ( opt i onal )
2b) Di st ance mat ri ces ( opt i onal )
2c) Sampl es ( mandat ory)
2d) Genet i c st ruct ur e ( opt i onal )
2e) Mant el t est s ( opt i onal )

We now descri be t he cont ent of each ( sub- ) sect i on i n more det ai l .
4.2.1 Pr of i l e sect i on
The propert i es of t he dat a must be descri bed i n t hi s sect i on. The begi nni ng of t he pr ofi l e
sect i on i s i ndi cat ed by t he keyword [ Pr ofi l e] ( wit hi n bracket s) .
One must al so speci fy
The t it le of t he curr ent proj ect ( used t o descri be t he cur rent anal ysi s)
Not at i on: Ti t l e=
Possi bl e val ue: Any st ri ng of charact er s wi t hin doubl e quot es
Exampl e: Title="An analysis of haplotype frequencies in 2 populations"
The number of samples or populat ions present i n t he cur rent proj ect
Not at i on: NbSampl es =

Manual Arlequin ver 3.5 I nput files 29

Possi bl e val ues: Any i nt eger number bet ween 1 and 1000.
Exampl e: NbSamples =3
The t ype of dat a t o be analyzed. Onl y one t ype of dat a i s al l owed per pr oj ect
Not at i on: Dat aTy pe =
Possi bl e val ues: DNA, RFLP, MI CROSAT, STANDARD and FREQUENCY
Exampl e: DataType = DNA
I f t he curr ent proj ect deals wit h haplot ypic or genot ypic dat a
Not at i on: Gen ot y pi cDat a =
Possi bl e val ues: 0 ( haplot ypi c dat a) , 1 ( genot ypi c dat a)
Exampl e: GenotypicData = 0
One can al so opt i onall y speci fy
The charact er used t o separat e t he allel es at di fferent loci ( t he l ocus separat or)
Not at i on: LocusSepar at or =
Possi bl e val ues: WHI TESPACE, TAB, NONE, or any charact er ot her t han "# ", or t he
charact er speci fyi ng mi ssi ng dat a.
Exampl e: LocusSeparator = TAB
Defaul t val ue: WHI TESPACE
I f t he gamet ic phase of genot ypes is known
Not at i on: Gamet i cPhase =
Possi bl e val ues: 0 ( gamet i c phase not known) , 1 ( known gamet i c phase)
Exampl e: GameticPhase = 1
Defaul t val ue: 1
I f t he genot ypic dat a pr esent a recessive all ele
Not at i on: Recessi v eDat a =
Possi bl e val ues: 0 ( co- domi nant dat a) , 1 ( recessi ve dat a)
Exampl e: RecessiveData =1
Defaul t val ue: 0
The code for t he recessi ve all ele
Not at i on: Recessi v eAl l el e =
Possi bl e val ues: Any st ri ng of charact er s wi t hin doubl e quot es. Thi s st ri ng can be
expl i ci t l y used i n t he input fil e t o i ndi cat e t he occurr ence of a
recessi ve homozygot e at one or sever al l oci .
Exampl e: RecessiveAllele ="xxx"
Defaul t val ue: "null "
The charact er used t o code for mi ssing dat a

Manual Arlequin ver 3.5 I nput files 30

Not at i on: Mi ssi ngDat a =
Possi bl e val ues: A charact er used t o speci fy t he code for mi ssi ng dat a, ent er ed
bet w een si ngl e or doubl e quot es.
Exampl e: MissingData ='$'
Defaul t val ue: '?'
I f haplot ype or phenot ype frequencies are ent ered as absolut e or relat i ve values
Not at i on: Fr equency =
Possi bl e val ues: ABS ( absol ut e val ues) , REL ( rel at i ve val ues: absol ut e val ues will
be found by mul t i pl yi ng t he r el at i ve fr equenci es by t he sampl e si zes)
Exampl e: Frequency = ABS
Defaul t val ue: ABS
The number of significant digit s for haplot ype fr equency out put s
Not at i on: Fr equency Thr eshol d =
Possi bl e val ues: A r eal number bet ween 1e- 2 and 1e- 7
Exampl e: FrequencyThreshold = 0.00001
Defaul t val ue: 1e- 5
The convergence crit eri on for t he EM algorit hm used t o est imat e haplot ype
fr equencies and linkage disequilibrium fr om genot ypic dat a
Not at i on: Epsi l on Val u e =
Possi bl e val ues: A r eal number bet ween 1e- 7 and 1e- 12.
Exampl e: EpsilonValue = 1e-10
Defaul t val ue: 1e- 7
4.2.2 Dat a sect i on
Thi s sect i on cont ai ns t he raw dat a t o be anal yzed. The begi nni ng of t he pr ofi l e sect i on i s
i ndi cat ed by t he keywor d [ Dat a] ( wi t hin bracket s) .

I t cont ai ns several sub- sect i ons:
4.2. 2.1 Haplot ype list ( opt ional)
I n t hi s sub- sect i on, one can defi ne a l i st of t he hapl ot ypes t hat ar e used for al l sampl es.
Thi s sect i on i s most useful i n order t o avoi d r epeat i ng t he all eli c cont ent of t he
hapl ot ypes pr esent i n t he sampl es. For i nst ance, i t can be t edi ous t o wri t e a ful l
sequence of several hundreds of nucl eot i des next t o each hapl ot ype i n each sampl e. I t
i s much easi er t o assi gn an i dent i fi er t o a gi ven DNA sequence i n t he hapl ot ype li st , and
t hen use t hi s i dent i fi er in t he sampl e dat a sect i on. Thi s way Arl equi n will know exact l y
t he DNA sequences associ at ed t o each hapl ot ype.

Manual Arlequin ver 3.5 I nput files 31

However, t hi s sect i on i s opt i onal . The hapl ot ypes can be ful l y defi ned in t he sampl e dat a
sect i on.
An i dent i fi er and a combi nat i on of al l el es at di ffer ent l oci ( one or more) descri be a gi ven
hapl ot ype. The l ocus separat or defi ned i n t he profi l e sect i on must separat e each
adj acent all el e from each ot her .
I t i s al so possi bl e t o have t he defi ni t i on of t he hapl ot ypes i n an ext ernal fil e. Use t he
keyword EXTERN fol l owed by t he name of t he fi l e cont ai ni ng t he defi ni t ion of t he
hapl ot ypes. Read Example 2 t o see how t o proceed. I f t he fi l e " hapl_file.hap" cont ai ns
exact l y what i s bet ween t he braces of Exampl e 1, t he t wo hapl ot ype l i st s ar e
equi val ent .
Exampl e 1:
[[HaplotypeDefinition]] #start the section of Haplotype definition
HaplListName="list1" #give any name you whish to this list
HaplList={
h1 A T #on each line, the name of the haplotype is
h2 G C # followed by its definition.
h3 A G
h4 A A
h5 G G
}
Exampl e 2:
[[HaplotypeDefinition]] #start the section of Haplotype definition
HaplListName="list1" #give any name you whish to this list
HaplList = EXTERN "hapl_file.hap"
4.2. 2.2 Dist ance mat rix ( opt ional)
Here, a mat ri x of genet ic di st ances bet ween hapl ot ypes can be speci fi ed. Thi s sect i on i s
her e t o pr ovi de some compat i bilit y wi t h earl i er WI NAMOVA fi l es. The dist ance mat ri x
must be a l ower di agonal wi t h zeroes on t he di agonal . Thi s di st ance mat ri x will be used
t o comput e t he genet i c st ruct ur e speci fi ed i n t he genet i c st ruct ur e sect i on. As speci fi ed
i n AMOVA, t he el ement s of t he mat ri x shoul d be squar ed Eucli dean di st ances. I n
pract i ce, t hey are an eval uat i on of t he number of mut at i onal st eps bet ween pai rs of
hapl ot ypes.
One al so has t o pr ovi de t he l abel s of t he hapl ot ypes for whi ch t he di st ances ar e
comput ed. The order of t hese l abel s must corr espond t o t he order of r ows and col umns
of t he di st ance mat ri x. I f a hapl ot ype l i st i s al so provi ded i n t he pr oj ect , t he l abel s and
t hei r order shoul d be t he same as t hose gi ven for t he hapl ot ype l i st .
Usuall y, i t will be much mor e conveni ent t o l et Arl equin comput e t he dist ance mat ri x by
i t sel f.
I t i s al so possi bl e t o have t he defi ni t i on of t he di st ance mat ri x gi ven i n an ext ernal fil e.
Use t he keyword EXTERN foll owed by t he name of t he fi l e cont ai ni ng t he defi ni t i on of
t he mat ri x. Read Exampl e 2 t o see how t o proceed.

Manual Arlequin ver 3.5 I nput files 32

Exampl e 1:
[[DistanceMatrix]] #start the distance matrix definition section
MatrixName= "none" # name of the distance matrix
MatrixSize= 4 # size = number of lines of the distance matrix
MatrixData={
h1 h2 h3 h4 # labels of the distance matrix (identifier of the
0.00000 # haplotypes)
2.00000 0.00000
1.00000 2.00000 0.00000
1.00000 2.00000 1.00000 0.00000
}

Exampl e2:
[[DistanceMatrix]] #start the distance matrix definition section
MatrixName= "none" # name of the distance matrix
MatrixSize= 4 # size = number of lines of the distance matrix
MatrixData= EXTERN "mat_file.dis"
4.2. 2.3 Samples
I n t hi s obl igat ory sub- sect i on, one defi nes t he hapl ot ypi c or genot ypi c cont ent of t he
di fferent sampl es t o be anal yzed.
Each sampl e defi ni t i on begi ns by t he keyword SampleName and ends aft er a
SampleDat a has been defi ned.
One must speci fy:
A name for each sample
Not at i on: Sampl eName =
Possi bl e val ues: Any st ri ng of charact er s wi t hin quot es.
Exampl e: SampleName= "A first example of a sample name"
Not e: Thi s name wi ll be used i n t he St ruct ur e sub- sect i on t o i dent i fy t he di ffer ent
sampl es, whi ch ar e part of a gi ven genet i c st ruct ure t o t est .
The size of t he sampl e
Not at i on: Sampl eSi ze =
Possi bl e val ues: Any i nt eger val ue.
Exampl e: SampleSize=732
Not e: For hapl ot ypi c dat a, t he sampl e si ze i s equal t o t he hapl oi d sample si ze. For
genot ypi c dat a, t he sampl e si ze shoul d be equal t o t he number of di pl oid
i ndi vi dual s present i n t he sampl e. When absol ut e frequenci es are ent er ed,
t he si ze of each sampl e will be checked agai nst t he sum of al l hapl ot ypic
fr equenci es wi ll check. I f a di scr epancy i s found, a Warning message i s
i ssued i n t he l og fil e, and t he sampl e si ze i s set t o t he sum of hapl ot ype
fr equenci es. When r el at ive fr equenci es ar e speci fi ed, no such check i s

Manual Arlequin ver 3.5 I nput files 33

possi bl e, and t he sampl e si ze i s used t o convert r el at i ve fr equenci es t o
absol ut e fr equenci es.
The dat a it self
Not at i on: Sampl eDat a =
Possi bl e val ues: A li st of hapl ot ypes or genot ypes and t hei r frequenci es as found i n
t he sampl e, ent ered wi t hin braces
Exampl e:
SampleData={
id1 1 ACGGTGTCGA
id2 2 ACGGTGTCAG
id3 8 ACGGTGCCAA
id4 10 ACAGTGTCAA
id5 1 GCGGTGTCAA
}
Not e: The l ast cl osi ng brace marks t he end of t he sampl e defi ni t i on. A new sampl e
defi ni t i on begi ns wit h anot her keyword Sampl eName.
FREQUENCY dat a t ype:
I f t he dat a t ype i s set t o FREQUENCY, one must onl y speci fy for each hapl ot ype i t s
i dent i fi er ( a st ri ng of charact er s wi t hout bl anks) and i t s sampl e fr equency ( ei t her
rel at i ve or absol ut e) . I n t hi s case t he hapl ot ype shoul d not be defi ned.
Exampl e:
SampleData={
id1 1
id2 2
id3 8
id4 10
id5 1
}

Hapl ot y pi c dat a
For al l dat a t ypes except FREQUENCY, one must speci fy for each hapl ot ype i t s i dent i fi er
and i t s sampl e frequency. I f no hapl ot ype l i st has been defi ned earli er, one must al so
defi ne her e t he al l eli c cont ent of t he hapl ot ype. The hapl ot ype i dent i fi er i s used t o
est abl i sh a li nk bet ween t he hapl ot ype and i t s all eli c cont ent mai nt ained i n a l ocal
dat abase.
Once a hapl ot ype has been defi ned, i t needs not be defi ned agai n. However t he al l eli c
cont ent of t he same hapl ot ype can al so be defi ned several t i mes. The di fferent
defi ni t i ons of hapl ot ypes wi t h same i dent i fi er ar e checked for equali t y. I f t hey ar e found
i dent i cal , a warni ng i s i ssued i s t he l og fi l e. I f t hey ar e found t o be di ffer ent at some
l oci , an err or i s i ssued and t he pr ogram st ops, aski ng you t o corr ect t he err or .

Manual Arlequin ver 3.5 I nput files 34

For compl ex hapl ot ypes li ke very l ong DNA sequences, one can perfect l y assi gn
di fferent i dent i fi ers t o al l sequences ( each havi ng t hus an absol ut e frequency of 1) ,
even i f some sequences t urn out t o be si mil ar t o each ot her . I f t he opt ion I nfer
Haplot ypes fr om Dist ance Mat rix i s checked i n t he General Set t i ngs di alog box, Arl equi n
will check whet her haplot ypes are effect i vel y different or not . Thi s i s a good precaut i on
when one t est s t he sel ect i ve neut ral i t y of t he sampl e usi ng Ewens- Wat t erson or
Chakrabort y' s t est s, because t hese t est s are based on t he observed number of
effect i vel y di ffer ent haplot ypes.

Gen ot y pi c dat a
For each genot ype, one must speci fy i t s i dent i fi er, i t s sampl e frequency, and i t s all eli c
cont ent . Genot ypi c dat a can be ent er ed ei t her as a l i st of i ndi vi dual s, all havi ng an
absol ut e fr equency of 1, or as a l i st of genot ypes wi t h di ffer ent sampl e frequenci es.
Duri ng t he comput at i ons, Arl equin will compare all genot ypes t o all ot her s and
recomput e t he genot ype frequenci es.
The al l eli c cont ent of a genot ype i s ent er ed on t wo separat e l i nes i n t he form of t wo
pseudo- hapl ot ypes.
Exampl es:
1) :
Id1 2 ACTCGGGTTCGCGCGC # t he fi rst pseudo- hapl ot ype
ACTCGGGCTCACGCGC # t he second pseudo- hapl ot ype
2)
my_id 4 0 0 1 1 0 1
0 1 0 0 1 1
I f t he gamet i c phase i s supposed t o be known, t he pseudo- hapl ot ypes are
t reat ed as t rul y defi ned hapl ot ypes.
I f t he gamet i c phase i s not supposed t o be known, onl y t he al l eli c cont ent of
each l ocus i s supposed t o be known. I n t hi s case an equi val ent defi ni t i on of t he
upper phenot ype woul d have been:
my_id 4 0 1 1 0 0 1
0 0 0 1 1 1
4.2. 2.4 Genet ic st ruct ure
The hi erar chi cal genet i c st ruct ur e of t he sampl es i s speci fi ed i n t hi s opt ional sub-
sect i on. I t i s possi bl e t o defi ne gr oups of popul at i ons. Thi s subsect i on st art s wi t h t he
keyword [ [ St ruct ur e] ] . The defi ni t i on of a genet i c st ruct ure i s onl y requi red for AMOVA
anal yses.
One must speci fy:
A name for t he genet ic st ruct ur e

Manual Arlequin ver 3.5 I nput files 35

Not at i on: St r uct u r eName =
Possi bl e val ues: Any st ri ng of charact er s wi t hin quot es.
Exampl e: StructureName= "A first example of a genetic structure"
Not e: Thi s name wi ll be used t o r efer t o t he t est ed st ruct ur e i n t he out put fil es.
The number of groups defined in t he st ruct ur e
Not at i on: NbGr ou ps =
Possi bl e val ues: Any i nt eger val ue.
Exampl e: NbGroups = 5
Not e: I f t hi s val ue does not corr espond t o t he number of defi ned gr oups, t hen
cal cul at i ons will not be possi bl e, and an err or message will be di spl ayed.
The gr oup definit ions
Not at i on: Gr oup =
Possi bl e val ues: A li st cont ai ning t he names of t he sampl es bel ongi ng t o t he
group, ent ered wi t hi n braces. Repeat t hi s for as many gr oups you
have i n your st ruct ure. I t i s of cour se not al l owed t o put t he same
popul at i on i n di ffer ent groups. Al so not e t hat a comment si gn ( # ) i s
not all owed aft er t he openi ng brace and woul d lead t o an er ror
message. Comment s about t he gr oup shoul d t her efor e be done
befor e t he defi ni t i on of t he gr oup.
Exampl e ( NbGroups= 2 ) :
Group ={
population1
population2
population3
}
Group ={
population4
population5
}

A new genet i c St ruct ur e Edi t or i s now avai l abl e t o hel p you wi t h t he pr ocess of defi ning
t he genet i c St ruct ure t o be t est ed ( see sect i on Defi ning t he Genet i c St r uct ure t o be
t est ed 3.2.1) .
4.2. 2.5 Mant el t est set t ings
Thi s subsect i on all ows t o speci fy some di st ance mat ri ces ( Ymat rix, X1 and X2) . The goal
i s t o comput e a corr el at ion bet ween t he Ymat rix and X1 or a part i al corr el at i on bet ween
t he Ymat ri x, X1 and X2. The Ymat rix can be ei t her a pai rwi se popul at i on F
ST
mat rix or a
cust om mat ri x ent ered i nt o t he pr oj ect by t he user. X1 ( and X2) have t o be defi ned i n
t he pr oj ect .

Manual Arlequin ver 3.5 I nput files 36

Thi s subsect i on st art s wi t h t he keyword [[Mantel]]. The mat ri ces, whi ch ar e used t o
t est corr el at i on bet ween genet i c di st ances and one or t wo ot her di st ance mat ri ces, ar e
defi ned i n t hi s sect i on.
One must speci fy:
The size of t he mat ri ces used for t he Mant el t est .
Not at i on: Mat r i x Si ze=
Possi bl e val ues: Any posi t i ve i nt eger val ue.
Exampl e: MatrixSize= 5
The number of mat rices among which we comput e t he corr elat ions. I f t his number
is 2 t he cor relat i on coefficient bet ween t he YMat r i x ( see next keywor d) and t he
mat rix defined aft er t he Di st Mat Mant el keyword. I f t his number is 3 t he part ial
corr elat ion bet ween t he YMat r i x ( see next keyword) and t he t wo ot her mat ri ces
are comput ed. I n t his case t he Mant el sect ion should cont ain t wo Di st Mat Mant el
keywords followed by t he definit ion of a dist ance mat rix.
Not at i on: Mat r i x Number =
Exampl e: MatrixNumber= 2
The mat rix t hat is used as genet ic dist ance. I f t he value is fst t hen t he
corr elat ion bet ween t he populat ion pairwi se F
ST
mat rix ot her anot her mat rix is
comput ed. . I f t he value is cust om t hen t he corr elat ion bet ween a proj ect
defined mat rix and ot her mat ri x is comput ed
Not at i on: YMat r i x =

Possible values: Corresponding YMat rix
"fst " Y= Fst
"log_fst " Y= log( Fst )
"slat kinlinearfst " Y= Fst / ( 1- Fst )
"log_slat kinlinearfst " Y= log( Fst / ( 1- Fst ) )
"nm" Y= ( 1- Fst ) / ( 2 Fst )
"cust om" Y= user- specified in t he
proj ect

Exampl e: YMatrix = fst
Labels t hat ident ify t he columns of t he YMat r i x . I n case of YMat rix = fst t he
labels should be t he names of populat ion fr om wit ch we use t he pai rwi se F
ST

dist ances. I n case of YMat rix = cust om t he labels can be chosen by t he user .
These labels will be used t o sel ect t he sub- mat r ices on whi ch cor relat i on ( or part ial
corr elat ion) is comput ed.
Not at i on: YMat r i x Label s =

Manual Arlequin ver 3.5 I nput files 37

Possi bl e val ues: A li st cont ai ning t he names of t he l abel name bel onging t o t he
group, ent ered wi t hi n braces.
Exampl e: YMatrixLabels = {
"Population1 " "Population4" "Population2"
"Population8" "Population5"
}
A keyword t hat allows t o define a mat rix wit h wit ch t he cor relat i on wit h t he
YMat r i x is comput ed.
Not at i on: Di st Mat Mant el =
Exampl e: DistMatMantel={
0.00
3.20 0.00
0.47 0.76 0.00
0.00 1.23 0.37 0.00
0.22 0.37 0.21 0.38 0.00
}
Labels defining t he sub- mat rix on wit ch t he cor r elat ion is comput ed.
Not at i on: UsedYMat r i x Label s=
Possi bl e val ues: A li st cont ai ning t he names of t he l abel name bel onging t o t he
group, ent ered wi t hi n braces.
Exampl e: UsedYMatrixLabels={
"Population1 "
"Population5"
"Population8"
}
Not e: I f you want t o comput e t he corr el at i on bet ween ent i rel y user- speci fi ed mat ri ces,
you need t o l i st a dummy popul at i on sampl e in t he [[Sample]] sect i on, i n order t o
all ow for a pr oper r eadi ng of t he Arl equi n proj ect . We hope t o remove t hi s wei rd
li mi t at i on, but i t i s t he way i t wor ks for now !
Tw o compl et e ex ampl es:
Ex ampl e 1 : We comput e t he part i al corr el at i on bet ween t he YMat ri x and t wo
ot her mat ri ces X1 and X2. The YMat ri x will be t he pai rwi se F
ST
mat ri x bet ween t he
popul at i on li st ed aft er YMat rixLabels . The part ial corr el at i ons will be based on t he
3 by 3 mat ri x whose l abel s are l i st ed aft er UsedYMat rixLabels.


Manual Arlequin ver 3.5 I nput files 38

[[Mantel]]
#size of the distance matrix:
MatrixSize= 5
#number of declared matrixes:
MatrixNumber=3
#what to be taken as the YMatrix
YMatrix="Fst"
#Labels to identify matrix entry and Population
YMatrixLabels ={
"pop 1"
"pop 2"
"pop 3"
"pop 4"
"pop 5"
}
# distance matrix: X1
DistMatMantel={
0.00
1.20 0.00
0.17 0.84 0.00
0.00 1.23 0.23 0.00
0.12 0.44 0.21 0.12 0.00
}

# distance matrix: X2
DistMatMantel={
0.00
3.20 0.00
0.47 0.76 0.00
0.00 1.23 0.37 0.00
0.22 0.37 0.21 0.38 0.00
}

UsedYMatrixLabels ={
"pop 1"
"pop 3"
"pop 4"
}

Ex ampl e 2 : we comput e t he cor rel at i on bet ween t he YMat ri x and anot her mat ri x
X1. The YMat ri x will be defi ned aft er t he keyword YMat r i x . The corr el at i on will be
based on t he 3 by 3 mat ri x whose l abel s are l i st ed aft er UsedYMat rixLabels.

[[Mantel]]
#size of the distance matrix:
MatrixSize= 5
#number of declared matrixes: 1 or 2
MatrixNumber=2
#what to be taken as YMatrix
YMatrix="Custom"
#Labels to identify matrix entry and Population
YMatrixLabels ={

Manual Arlequin ver 3.5 I nput files 39

"1" "2" "3"
"4" "5"
}
#This will be the Ymatrix
DistMatMantel={
0.00
1.20 0.00
1.17 0.84 0.00
1.00 1.23 0.23 0.00
2.12 0.44 0.21 0.12 0.00
}
#This will be X1
DistMatMantel={
0.00
3.20 0.00
2.23 1.73 0.00
2.55 2.23 0.35 0.00
2.23 1.62 1.54 2.32 0.00
}
UsedYMatrixLabels ={
"1" "2"
"3"
"4" "5"
}
4.3 Ex ampl e of an i nput f i l e
The fol l owi ng small exampl e i s a proj ect fi l e cont aini ng four popul at i ons. The dat a t ype i s
STANDARD genot ypi c dat a wi t h unknown gamet i c phase.

[Profile]
Title="Fake HLA data"
NbSamples=4
GenotypicData=1
GameticPhase=0
DataType=STANDARD
LocusSeparator=WHITESPACE
MissingData='?'

[Data]

[[Samples]]
SampleName="A sample of 6 Algerians"
SampleSize=6
SampleData={
1 1 1104 0200
0700 0301
3 3 0302 0200
1310 0402
4 2 0402 0602
1502 0602
}
SampleName="A sample of 11 Bulgarians"
SampleSize=11
SampleData={

Manual Arlequin ver 3.5 I nput files 40

1 1 1103 0301
0301 0200
2 4 1101 0301
0700 0200
3 1 1500 0502
0301 0200
4 1 1103 0301
1202 0301
5 1 0301 0200
1500 0601
6 3 1600 0502
1301 0603
}
SampleName="A sample of 12 Egyptians"
SampleSize=12
SampleData={
1 2 1104 0301
1600 0502
3 1 1303 0301
1101 0502
4 3 1502 0601
1500 0602
6 1 1101 0301
1101 0301
8 4 1302 0502
1101 0609
9 1 1500 0302
0402 0602
}
SampleName="A sample of 8 French"
SampleSize=8
SampleData={
219 1 0301 0200
0101 0501
239 2 0301 0200
0301 0200
249 1 1302 0604
1500 0602
250 3 1401 0503
1301 0603
254 1 1302 0604
}

[[Structure]]

StructureName="My population structure"
NbGroups=2
Group={
"A sample of 6 Algerians"
"A sample of 12 Egyptians"
}
Group={
"A sample of 11 Bulgarians"
"A sample of 8 French"
}

Manual Arlequin ver 3.5 I nput files 41

4.4 Aut omat i cal l y cr eat i ng t he out l i ne of a pr oj ect f i l e
I n order t o hel p you set t ing up qui ckl y a proj ect fil e, Arl equi n can creat e t he out li ne of a
proj ect fi l e for you.
I n order t o do t hi s, use t he Pr oj ect w i zar d t ab.

See sect i on Pr oj ect Wi zard ( 6.3.4) for mor e i nformat i on on how t o set up up t he di ffer ent
paramet er s.
4.5 Conv er si on of dat a f i l es
Sel ect i ng t he I mport Dat a t ab opens a t ab for t he conversi on of dat a files fr om one
for mat t o an ot her.
Thi s mi ght be useful for users al r eady havi ng dat a fil es set up for ot her dat a soft war e
packages. I t i s al so possi bl e t o convert Arl equin dat a fi l es int o ot her for mat s.
The curr ent l y recogni zed dat a format s ar e:
Arl equin

Manual Arlequin ver 3.5 I nput files 42

GenePop ver . 3.0,
Bi osys ver.1.0,
Phyli p ver. 3.5
Mega ver. 1.0
Wi n Amova ver. 1.55.

The t ransl at i on procedure i s mor e full y descri bed i n t he Proj ect Wi zard sect i on 6.3.5.
These conver si on rout i nes were done on t he basi s of t he descri pt i on of t he i nput fil e
for mat found i n t he user manual s of each of aforement i oned pr ograms. The t est s done
wi t h t he exampl e fil es gi ven wi t h t hese programs worked fi ne. However , t he ori gi nal
readi ng procedur es of t he ot her soft war e packages may be mor e t ol erant t han our own,
and some dat a may be i mpossi bl e t o convert . Thus, some smal l corr ect i ons will need t o
be done by hand, and we apol ogi ze for t hat .
4.6 Ar l equi n bat ch f i l es
A bat ch fi l e ( wi t h t he .arb ext ensi on) i s si mpl y a t ext fil e havi ng on each li ne t he name of
t he pr oj ect fi l es t hat shoul d be anal yzed by Arl equi n. The number of dat a fil es t o be
anal yzed can be arbi t rar y l arge.
I f t he proj ect t ype you open i s of Bat ch file t ype, t he Bat ch file t ab panel opens up
aut omat i call y and all ows you t o t une t he set t i ngs of your bat ch run.

Manual Arlequin ver 3.5 I nput files 43


On t he l eft t ree pane you can see pr oj ect fi l es list ed i n t he bat ch fi l e.
Set t i n gs choi ce:
You can ei t her use t he same opt i ons for al l proj ect fil es by sel ect i ng Use i nt er f ace
set t i ngs, or use t he set t ing fil e associ at ed wi t h each proj ect fi l e by sel ect i ng Use
associ at ed set t i ngs. I n t he fi rst case, t he same anal yses wi ll be performed on all
proj ect fi l es li st ed i n t he bat ch fi l e. I n t he second case, you can perform di ffer ent
comput at i ons on each proj ect fi l e li st ed i n t he bat ch fil e, gi ving you much more
fl exi bilit y on what shoul d be done. However , i t impli es t hat set t i ng fil es have been
prepared pr evi ousl y, r ecordi ng t he anal yses needi ng t o be per formed on t he dat a, as
wel l as t he opt i ons of t hese anal yses.
Resu l t s t o summar i ze:
Some r esul t s can be collect ed from t he anal ysi s of each bat ch fi l e, and put int o
summary fi l es. See sect i on Bat ch fi l es 6.3.7 for addi t i onal i nformat i on.
I f t he associ at ed pr oj ect fil e does not exi st , t he curr ent set t i ngs are used.
Not e t hat t he bat ch file, t he pr oj ect files, and t he set t ing files should all be in t he same
folder


Manual Arlequin ver 3.5 Examples of input files 44

5 EXAMPLES OF I NPUT FI LES
5.1 Ex ampl e of al l el e f r equency dat a
The fol l owi ng exampl e is a fi l e cont ai ni ng FREQUENCY dat a. The al l eli c composi t i on of t he
i ndi vi dual s i s not speci fied. The onl y i nformat i on we have ar e t he fr equenci es of t he
all el es.
[Profile]
Title="Frequency data"
NbSamples=2
GenotypicData=0
DataType=FREQUENCY
[Data]
[[Samples]]
SampleName="Population 1"
SampleSize=16
SampleData= {
000 1
001 3
002 1
003 7
004 4
}
SampleName="Population 2"
SampleSize=23
SampleData= {
000 3
001 6
002 2
003 8
004 4
}
5.2 Ex ampl e of st andar d dat a ( Genot ypi c dat a, unk now n
gamet i c phase, r ecessi v e al l el es)
I n t hi s exampl e, t he i ndi vi dual genot ypes for 5 HLA l oci ar e out put on t wo separat e li nes.
We speci fy t hat t he gamet i c phase bet ween l oci i s unknown, and t hat t he dat a has a
recessi ve al l el e. We expli ci t l y defi ne it t o be "xxx". Not e t hat wi t h recessi ve dat a, al l
si ngl e l ocus homozygot es ar e al so consi der ed as pot ent i al het erozygot es wi t h a null
all el e. We al so provi de Arl equin wi t h t he mi ni mum fr equency for t he est i mat ed
hapl ot ypes t o be l i st ed ( 0.00001) , and we defi ne t he mi ni mum epsil on val ue ( sum of
hapl ot ype fr equency di ffer ences bet ween t wo st eps of t he EM al gori t hm) t o be r eached
for t he EM al gori t hm t o st op when est i mat i ng hapl ot ype fr equenci es.

[Profile]
Title="Genotypic Data, Phase Unknown, 5 HLA loci"
NbSamples=1
GenotypicData=1

Manual Arlequin ver 3.5 Examples of input files 45

DataType=STANDARD
LocusSeparator=WHITESPACE
MissingData='?'
GameticPhase=0
RecessiveData=1
RecessiveAllele="xxx"
[Data]
[[Samples]]
SampleName="Population 1"
SampleSize=63
SampleData={
MAN0102 12 A33 Cw10 B70 DR1304 DQ0301
A33 Cw10 B7801 DR1304 DQ0302
MAN0103 22 A33 Cw10 B70 DR1301 DQ0301
A33 Cw10 B7801 DR1302 DQ0501
MAN0108 23 A23 Cw6 B35 DR1102 DQ0301
A29 Cw7 B57 DR1104 DQ0602
MAN0109 6 A30 Cw4 B35 DR0801 xxx
A68 Cw4 B35 DR0801 xxx
}
5.3 Ex ampl e of DNA sequence dat a ( Hapl ot ypi c)
Here, we defi ne 3 popul at i on sampl es of hapl ot ypi c DNA sequences. A si mpl e genet i c
st ruct ur e i s defi ned t hat j ust i ncorporat es t he t hree popul at i on sampl es i nt o a si ngl e
group of popul at i ons.

[Profile]
Title="An example of DNA sequence data"
NbSamples=3
GenotypicData=0
DataType=DNA
LocusSeparator=NONE
[Data]
[[Samples]]
SampleName="Population 1"
SampleSize=6
SampleData= {
000 3 GACTCTCTACGTAGCATCCGATGACGATA
001 1 GACTGTCTGCGTAGCATACGACGACGATA
002 2 GCCTGTCTGCGTAGCATAGGATGACGATA
}
SampleName="Population 2"
SampleSize=8
SampleData= {
000 1 GACTCTCTACGTAGCATCCGATGACGATA
001 1 GACTGTCTGCGTAGCATACGACGACGATA
002 1 GCCTGTCTGCGTAGCATAGGATGACGATA
003 1 GCCTGTCTGCCTAGCATACGATCACGATA
004 1 GCCTGTCTGCGTACCATACGATGACGATA
005 1 GCCTGTCCGCGTAGCGTACGATGACGATA
006 1 GCCCGTGTGCGTAGCATACGATGGCGATA
007 1 GCCTGTCTGCGTAGCATGCGACGACGATA
}
SampleName="Population 3"
SampleSize=6
SampleData= {
023 1 GCCTGTCTGCGTAGCATACGATGACGGTA

Manual Arlequin ver 3.5 Examples of input files 46

024 1 GCCTGTCTGCGTAGCGTACGATGACGATA
025 1 GCCTGTCTGCGTAGCATACGATGACGATA
026 1 GCCTGTCCGCGTAGCATACGGTGACGGTA
027 1 GCCTGTCTGCGTGGCATACGATGACGATG
028 1 GCCTGTCTGCGTAGCATACGATGACGATA
}
[[Structure]]
StructureName="A group of 3 populations analyzed for DNA"
NbGroups=1
Group= {
"Population 1"
"Population 2"
"Population 3"
}
5.4 Ex ampl e of mi cr osat el l i t e dat a ( Genot ypi c)
I n t hi s exampl e, we show how t o prepare a proj ect fil e consi st i ng i n mi crosat el li t e dat a.
Four popul at i on sampl es ar e defi ned. Three mi crosat el li t e l oci onl y have been anal yzed i n
di pl oi d i ndi vi dual s. The di fferent genot ypes ar e out put on t wo separat e lines. The
fr equenci es of t he di ffer ent genot ypes ar e l i st ed i n t he second col umn of t he fi rst l i ne of
each genot ype. Al t ernat i vel y, one coul d j ust out put t he genot ype of each i ndi vi dual , and
si mpl y set i t s fr equency t o 1. One shoul d however be careful t o use di fferent i dent i fi ers
for each i ndi vi dual . I t does not mat t er i f di fferent genot ype l abel s refer t o t he same
genot ype cont ent . Her e, onl y a few di fferent genot ypes have been found i n each of t he
popul at i ons ( whi ch shoul d not cor respond t o most r eal si t uat i ons, but we want ed t o save
space) . The genot ypes consi st i n t he number of r epeat s found at each locus. The genet i c
st ruct ur e t o be anal yzed consi st s i n 2 gr oups, each made up of 2 popul at i ons.
To make t hi ngs cl ear, t he genot ype " Genot 1" i n t he fi rst popul at i on, has been observed
27 t i mes. For t he fi rst l ocus, 12 and 13 r epeat s wer e obser ved, 22 and 23 repeat s wer e
observed for t he second l ocus, and fi nall y 16 and 17 r epeat s wer e found at t he t hi rd
l ocus.

[Profile]
Title="A small example of microsatellite data"
NbSamples=4
GenotypicData=1
#Unknown gametic phase between the 2 loci
GameticPhase=0
DataType=MICROSAT
LocusSeparator=WHITESPACE
[Data]
[[Samples]]
SampleName="MICR1"
SampleSize=28
SampleData= {
Genot1 27 12 23 17
13 22 16
Genot2 1 15 22 16
13 22 16

Manual Arlequin ver 3.5 Examples of input files 47

}
SampleName="MICR2"
SampleSize=59
SampleData= {
Genot3 37 12 24 18
12 22 16
Genot4 1 15 20 18
13 22 18
Genot5 21 14 22 16
14 23 16
}
SampleName="MICR3"
SampleSize=30
SampleData= {
Genot6 17 12 21 16
13 22 15
Genot7 1 12 20 16
13 23 16
Genot8 12 10 22 15
12 22 15
}
SampleName="MICR4"
SampleSize=16
SampleData= {
Genot9 15 13 24 16
13 23 17
Genot10 1 12 24 16
13 23 16
}
[[Structure]]
StructureName="Test microsat structure"
NbGroups=2
#The first group is made up of the first 2 samples
Group={
"MICR1"
"MICR2"
}
#The last 2 samples will be put into the second group
Group={
"MICR3"
"MICR4"
}
5.5 Ex ampl e of RFLP dat a( Hapl ot y pi c)
I n t hi s exampl e, we show how t o use a defi ni t i on li st of RFLP hapl ot ypes. Di ffer ent RFLP
hapl ot ypes ar e fi rst defi ned i n t he [[HaplotypeDefinition]] sect i on. The al l eli c cont ent
of each hapl ot ype i s t hen defi ned aft er a gi ven i dent i fi er. The i dent i fi er is t hen used at
t he popul at i on sampl es l evel . Not e t hat t he l i st of hapl ot ypes can i ncl ude hapl ot ypes t hat
are not l i st ed in t he popul at i on sampl es. The genet i c di versi t y of t he sampl es i s t hen
si mpl y descri bed as a li st of hapl ot ypes found i n each popul at i on as wel l as t hei r sampl e
fr equenci es.

[Profile]
Title="A small example of RFLP data: 3 populations"
NbSamples=3

Manual Arlequin ver 3.5 Examples of input files 48

GenotypicData=0
DataType=RFLP
LocusSeparator=WHITESPACE
#We tell Arlequin to compute Euclidian square distances between
#the haplotypes listed below
MissingData='?'
[Data]
[[HaplotypeDefinition]]
HaplListName="A fictive list of RFLP haplotypes"
HaplList= {
1 000011100111010011011001001011001101110100101101100
2 100011100111010011011001001011001101110100101100100
6 000011100111010010011001001011001101110100101101100
7 100011100111010011011001001011001101110100101101100
8 000011100111010011011001001001001101110100101101100
11 000001100111011011011001001011001101110100101111100
12 000011100111010011011001101011001101110100101101100
17 000011100111010011011001001011001100110100101101100
22 000011100111011011011001001011001101110100101100100
36 000011100111010011011001001010001100110100101101100
37 000011100111011011011001001111001101110100101100100
38 000111100111010011011001001011001101110100101101100
40 000011100111000011011001001011001101110100101101100
47 000011100111010011011001001011001101110100101100100
139 000011100111010011011001001011001111110100101001110
140 000011100111010011011001001011001101110100101100101
141 000011100111010010011001000011001101110100101100100
}
[[Samples]]
#1
SampleName="pop 1"
SampleSize=28
SampleData= {
1 27
40 1
}
#2
SampleName="pop 2"
SampleSize=75
SampleData= {
1 37
17 1
6 21
7 1
2 1
22 5
11 2
36 1
139 1
47 1
140 1
141 1
37 1
38 1
}
#3
SampleName="pop 3"
SampleSize=48
SampleData= {
1 46
8 1

Manual Arlequin ver 3.5 Examples of input files 49

12 1
}
[[Structure]]
StructureName="A single group of 3 samples"
NbGroups=1
Group={
"pop 1"
"pop 2"
"pop 3"
}
5.6 Ex ampl e of st andar d dat a ( Genot ypi c dat a, k now n
gamet i c phase)
I n t hi s exampl e, we have defi ned 3 sampl es consi st i ng of st andard mul t i - l ocus dat a wi t h
known gamet i c phase. I t means t hat t he all el es li st ed on t he same l i ne const i t ut e a
hapl ot ype on a gi ven chromosome. For i nst ance, t he genot ype G1 i s made up of t he t wo
fol l owi ng hapl ot ypes: AD on one chr omosome and BC on t he second, A and b bei ng t wo
all el es at t he fi rst l ocus, and C and D bei ng t wo all el es at t he second l ocus. Not e t hat t he
same all el e i dent i fi er can be used i n di fferent l oci . Thi s i s obvi ousl y t rue for Dna
sequences, but i t al so hol ds for al l ot her dat a t ypes.

[Profile]
Title="An example of genotypic data with known gametic phase"
NbSamples=3
GenotypicData=1
GameticPhase=1
#There is no recessive allele
RecessiveData=0
DataType=STANDARD
LocusSeparator=WHITESPACE
[Data]
[[Samples]]
SampleName="standard_pop1"
SampleSize=20
SampleData= {
G1 4 A D
B C
G2 5 A B
A A
G3 3 B B
B A
G4 8 D C
D C
}
SampleName="standard_pop2"
SampleSize=10
SampleData= {
G5 5 A C
C B
G6 5 B C
D B
}
SampleName="standard_pop3"
SampleSize=15

Manual Arlequin ver 3.5 Examples of input files 50

SampleData= {
G7 3 A D
C A
G8 12 A C
B B
}

[[Structure]]
StructureName="Two groups"
NbGroups=2
Group={
"standard_pop1"
}
Group={
"standard_pop2"
"standard_pop3"



Manual Arlequin ver 3.5 Arlequin int erface 51

6 ARLEQUI N I NTERFACE
The i nt erface of Arl equin ( si nce ver. 3.0) i s wri t t en i n C+ + and l ooks l i ke:

The graphi cal int erface i s made up of a seri es of t abbed di al og boxes, whose cont ent var y
dynami call y dependi ng on t he t ype of dat a cur r ent l y anal yzed.
6.1 Menus
6.1.1 Fi l e Menu

New pr oj ect . .. Prompt s t he Pr oj ect Wi zard di al og box
Open proj ect . .. Opens a di al og box t o l ocat e an
exi st i ng proj ect
Cl ose pr oj ect Cl oses t he curr ent pr oj ect .
Recent pr oj ect s Open a submenu wi t h t he l ast 10
mor e recent l y opened proj ect s
Load set t i ngs... Load previ ousl y saved comput at i on
set t i ngs
Save set t i ngs Save cur rent comput at ion set t i ngs
Save set t i ngs as . .. Save cur rent comput at ion set t i ngs
under a speci fi c name
Exi t Exi t Arl equin and cl ose curr ent pr oj ect


Manual Arlequin ver 3.5 Arlequin int erface 52


6.1.2 Vi ew Menu


Proj ect i nformat i on Open t ab di al og wi t h informat i on on curr ent
proj ect
Set t i ngs Open speci fi c t ab di al ogs t o act i ve some
comput at i ons and choose t hei r associ at ed
set t i ngs
Vi ew Proj ect Vi ew curr ent pr oj ect i n t ext edi t or
Vi ew Resul t s Vi ew comput at i on resul t i n defaul t web browser
Vi ew Log fi l e Vi ew l og fil e in t ext edi t or
Show but t on t ext Toggl e pr esence/ absence of t ext associ at ed t o
t ool bar but t ons
6.1.3 Opt i ons Menu


XML Out put Check t hi s menu i t em i f you want Arl equi n t o generat e
out put fil es i n xml format , all owi ng for mor e fl exi bili t y i n
t he format t i ng of t he out put , and t he i ncl usi on of
graphi cs generat ed by an R scri pt ( see sect i on 7.6) . I f
t hi s menu i s unchecked, convent i onal ht ml fil es are
generat ed, and graphs cannot be i ncorporat ed int o
out put fil es.
Append r esul t s I f checked, Add r esul t s of a new anal ysi s at t he end of
t he cur rent r esul t fil e. Ot herwi se, pr evi ous r esult s ar e

Manual Arlequin ver 3.5 Arlequin int erface 53

del et ed before addi ng t he new r esul t s.
Use associ at ed set t i ngs Check t hi s box i f you want Arl equin t o aut omat i call y l oad
t he set t i ngs associ at ed t o each pr oj ect . I f t hi s box i s
unchecked, t he same set t ings will be used for different
proj ect s ( see sect i on 6.3.2) .
Keep Amova null di st ribut i ons I f checked, t he null e di st ri but i on of vari ance compoent s
are wri t t en i n speci fi c files ( see sect i on 6.3.2) .
Prompt for handli ng unphased
mul t i- l ocus dat a
I f checked, you will have t he opt i on of est i mat ing t he
gamet i c phase of unphased genot ype dat a wi t h t he ELB
al gori t hm ( see sect i on 6.3.8.4.2.1) .
6.1.4 Hel p Menu


The menu t o get access t o t he Hel p Fil e Syst em
Arl equin PDF Hel p fil e Open Arl equi n hel p fil e. Act uall y i t t ri es t o open t he fi l e
"arl equi n.pdf". You t hus need t o have i nst all ed t he Adobe
Acr obat ext ensi ons i n your web browser.
Arl equin web si t e Li nk t o Arl equi n web si t e
ht t p: / / cmpg.uni be.ch/ soft ware/ arl equin3"
R proj ect Li nk t o t he R proj ect web page ht t p: / / www.r- pr oj ect . org/
R i s a l anguage and envi ronment for st at i st i cal comput i ng and
graphi cs, t hat needs t o be i nst all ed on your comput er t o
i ncl ude graphs i n out put fil es
About Arl equi n Some i nformat i on about Arl equi n, i t s aut hors, cont act addr ess
and t he Swi ss NSF grant s t hat support ed i t s devel opment .
6.2 Tool bar
Arl equins t ool bar cont ai ns i cons t hat ar e short cut s t o some commonl y used menu i t ems
as shown bel ow. Cl i cki ng on one of t hese i cons i s equi val ent t o act i vat ing t he
corr espondi ng menu it em.


Manual Arlequin ver 3.5 Arlequin int erface 54

The i ndi vi dual but t ons per form t he fol l owi ng act i ons:

Opens a di al og box t o choose an Arl equi n proj ect t o open
( see sect i on 6.3.1)

Call s t he t ext edi t orspeci fi ed i n t he Arlequin Configurat ion
t ab ( see sect i on6.3.3) and l oads t he cur r ent Arl equi n
proj ect , al l owing you t o edi t i t .

Vi ew resul t s in your web browser

Vi ew Arl equin l og fil e i n t he sel ect ed t ext edi t or

Cl oses t he curr ent pr oj ect

Call s R rout i nes t o i nt egrat e graphi cs of r esul t s int o t he
xml out put fil e

St art cur r ent l y sel ect ed comput at i ons

Pauses curr ent comput at i ons

St ops curr ent comput at ions
6.3 Tab di al ogs
Most of t he met hods i mpl ement ed i n Arl equi n can be comput ed i rr espect i ve of t he dat a
t ype. Nevert hel ess, t he t est i ng procedur e used for a gi ven t ask ( e.g. l inkage
di sequili bri um t est ) may depend on t he dat a t ype. The ai m of t hi s sect i on i s t o gi ve an
overvi ew of t he numerous opt i ons whi ch can be set up for t he di ffer ent ananl yses.

Manual Arlequin ver 3.5 Arlequin int erface 55

The i t ems t hat appear grayed i n Arl equi ns dial og boxes i ndi cat e t hat a gi ven t ask i s
i mpossi bl e in t he cur rent si t uat i on. For exampl e, i f you open a pr oj ect cont ai ning
hapl ot ypi c dat a, i t i s not possi bl e t o t est for Har dy- Wei nberg equili bri um or for
STANDARD dat a i t i s not possi bl e t o set up t he t ransver si on or t ransi t i on wei ght s, whi ch
can onl y be set up for DNA dat a.
Arl equins i nt erface usuall y prevent s t he user fr om sel ect i ng t asks i mpossi bl e t o per form,
or fr om set t i ng up paramet ers t hat ar e not t aken i nt o account i n t he anal yses.
When descri bi ng t he di ffer ent di al og boxes accessi bl e i n Arl equi n, we have somet i mes
used t he fol l owi ng symbol s t o speci fy whi ch t ypes of user i nput wer e expect ed:
[ f] : paramet er t o be set i n t he di al og box as a fl oat ing number.
[ i ] : paramet er t o be set i n t he di al og box as an i nt eger.
[ b] : check box ( t wo st at es: checked or unchecked) .
[ m] : mul t ipl e sel ect i on radi o but t ons.
[ l ] : Li st box, al l owing t he sel ect i on of an i t em i n a downward scr olli ng li st .
[ r] : read onl y set t i ng, cannot be changed by t he user.
6.3.1 Open pr oj ect

I n t hi s di al og box, you can l ocat e an exi st i ng Arlequi n proj ect on your hard di sk.
Al t ernat i vel y you can use t he Fil e | Recent Proj ect s menu t o rel oad one t he l ast 10

Manual Arlequin ver 3.5 Arlequin int erface 56

proj ect s on whi ch you worked on.

6.3.2 Handl i ng of unphased genot y pi c dat a


I f t he menu "Prompt for handling unphased mul t i- locus dat a" i s checked i n t he Opt ion
menu ( see sect i on 6.1.3) , t hi s di al og box wi ll appear when proj ect s cont ai ning genot ypi c
dat a wi t h unknown phase ar e l oaded. The t wo opt i ons appeari ng i n t he di al og box are

Manual Arlequin ver 3.5 Arlequin int erface 57

sel f- expl anat ory, and t he set t i ngs for t he ELB al gori t hm are descri bed i n t he Set t i ngs for
t he ELB al gori t hm and ELB al gori t hm sect i ons ( 6.3.8.4.2.1 and 8.1.3.2. 3) .
I f you choose t o est i mat e t he gamet i c phase wi t h t he ELB al gori t hm, t hen Arl equi n
proj ect fi l es ( as many as t he vari abl e No. of files t o generat e in t he dist ribut ion defi ned
above) ar e wri t t en i n a subdi rect or y of t he r esul t di rect or y call ed PhaseDist ribut ion. They
have t he name ELB_Est imat edPhase# < Sampl e number> .arp. Arl equi n al so out put s a fi l e
call ed ELB_Best _Phases.arp cont ai ni ng for each i ndi vi dual t he gamet i c phases est i mat ed
wi t h t he ELB al gori t hm, as well as bat ch fi l e ELB_PhaseDist ribut ion.arb l i st i ng all
afor ement i oned pr oj ect fil es.
The fi l e ELB_Best _Phases.arp can t hen be anal yzed as i f gamet i c phases were known for
t he di ffer ent sampl es.
Keep however i n mi nd t hat t he gamet i c phases are not necessari l y corr ect , and t hat
anal yses assumi ng t hat t he gamet i c phase i s unknown wi ll not t ake i nt o account possi bl e
gamet i c phase est i mat ion err ors.

Manual Arlequin ver 3.5 Arlequin int erface 58

6.3.3 Ar l equi n Conf i gur at i on

Di ffer ent opt i ons can be speci fi ed i n t hi s t ab di alog.
Use associ at ed set t i n gs: By checki ng t he Use associat ed set t ings checkbox, t he
set t i ngs and opt i ons l ast speci fi ed for your pr oj ect wi ll be used when openi ng a
proj ect fi l e. When cl osi ng a pr oj ect fi l e, Arl equi n aut omat i call y saves t he cur rent
cal cul at i on set t i ngs for t hat part i cul ar proj ect . Check t hi s box i f you want Arl equin t o
aut omat i call y l oad t he set t i ngs associ at ed t o each pr oj ect . I f t hi s box i s unchecked,
t he same set t i ngs will be used for di ffer ent pr oj ect s.
Append r esu l t s: I f t he opt i on Append Result s i s checked, t he resul t s of t he curr ent
comput at i ons ar e appended t o t hose of previ ous anal yses. Ot herwi se, onl y t he resul t s
of t he l ast anal ysi s are wri t t en i n t he r esul t fil e, and previ ous r esul t s ar e erased.
Keep AMOVA n ul l di st r i but i ons: I f t hi s opt i on i s checked, t he null dist ri but i ons of
2
a
,
2
b
,
2
c
, and
2
d
generat ed by an AMOVA anal ysi s are wri t t en i n fil es havi ng

Manual Arlequin ver 3.5 Arlequin int erface 59

t he same name as t he proj ect fi l e, but wi t h t he ext ensi ons .va, . vb, . vc, and . vd,
respect i vel y.
XML Out put : Thi s opt ion has t he same effect as act i vat i ng t he menu XML out put i n
t he Opt ions menu ( see sect i on 6.1.3) . Check t hi s box i f you want Arl equin t o
generat e out put fil es i n xml format , all owi ng for more fl exi bili t y i n t he format t i ng of
t he out put , and t he i nclusi on of graphi cs gener at ed by an R scri pt ( see sect i on 7.6.
I f t hi s menu i s unchecked, convent i onal ht ml files ar e gener at ed, and graphs cannot
be i ncorporat ed i nt o out put fil es.

Hel per pr ogr ams:
Tex t edi t or : press on t he Br owse but t on t o l ocat e t he t ext edi t or you want t o use
t o edi t or vi ew your pr oj ect fil e and t o vi ew t he Arl equin Log Fil e.
Rcmd: RCmd i s a consol e ver si on of t he R st at i st i cal package, whi ch needs t o be
i nst all ed on your comput er. The R envi ronment for st at i st i cal comput i ng and
graphi cs i s a GNU proj ect , whi ch can be downl oaded from t he R proj ect page
ht t p: / / www.r- proj ect . or g/ .
Aft er i nst all at i on of t he R package on your comput er, you need t o t ell Arl equin
wher e t he Rcmd execut abl e fil e i s l ocat ed ( usuall y i n { R versi on di rect ory} / bi n) . Thi s
program i s cal l ed when you pr ess on t he Rcmd but t on l ocat ed on Arl equin t ool bar.



Manual Arlequin ver 3.5 Arlequin int erface 60

6.3.4 Pr oj ect Wi zar d

I n order t o hel p you set t ing up qui ckl y a proj ect fil e, Arl equi n can creat e t he out li ne of a
proj ect fi l e for you. Thi s t ab di al og shoul d all ow you t o qui ckl y defi ne whi ch t ype of dat a
you have and some of i t s pr opert i es.
Br ow se but t on
I t al l ows you t o speci fy t he name and t he di rect ory l ocat i on of t he new proj ect fi l e.
Pressi ng on t hat but t ons opens a Fil e di al og box. The pr oj ect fi l e should have t he
ext ensi on .arp .
Cr eat e pr oj ect but t on
Press on t hat but t on once you have speci fi ed all ot her pr opert i es of t he proj ect .
Edi t pr oj ect but t on
Thi s but t on become act ive once you have cr eat ed an out li ne and all ows you t o
begi n edi t ing t he out li ne and fill i n some dat a.
Dat a t y pe

Manual Arlequin ver 3.5 Arlequin int erface 61

Speci fy whi ch t y pe of dat a you want t o anal yze ( DNA, RFLP, Mi cr osat , St andard,
or Fr equency) .
Speci fy i f t he dat a i s under gen ot y pi c or hapl ot y pi c form.
Speci fy i f t he gamet i c phase i s known ( for genot ypi c dat a onl y) .
Speci fy i f t her e are r ecessi v e al l el es ( for genot ypi c dat a onl y)
Cont r ol s
Speci fy t he number of popu l at i on sampl es defi ned i n t he proj ect
Choose a l ocus separ at or
Speci fy t he charact er codi ng for mi ssi ng dat a
Opt i on al sect i ons
Speci fy i f you want t o i ncl ude a gl obal l i st of h apl ot y pes
Speci fy i f you want t o i ncl ude a predefi ned di st ance mat r i x
Speci fy i f you want t o i ncl ude a gr oup st r u ct ur e
6.3.5 I mpor t dat a


Manual Arlequin ver 3.5 Arlequin int erface 62

Wi t h t hi s di al og box you can qui ckl y t ransl at e dat a i nt o several ot her fi le format s oft en us
i n popul at i on genet i cs anal yses. The curr ent l y support ed format s ar e:
Arl equin GenePop ver . 1.0 Phyli p ver. 3.5
Mega ver. 1.0 Bi osys ver.1.0 Wi n Amova ver. 1.55

The t ransl at i on procedure i s as foll ows:
1) Sel ect t he source fi l e wi t h t he upper l eft Browse but t on.
2) Sel ect t he format of t he source dat a fi l e, as wel l as t hat of t he t arget fil e.
3) A defaul t ext ensi on dependi ng on t he dat a format i s aut omat i call y given t o t he
t arget fi l e.
4) The fi l e conver si on i s l aunched by pressi ng on Translat e but t on.
5) I n some cases, you mi ght be asked for some addi t i onal i nformat i on, for i nst ance i f
i nput dat a i s spli t i nt o several i nput fil es ( li ke i n Wi nAmova) .
6) I f you have sel ect ed t he t ransl at i on of a dat a fil e i nt o t he Arl equin file format ,
you' l l have t he opt i on t o l oad t he newl y creat ed proj ect fi l e i nt o t he Arl equin Java
I nt erface.
6.3.6 Loaded Pr oj ect


Manual Arlequin ver 3.5 Arlequin int erface 63

Once a proj ect has been l oaded, t he Pr oj ect t ab di al og becomes act i ve. I t shows a bri ef
out li ne of t he proj ect i n an expl orabl e t r ee pane, and a few i nfor mat i on on t he dat a t ype.
The proj ect can be edi t ed by pr essi ng t he View Proj ect but t on on t he Tool bar, whi ch will
l aunch t he t ext edi t or curr ent l y speci fi ed i n t he Arlequin Configurat ion t ab. All t he
i nformat i on shown under t he proj ect profil e sect i on i s read onl y. I n order t o modi fy t hem,
you need t o edi t t he proj ect fi l e wi t h your t ext edi t or and rel oad t he pr oj ect wi t h t he
File | Recent proj ect s menu.
Fi l e n ame [ r] : The l ocat i on and t he name of t he curr ent pr oj ect .
Pr oj ect t i t l e[ r] : The t i t l e of t he pr oj ect as ent ered i n t he i nput fil e.
Pl oi dy [ r] : Speci fi es whet her i nput dat a consi st of di pl oid genot ypi c dat a or
hapl ot ypi c dat a. For genot ypi c dat a, t he di pl oi d informat i on of each genot ype i s
ent er ed on separat e l i nes i n t he i nput fil e.
Gamet i c ph ase [ r] : Speci fi es whet her t he gamet i c phase i s known or unknown
when t he i nput fil e i s made up of genot ypi c dat a. I f t he gamet i c phase i s known,
t hen t he t r eat ment of t he dat a wi ll be essent i ally si mil ar t o t hat of hapl ot ypi c
dat a.
Dat a t y pe [ r] : Dat a t ype speci fi ed i n t he input fil e.
Domi n an ce [ r] : Speci fies i f t he dat a consi st s of onl y co- domi nant dat a or i f
some recessi ve al l el es can occur.
Recessi v e al l el e [ r] : Speci fi es t he i dent i fi er of t he r ecessi ve al l el e.
Locus separ at or [ r] : The char act er used t o separat e al l eli c i nformat i on at
adj acent l oci .
Mi ssi ng dat a[ r] : The charact er used t o r epresent mi ssi ng dat a at any l ocus. By
defaul t , a quest i on mar k ( ?) i s used for unknown all el es.

Manual Arlequin ver 3.5 Arlequin int erface 64

6.3.7 Bat ch f i l es

The proj ect fi l es found i n t he sel ect ed bat ch fi l e appear l i st ed in t he l eft pane wi ndow.
Use associ at ed set t i n gs [ b] .: Use t hi s but t on i f you have pr epar ed set t ings fil es
associ at ed t o each proj ect .
Use i nt er f ace set t i ngs [ b] : Use t hi s but t on i f you want t o use t he same
predefi ned cal cul at i on set t i ngs for al l proj ect files.
Resu l t s t o summar i ze: Thi s opt i on all ows you t o col l ect a summar y of t he r esul t s
for each fi l e found i n t he bat ch l i st . These result s ar e wri t t en i n di fferent fil es,
havi ng t he ext ensi on * . sum. These summary fil es will be pl aced i nt o t he same
di rect or y as t he bat ch fi l e.



Manual Arlequin ver 3.5 Arlequin int erface 65

List of summary files cr eat ed by act ivat ing different checkboxes
Check box Summar y f i l e Descr i pt i on
Gene diver sit y gen_di v.sum Gene di ver si t y of each sampl e
Nucleot ide
composit ion
nucl _comp.sum Nucl eot i de composi t i on of each sampl e
Molecular diversit y mol d_di v.sum Mol ecul ar di versi t y i ndexes of each sampl e
Mismat ch dist ribut ion mi smat ch.sum Mi smat ch di st ri but i on for each sampl e
Thet a values t het a.sum Di ffer ent t het a val ues for each sampl e
Linkage disequilibrium l _d_pro.sum Si gni fi cance l evel of l inkage di sequili bri um for
each pai r of l oci
link_di s.sum Number of si gni fi cant l y li nked l oci per l ocus
Hardy Weinberg hw.sum Test of depart ur e from Hardy- Wei nberg
equili bri um
Taj imas t est t aj i ma.sum Taj i mas t est of sel ect i ve neut r ali t y
Fus Fs t est fu_fs.sum Fus F
S
t est of sel ect i ve neut rali t y
Ewens Wat t er son ewens.sum Ewens- Wat t erson t est s of sel ect i ve neut rali t y
Chakrabort ys t est chakra. sum Chakrabort ys t est of popul at i on amal gamat i on
Populat ion
comparisons
coanst _c.sum Mat ri x of Reynol ds genet i c di st ances ( i n linear
for m)
NM_val ue.sum Mat ri x of Nm val ues bet ween pai rs of
popul at i ons ( in li near form)
sl at ki n.sum Mat ri x of Sl at ki ns genet i c di st ance ( i n li near
for m)
t au_uneq.sum Mat ri x of di vergence t i mes bet ween
popul at i ons, t aki ng i nt o account unequal
popul at i on sizes ( i n li near form)
pai rdi ff.sum Mat ri x of mean number of pai rwi se di fferences
bet ween pai rs of sampl es ( i n li near form)
pai rdi st .sum Di ffer ent genet i c di st ances for each pai r of
popul at i on ( onl y cl earl y readabl e i f 2 sampl es
i n t he pr oj ect )
Allele fr equenci es all el e_freqs. sum Li st all el e fr equenci es for al l popul at i ons in
t urn. I t becomes di ffi cult t o r ead when mor e
t han a si ngl e popul at i on i s present i n t he
proj ect fi l e.
Det ect l oci under
sel ect ion
sel ect i onDet ect i on.sum Report s j ust t he fract i on of l oci t hat have
si gni fi cant FSTs or FCTs for di fferent
si gni fi cance l evel s.
Locus speci fi c F- st at i st i cs, het er ozygosi t i es,
and p- val ues can be found i n t he fil e
"fdi st 2_ObsOut .t xt " l ocat ed i n t he resul t
di rect or y of each arp file.



Manual Arlequin ver 3.5 Arlequin int erface 66

6.3.8 Cal cul at i on Set t i ngs

The Set t ings t ab i s di vi ded i nt o t wo zones:
On t he l eft , a t r ee st r u ct ur e al l ows t he user t o qui ckl y sel ect whi ch t ask t o per form. The
opt i ons for t hose t asks ( set t i n gs) wi ll appear on t he ri ght pane of t he t ab di al og.
I f you sel ect t he fi rst Arl equin set t ings node on t he t r ee, a l i st of t he di ffer ent t asks t hat
can be set up appears on t he ri ght pane. Cl i cki ng on t hese underl i ned blue li nks will l ead
you t o t he appropri at e set t i ngs panes.
I f a part i cul ar t ask has been sel ect ed, i t will be refl ect ed by a red dot on t he l eft si de of
t he t ask i n t he t r ee st ruct ure.
Set t i n gs managemen t
Three but t ons ar e al so shown on t he upper l eft of t he t ab di al og:
Reset : Reset al l set t i ngs t o defaul t val ues and uncheck al l t asks.
Load: Load a part i cul ar set of set t i ngs previ ousl y saved i nt o a set t i ngs fil e
( ext ensi on " . ar s") .
Sav e: Saves t he cur r ent set t i ngs i nt o a gi ven set t i ng fil e ( ext ensi on ". ars") .

Manual Arlequin ver 3.5 Arlequin int erface 67

6.3. 8.1 General Set t ings

Pr oj ect f i l e [ r] : The name of t he pr oj ect fi l e cont aini ng t he dat a t o be anal yzed
( i t usuall y has t he ". arp" ext ensi on) .
Resu l t f i l es: The ht ml fil e cont ai ni ng t he resul t s of t he anal yses gener at ed by
Arl equin ( i t has t he same name as t he proj ect fi l e, but t he ". ht m" ext ensi on) .
Pol y mor ph i sm cont r ol :
Al l ow ed mi ssi ng l ev el per si t e [ f] : Speci fy t he fract i on of mi ssi ng dat a
all owed for any l ocus t o be t aken i nt o account in t he anal yses. For i nst ance, a
l evel of 0.05 means t hat a l ocus wi t h mor e t han 5% of mi ssi ng dat a will not
be consi der ed i n any anal ysi s. Thi s opt i on i s especi all y useful when dealing
wi t h DNA dat a wher e di fferent i ndi vi dual s have been sequenced for sl ight l y
di fferent fragment s. Set t ing a l evel of zero wi ll for ce t he anal ysi s t o consi der
onl y t hose si t es t hat have been sequenced i n all i ndi vi dual s. Al t ernat i vel y,
choosi ng a l evel of one means t hat al l si t es wi ll be consi der ed i n t he anal yses,

Manual Arlequin ver 3.5 Arlequin int erface 68

even i f t hey have not been sequenced i n any i ndi vi dual ( not a very smart
choi ce, however) .
Tr an sv er si on w ei ght [ f] : The wei ght gi ven t o t ransver si ons when compari ng
DNA sequences.
Tr an si t i on w ei ght [ f] : The wei ght gi ven t o t ransi t i ons when compari ng DNA
sequences.
Del et i on w ei ght [ f] : The wei ght gi ven t o del et i ons when compari ng DNA or
RFLP sequences.
Hapl ot y pe def i n i t i on
Use or i gi n al def i ni t i on [ m] : Hapl ot ypes are i dent i fi ed accordi ng t o t hei r
ori gi nal i dent i fi er, wi t hout consi deri ng t he fact t hat t hei r mol ecul ar defi nit i on
coul d be i dent i cal .
I nf er f r om di st ance mat r i x [ m]
Si mil ar hapl ot ypes wi ll be i dent i fi ed by comput i ng a di st ance mat ri x based
on t he set t i ngs chosen above. When t h i s opt i on i s act i v at ed, a sear ch
f or sh ar ed hapl ot y pes i s aut omat i cal l y per f or med at t he begi n ni ng of
each r un, and new hapl ot y pes def i n i t i ons and f r equenci es ar e
comput ed f or each popul at i on.

Manual Arlequin ver 3.5 Arlequin int erface 69

6.3. 8.2 Diversit y indices

St andar d di v er si t y i n di ces [ b] : Comput e several common i ndi ces of di versi t y,
li ke t he number of all el es, t he number of segr egat i ng l oci , t he het er ozygosi t y
l evel , et c. ( see sect i on 8.1.1) .
Out put sampl e al l el e f r equenci es f or al l l oci [ b] : Out put all el e frequenci es
at all l oci for al l popul at ions. Cr eat es a separat e fil e for each l ocus i n t he
resul t di rect ory. Fil e names ar e "Al l FreqLocus_XXX.t xt ", where XXX i s t he
l ocus number. On each row, t he fr equenci es of an all el e ar e l i st ed for al l
sampl ed popul at i ons. The names of t he popul at i ons are l i st ed i n a separat e
fil e, cal l ed " PopNames.t xt ".
Mol ecu l ar di v er si t y i n di ces [ b] : Check box for comput i ng several i ndices of
di versi t y at t he mol ecul ar l evel .
Comput e mi n i mum spann i n g t r ee among h apl ot y pes [ b] : Comput es a
mi ni mum spanni ng t ree and a mi ni mum spanning net work among t he

Manual Arlequin ver 3.5 Arlequin int erface 70

hapl ot ypes found i n each popul at i on sampl e ( see sect i on 8.1.2.9) . Thi s opt i on
i s onl y valid for hapl ot ypi c dat a.
Mol ecu l ar di st ance [ l ] : Choose t he t ype of di st ance used when compari ng
hapl ot ypes ( see sect i on 8.1.2 and bel ow) .
o Gamma a v al ue [ f] : Set t he val ue for t he shape paramet er of t he
gamma funct i on, when sel ect i ng a di st ance all owi ng for unequal mut at ion
rat es among si t es. Thi s opt i on i s onl y vali d for some di st ances comput ed
bet ween DNA sequences. Not e t hat a val ue of zer o deact i vat es her e t he
Gamma corr ect i on of t hese di st ances, wher eas in real i t y, a val ue of
i nfi nit y woul d deact i vat e t he Gamma corr ect i on procedur e. Thi s opt i on is
onl y vali d for DNA dat a.
Pr i nt di st ance mat r i x bet w een h apl ot y pes [ b] : I f checked, t he i nt er -
hapl ot ypi c di st ance mat ri x used t o eval uat e t he mol ecul ar di versi t y i s pri nt ed
i n t he r esul t fil e.
Thet a( Hom) [ b] : An est i mat i on of obt ai ned fr om t he observed
homozygosi t y H ( see sect i on 8.1.2.3.1) .
Thet a( S) [ b] : An est i mat i on of obt ai ned from t he observed number of
segregat i ng si t e S ( see sect i on 8.1.2.3.2) .
Thet a( k ) [ b] : An est i mat i on of obt ai ned from t he observed number of
all el es k ( see sect i on 8. 1.2.3.3) .
Thet a( ) [ b] : An est i mat i on of obt ai ned from t he mean number of
pai rwi se di ffer ences ( see sect i on 8.1.2.3.4) .
6.3. 8.3 Mismat ch dist ribut ion
Comput e t he di st ri but i on of t he observed number of di ffer ences bet ween pai rs of
hapl ot ypes i n t he sample ( see sect i on 8.1.2.4) . I t al so est i mat es paramet ers of a sudden
demographi c ( or spat i al) expansi on usi ng a general i zed l east - square approach, as
descri bed i n Schnei der and Excoffi er ( 1999) ( see sect i on 8.1.2.4) .

Manual Arlequin ver 3.5 Arlequin int erface 71


Est i mat e par amet er s of demogr aphi c ex pansi on [ b] : The paramet er s of an
i nst ant aneous demographi c expansi on are est i mat ed from t he mi smat ch
di st ri but i on ( see sect i on 8.1.2.4) usi ng a generali zed l east - square appr oach, as
descri bed i n Schnei der and Excoffi er ( 1999) ( see sect i on 8.1.2.4.1) .
Est i mat e par amet er s of spat i al ex pansi on [ b] : Est i mat e t he speci fi c
paramet er s of spat i al expansi on, fol l owi ng Excoffi er ( 2004) . ( see sect i on
8.1.2.4.2) .
Mol ecu l ar di st ance [ l ] : Her e we onl y all ow one genet i c di st ance: t he mer e
number of obser ved di ffer ences bet ween hapl ot ypes.
Number of boot st r ap r epl i cat es [ l ] : The number of coal escent si mul at i ons
performed usi ng t he est i mat ed paramet ers of t he demographi c or spat i al
expansi on. These paramet ers will be r e- est i mat ed for each si mul at i on in order t o
obt ai n t hei r empi ri cal confi dence i nt erval s, and t he empi ri cal di st ri but i on of t he
out put st at i st i cs such as t he sum of squar ed devi at i ons bet ween t he observed
and t he expect ed mi smat ch, t he raggedness i ndex, or per cent il e val ues for each

Manual Arlequin ver 3.5 Arlequin int erface 72

poi nt of t he expect ed mi smat ch ( see sect i on 8. 1.2.4) . Hundreds t o t housands of
si mul at i ons are necessary t o obt ai n meani ngful est i mat es.
6.3. 8.4 Haplot ype inference
Dependi ng on t he dat a t ype, di ffer ent met hods are used t o est i mat e t he hapl ot e
fr equenci es.
6.3.8.4. 1 Hapl ot y pi c dat a, or genot y pi c ( di pl oi d) dat a w i t h k n ow n gamet i c ph ase

Sear ch f or shar ed h apl ot y pes [ b] : Look for hapl ot ypes t hat ar e effect i vel y
si mil ar aft er comput i ng pai rwi se genet i c di st ances accordi ng t o t he di st ance
cal cul at i on set t i ngs in t he General Set t i ngs sect i on. For each pai r of popul at i ons,
t he shar ed hapl ot ypes will be pri nt ed out . Then will foll ow a t abl e t hat cont ai ns,
for every gr oup of i dent ifi ed hapl ot ypes, i t s absol ut e and r el at i ve fr equency i n
each popul at i on. Thi s t ask i s onl y possi bl e for hapl ot ypi c dat a or genot ypi c dat a
wi t h known gamet i c phase.
Hapl ot y pe def i n i t i on:
Use or i gi n al def i ni t i on [ m] : Hapl ot ypes are i dent i fi ed accordi ng t o t hei r
ori gi nal i dent i fi er, wi t hout consi deri ng t he fact t hat t hei r mol ecul ar defi nit i on
coul d be i dent i cal .

Manual Arlequin ver 3.5 Arlequin int erface 73

I nf er f r om di st ance mat r i x [ m] : Si mil ar hapl ot ypes wi ll be i dent i fi ed by
comput i ng a mol ecul ar di st ance mat ri x bet ween hapl ot ypes.
Hapl ot y pe f r equ ency est i mat i on :
Est i mat e h apl ot y pe f r equ enci es by mer e count i ng [ b] : Est i mat e t he
maxi mum- li kelihood hapl ot ype fr equenci es fr om t he observed dat a using a
mer e gene count i ng procedur e.
Est i mat e al l el e f r equ enci es at al l l oci : Est i mat e al l el e fr equenci es at all l oci
separat el y.
6.3.8.4. 2 Gen ot y pi c dat a w i t h unk now n gamet i c ph ase
When gamet i c phase i s unknown, t wo met hods can be used t o i nfer hapl ot ypes: The
( maxi mum- li keli hood) EM al gori t hm or or t he ( Bayesi an) ELB al gori t hm.


Manual Arlequin ver 3.5 Arlequin int erface 74

6.3.8.4.2.1 Set t ings for t he ELB algorit hm
The ELB al gori t hm has been descri bed r ecent l y i n Excoffi er et .al ( 2003) .

Use ELB al gor i t hm t o est i mat e gamet i c ph ase [ b] : Check t hi s box i f you want t o
est i mat e t he gamet i c phase of mul t i - l ocu genot ypes wi t h t he ELB al gori t hm. See
met hodol ogi cal sect i on on ELB al gori t hm ( 8.1.3.2.3) for a descri pt i on of t he
al gori t hm.
Di r i chl et pr i or al ph a v al ue [ f] : Val ue of t he al pha paramet er of t he pr i or
di ri chl et di st ribut i on of hapl ot ype fr equenci es. Recommended val ue: a small
val ue li ke 0.01 for al l dat a t ypes has been found t o wor k wel l ( Excoffi er et al .
2003) . ( see sect i on 8.1.3.2.3 det ail s)
Epsi l on v al u e [ f] : Val ue of t he paramet er cont rol ling how much hapl ot ypes
di fferi ng by a si ngl e mut at i on from pot ent i all y present hapl ot ypes ar e wei ght ed.
Recommended val ues: 0.1 for mi cr osat el lit e dat a, and 0.01 for ot her dat a t ypes.
( see sect i on 8.1.3.2.3 det ail s)
Het er ozy got e si t e i nf l uence zon e [ i ] : Defi nes t he number of si t es adj acent t o
het er ozygot e si t es t hat need t o be t aken i nt o account when comput i ng hapl ot ype
fr equenci es i n t he Gi bbs chai n. A val ue of zer o i mpli es t hat gamet i c phase will be

Manual Arlequin ver 3.5 Arlequin int erface 75

est i mat ed onl y on t he basi s of het erozygot e si t es. A negat i ve val ue will i ndi cat e
t hat all si t es ( homozygot es and het erozygot es will be used) . Thi s paramet er i s
most l y useful for i nfer ri ng gamet i c phase of DNA sequences wher e t her e i s onl y
a few het er ozygot e si t es among l ong st ret ches of homozygous si t es. ( see
sect i on 8.1.3.2.3 det ail s)
Gamma v al u e [ f] : Thi s paramet er pr event s adapt i ve wi ndows wher e gamet i c
phase i s est i mat ed t o gr ow t oo much. I t can be set t o zer o for mi crosat elli t e
dat a, and t o a smal l value for ot her dat a set s, l ike 0.01. ( see sect i on 8. 1.3.2.3
det ail s)
Sampl i ng i nt er v al [ i ] : I t i s t he number of st eps i n t he Gi bbs chai n bet ween t wo
consecut i ve sampl es of gamet i c phases.
Number of sampl es [ i] : I t repr esent s t he number of sampl es of gamet i c
phases one want s t o dr aw i n t he Gi bbs chai n t o get t he post eri or di st ri but i on of
gamet i c phases ( and hapl ot ype fr equenci es) for each i ndi vi dual . ( see sect i on
8.1.3.2.3 det ai l s)
Bur ni n st eps [ i ] : I t i s t he number of st eps t o per form i n t he Gi bbs chai n befor e
sampli ng gamet i c phases. The t ot al number of st eps i n t he chai n will t hus be:
Burni n st eps + ( Number of sampl es Sampling i nt erval ) . ( see sect i on 8.1.3.2.3
det ail s)
Recombi n at i on st eps [ i ] : I t i s t he proport i on of st eps i n t he Gi bbs chai n
consi st i ng i n i mpl ement ing a pseudo- r ecombi nat i on phase updat e i nst ead of a
si mpl e phase swi t ch ( corr espondi ng t o a doubl e r ecombi nat i on around a
het er ozygous si t e) ( see sect i on 8.1.3.2.3 det ail s) .
Out put ph ase di st r i bu t i on f i l es [ b] : Cont r ol s i f one want s t o out put Arl equin
fil es wi t h t he gamet i c phase of each sampl e i n t he Gi bbs chai n. The arl equi n fil es
( as many as t he vari able Number of sampl es defi ned above) are wri t t en i n a
subdi rect or y of t he r esul t di rect or y call ed PhaseDist ribut ion. They have t he name
ELB_Est i mat edPhase# < Sampl e number> .arp. Arl equin al so out put s a fi le cal l ed
ELB_Best _Phases.arp cont ai ning for each i ndi vidual t he gamet i c phases
est i mat ed wi t h t he ELB al gori t hm, as well as bat ch fil e ELB_PhaseDist ri but ion.arb
li st i ng all aforement i oned proj ect fi l es.

Manual Arlequin ver 3.5 Arlequin int erface 76

6.3.8.4.2.2 Set t ings for t he EM algorit hm

Use EM al gor i t hm t o est i mat e ML hapl ot y pe f r equ enci es [ b] : We est i mat e t he
maxi mum- li kelihood ( ML) hapl ot ype frequenci es from t he obser ved dat a usi ng an
Expect at i on- Maxi mi zat i on ( EM) al gori t hm for mul t i- l ocus genot ypi c dat a when t he
gamet i c phase i s not known, or when recessi ve all el es ar e present ( see sect i on
8.1.3.2) .
Per f or m EM al gor i t h m at t he:
Hapl ot y pe l ev el [ m] : Est i mat e hapl ot ype frequenci es for hapl ot ypes defi ned by
all el es at al l l oci .
Locus l ev el [ m] : Est i mat e al l el e fr equenci es for each l ocus.
Hapl ot y pe and l ocus l ev el s [ m] : The t wo pr evi ous opt i ons ar e perfor med one
aft er t he ot her.
Epsi l on v al u e [ l ] : Threshol d for st oppi ng t he EM al gori t hm. Aft er each i t erat i on,
Arl equin checks i f t he curr ent hapl ot ype fr equenci es ar e di fferent fr om t hose at t he
previ ous i t erat i on. I f t he sum of di ffer ence i s small er t han epsi l on, t he al gori t hm
st ops.

Manual Arlequin ver 3.5 Arlequin int erface 77

Si gn i f i cant di gi t s f or out put [ l ] : Preci si on requi red for out put of hapl ot ype
fr equenci es. Hapl ot ypes havi ng a zero fr equency gi ven t he requi red preci sin are not
out put i n t he r esul t fil e.
Number of st ar t i ng poi nt s f or EM al gor i t h m: [ i ] : Set t he number of random
i nit i al condi t i ons from whi ch t he EM al gori t hm i s st art ed t o r epeat edl y est i mat e
hapl ot ype fr equenci es. The hapl ot ype fr equenci es gl oball y maxi mi zi ng t he li keli hood
of t he sampl e wi ll be kept event uall y. Fi gures of 50 or mor e are usuall y i n order .
Max i mu m n o. of i t er at i ons [ i ] : Set t he maxi mum number of i t erat i ons all owed i n
t he EM al gori t hm. The i t erat i ve process will have at most t hi s number of i t erat i ons,
but may st op before i f convergence has been r eached. Her e, convergence i s
reached when t he sum of t he di ffer ences bet ween hapl ot ypes fr equenci es bet ween
t wo successi ve i t erat i ons i s smal l er t han t he epsil on val ue defi ned above.
Use Zi pper v er si on of EM [ b] : Use t he zi pper versi on of t he EM al gori t hm
consi st i ng i n buildi ng hapl ot ypes pr ogr essi vel y by addi ng one l ocus at a t i me ( see
sect i on 8.1.3.2.2) .
No. of l oci or der s [ l ] : Defi nes how many random l oci order s shoul d be used i n
t he zi pper versi on of t he EM al gori t hm. Resul t s about hapl ot ype fr equenci es
obt ai ned for t he l ocus order l eadi ng t o t he best li kelihood i s shown i n t he resul t
fil e.
Recessi v e dat a [ b] : Speci fy whet her a r ecessi ve al l el e i s present . Thi s opt i on
appli es t o al l l oci . The code for t he r ecessi ve al lel e can be speci fi ed i n t he pr oj ect
fil e ( see sect i on 4.2.1) .
Est i mat e st andar d dev i at i on t h r ough boot st r ap [ b] : Uses a boot st rap
approach t o est i mat e t he st andard devi at i on of hapl ot ype fr equenci es.
No. of boot st r ap t o per f or m [ i ] : Set t he number of paramet ri c boot st rap
repl i cat es of t he EM est imat i on process on random sampl es generat ed from a
fi ct i ve popul at i on havi ng hapl ot ype fr equenci es equal t o pr evi ousl y est i mat ed ML
fr equenci es. Thi s procedure i s used t o generat e t he st andard devi at i on of
hapl ot ype fr equenci es. When set t o zero, t he st andard devi at i ons are not
est i mat ed.
No. of st ar t i ng poi nt s f or s. d. est i mat i on [ i ] : Set t he number of i ni t ial
condi t i ons for t he boot st rap pr ocedur e. I t may be smal l er t han t he number of
i nit i al condi t i ons set when est i mat i ng t he hapl ot ype fr equenci es, because t he
boot st rap repl i cat es ar e qui t e t i me- consumi ng. Set t i ng t hi s number t o small
val ues i s conser vat i ve, i n t he sense t hat i t usuall y i nfl at es t he st andard
devi at i ons.


Manual Arlequin ver 3.5 Arlequin int erface 78

6.3. 8.5 Linkage disequilibrium
6.3.8.5. 1 Li nk age di sequi l i br i u m bet w een pai r s of l oci
6.3.8.5.1.1 Gamet i c phase known

Li nk age di sequi l i br i u m bet w een al l pai r s of l oci [ b] : Test for t he presence of
si gni fi cant associ at i on bet ween pai rs of l oci , based on an exact t est of l inkage
di sequili bri um. Thi s t est can be done wi t h all dat a t ypes except FREQUENCY dat a
t ype. The number of l oci can be arbi t rary, but i f t her e are l ess t han t wo pol ymorphi c
l oci , t here i s no poi nt performi ng t hi s t est . The t est procedure i s anal ogous t o
Fi shers exact t est on a t wo- by- t wo cont i ngency t abl e but ext ended t o a cont i ngency
t abl e of arbi t rar y si ze ( see sect i on 8.1.4.1) .
No. of st eps i n Mar k ov ch ai n [ i ] : The maxi mum number of al t ernat i ve t abl es t o
expl or e. Fi gures of 100, 000 or mor e ar e i n order. Larger val ues wi ll l ead t o a
bet t er pr eci si on of t he P- val ue as wel l as i t s est imat ed st andard devi at i on.
No. of dememor i zat i on st eps [ i ] : The number of st eps t o per form befor e
begi nning t o compar e t he al t ernat i ve t abl e probabilit i es t o t hat of t he observed
t abl e. I t corr esponds t o a burni n. A few t housands st eps are necessary t o r each a

Manual Arlequin ver 3.5 Arlequin int erface 79

random st art i ng poi nt cor respondi ng t o a t abl e i ndependent from t he observed
t abl e.
LD coef f i ci ent s bet w een pai r s of al l el es at di f f er ent l oci
Comput e D, D an d r
2
coef f i ci ent s [ b] ( bet ween all pai rs of al l el es at di fferent
l oci ) :
See sect i on 8.1.4.3
1) D: The cl assi cal linkage di sequili bri um coeffi ci ent measuri ng devi at i on from
random associ at i on bet ween all el es at di ffer ent l oci ( Lewont i n and Koj ima,
1960) expressed as
j
D p p p
ij i
= .
2) D: The l i nkage di sequilibrium coeffi ci ent D st andardi zed by t he maxi mum
val ue i t can t ake (
max
D
) , gi ven t he al l el e frequenci es ( Lewont i n 1964) .
3) r
2
: I t i s anot her way t o st andardi se t he si mple measur e of li nkage disequili brium
D as
2
2
(1 ) (1 )
i i j j
D
r
p p p p
=

.
o Gen er at e hi st ogr am and t abl e [ b] : Generat es a hi st ogram of t he number
of l oci wi t h whi ch each locus i s i n di sequili bri um, and an s by s t abl e ( s
bei ng t he number of polymorphi c l oci) summarizing t he si gni fi cant
associ at i ons bet ween pai rs of l oci . Thi s t abl e i s generat ed for di fferent l evel s
of pol ymorphi sm, cont r oll ed by t he val ue y: a l ocus i s decl ared pol ymorphi c
i f t here are at l east 2 al lel es wi t h y copi es i n t he sampl e ( Sl at ki n, 1994a) .
Thi s i s done because t he exact t est i s mor e power ful at det ect i ng depart ure
from equi li bri um for hi gher val ues of y ( Sl at ki n 1994a) . The r esul t s ar e
out put i n a fil e call ed ld_dis.xl .
Si gn i f i cance l ev el [ f] : The l evel at whi ch t he t est of l i nkage
di sequili bri um i s consi der ed si gni fi cant for t he out put t abl e
6.3.8.5.1.2 Gamet i c phase unknown
When t he gamet i c phase i s not known, we use a di ffer ent pr ocedure for t est i ng t he
si gni fi cance of t he associ at i on bet ween pai rs of loci ( see sect i on 8.1.4.2) . I t i s based on
a li keli hood rat i o t est , wher e t he l i keli hood of t he sampl e eval uat ed under t he
hypot hesi s of no associ at i on bet ween l oci ( li nkage equili bri um) i s compared t o t he
li kelihood of t he sampl e when associ at i on i s all owed ( see Sl at ki n and Excoffi er, 1996) .
The si gni fi cance of t he observed li keli hood rat i o i s found by comput i ng t he null
di st ri but i on of t hi s rat i o under t he hypot hesi s of li nkage equili bri um, using a
permut at i on procedure.

Manual Arlequin ver 3.5 Arlequin int erface 80


Li nk age di sequi l i br i u m bet w een al l pai r s of l oci [ b] : perform t he l ikel ihood- rat i o
t est ( see sect i on 8.1.4. 2) .
No. of per mut at i ons [ i ] : Number of random permut ed sampl es t o generat e.
Fi gures of several t housands ar e i n order , and 16,000 permut at i ons guarant ee t o
have l ess t han 1% di fference wi t h t he exact probabilit y i n 99% of t he cases ( Guo
and Thomson, 1992) . A st andard er ror for t he est i mat ed P- val ue i s est imat ed
usi ng a syst em of bat ches ( Guo and Thomson, 1992) .
No. of i n i t i al con di t i ons f or EM [ i ] : Set s t he number of random i ni t i al condi t i ons
from whi ch t he EM i s st art ed t o repeat edl y est i mat e t he sampl e li keli hood. The
hapl ot ype fr equenci es gl oball y maxi mi zing t he sampl e li keli hood will be event uall y
kept . Fi gures of 3 or more ar e i n order .
Gen er at e hi st ogr am and t abl e [ b] : Generat es an hi st ogram of t he number of
l oci wi t h whi ch each l ocus i s i n di sequili bri um, and an s by s t abl e ( s bei ng t he
number of pol ymorphi c l oci ) summarizi ng t he signi fi cant associ at i ons bet ween
pai rs of l oci . Thi s t abl e is generat ed for di ffer ent l evel s of pol ymorphi sm, cont rol l ed
by t he val ue y: a l ocus i s decl ared pol ymorphi c if t her e ar e at l east 2 allel es wi t h y

Manual Arlequin ver 3.5 Arlequin int erface 81

copi es i n t he sampl e ( Sl at ki n, 1994a) . Thi s i s done because t he exact t est i s mor e
powerful at det ect i ng depart ure fr om equili brium for hi gher val ues of y ( see
Sl at ki n 1994a) . The r esult s are out put i n a fil e call ed ld_dis.xl .
Si gn i f i cance l ev el [ f] : The l evel at whi ch t he t est of l i nkage di sequili bri um i s
consi der ed si gni fi cant for t he out put t abl e.
6.3.8.5. 2 Har dy - Wei nber g equ i l i br i um

Per f or m ex act t est of Har dy - Wei nber g equ i l i br i um [ b] : Test of t he hypot hesi s
t hat t he obser ved di pl oid genot ypes ar e t he pr oduct of a random uni on of gamet es.
Thi s t est i s onl y possi ble for genot ypi c dat a. Separat e t est s ar e car ri ed out at each
l ocus.
Thi s t est i s anal ogous t o Fi shers exact t est on a t wo- by- t wo cont i ngency t abl e but
ext ended t o a cont i ngency t abl e of arbi t rar y si ze ( see sect i on 8.1.5) . I f t he gamet i c
phase i s unknown t he t est i s onl y possi bl e l ocus by l ocus. For dat a wi t h known
gamet i c phase, i t i s al so possi bl e t o t est t he associ at i on at t he hapl ot ypic l evel
wi t hin i ndi vi dual s.

Manual Arlequin ver 3.5 Arlequin int erface 82

No. of st eps i n Mar k ov ch ai n [ i ] : The maxi mum number of al t ernat i ve
t abl es t o expl ore. Fi gures of 100,000 or mor e ar e i n order.
No. of dememor i sat i on st eps [ i ] : The number of st eps t o per form befor e
begi nning t o compar e t he al t ernat i ve t abl e probabilit i es t o t hat of t he observed
t abl e. A few t housands st eps ar e necessary t o r each a random st art i ng poi nt
corr espondi ng t o a t abl e i ndependent fr om t he observed t abl e.
HWE t est t ype
o Locus by l ocus [ m] : Per form separat e HWE t est for each l ocus.
o Whol e h apl ot y pe [ m] : Perform a HWE t est at t he hapl ot ype l evel ( i f
gamet i c phase i s avai l abl e) .
o Locus by l ocus and w hol e h apl ot y pe [ m] : Per form bot h ki nds of t est s
( i f gamet i c phase i s avail abl e) .
6.3. 8.6 Neut ralit y t est s

Test s of sel ect i ve neut r ali t y, based ei t her on t he i nfini t e- all el e model or on t he i nfi ni t e-
si t e model ( see sect i on 8.1.6) .

Manual Arlequin ver 3.5 Arlequin int erface 83

I nf i ni t e al l el e model
Ew en s- Wat t er son neut r al i t y t est [ b] : Per forms t est s of sel ect i ve neut ral it y
based on Ewens sampl ing t heor y i n a popul at i on at equili brium ( Ewens 1972) .
These t est s ar e curr ent l y li mi t ed t o sampl e si zes of 2000 genes or l ess and 1000
di fferent al l el es ( hapl ot ypes) or l ess.
Ewens- Wat t erson homozygosi t y t est : Thi s t est , devi sed by Wat t erson ( 1978,
1986) , i s based on Ewens sampl i ng t heory, but uses as a st at i st i c t he
quant i t y F equal t o t he sum of squar ed al l el e frequenci es, equi val ent t o t he
sampl e homozygosi t y i n di pl oi ds ( see sect i on 8. 1.6.1) .
Exact t est based on Ewens sampl ing t heory: I n t hi s t est , devi sed by Sl at ki n
( 1994b, 1996) , t he probabilit y of t he observed sampl e i s compar ed t o t hat of
a random neut ral sampl e wi t h same number of all el es and i dent i cal size. The
probabi lit y of t he sampl e sel ect i ve neut r ali t y i s obt ai ned as t he proport i on of
random sampl es, whi ch are l ess or equal l y probabl e t han t he observed
sampl e.
No. of si mul at ed sampl es [ i ] : Number of random sampl es t o be generat ed
for t he t wo neut ral it y t est s ment i oned above. Val ues of several t housands ar e
i n order , and 16,000 permut at i ons guarant ee t o have l ess t han 1% di fference
wi t h t he exact probabi lit y i n 99% of t he cases ( see Guo and Thomson 1992) .
Chak r abor t y s t est of popu l at i on amal gamat i on [ b] : A t est of sel ect i ve
neut rali t y and popul at i on homogenei t y and equili bri um ( Chakrabort y, 1990) . Thi s
t est can be used when sampl e het er ogenei t y i s suspect ed. I t uses t he observed
homozygosi t y t o est i mat e t he popul at i on mut at ion paramet er
Hom
. The est i mat ed
val ue of t hi s paramet er i s t hen used t o comput e t he pr obabi lit y of obser vi ng k
all el es or mor e i n a neut ral sampl e drawn from a st at i onary popul at i on. Thi s t est i s
based on Chakrabort ys observat i on t hat t he observed homozygosi t y i s not very
sensi t i ve t o popul at i on amal gamat i on or sampl e het er ogenei t y, whereas t he
number of obser ved ( l ow fr equency) al l el es i s mor e affect ed by t hi s phenomenon.
I nf i ni t e si t e model
Taj i ma s D [ b] : Thi s t est descri bed by Taj i ma ( 1989a, 1989b, 1993) compares t wo
est i mat ors of t he populat i on paramet er , one bei ng based on t he number of
segregat i ng si t es i n t he sampl e, and t he ot her bei ng based on t he mean number of
pai rwi se di ffer ences bet ween hapl ot ypes. Under t he i nfini t e- si t e model , bot h
est i mat ors shoul d est i mat e t he same quant i t y, but di fferences can ari se under
sel ect i on, popul at i on non- st at i onari t y, or het er ogenei t y of mut at i on rat es among
si t es ( see sect i on 8.1.6. 4) .

Manual Arlequin ver 3.5 Arlequin int erface 84

Fu s F
S
[ b] : Thi s t est descri bed by Fu ( 1997) i s based on t he pr obabili t y of
observi ng k or mor e al l el es i n a sampl e of a gi ven si ze, condi t i oned on t he observed
average number of pai rwi se di ffer ences. The dist ri but i on of t he st at i st ic i s obt ai ned
by si mul at i ng sampl es accordi ng t o a gi ven val ue t aken as t he average number of
pai rwi se di ffer ences. Thi s t est has been shown t o be especi all y sensi t i ve t o
depart ur e fr om popul at ion equi libri um as in case of a popul at i on expansi on ( see
sect i on 8.1.6.4) .
Hapl ot y pe def i n i t i on
The way hapl ot ypes ar e defi ned i s i mport ant here si nce some t est s ar e based on
t he number of all el es i n t he sampl es, and t her efor e i t i s bet t er t o re- eval uat e t hi s
quant i t y before doi ng t hese t est s ( Chakrabort y' s t est , Ewens- Wat t erson, and Fu' s
Fs) .
Use or i gi n al def i ni t i on [ m] : Hapl ot ypes are i dent i fi ed accordi ng t o t hei r
ori gi nal i dent i fi er, wi t hout consi deri ng t he fact t hat t hei r mol ecul ar defi nit i on
coul d be i dent i cal .
I nf er f r om di st ance mat r i x [ m]
Si mil ar hapl ot ypes wi ll be i dent i fi ed by comput i ng a di st ance mat ri x based on t he
set t i ngs chosen above. When t hi s opt i on i s act i v at ed, a sear ch f or shar ed
hapl ot y pes i s aut omat i cal l y per f or med at t he begi nn i ng of each r un, and
new h apl ot y pes def i n i t i ons an d f r equenci es ar e comput ed f or each
popu l at i on.

Manual Arlequin ver 3.5 Arlequin int erface 85

6.3. 8.7 Genet ic st ruct ure
6.3.8.7. 1 AMOVA
6.3.8.7.1.1 AMOVA wit h haplot ypic dat a

St andar d AMOVA [ b] : Anal ysi s of MOl ecul ar VAri ance framewor k and
comput at i on of a Mi ni mum Spanni ng Net work among hapl ot ypes. Est i mat e genet i c
st ruct ur e i ndi ces usi ng informat i on on t he al l eli c cont ent of hapl ot ypes, as well as
t hei r frequenci es ( Excoffi er et al . 1992) . The i nformat i on on t he di ffer ences i n
all eli c cont ent bet ween hapl ot ypes i s ent ered as a mat ri x of Eucl i dean squared
di st ances. The si gni fi cance of t he covari ance component s associ at ed wi t h t he
di fferent possi bl e l evel s of genet i c st ruct ure ( wi t hin indi vi dual s, wi t hin popul at i ons,
wi t hin groups of popul at i ons, among gr oups) i s t est ed usi ng non- paramet ri c
permut at i on procedures ( Excoffi er et al . 1992) . The t ype of permut at i ons i s
di fferent for each covari ance component ( see sect i on 8.2) .
The mi ni mum spanni ng t ree and net work i s comput ed among al l hapl ot ypes
defi ned i n t he sampl es incl uded i n t he genet i c st ruct ure t o t est ( see sect i on 8.2.2) .
The number of hi erarchi cal l evel s of t he vari ance anal ysi s and t he ki nd of
permut at i ons t hat ar e done depend on t he ki nd of dat a, t he genet i c st ruct ure t hat

Manual Arlequin ver 3.5 Arlequin int erface 86

i s t est ed, and t he opt i ons t he user mi ght choose. Al l det ail s will be gi ven i n sect i on
8.2.
Locus by l ocus AMOVA [ b] : A separat e AMOVA can be performed for each l ocus
separat el y. For t hi s pur pose, we use t he same number of permut at i ons as i n t he
gl obal Amova. Th i s pr ocedur e shoul d be f av or ed w hen t h er e i s some
mi ssi n g dat a. Not e t hat di pl oi d i ndi vi dual s t hat ar e found wi t h mi ssi ng dat a for
one of t hei r t wo al l el es at a gi ven l ocus ar e removed from t he anal ysi s for t hat
l ocus.
Comput e Popu l at i on Speci f i c FST' s [ b] : Popul at i on speci fi c F
ST
i ndi ces will
be comput ed ( as defi ned i n sect i on Er r or ! Ref er ence sour ce not f ou nd.) for
ll l oci and for each l ocus separat el y i f t he Locus by l ocus AMOVA opt i on i s
checked. Not e t hat t hi s opt i on i s onl y availabl e if a si ngl e group i s defi ned i n
t he [ [ St ruct ur e] ] sect i on. No t est of t hese coeffi ci ent s i s performed as t hey
are onl y provi ded for expl orat ory purposes.
No. of per mut at i ons [ i ] : Ent er t he number of permut at i ons used t o t est t he
si gni fi cance of covari ance component s and fi xat ion i ndi ces. A val ue of zero will
not l ead t o any t est i ng procedur e. Val ues of several t housands ar e i n order for
a proper t est i ng scheme, and 16 000 permut at i ons guarant ee t o have l ess
t han 1% di ffer ence wi t h t he exact probabi li t y i n 99% of t he cases ( Guo and
Thomson 1992) .
The number of permut at i ons used by t he pr ogr am mi ght be sli ght l y l arger.
Thi s i s t he consequence of subdi vi si on of t he t ot al number of permut at ion i n
bat ches for est i mat ing t he st andard err or of t he P- val ue.
Not e t hat i f several covari ance component s need t o be t est ed, t he pr obabilit y
of each covari ance component wi ll be est i mat ed wi t h t hi s number of
permut at i on. The di st ri but i on of t he covari ance component s i s out put i nt o a
t abul at ed t ext fi l e call ed amo_hist . xl, whi ch can be di rect l y r ead i nt o MS-
EXCEL .
Comput e Mi ni mum Spann i n g Net w or k ( MSN) among h apl ot y pes. A
Mi ni mum Spanni ng Tree and a Mi ni mum Spanning Net work ar e comput ed
from t he di st ance mat ri x used t o perform t he AMOVA cal cul at i ons.
Choi ce of Eucl i di an squar e di st ances [ m] :
o Use pr oj ect di st an ce mat r i x [ m] : Use t he di st ance mat ri x defi ned i n t he
proj ect fi l e ( i f availabl e)
o Comput e di st ance mat r i x [ m] : Comput e a gi ven di st ance mat ri x based
on a met hod defi ned bel ow. Wi t h t hi s set t i ng select ed, t he di st ance mat ri x
pot ent i all y defi ned i n t he pr oj ect fi l e will be i gnor ed. Thi s mat ri x can be

Manual Arlequin ver 3.5 Arlequin int erface 87

generat ed ei t her for hapl ot ypi c dat a or genot ypi c dat a ( Mi chal aki s and
Excoffi er, 1996)
o Use conv ent i on al F- st at i st i cs [ m] : Wi t h t hi s set t i ng act i vat ed, we will
use a l ower di agonal di st ance mat ri x, wi t h zeroes on t he di agonal and ones
as off- di agonal el ement s. I t means t hat al l di st ances bet ween non- i dent i cal
hapl ot ypes wi ll be consi der ed as i dent i cal , i mplyi ng t hat one will bas t he
anal ysi s of genet i c st ruct ure onl y on all el e frequenci es.
Di st ance bet w een h apl ot y pes [ m] : Sel ect a di st ance met hod t o comput e
t he di st ances bet ween hapl ot ypes. Di ffer ent square Eucl i dean di st ances can
be used dependi ng on t he t ype of dat a anal yzed.
o Gamma a v al u e [ f] : Set t he val ue for t he shape paramet er of t he
gamma funct i on, when sel ect i ng a di st ance all owi ng for unequal mut at ion
rat es among si t es. See t he Mol ecul ar di ver si t y sect i on 6.3.8.2.
Pr i nt di st ance mat r i x [ b] : I f checked, t he i nt er- hapl ot ypi c di st ance mat ri x
used t o eval uat e t he mol ecul ar di versi t y i s pri nt ed i n t he r esul t fil e.
6.3.8.7.1.2 AMOVA wit h genot ypic dat a


Manual Arlequin ver 3.5 Arlequin int erface 88

Compar ed t o hapl ot ypi c dat a, i t becomes possi bl e t o comput e t he average i nbreedi ng
coeffi ci ent F
I S
wi t h di pl oid genot ypi c dat a.
I ncl ude i ndi v i dual l ev el f or genot y pe dat a [ b] : I ncl ude t he i nt ra- i ndi vi dual
covari ance component of genet i c di versi t y, and it s associ at ed i nbreedi ng
coeffi ci ent s ( F
I S
and F
I T
) . I t t hus t akes i nt o account t he di fferences bet ween
genes found wi t hi n indivi dual s. Thi s i s anot her way t o t est for gl obal depart ure
from Hardy- Wei nberg equilibrium.
Comput e popu l at i on speci f i c FI S s [ b] : Comput e i nbreedi ng coeffi cient s ( F
I S
)
separat el y for each popul at i on and t est i t by permut at i on of gene copi es bet ween
i ndi vi dual s wi t hin populat i on. The checkbox I nclude individual level must be
checked t o enabl e t hi s opt i on.
6.3.8.7. 2 Det ect i on of l oci u nder sel ect i on

Det ect i ng l oci under sel ect i on f r om gen et i c st r uct ur e an al y si s [ b] : Uses
coal escent si mul at i ons t o get t he p- val ues of l ocus- speci fi c F- st at i st i cs condi t i oned
on observed l evel s of het er ozygosi t i es. See Excoffi er et al . ( 2009) and sect i on
8.2.8 for met hodol ogi cal det ail s about t hese comput at i ons.

Manual Arlequin ver 3.5 Arlequin int erface 89

Use h i er ar ch i cal i sl an d model [ b] : I f checked, Arl equi n uses a hi erarchi cal
i sl and model ( Sl at ki n and Voel m, 1991) t o perform coal escent si mul at i ons
l eadi ng t o t he j oi nt null di st ri but i ons of hi erarchi cal F- st at i st i cs ( F
SC
, F
CT
, and
F
ST
) and het erozygosi t i es, from whi ch l ocus- speci fi c p- val ues ar e est i mat ed.
The used hi erarchi cal popul at i on st ruct ure i s t hat defi ned i n t he [ St ruct ure]
sect i on of your Arl equi n i nput fil e. Not e t hat a hi erarchi cal st ruct ure canonl y
be used i f more t han one gr oup i s defi ned i n Arlequi n genet i c st ruct ure. I f
unchecked, a non- hi erarchi cal fi ni t e i sl and model i s used i nst ead. I n t hat case,
all popul at i ons found i n di fferent groups of t he [ St ruct ure] sect i on ar e pool ed
i nt o a si ngl e group for t hi s anal ysi s. Popul at i ons pr esent i n t he Arl equin i nput
fil e t hat are not l i st ed as bel ongi ng t o a group i n t he [ St ruct ur e] sect i on are
di scarded from t he anal ysi s.
o Number of si mu l at i on s: [ i ] Number of coal escent si mul at i ons t o
perform t o get t he null di st ri but i on of F- st at i st i cs. Some l arge number
i s expect ed ( 10,000- 50, 000) t o get cor rect est i mat es.
o Number of demes t o si mu l at e: [ i ] Number of si mul at ed demes per
group. I f no hi erar chi cal st ruct ur e i s assumed, t hi s i s t he t ot al number
of si mul at ed demes. A val ue of 100 i s adequat e i n most si t uat i ons.
o Number of gr oups t o si mu l at e: [ i ] Number of si mul at ed groups i n
t he hi erarchi cal genet i c st ruct ur e. Thi s number shoul d be equal or
l arger t o t he number of groups defi ned i n t he Ar l equi n genet i c
st ruct ur e. I f t hi s number i s smal l er t han t he number of defi ned
groups, t hen si mul at i ons ar e done wi t h t he number of gr oups i n t he
Arl equin st ruct ure. A val ue l arger t han t he number of gr oups i n t he
st ruct ur e i s i n order, as t hi s does not have t oo much i nfl uence on t he
resul t i ng p- val ues ( see Excoffi er et al . 2009) .
o Mi n. ex p. Het er ozy gosi t y : [ f] Mi ni mum value of si mul at ed t arget
het er ozygosi t y. Thi s set t ing combi ned wi t h t he next one can be useful
t o make si mul at i ons around t he het erozygosi t i es of t he t est ed l oci .
Thi s i s because t he l ocus speci fi c p- val ues are comput ed by usi ng
si mul at i ons t hat have mat chi ng het erozygosi t i es. I t i s t her efor e
usel ess t o si mul at e very l ow het erozygosi t i es i f all t est ed l oci have
hi gh het er ozygosi t i es, because t hese si mul at i ons won' t be used t o
comput e p- val ues.
o Max . ex p. Het er ozy gosi t y : [ f] maxi mum val ue of si mul at ed t arget
het er ozygosi t y. See above for an expl anat i on of t he useful ness of
t hi ss set t i ng.

Manual Arlequin ver 3.5 Arlequin int erface 90

o Di st ance met hod f or AMOVA comput at i on s: [ m] Sel ect a di st ance
met hod t o comput e t he di st ances bet ween haplot ypes. Di ffer ent
squared Eucl i dean di st ances can be used dependi ng on t he t ype of
dat a anal yzed. See sect i ons 0 for DNA, 8.1.2.6 for RFLP, 8.1.2.7 for
mi crosat el li t e, and 8.1. 2.8 for st andard dat a.
o Mi n DAF f r equency : [ f] For DNA sequence onl y. One can speci fy a
mi ni mum val ue for t he si mul at ed Deri ved Al l el e Frequenci es ( DAF) of
i ndi vi dual SNPs t hat ar e si mul at ed. I t can be useful i f all observed
SNPs have some mi ni mum fr equency, e.g. due t o some ascert ai nment
bi as procedur e.


Manual Arlequin ver 3.5 Arlequin int erface 91

6.3.8.7. 3 Popul at i on compar i son

Comput e pai r w i se F
ST
[ b] : Comput es pai rwi se F
ST
s for all pai rs of popul at i ons,
as well as di ffer ent i ndexes of di ssi mil arit i es ( genet i c di st ances) bet ween pai rs of
popul at i ons, li ke t ransformed pai rwi se F
ST
s t hat can be used as short t erm
genet i c di st ances bet ween popul at i ons ( Reynol ds et al . 1983; Sl at ki n, 1995) , but
al so Nei s mean number of pai rwi se di ffer ences wi t hin and bet ween pai rs of
popul at i ons.
The si gni fi cance of t he genet i c di st ances i s t est ed by permut i ng t he haplot ypes or
i ndi vi dual s bet ween t he popul at i ons. See sect i on 8.2.3 for more det ai l s on t he
out put resul t s ( genet i c di st ances and mi grat i on rat es est i mat es bet ween
popul at i ons) .
Sl at k i n s di st ance [ b] : Comput es Sl at ki ns ( 1995) genet i c di st ance der i ved
from pai rwi se F
ST
( see sect i on 8.2.4.2) .
Rey nol ds s di st ance [ b] : Comput es Reynol ds et al . ( 1983) li neari zed F
ST

for short di vergence t i me ( see sect i on 8.2.4.1) .

Manual Arlequin ver 3.5 Arlequin int erface 92

Comput e pai r w i se di f f er ences [ b] : Comput es Nei s average number of
pai rwi se di ffer ences wi t hin and bet ween popul at i ons ( Nei and Li , 1979) ( see
sect i on 8.2.4.4)
o Est i mat e r el at i v e popul at i on si zes [ b] : Comput es r el at i ve popul at i on
si zes for al pai rs of popul at i ons, as wel l as di vergence t i mes bet ween
popul at i ons t aki ng int o account t hese pot ent i al di fferences bet ween
popul at i on sizes ( Gaggi ot t i and Excoffi er 2000) ( see sect i on 8.2.4.5) .
Comput e pai r w i se u
2
: [ b] For mi cr osat el lit e dat a onl y. Comput es a genet i c
di st ance bet ween al l pairs of popul at i ons t hat shoul d be li nearl y rel at ed t o
di vergence t i me ( see sect i on 8.2.4.5)
No. of per mut at i ons [ i ] : Ent er t he requi red number of permut at i ons t o t est
t he si gni fi cance of t he deri ved genet i c di st ances.. I f t hi s number i s set t o zero,
no t est i ng procedure will be performed. Not e t hat t hi s procedure i s qui t e t i me
consumi ng when t he number of popul at i ons i s large.
Si gn i f i cance l ev el [ f] : The l evel at whi ch t he t est of di ffer ent i at i on i s
consi der ed si gni fi cant for t he out put t abl e. I f t he P- val ue i s smal l er t han t he
Significance l evel , t hen t he t wo popul at i ons ar e consi der ed as si gni fi cant l y
di fferent .
Choi ce of Eucl i di an di st ance [ m] : Sel ect a di st ance met hod t o comput e t he
di st ances bet ween haplot ypes. Di ffer ent squar e Eucl i dean di st ances can be used
dependi ng on t he t ype of dat a anal yzed.
o Use pr oj ect di st an ce mat r i x [ m] : Use t he di st ance mat ri x defi ned i n
t he pr oj ect fi l e ( i f avail abl e)
o Comput e di st ance mat r i x [ m] : Comput e a gi ven di st ance mat ri x based
on a met hod defi ned bel ow. Wi t h t hi s set t i ng select ed, t he di st ance mat ri x
pot ent i all y defi ned i n t he pr oj ect fi l e will be i gnor ed. Thi s mat ri x can be
generat ed ei t her for hapl ot ypi c dat a or genot ypi c dat a ( Mi chal aki s and
Excoffi er, 1996) .
o Gamma a v al ue [ f] : Set t he val ue for t he shape paramet er a of t he
gamma funct i on, when sel ect i ng a di st ance all owi ng for unequal
mut at i on rat es among si t es. See t he Mol ecul ar di versi t y sect i on 0.
Thi s paramet er onl y appli es t o DNA dat a.
o Use conv ent i on al F- st at i st i cs [ m] : Wi t h t hi s set t i ng act i vat ed, we will
use a l ower di agonal di st ance mat ri x, wi t h zeroes on t he di agonal and
ones as off- di agonal el ement s. I t means t hat all di st ances bet ween non-
i dent i cal hapl ot ypes wi ll be consi der ed as i dent ical , i mpl yi ng t hat one will
bas t he anal ysi s of genet i c st ruct ure onl y on al l el e frequenci es.

Manual Arlequin ver 3.5 Arlequin int erface 93

6.3.8.7. 4 Popul at i on di f f er ent i at i on

Ex act t est of popu l at i on di f f er ent i at i on [ b] : We t est t he hypot hesi s of random
di st ri but i on of t he i ndi vidual s bet ween pai rs of popul at i ons as descri bed i n
Raymond and Rousset ( 1995) and Goudet et al . ( 1996) . Thi s t est i s anal ogous t o
Fi shers exact t est on a t wo- by- t wo cont i ngency t abl e, but ext ended t o a
cont i ngency t abl e of si ze t wo by ( no. of hapl ot ypes) . We do al so an exact
di fferent i at i on t est for al l popul at i ons defi ned i n t he pr oj ect by const ruct ing a
t abl e of si ze ( no. of popul at i ons) by ( no. of hapl ot ypes) . ( Raymond and Rousset ,
1995) .
No. of st eps i n Mar k ov ch ai n [ i ] : The maxi mum number of al t ernat i ve t abl es
t o expl ore. Fi gures of 100,000 or mor e are i n order. Larger val ues of t he st ep
number i ncreases t he preci si on of t he P- val ue as well as i t s est i mat ed st andard
devi at i on.
No. of dememor i sat i on st eps [ i ] : The number of st eps t o per form befor e
begi nning t o compar e t he al t ernat i ve t abl e probabilit i es t o t hat of t he observed

Manual Arlequin ver 3.5 Arlequin int erface 94

t abl e. Cor responds t o a burni n. A few t housands st eps ar e necessar y t o reach a
random st art i ng poi nt cor respondi ng t o a t abl e i ndependent from t he observed
t abl e.
Gen er at e hi st ogr am and t abl e [ b] : Generat es a hi st ogram of t he number of
popul at i ons whi ch are si gni fi cant l y di fferent fr om a gi ven popul at i on, and a P P
t abl e ( P bei ng t he number of popul at i ons) summari zi ng t he si gni fi cant
associ at i ons bet ween pai rs of popul at i ons. An associ at i on bet ween t wo
popul at i ons i s consi dered as si gni fi cant or not dependi ng on t he si gni fi cance l evel
speci fi ed bel ow.
Si gn i f i cance l ev el [ f] : The l evel at whi ch t he t est of di ffer ent i at i on i s
consi der ed si gni fi cant for t he out put t abl e. I f t he P- val ue i s smal l er t han t he
Significance l evel , t hen t he t wo popul at i ons ar e consi der ed as si gni fi cant l y
di fferent .
6.3. 8.8 Genot ype assignment


Manual Arlequin ver 3.5 Arlequin int erface 95

Per f or m gen ot y pe assi gnment f or al l pai r s of popul at i ons: Comput es t he
l og li keli hood of t he genot ype of each i ndi vidual i n ever y sampl e, as i f i t was
drawn from a popul at i on sampl e havi ng all el e frequenci es equal t o t hose
est i mat ed for each sampl e ( Paet kau et al . 1997; Waser and St r obeck, 1998) .
Mul t i- l ocus genot ype li keli hoods are comput ed as t he product of each l ocus
li kelihood, t hus assumi ng t hat t he l oci are i ndependent . The out put resul t fil e
li st s, for each popul at i on, a t abl e of t he l og- li kelihood of each i ndi vi dual
genot ype i n all popul at ions ( see sect i on 8.2.6) .
6.3. 8.9 Mant el t est

Comput e cor r el at i on bet w een di st ance mat r i ces: Test t he cor r el at i on or t he
part i al cor rel at i ons bet ween 2 or 3 mat ri ces by a per mut at i on procedur e ( Mant el ,
1967; Smouse et al . 1986) .
Number of per mut at i ons: Set s t he number of permut at i ons for t he Mant el t est


Manual Arlequin ver 3.5 Out put files 96

7 OUTPUT FI LES
The resul t fil es are al l out put i n a speci al sub- direct ory, havi ng t he same name as your
proj ect , but wi t h t he ". r es" ext ensi on. Thi s has been done t o st ruct ur e your r esul t fil es
accordi ng t o di ffer ent pr oj ect s. For i nst ance, i f your proj ect fi l e i s call ed my_file.arp, t hen
t he r esul t fil es will be i n a sub- di rect ory cal l ed [ my_file. r es]
7.1 Resul t f i l e
The fi l e cont ai ni ng all t he resul t s of t he anal yses j ust performed. You can choose t o have
resul t s shown i n an ht ml fil e, as wi t h previ ous ver si ons ( befor e 3.5) , or by out put t i ng
resul t s int o an xml fil e ( see opt i ons i n sect i on 6.1.3) . By defaul t , i t has t hese r esul t fil es
have t he same name t han t he Arl equi n i nput file, wi t h t he ext ensi on .ht m or .xml . The
resul t fil e i s opened i n t he ri ght frame of t he ht ml browser at t he end of each run.
I f t he opt i on Append Result s of t he Configurat ion Arlequin t ab i s checked, t he resul t s of
t he cur rent comput at i ons ar e appended t o t hose of pr evi ous cal cul at i ons, ot herwi se t he
resul t s of pr evi ous anal yses are erased, and onl y t he l ast resul t s ar e out put in t he resul t
fil e.
7.2 Ar l equi n l og f i l e
A fil e wher e run- t i me WARNI NGS and ERRORS encount er ed duri ng any phases of t he
curr ent Arl equi n sessi on ar e i ssued. The fi l e has t he name Arl equin_log.t xt and i s
l ocat ed i n t he r esul t di r ect or y of t he open ed pr oj ect . You shoul d consul t t hi s fil e i f
you observe any warni ng or err or message i n your resul t fil e. I f Arl equin has crashed
t hen consul t Arlequin_log.t xt bef or e runni ng Arl equi n agai n. I t will probabl y hel p you in
fi nding where t he probl em was l ocat ed. A refer ence t o t he l og fi l e i s provi ded i n t he l eft
pane of t he ht ml resul t fil e and can be act i vat ed i n your web br owser. The l og fil e of t he
curr ent pr oj ect can al so be vi ewed by pr essi ng on t he Vi ew Log File but t on on t he Tool bar
7.3 Li nk age di sequi l i br i um r esul t f i l e
Thi s fil e cont ai ns t he resul t s of pai rwi se li nkage di sequilibri um t est s bet ween all pai rs of
l oci . By defaul t , i t has t he name LD_DI S.XL. As suggest ed by i t s ext ensi on, t hi s fil e can
be r ead wi t h MS- Excel wi t hout modi fi cat i on. The format of t he fil e i s t ab separat ed.
7.4 Al l el e f r equenci es
By checki ng t he opt i on "Out put sampl e all el e fr equenci es for al l l oci " i n t he Molecular
diver sit y indices t ab, i t is possi bl e t o out put all ele frequenci es at al l l oci for al l popul at i ons
i n a seri es of fil es cal l ed "All FreqLocus_XXX.t xt " , where XXX i s t he l ocus number. On each

Manual Arlequin ver 3.5 Out put files 97

row, t he fr equenci es of an all el e ar e l i st ed for all sampl ed popul at i ons. The names of t he
popul at i ons are l i st ed i n a separat e fi l e, cal l ed "PopNames.t xt ".
7.5 Vi ew r esul t s i n y our HTML br ow ser
For very l arge r esul t fil es or resul t fil es cont ai ning t he pr oduct of sever al anal yses, i t may
be of pract i cal i nt erest t o vi ew t he r esul t s i n an HTML br owser. Thi s can be si mpl y done
by act i vat i ng t he but t on Br owse result s of t he proj ect t ab panel , whi ch will t hen l oad t he
resul t fil es int o your defaul t web browser.
I f t he XML out put i s inact i vat ed, Arl equi n will produce convent i onal ht ml fil e, l ooki ng li ke
t hi s i n your web br owser:

1) The l ef t pane cont ains a t r ee wher e each fi r st l evel branch corr esponds t o a run. For
each run we have sever al ent ri es corr espondi ng t o t he set t i ngs used for t he
cal cul at i on, t he i nt er- popul at i on anal yses ( Genet i c st ruct ure, shared hapl ot ypes,
et c) and fi nall y all int ra- popul at i on anal yses wi t h one ent ry per popul at i on sampl e.

Manual Arlequin ver 3.5 Out put files 98

The descri pt i on of t hi s t ree i s st or ed i n [ proj ect name] _t ree.ht ml. At t his poi nt i t i s
i mport ant t o not i ce t hat t hi s t ree uses t he j ava scri pt fil es ft iens4.j s and ua.j s l ocat ed
i n Arl equi ns i nst all at i on di rect ory. I f you move Arl equin t o anot her l ocat i on, or
uni nst all Arl equi n, t he left pane wi ll not work anymor e.
2) The r i ght pan e shows t he r esul t s concerni ng t he sel ect ed i t em i n t he l eft pane. The
HTML code of t hi s pane i s i n t he mai n resul t fil e. Thi s fil e i s l ocat ed i n resul t sub-
di rect or y of your pr oj ect and i s named [ pr oj ect name] .ht m or [ pr oj ect name] . xml,
dependi ng on your choice of out put i n t he Opt ion menu ( see sect i on 6. 1.3) .
7.6 XML out put f i l e
I n Arl equin ver 3.5, t he user has t he choi ce t o produce resul t fil es i n convent i onal ht ml
fil e or i n t he Ext ensi bl e Markup Language ( ht t p: / / www.w3. org/ XML/ ) , by checki ng t he
XML Out put menu i n t he Opt i ons menu. Out put fil es i nclude addi t i onal format t i ng opt i ons
i n t he xml versi ons, as wel l as t he possi bili t y t o ext ract dat a fr om t abl es and use i t t o
produce graphi cs t hat are i nt egrat ed di rect l y i nt o t he xml fil e.
7.6.1 Pot ent i al XML f or mat t i ng pr obl em w i t h Fi r ef ox v er 3.x
I n Fi refox ver 3.x, your xml may appear as unformat t ed, whi ch i s because t he XSLT st yl e
sheet i s out si de t he curr ent XML fi l e' s pat h.
A workaround i s t o edi t Fi refox set t i ngs as fol l ows:
1) Type about : con f i g in Fi refox addr ess bar
2) Change secur i t y .f i l eur i . st r i ct _or i gi n_ pol i cy t o f al se
7.6.2 I ncl ude gr aphi cs i nt o t he x ml out put f i l e
I f r esul t s have been generat ed i n an XML resul t fil e, i t i s t hen possi bl e t o cr eat e graphs
from speci fi c t abl es found i n t he xml out put fil e. Graphi cs ar e generat ed aut omat i call y by
a seri es of R scri pt s t ri gger ed by t he Rcmd but t on on t he t ool bar ( see al so sect i on 6.2) .






Manual Arlequin ver 3.5 Out put files 99

The Rcmd command i s act i ve i f t he pat h t o t he Rcmd pr ogram has been speci fi ed i n t he
Arlequin configurat ion t ab ( see sect i on 6.3.3) , i mpl yi ng t hat t he R package has been
i nst all ed on your comput er. Thi s package can be freel y downl oaded fr om ht t p: / / www.r-
proj ect . org/ . Not e agai n t hat ht ml out put s cannot be used t o produce graphs.
For i nst ance, a graph conveni ent l y represent i ng pai r w i se FSTs bet w een popul at i on s
has been added bel ow t he FST di st ance mat ri x in t he xml resul t fil e.

7.6.3 Why use R t o mak e gr aphs?
R i s a l anguage and envi ronment for st at i st i cal comput i ng and graphi cs product i on. I t
provi des a wi de vari et y of st at i st i cal and graphical t echni ques and i s highl y ext ensi bl e. R
i s avail abl e as a free package and i t compil es and runs on Li nux, Wi ndows and MacOS X.
Ther efor e R funct i ons ar e port abl e under many comput er syst ems. R al so provi des ver y
powerful graphi c facili t ies for t he pr oduct i on of many di ffer ent di agrams and pl ot s.
R al so i ncl udes an XML packages, whi ch cont ains t ool s for par si ng and generat i ng XML
wi t hin R. These t ool s al low one t o get an R st ruct ure r epr esent i ng t he XML fi l e, t o access
t ags of i nt erest wi t h R and t o get t hei r at t ri but e val ues and t hei r cont aining dat a. I t al so
all ows one t o mani pul at e t he XML st ruct ure, e.g. t o add addi t i onal t ags or at t ri but es.
Aft er t hi s manipul at i on it i s possi bl e t o save t he R st ruct ur e back i nt o t he xml fi l e.
The possi bili t y t o parse XML fi l es, t he abi li t y t o produce compl ex graphics and t he
port abi lit y of R make i t an i deal t ool t o par se t he XML out put of ARLEQUI N and t o
generat e graphi cs based on t he ext ract ed dat a.

Manual Arlequin ver 3.5 Out put files 100

7.6.4 Ex ampl e of R- l equi n gr aphi cal out put s
We r eport bel ow some exampl es of t he graphs produced by R- l equi n for di fferent
summary st at i st i cs, and t he name of t he R funct i on used t o produce i t . These R funct i ons
can be found i n t he Rfunct ions di rect or y l ocat ed at t he root of t he Arl equin di rect ory.
Users can modi fy t hem t o cust omi ze t hei r graphs at wi ll . We al so r epor t for each graph,
t he XML t ag sur r oundi ng t he dat a used t o generat e t hese gr aphs, as well as whi ch
Arl equin comput at i ons produce t he r esul t s used t o generat e t hese graphs.
7.6. 4.1 Genet ic diversit y
7.6.4.1. 1 Number of al l el es per l ocus

XML t ag i n out put fil e: sumNumAll el es
R- funct i on: sumNumAll el esFunct i on.r
Arl equin comput at i on: St andard di versi t y i ndi ces ( see sect i on 6.3.8.2) .

When t he number of l oci i s l arge, we use anot her r epr esent at i on, li ke bel ow for DNA
sequences:

7.6.4.1. 2 Ex pect ed het er ozy gosi t y

XML t ag i n out put fil e: sumExpHet er ozygosi t y

Manual Arlequin ver 3.5 Out put files 101

R- funct i on: sumExpect edHet er ozygosi t y.r
Arl equin comput at i on: St andard di versi t y i ndi ces ( see sect i on 6.3.8.2) .

Agai n, when t he number of l oci i s l arge, we use anot her r epr esent at i on, li ke bel ow for
DNA sequences:

7.6.4.1. 3 Thet a v al u es
We pl ot t het a val ues est i mat ed from di ffer ent aspect s of t he dat a for al l popul at i ons.
Not e t hat t he compari son of di fferent t het a val ues i s t he basi s of several neut ralit y t est s,
li ke t hat based on Taj i ma' s D ( see sect i on 8.1.6. 4) .

XML t ag i n out put fil e: sumMol ecDi vI ndexes
R- funct i on: sumMol ecul arDi vI ndexes.r
Arl equin comput at i on: Mol ecul ar di st ance ( see sect i on 6.3.8.2) .

Manual Arlequin ver 3.5 Out put files 102

7.6.4.1. 4 Thet a ( H) f or mi cr osat el l i t e dat a

XML t ag i n out put fil e: sumThet aH
R- funct i on: sumThet aHFunct i on.r
Arl equin comput at i on: Mol ecul ar di st ance ( see sect i on 6.3.8.2) .

7.6.4.1. 5 Al l el e si ze r an ge at di f f er ent l oci ( mi cr osat el l i t e dat a)

XML t ag i n out put fil e: sumAll eli cSizeRange
R- funct i on: sumAll eli cSizeRangeFunct i on.r
Arl equin comput at i on: Mol ecul ar di st ance ( see sect i on 6.3.8.2) .

7.6.4.1. 6 Gar za- Wi l l i amson i n dex ( mi cr osat el l i t e dat a)

XML t ag i n out put fil e: sumGWI ndex
R- funct i on: sumGWI ndexFunct i on.r
Arl equin comput at i on: Mol ecul ar di st ance ( see sect i on 6.3.8.2) .



Manual Arlequin ver 3.5 Out put files 103

7.6.4.1. 7 Modi f i ed Gar za- Wi l l i amson i ndex ( mi cr osat el l i t e dat a)

XML t ag i n out put fil e: sumModGWI ndex
R- funct i on: sumModGWI ndexFunct ion.r
Arl equin comput at i on: Mol ecul ar di st ance ( see sect i on 6.3.8.2) .

7.6. 4.2 Genet ic dist ances bet ween populat ions
7.6.4.2. 1 Mat r i x of pai r w i se F
ST
s
We make a si mpl e graphi c repr esent at i on of t he popul at i on rel at i onships as descri bed by
F
ST
comput ed bet ween pai rs of popul at i ons. As an exampl e, we show pai rwi se di st ances
bet ween Myot is myot is bat popul at i ons from Europe as i nferr ed fr om mt DNA cont rol
regi on. Thi s r epr esent at i on all ows one t o qui ckly percei ve genet i c affi nit i es bet ween
popul at i ons.

XML t ag i n out put fil e: pai rwi seDi ffer enceMat rix
R- funct i on: pai rFst Mat ri x.r
Arl equin comput at i on: Comput e pai rwi se F
ST
( see sect i on 6.3.8.7.3)


Manual Arlequin ver 3.5 Out put files 104

7.6.4.2. 2 Mat r i x of Rey nol d s coancest r y coef f i ci ent

XML t ag i n out put fil e: coancest ryCoeffi ci ent s
R- funct i on: coancest ryCoeff. r
Arl equin comput at i on: Comput e pai rwi se F
ST
and Reynol dss di st ance ( see sect i on
6.3.8.7.3)

7.6.4.2. 3 Sl at k i n s l i near i zed F
ST
s

XML t ag i n out put fil e: sl at ki nFst
R- funct i on: sl at ki nFst Funct i on.r
Arl equin comput at i on: Comput e pai rwi se F
ST
and Sl at ki ns di st ances ( see sect i on
6.3.8.7.3)

Manual Arlequin ver 3.5 Out put files 105

7.6.4.2. 4 Av er age number of pai r w i se di f f er ences w i t hi n and bet w een
popu l at i on s
I n t hi s graph we represent on t hr ee di ffer ent col our scal es t he average number of
pai rwi se di ffer ences ( ) bet ween sampl ed populat i ons. Orange on di agonal : wi t hi n
popul at i ons; Gr een above di agonal :
xy
bet ween pai rs of popul at i ons Blue bel ow di agonal :
net number of nucl eot i de di fferences bet ween popul at i ons ( D
A
, see sect ion 8.2.4.4) .


XML t ag i n out put fil e: pai rwi seDi ffMat ri x
R- funct i on: pai rwi seDi ffMat ri x.r
Arl equin comput at i on: Comput e pai rwi se di ffer ences ( see sect i on 6.3.8.7.3)



Manual Arlequin ver 3.5 Out put files 106

7.6.4.2. 5 Model of popul at i on di v er gence al l ow i ng f or u nequal der i v ed
popu l at i on si ze
7.6.4.2.5.1 Divergence t ime bet ween populat ions assuming different derived populat ion
sizes

XML t ag i n out put fil e: t auMat ri x
R- funct i on: t auMat ri xFunct i on.r
Arl equin comput at i on: Comput e pai rwi se di ffer ences and Est i mat e r el at i ve popul at i on
si zes ( see sect i on 6.3.8. 7.3)

7.6.4.2.5.2 Ancest ral populat ion sizes

XML t ag i n out put fil e: ancest ral PopSi ze
R- funct i on: ancest ral Popul at i onSi ze.r
Arl equin comput at i on: Comput e pai rwi se di ffer ences and Est i mat e r el at i ve popul at i on
si zes ( see sect i on 6.3.8. 7.3)

Manual Arlequin ver 3.5 Out put files 107

7.6. 4.3 Mat rix of molecular dist ance bet ween haplot ypes
Thi s graph repr esent s t he number of mol ecul ar di ffer ences bet ween al l di ffer ent
hapl ot ypes found i n t he proj ect . I t i s out put when one sel ect s t he opt i on Pri nt di st ance
mat ri x i n t he AMOVA t ab ( see sect i on 6.3.8.7. 1) .

XML t ag i n out put fil e: hapDi st Mat ri x
R- funct i on: hapl ot ypeDi st Mat ri x.r
Arl equin comput at i on: Pri nt di st ance ( see AMOVA sect i on 6.3.8.7.1.1)

7.6. 4.4 Mat rix of molecular dist ances bet ween gene copies wit hin and bet ween
populat ions ( phase known only)
Thi s graphi c i s produced by combi ni ng i nformat ion fr om t he t abl e of hapl ot ype
fr equenci es and t he i nt er- hapl ot ype di st ance mat ri x. I t i mpli es t hat bot h t abl es must be
comput ed i n t he same r un ( see bel ow on how t o act i vat e t hese comput at i ons) . The
graphi c i s di spl ayed aft er t he hapl ot ype di st ance mat ri x graphi c i n t he XML out put fil e.

Manual Arlequin ver 3.5 Out put files 108


The dashed l i nes separ at e popul at i on sampl es. Li ke t he graph of t he Mat ri x of pai rwi se
F
ST
s ( see sect i on 7.6.4. 2.1) , t hi s graph all ows one t o qui ckl y vi suali ze popul at i on genet i c
affi ni t i es, but provi des a more det ai l ed vi ew, at t he i ndi vi dual hapl ot ype ( sequence) l evel .
XML t ag i n out put fil e: hapDi st Mat ri x
R- funct i on: hapDi st Mat ri x_wi t hinBet weenCompl et e. r
Arl equin comput at i ons: Sear ch for shar ed hapl ot ypes and Pri nt di st ance ( see sect i ons
6.3.8.4.1 and 6.3.8.7.1. 1)


Manual Arlequin ver 3.5 Out put files 109

7.6. 4.5 Mat rix of molecular dist ances bet ween haplot ypes wit hin populat ions
Thi s graph repr esent s t he mat ri x of t he number of pai rwi se di st ances bet ween al l
hapl ot ypes found i n a given popul at i on. Not e t hat i t does not use i nfor mat i on on
hapl ot ype fr equenci es.

XML t ag i n out put fil e: i nt erHapDi st Mat ri x
R- funct i on: i nt erHapl ot ypeDi st Mat rix.r
Arl equin comput at i on: Pri nt di st ance mat ri x bet ween hapl ot ypes ( see sect i on 6.3.8.2)
7.6. 4.6 Haplot ype frequencies wit hin populat ion
For each popul at i on, we produce graphs of t he order ed di st ri but i on of all el e fr equenci es
and i t s neut ral expect at i on as comput ed i n t he Ewens- Wat t er son neut ral i t y t est ( see
sect i on 8.1.6.1)

XML t ag i n out put fil e: expHapFr eq
R- funct i on: expect edHapFr eq. r
Arl equin comput at i on: Ewens- Wat t erson neut r ali t y t est ( see sect i on 6.3.8.6)

Manual Arlequin ver 3.5 Out put files 110

7.6. 4.7 Haplot ype frequencies in populat ions
The fr equency of al l hapl ot ypes ( known phase i s assumed) ar e pl ot t ed for each
popul at i ons.

XML t ag i n out put fil e: rel HapFr eq
R- funct i on: rel at i veHapFr eq.r
Arl equin comput at i on: Sear ch for shar ed hapl ot ypes ( see sect i on 6.3.8.4.1)

7.6. 4.8 Mismat ch dist ribut ion
7.6.4.8. 1 Demogr aphi c ex pan si on

XML t ag i n out put fil e: mi smat chDemogExp
R- funct i on: mi smat ch.r
Arl equin comput at i on: Est i mat e paramet ers of demographi c expansi on ( see sect i on
6.3.8.3)

Manual Arlequin ver 3.5 Out put files 111

7.6.4.8. 2 Spat i al ex pansi on
We produce pl ot s of t he obser ved mi smat ch di st ri but i on and i t s confi dence i nt erval

XML t ag i n out put fil e: mi smat chSpat i al Exp
R- funct i on: mi smat ch.r
Arl equin comput at i on: Est i mat e paramet ers of spat i al expansi on ( see sect i on 6.3.8.3)


Manual Arlequin ver 3.5 Out put files 112

7.6. 4.9 Populat ion assignment t est
For each sampl e, we pr oduce graph of t he genot ype li keli hoods i n t he sampl ed popul at i on
vs. t hat i n all ot her popul at i ons.

XML t ag i n out put fil e: genot ypeLi kel ihoodMat ri x
R- funct i on: genot Li kel ihoodMat ri x.r
Arl equin comput at i on: Per form genot ype assi gnment for al l pai rs of popul at i ons ( see
sect i on 6.3.8.8)


Manual Arlequin ver 3.5 Out put files 113

7.6. 4.10 Det ect ion of loci under select ion
I n t hi s graph, we pl ot t he j oi nt di st ri but i on of F
ST
and ( het er ozygosi t y wit hi n
popul at i ons) / ( 1- F
ST
) for t he observed l oci ( small ci rcl es) , as wel l as one- si ded confi dence
i nt erval li mi t s obt ained from si mul at ed dat a ( see sect i on 8.2.8 for det ai l s) as dashed
lines. Loci si gni fi cant at t he 5% l evel are shown as fi ll ed bl ue ci rcl es, whi l e l oci si gni fi cant
at t he 1% l evel ar e shown as red fill ed ci rcl es. We al so gi ve t he number of t he l oci under
sel ect i on at t he 1% l evel . I f a hi erarchi cal i sl and model i s used t o det ect l oci under
sel ect i on, we al so out put a pl ot for t he j oi nt di st ri but i on of F
CT
and het er ozygosi t y.


XML t ag i n out put fil e: det Sel _FSt at _Pval & det Sel _FST_CI
R- funct i on: l oci Sel ect i on.r
Arl equin comput at i on: Det ect i ng l oci under sel ect i on fr om genet i c st ruct ure anal ysi s
( see sect i on 6.3.8.7.2)


Manual Arlequin ver 3.5 Met hodological out lines 114

8 METHODOLOGI CAL OUTLI NES
The fol l owi ng t abl e gi ves a rapi d overvi ew of t he met hods i mpl ement ed i n Arl equi n. A
i ndi cat es t hat t he t ask cor respondi ng t o t he t abl e ent ry i s possi bl e. Some t asks ar e onl y
possi bl e or meani ngful if t her e i s no r ecessi ve dat a, and t hose cases ar e marked wi t h a

Dat a t y pes
DNA & RFLP Mi cr osat St andar d Fr equency
Ty pes of comput at i on s G+ G- H G+ G- H G+ G- H
St andard i ndi ces

Mol ecul ar di versi t y


Mi smat ch di st ri but i on








Hapl ot ype ( or al l el e) frequency
est i mat i on

Li nkage di sequilibrium


Hardy- Wei nberg equili bri um






Taj i mas neut ral it y t est




Fus neut ral i t y t est




Ewens- Wat t erson neut r ali t y
t est s







Chakrabort ys amal gamat i on
t est







Sear ch for shar ed hapl ot ypes
bet ween sampl es






AMOVA

Det ect i on of sel ect ed l oci


Mi ni mum Spanni ng Net wor k
1









Pai rwi se genet i c di st ances

Exact t est of popul at i on
di fferent i at i on

I ndi vi dual assi gnment t est






Mant el t est
G+ : Genot ypi c dat a, gamet i c phase known
G- : Genot ypi c dat a, gamet i c phase unknown
H : Hapl ot ypi c dat a

Manual Arlequin ver 3.5 Met hodological out lines 115

1
Comput at i on of mi ni mum spanni ng net work bet ween hapl ot ypes i s onl y possi bl e i f a
di st ance mat ri x i s provi ded or i f i t can be comput ed fr om t he dat a.
8.1 I nt r a- popul at i on l ev el met hods
8.1.1 St andar d di v er si t y i ndi ces
8.1. 1.1 Gene diversit y
Equi val ent t o t he expect ed het er ozygosi t y for di pl oi d dat a. I t i s defi ned as t he
probabi lit y t hat t wo randoml y chosen hapl ot ypes ar e di ffer ent i n t he sampl e. Gene
di versi t y and i t s sampling vari ance are est i mat ed as
) 1 (
1

1
2

=
k
i
i
p
n
n
H

=

= = = =
2
1
2
1
2 2
1
2
1
3
) ( ) ( ) 2 ( 2
) 1 (
2
)

( V
k
i
i
k
i
i
k
i
i
k
i
i
p p p p n
n n
H
,
wher e n i s t he number of gene copi es i n t he sampl e, k i s t he number of hapl ot ypes, and
i
p i s t he sampl e fr equency of t he i- t h hapl ot ype.
Not e t hat Arl equi n out put s t he st andard devi at ion of t he Het erozygosi t y comput ed as

. .( ) ( ) s d H V H = .
Refer ence:
Nei , 1987, p.180.
8.1. 1.2 Expect ed het erozygosit y per locus
For each l ocus, Arl equin provi des an est i mat i on of t he expect ed het erozygosi t y si mpl y as
) 1 (
1

1
2

=
k
i
i
p
n
n
H

8.1. 1.3 Number of usable loci
Number of l oci t hat show l ess t han a speci fi ed amount of mi ssi ng dat a. The maxi mum
amount of mi ssi ng dat a must be speci fi ed i n t he General Set t ings t ab di al og .
8.1. 1.4 Number of polymorphic sit es ( S)
Number of usabl e l oci t hat show mor e t han one al l el e per l ocus.
8.1. 1.5 Allelic range ( R)
For MI CROSAT dat a, i t is t he di fference bet ween t he maxi mum and t he mi ni mum number
of repeat s.

Manual Arlequin ver 3.5 Met hodological out lines 116

8.1. 1.6 Garza- Williamson index ( G- W)
Foll owi ng Garza and Wlli amson ( 2001) , t he G- W st at i st i c i s gi ven as
1
k
G W
R
=
+
wher e
k i s t he number of all el es at a gi ven l oci i n a popul at i on sampl e, and R i s t he al l eli c
range. Ori gi nall y, t he denomi nat or was defi ned as j ust R i n Garza and Wlli amson
( 2001) , but t hi s coul d l ead t o a di vi si on by zero i f a sampl e i s monomor phi c. Thi s
adj ust ment was i nt roduced i n Excoffi er et al . ( 2005) .
Thi s st at i st i c was shown t o be sensi t i ve t o popul at i on bot t l eneck, because t he number of
all el es i s usuall y more r educed t han t he range by a r ecent reduct i on i n popul at i on si ze,
such t hat t he di st ri but i on of al l el e l engt h will show " vacant " posi t i ons. Ther efor e t he G-
W st at i st i c i s supposed t o be ver y smal l i n populat i on havi ng been t hrough a bot t l eneck
and cl ose t o one i n st at ionary popul at i ons.
Here we j ust r eport t he st at i st i cs but do not provi de any t est .
8.1.2 Mol ecul ar i ndi ces
8.1. 2.1 Mean number of pairwise differences ( )
Mean number of di fferences bet ween all pai rs of hapl ot ypes i n t he sampl e. I t i s gi ven by
1 1

1
k k
i j ij
i j
n
p p d
n

= =
=


,
wher e
ij
d

i s an est i mat e of t he number of mut at i ons havi ng occur r ed si nce t he


di vergence of hapl ot ypes i and j , k i s t he number of hapl ot ypes,
i
p i s t he fr equency of
hapl ot ype I , and n i s t he sampl e si ze. The t ot al vari ance ( over t he st ochast i c and t he
sampli ng process) , assumi ng no recombi nat i on bet ween si t es and sel ect i ve neut rali t y, i s
obt ai ned as
) 6 7 ( 11
) 3 ( 2 ) 1 ( 3
) ( V
2
2 2
+
+ + + +
=
n n
n n n n
. ( Taj i ma, 1993)
Not e t hat si mil ar formul as ar e al so used for Microsat and St andard dat a, even t hough
t he underl yi ng assumpt ions of t he model may be vi ol at ed. Not e al so t hat Arl equi n
out put s t he st andard devi at i on comput ed as . .( ) ( ) s d V = .
Refer ences:
Taj i ma, 1983
Taj i ma, 1993

Manual Arlequin ver 3.5 Met hodological out lines 117

8.1. 2.2 Nucleot ide diversit y or average gene diversit y over L loci
I t i s comput ed her e as t he pr obabili t y t hat t wo r andoml y chosen homol ogous
( nucl eot ide or RFLP) si t es ar e di fferent . I t i s equi val ent t o t he gene di ver si t y at t he
nucl eot i de l evel for DNA dat a.
L
d p p
ij j i
i j
k
i
n

1

< =
=
2
2

) 1 ( 9
) 3 ( 2

) 1 ( 3
1
) ( V
n n n
n n
n n
L n
n

+ +
+

+
=
Not e t hat si mil ar formul as ar e used for comput i ng t he average gene di ver si t y over L
l oci for Mi crosat and St andard dat a, assumi ng no r ecombi nat i on and sel ect i ve
neut rali t y. As above, one shoul d be awar e t hat t hese assumpt i ons may not hol d for
t hese dat a t ypes. Not e al so t hat Arl equin out put s t he st andard devi at i on comput ed as
. .( ) ( )
n n
s d V = .
Not e t hat for RFLP dat a t hi s measure shoul d be consi der ed as t he average
het er ozygosi t y per RFLP si t e, whi ch i s di ffer ent from t he t rue di ver si t y at t he nucl eot i de
l evel , for whi ch one woul d need t o know t he base composi t i on of t he r est ri ct i on si t es.
Refer ences:
Taj i ma, 1983
Nei , 1987, p. 257
8.1. 2.3 Thet a est imat ors
Several met hods ar e used t o est i mat e t he popul at i on paramet er Mu 2 = , wher e M i s
equal t o 2 N for di pl oi d popul at i ons of si ze N , or equal t o N for hapl oid popul at i ons, and
u i s t he overal l mut at i on rat e at t he hapl ot ype l evel .
8.1.2.3. 1 Thet a( Hom)
The expect ed homozygosi t y in a popul at i on at equilibrium bet ween dri ft and mut at i on i s
usuall y gi ven by
1
1
+
=

H .
However, Zour os ( 1979) has shown t hat t hi s est i mat or was an over est i mat e when
est i mat ed fr om a si ngl e or a few l oci . Al t hough he gave no cl osed form sol ut i on,
Chakrabort y and Wei ss ( 1991) proposed t o i t erat i vel y sol ve t he foll owing rel at i onshi p
bet ween t he expect at i on of
H

and t he unknown paramet er


|
|
.
|

\
|
+ +
+
+ =
) 3 )( 2 (
) 1 ( 2
1 )

( E



H
( Zouros, 1979)

Manual Arlequin ver 3.5 Met hodological out lines 118

st art i ng wit h a fi rst est i mat e of
H

of H H / ) 1 ( , and equat ing i t t o it s expect at i on.


Chakrabort y and Wei ss ( 1991) gi ve an approxi mat e formul a for t he st andard err or of
H

as
| | 4 ) 2 ( 10 ) 4 )( 3 )( 2 ( ) 1 (
) ( s.d. ) 3 ( ) 2 (
)

( s.d.
2
2 2
+ + + + + + +
+ +

H
H
H
,
wher e ) ( s.d. H i s t he st andard err or of H gi ven i n sect i on 8.1.1.1.
For MI CROSAT dat a, Oht a and Ki mura ( 1973) have shown t hat t he expect ed
homozygosi t y i n st at i onary popul at i ons under a pure st epwi se mut at i on model was
equal t o:
1
( )
1 2
E Hom

=
+

wher e 4
e
N u = for di pl oi ds and 2
e
N u = for hapl oi d syst ems. I t foll ows t hat an
est i mat or of can be obt ai ned for mi cr osat elli t e dat a as
2
1

(1 )
H
H
=

,
wher e

H i s t he expect ed het er ozygosi t y est i mat ed as i n sect i on 8.1.1.2.


8.1.2.3. 2 Thet a( S)
S

i s est i mat ed fr om t he i nfi nit e- si t e equili bri um rel at i onshi p ( Wat t erson, 1975)
bet ween t he number of segregat i ng si t es ( S) , t he sampl e si ze ( n) and for a sampl e of
non- recombi ning DNA:
1
a
S
=
wher e
i
a
n
i
1
1
1
1

=
= .
The vari ance of
S

i s obt ained as
) (
)

( V
2
2
1
2
1
2
2
2
1
a a a
S a S a
S
+
+
= , ( Taj i ma, 1989)
wher e
2
1
1
2
1
i
a
n
i

=
=

Manual Arlequin ver 3.5 Met hodological out lines 119

8.1.2.3. 3 Thet a( k )
k

i s est i mat ed fr om t he i nfi ni t e- all el e equili brium rel at i onshi p ( Ewens, 1972) bet ween
t he expect ed number of all el es ( k) , t he sampl e si ze ( n) and :

+
=

=
1
0
1
) ( E
n
i
i
k


I nst ead of t he vari ance of
k

, we gi ve t he li mi t s (
0

and
1

) of a 95% confi dence


i nt erval around
k

, obt ai ned from Ewens ( 1972)


025 . 0 ) | alleles than less Pr(
0
= = k
025 . 0 ) | alleles than more Pr(
1
= = k ,
These probabi li t i es are obt ai ned by summi ng up t he pr obabili t i es of obser vi ng k' all el es
( k' = 0,.. ., k) , obt ai ned as ( Ewens, 1972)
) (
) | Pr(

n
S
k k
n
S
k K = =
wher e
k
n
S i s a St i rli ng number of t he fi rst ki nd ( see Abramovi t z and St egun, 1970) ,
and ) (
n
S i s defi ned as ) 1 ( ) 2 )( 1 ( + + + n .
8.1.2.3. 4 Thet a( )

i s est i mat ed fr om t he i nfi nit e- si t e equili bri um rel at i onshi p bet ween t he mean
number of pai rwi se di fferences ( ) and t het a ( ) :
= ) ( E , ( Taj i ma, 1983)
and i t s vari ance ) ( V i s gi ven i n sect i on 8.1.1.1.


Manual Arlequin ver 3.5 Met hodological out lines 120

8.1. 2.4 Mismat ch dist ribut ion
I t i s t he di st ri but i on of t he observed number of di fferences bet ween pai rs of hapl ot ypes.
Thi s di st ri but i on i s usuall y mul t i modal in sampl es drawn fr om popul at i ons at
demographi c equili bri um, as i t r efl ect s t he hi ghl y st ochast i c shape of gene t rees, but i t
i s usuall y uni modal i n popul at i ons havi ng passed t hrough a recent demographi c
expansi on ( Roger s and Harpendi ng, 1992; Hudson and Sl at ki n, 1991) or t hough a range
expansi on wi t h hi gh l evel s of mi grat i on bet ween nei ghbori ng demes ( Ray et al . 2003,
Excoffi er 2004) .
8.1.2.4. 1 Pur e demogr aphi c ex pan si on
I f one assumes t hat a st at i onary hapl oi d popul at i on at equil i bri um has suddenl y passed
generat i ons ago fr om a popul at i on size of
0
N t o
1
N , t hen t he pr obabi lit y of
observi ng S di ffer ences bet ween t wo randoml y chosen non- r ecombi ni ng hapl ot ypes i s
gi ven by
| | ) ( ) (
!
)
1
( ) ( ) , , (
1 0
0 1
1
1 1 0




F F
j
exp
F F
j S j S
j
S
j
S S

+
+ =

=

, ( Li , 1977)
wher e
1
) 1 (
) (
+
+
=
S
S
S
F


i s t he pr obabili t y of observi ng t wo random hapl ot ypes wi t h S
di fferences i n a st at i onary popul at i on ( Wat t erson, 1975) ,
0 0
2uN = ,
1 1
2uN = ,
ut 2 = , and u i s t he mut at i on rat e for t he whol e hapl ot ype.
Rogers ( 1995) has si mpli fi ed t he above equat i on, by assumi ng t hat
1
, i mpl yi ng
t here ar e no coal escent event s aft er t he expansi on, whi ch i s onl y reasonabl e i f t he
expansi on si ze i s l arge. Wi t h t hi s si mpli fyi ng assumpt i on, it i s possi bl e t o deri ve t he
moment est i mat or s of t he t i me t o t he expansi on ( ) and t he mut at i on paramet er
0
,
as
0
0

=
=
m
m v
, ( Rogers, 1995)
wher e m and v ar e t he mean and t he vari ance of t he observed mi smat ch di st ri but i on,
respect i vel y. These est i mat or s can t hen be used t o pl ot ) , , (
0

F
S
val ues. Not e,
however, t hat t hi s est i mat i on cannot be done i f t he vari ance of t he mi smat ch i s smal l er
t han t he mean.
However, Schnei der and Excoffi er ( 1999) fi nd t hat t hi s moment est i mat or oft en l eads t o
an underest i mat i on of t he age of t he expansi on ( ) . They rat her pr opose t o est i mat e t he

Manual Arlequin ver 3.5 Met hodological out lines 121

paramet er s of t he demographi c expansi on by a generali zed non- linear l east - square
approach. Thi s i s t he met hod we now use t o est i mat e t he paramet er s of t he
demographi c expansi on ,
0
, and
1
.
Approxi mat e confi dence i nt erval s for t hose paramet ers ar e obt ai ned by a paramet ri c
boot st rap appr oach. The pri nci pl e i s t he fol l owing: We comput ed appr oxi mat e
confi dence i nt erval s for t he est i mat ed paramet ers and

0 1
usi ng a paramet ri c
boot st rap appr oach ( Schnei der and Excoffi er, 1999) generat i ng percent il e confi dence
i nt erval s ( see e.g. Efron, 199, p. 53 and chap. 13) .
We generat e a l arge number ( B) of random sampl es accordi ng t o t he est i mat ed
demography, usi ng a coal escent al gori t hm modifi ed from Hudson ( 1990) .
For each of t he B si mul at ed dat a set s, we reest i mat e ,
0
, and
1
t o get B
boot st rapped val ues
* *
1
*
0
and , .
For a gi ven confi dence l evel , t he appr oxi mat e li mi t s of t he confi dence i nt erval
wer e obt ai ned as t he / 2 and 1- / 2 percent il e val ues ( Efr on, 1993, p. 168) .
I t i s i mport ant t o underl i ne t hat t hi s form of par amet ri c boot st rap assumes t hat t he
dat a are di st ri but ed accordi ng t he sudden expansi on model . I n Schnei der and Excoffi er
( 1999) , we showed by si mul at i on t hat onl y t he confi dence i nt erval ( CI ) for has a good
coverage ( i .e. t hat t he t rue val ue of t he paramet er i s i ncl uded i n a 100x( 1- ) % CI wi t h
a probabi li t y very cl ose t o 1- .) . The CI of t he ot her t wo paramet er s ar e overl y l arge
( t he t rue val ue of t he paramet er was al most al ways i ncl uded in t he CI ) , and t hus t oo
conser vat i ve.
The val i di t y of t he est i mat ed st epwi se expansi on model i s t est ed usi ng t he same
paramet ri c boot st rap approach as descri bed above. We used her e t he sum of squar e
devi at i ons ( SSD) bet ween t he obser ved and t he expect ed mi smat ch as a t est st at i st i c.
We obt ai ned i t s di st ri but i on under t he hypot hesi s t hat t he est i mat ed paramet er s are t he
t rue ones, by si mul at i ng B sampl es ar ound t he est i mat ed paramet ers. As before, we re-
est i mat ed each t i me new paramet ers
* *
1
*
0
and , , and comput ed t hei r associ at ed sums
of squar es SSD
sim
. The P- val ue of t he t est i s t her efor e approxi mat ed by
B
SSD SSD
P
obs sim
to equal or larger of number
= .
For conveni ence, we al so comput e t he r aggedness i ndex of t he obser ved di st ri but i on
defi ned by Harpendi ng ( 1994) as

Manual Arlequin ver 3.5 Met hodological out lines 122

+
=

=
1
1
2
1
) (
d
i
i i
x x r ,
wher e d i s t he maxi mum number of observed di fferences bet ween hapl ot ypes, and t he
x' s ar e t he observed r el at i ve fr equenci es of t he mi smat ch cl asses. Thi s index t akes
l arger val ues for mul t i modal di st ri but i ons commonl y found i n a st at i onary popul at i on
t han for uni modal and smoot her di st ri but i ons t ypi cal of expandi ng populat i ons. I t s
si gni fi cance i s t est ed si mil arl y t o t hat of SSD.
Refer ences:
Rogers and Harpendi ng ( 1991)
Rogers ( 1995)
Schnei der and Excoffi er ( 1999)
Excoffi er ( 2004)
8.1.2.4. 2 Spat i al ex pansi on
A popul at i on spat i al expansi on general l y occur s i f t he range of a popul at i on i s i nit i all y
rest ri ct ed t o a very small area, and t hen t he range of t he popul at i on incr eases over t i me
and over space. The r esult i ng popul at i on becomes general l y subdi vi ded i n t he sense t hat
i ndi vi dual s will t end t o mat e wi t h geogr aphi cally cl ose i ndi vi dual s rat her t han r emot e
i ndi vi dual s.
Based on si mul at i ons, Ray et al . ( 2003) have shown t hat a l arge spat i al expansi on can
l ead t o t he same si gnal i n t he mi smat ch di st ri but i on t han a pure demographi c expansi on
i n a panmi ct i c popul at i on, but onl y i f nei ghbori ng sub- popul at i ons ( demes) exchange
many mi grant s ( 50 or mor e) . The si mul at i ons performed i n Ray et al . ( 2003) wer e
performed i n a t wo- di mensi onal st eppi ng- st one model . T generat i ons ago, a hapl oi d
popul at i on rest ri ct ed t o a si ngl e deme of si ze N, began t o send mi grant s t o nei ghbori ng
demes at rat e m, progr essi vel y col oni zing t he whol e worl d. Duri ng t he expansi on, t he
si ze of each deme fol l owed a l ogi st i c regul at i on wi t h carryi ng capaci t y K, and i nt ri nsi c
rat e of gr owt h r . Duri ng t he whol e process nei ghbori ng demes cont i nue t o exchange a
fract i on m of mi grant s.
Whil e t hi s model i s di ffi cult t o descri be anal yt i call y, Excoffi er ( 2004) der i ved t he expect ed
mi smat ch di st ri but i on under a si mpl er model of spat i al expansi on. He assumed t hat one
has sampl ed genes fr om a si ngl e deme bel ongi ng t o a popul at i on subdi vi ded int o a
i nfi nit e number of demes, each of si ze N, whi ch woul d exchange a fract i on m of mi grant s
wi rh ot her demes. Thi s infi nit e- i sl and model i s act uall y equi val ent t o a cont i nent - i sl and
model , where t he sampl ed deme woul d exchange mi grant s at r at e m wi t h a uni que
popul at i on of i nfi ni t e si ze. Some T generat i ons i n t he past , t he cont i nent - i sl and syst em
woul d be r educed t o a si ngl e deme of si ze N
0
, like:


Manual Arlequin ver 3.5 Met hodological out lines 123

Cont i nent - i sl and model

Af t er t h e ex pansi on Bef or e t he
ex pan si on

Under t hi s si mpl e model , t he probabi li t y t hat t wo genes cur rent l y sampl ed i n t he smal l
deme of si ze N di ffer at S si t es i s gi ven by
0 1 1
0 0 1
1 1 1
0 0
( )
( ; , ; , )
( 1) ( 1) ( )! !
j S j S j j S S
S j S j
j
Me C C
F S M
A M S j j A


+ + +
=
| |
+
= + |
|
+ +
\ .

, Excoffi er ( 2004)
wher e
0
= 2N
0
u,
1
= 2N
1
u, = 2Tu, and A =
1
+ M + 1, and
1
/ A
C e

= .
I n Arl equin, we est i mat e t he t hree paramet ers of a spat i al expansi on, . =
0
=
1
( here
we assume t hat N= N
0
) , and M= 2Nm, usi ng t he same l east - square met hod as descri bed
i n t he case of t he est i mat i on of t he paramet er s of a demographi c expansi on ( see sect i on
8.1.2.4.1) . Li ke for t he demographi c expansi on, we al so pr ovi de t he expect ed mi smat ch
di st ri but i on and t est t he fi t t o t he model by coal escent si mul at i ons of an i nst ant aneous
expansi on under t he cont i nent - i sl and model defi ned above.
Refer ences:
Ray et al ( 2003)
Excoffi er ( 2004)

N N =
m
0
N
generations ago T
N N =
m
0
N
generations ago T

Manual Arlequin ver 3.5 Met hodological out lines 124

8.1. 2.5 Est imat ion of genet ic dist ances bet ween DNA sequences
Definit ions:
L:
Number of l oci
Gamma
corr ect i on:
Thi s cor rect i on i s proposed when t he mut at i on rat es cannot be
assumed as uni form for all si t es. I t had been origi nall y
proposed for mut at i on rat es among ami no aci ds ( Uzell and
Corbi n, 1971) , but i t seems al so t o be t he case of t he cont r ol
regi on of human mt DNA ( Wakel ey, 1993) . I n such a case, a
Gamma di st ri but i on of mut at i on rat es i s oft en assumed. The
shape of t hi s di st ri but i on ( t he unevenness of t he mut at i on
rat es) i s mai nl y cont rol led by a paramet er a, whi ch i s t he
i nverse of t he coeffi ci ent of vari at i on of t he mut at i on rat e.
The small er t he a coeffi ci ent , t he mor e uneven t he mut at i on
rat es. A uni form mut at ion rat e corr esponds t o t he case where
a i s equal t o i nfi nit y.
d
n
:
Number of observed subst i t ut i ons bet ween t wo DNA sequences
s
n
:
Number of observed t r ansi t i ons bet ween t wo DNA sequences
v
n
:
Number of observed t r ansversi ons bet ween t wo DNA
sequences


G+ C rat i o, comput ed on all t he DNA sequences of a gi ven
sampl e
8.1.2.5. 1 Pai r w i se di f f er ence
Out put s t he number of l oci for whi ch t wo hapl ot ypes ar e di fferent
d
n d =


L d L d d / )

( V =
8.1.2.5. 2 Per cent age di f f er ence
Out put s t he percent age of l oci for whi ch t wo hapl ot ypes ar e di ffer ent
L n d
d
/

=
L d d d / )

1 (

( V =


Manual Arlequin ver 3.5 Met hodological out lines 125

8.1.2.5. 3 Juk es and Cant or
Out put s a cor r ect ed per cent age of nucl eot i des for whi ch t wo hapl ot ypes ar e di ffer ent .
The corr ect i on all ows for mul t i pl e subst i t ut i ons per si t e si nce t he most r ecent common
ancest or of t he t wo DNA sequences. The cor rect i on al so assumes t hat t he rat e of
nucl eot i de subst i t ut i on is i dent i cal for al l 4 nucleot i des A, C, G and T.
L n p
d
/ =
)
3
4
1 log(
4
3

p d =
L p
p p
d
2
)
3
4
1 (
) 1 (
)

( V


=
Gamma corr ect i on:
| | 1 )
3
4
1 (
4
3

/ 1
=
a
p a d
| | L p p p d
a
/ )
3
4
1 ( ) 1 ( )

( V
) 1 / 1 ( 2 +
=
Refer ences:
Jukes and Cant or 1969
Jin and Nei 1990
Kumar et al . 1993
8.1.2.5. 4 Ki mu r a 2- par amet er s
Out put s a cor r ect ed per cent age of nucl eot i des for whi ch t wo hapl ot ypes ar e di ffer ent .
The corr ect i on al so al l ows for mul t i pl e subst i t ut i ons per si t e, but t akes i nt o account
di fferent subst i t ut i on rat es bet ween t ransi t i ons and t ransver si ons. The t ransi t i on-
t ransver si on rat i o i s est i mat ed from t he dat a.
L
n
Q
L
n
P
v s
= =

,


2
),

2 1 /( 1 ),

2 1 /( 1
2 1
3 2 1
c c
c Q c Q P c
+
= = =
)

2 1 log(
4
1
)

2 1 log(
2
1

Q Q P d =
L
Q c P c Q c P c
d
2
3 1
2
3
2
1
)

(

( V
+ +
=
Gamma corr ect i on:
2
, )

2 1 ( , )

2 1 (
2 1
3
) 1 / 1 (
2
) 1 / 1 (
1
c c
c Q c Q P c
a a
+
= = =
+ +


Manual Arlequin ver 3.5 Met hodological out lines 126

| |
2
3
)

2 1 (
2
1
)

2 1 (
2

/ 1 / 1
+ =
a a
Q Q P
a
d
L
Q c P c Q c P c
d
2
3 1
2
3
2
1
)

(

)

( V
+ +
=
Refer ences:
Ki mura ( 1980)
Jin and Nei ( 1990)
8.1.2.5. 5 Tamur a
Out put s a cor r ect ed per cent age of nucl eot i des for whi ch t wo hapl ot ypes ar e di ffer ent .
The corr ect i on i s an ext ensi on of Ki mura 2- paramet ers met hod, al l owi ng for unequal
nucl eot i de frequenci es. The t ransi t i on- t ransversi on rat i os, as wel l as t he overal l
nucl eot i de frequenci es are comput ed from t he ori gi nal dat a.
L
n
Q
L
n
P
v s
= =

,


2 2 1 3 2 1
) )( 1 ( 2 ,

2 1
1
,
) 1 ( 2

1
1
c c c c
Q
c
P
c + =

=


)

2 1 log( )) 1 ( 2 1 (
2
1
)

) 1 ( 2

1 log( ) 1 ( 2

Q Q
P
d

=


L
Q c P c Q c P c
d
2
3 1
2
3
2
1
)

(

)

( V
+ +
=
Refer ences:
Tamura, 1992,
Kumar et al . 1993
8.1.2.5. 6 Taj i ma and Nei
Out put s a cor r ect ed per cent age of nucl eot i des for whi ch t wo hapl ot ypes ar e di ffer ent .
The corr ect i on i s an ext ensi on of Jukes and Cant or met hod, all owi ng for unequal
nucl eot i de frequenci es. The overal l nucl eot i de fr equenci es ar e comput ed fr om t he dat a.
j i
ij
i j i i
i
d
g g
x
c
c
p
g b
L
n
p
2
,

1 (
2
1
,
2
4
1
3
1
2
4
1
2

+ = = =
= + = = ,
wher e t he g' s ar e t he four nucl eot i de fr equenci es, and
ij
x i s t he r el at i ve fr equency of
t he nucl eot i de pai r i and j .
)

1 log(

b
p
b d =

Manual Arlequin ver 3.5 Met hodological out lines 127

L
b
p
p p
d
2
)

1 (
) 1 (
)

( V


=
Refer ences:
Taj i ma and Nei , 1984,
Kumar et al . 1993
8.1.2.5. 7 Tamur a and Nei
Out put s a cor r ect ed per cent age of nucl eot i des for whi ch t wo hapl ot ypes ar e di ffer ent .
Li ke Ki mura 2- paramet ers, and Taj i ma and Nei di st ances, t he cor rect i on all ows for
di fferent t ransver si on and t ransi t i on rat es, but a di st i nct i on i s al so made bet ween
t ransi t i on rat es bet ween puri nes and bet ween pyri mi dines.
Q g g P g g g g
g g g
c
g
g g
c
g
g g
c
G A R R G A
R G A
Y
T C
R
G A

2
2
,
2
,
2
1
2 3 2 1

= = =

Q g g P g g g g
g g g
c
C T Y Y C T
Y C T

2
2
2
2 4

=

=
5
c
)

2 (
2
1
2
2 2
Q g g P g g g g g
g g
G A R R G A R
G A



+
)

2 (
2
2
2
2 2
Q g g P g g g g g
g g
C T Y Y C T Y
C T



+
Q g g g g
g g g g g g
Y R Y R
G A Y C T R

+ + +
2 2
2 2 2 2 2 2
2
) ( ) (


d
s
s s
n
n
Q T C n P G A n P = = =

), (

), (

2 1


= d

)
2

1 log( )
2

1 log(
2
2
2
1
1
1
Y R
g
Q
c
P
c
g
Q
c
P
c

Manual Arlequin ver 3.5 Met hodological out lines 128


)
2
1 log( ) ( 2
2 1
Y R
R Y Y R
g g
Q
g c g c g g

L
Q c P c P c Q c P c P c
d
2
5 2 4 1 3
2
5 2
2
4 1
2
3
)

(

)

( V
+ + + +
=
Gamma corr ect i on:
= d

|
a
Y
a
R
g
Q
c
P
c
g
Q
c
P
c a
/ 1
2
2
2
/ 1
1
1
1
)
2

1 ( )
2

1 ( 2

+
|
Y R C T G A
a
Y R
R Y
Y R
g g g g g g
g g
Q
c
g
c
g
g g 2 2 2 )
2

1 )( (
/ 1
2 1
+



L
Q c P c P c Q c P c P c
d
2
5 2 4 1 3
2
5 2
2
4 1
2
3
)

(

)

( V
+ + + +
=
Refer ences:
Tamura and Nei , 1994,
Kumar et al . 1993
8.1. 2.6 Est imat ion of genet ic dist ances bet ween RFLP haplot ypes
8.1.2.6. 1 Number of pai r w i se di f f er ence
We si mpl y count t he number of di ffer ent al l el es bet ween t wo RFLP hapl ot ypes.

=
=
L
i
xy xy
i d
1
) (


wher e ) (i
xy
i s t he Kronecker funct i on, equal t o 1 i f t he all el es of t he i- t h l ocus ar e
i dent i cal for bot h hapl ot ypes, and equal t o 0 ot herwi se.
When est i mat i ng genet ic st ruct ure i ndi ces, t hi s choi ce amount s at est i mat i ng wei ght ed
F
ST
st at i st i cs over al l l oci ( Wei r and Cockerham, 1984; Mi chal aki s and Excoffi er, 1996) .
8.1.2.6. 2 Pr opor t i on of di f f er en ce
We si mpl y count t he pr oport i on of l oci t hat are di fferent bet ween t wo RFLP hapl ot ypes.

=
=
L
i
xy xy
i
L
d
1
) (
1


wher e ) (i
xy
i s t he Kronecker funct i on, equal t o 1 i f t he all el es of t he i- t h l ocus ar e
i dent i cal for bot h hapl ot ypes, and equal t o 0 ot herwi se.

Manual Arlequin ver 3.5 Met hodological out lines 129

When est i mat i ng genet ic st ruct ure i ndi ces, t hi s choi ce wi ll l ead t o exact l y t he same
resul t s as t he number of pai rwi se di ffer ences.
8.1. 2.7 Est imat ion of dist ances bet ween Microsat ellit e haplot ypes
8.1.2.7. 1 No. of di f f er ent al l el es
We si mpl y count t he number of di ffer ent al l el es bet ween t wo hapl ot ypes.

=
=
L
i
xy xy
i d
1
) (


wher e ) (i
xy
i s t he Kronecker funct i on, equal t o 1 i f t he all el es of t he i- t h l ocus ar e
i dent i cal for bot h hapl ot ypes, and equal t o 0 ot herwi se.
When est i mat i ng genet ic st ruct ure i ndi ces, t hi s choi ce amount s at est i mat i ng wei ght ed
F
ST
st at i st i cs over al l l oci ( Wei r and Cockerham, 1984; Mi chal aki s and Excoff i er, 1996) .
8.1.2.7. 2 Sum of squ ar ed si ze di f f er ence
Count s t he sum of t he squared number of r epeat di fference bet ween t wo hapl ot ypes
( Sl at ki n, 1995) .
2
1
) (

yi
L
i
xi xy
a a d

=
= ,
wher e
xi
a i s t he number of repeat s of t he mi cr osat elli t e for t he i- t h l ocus.
When est i mat i ng genet ic st ruct ure i ndi ces, t hi s choi ce amount s at est i mat i ng an anal og
of Sl at ki n' s R
ST
( 1995) ( see Mi chal aki s and Excoffi er, 1996, as wel l as Rousset , 1996 ,
for det ai l s on t he r el at i onshi p bet ween F
ST
and R
ST
) .
8.1. 2.8 Est imat ion of dist ances bet ween St andard haplot ypes
8.1.2.8. 1 Number of pai r w i se di f f er ences
Si mpl y count s t he number of di ffer ent al l el es bet ween t wo hapl ot ypes.

=
=
L
i
xy xy
i d
1
) (


wher e ) (i
xy
i s t he Kronecker funct i on, equal t o 1 i f t he all el es of t he i- t h l ocus ar e
i dent i cal for bot h hapl ot ypes, and equal t o 0 ot herwi se.
When est i mat i ng genet ic st ruct ure i ndi ces, t hi s choi ce amount s at est i mat i ng wei ght ed
F
ST
st at i st i cs over al l l oci ( Wei r and Cockerham, 1984; Mi chal aki s and Excoffi er, 1996) .
8.1. 2.9 Minimum Spanning Net work among haplot ypes
We have i mpl ement ed t he comput at i on of a Mi ni mum Spanni ng Tree ( MST) ( Kruskal ,
1956; Pri m, 1957) bet ween OTUs ( Operat i onal Taxonomi c Uni t s) . The MST i s comput ed

Manual Arlequin ver 3.5 Met hodological out lines 130

from t he mat ri x of pai rwi se di st ances cal cul at ed bet ween al l pai rs of hapl ot ypes usi ng a
modi fi cat i on of t he al gori t hm descri bed i n Rohlf ( 1973) . The Mi ni mum Spanni ng Net wor k
embeddi ng all MSTs ( see Excoffi er and Smouse 1994) i s al so pr ovi ded. Thi s
i mpl ement at i on i s t he t r ansl at i on of a st andal one pr ogram wri t t en i n Pascal call ed
MI NSPNET.EXE runni ng under DOS, formerl y avail abl e on
ht t p: / / ant hropol ogi e.unige.ch/ LGB/ soft ware/ wi n/ mi n- span- net / .
8.1.3 Hapl ot y pe i nf er ence
8.1. 3.1 Haplot ypic dat a or Genot ypic dat a wit h known Gamet ic phase
I f hapl ot ype i i s observed
i
x t i mes i n a sampl e cont aini ng n gene copi es, t hen i t s
est i mat ed fr equency (
i
p ) is gi ven by

n
x
p
i
i
= ,
wher eas an unbi ased est i mat e of i t s sampli ng vari ance i s gi ven by
1
) 1 (
) ( V

=
n
p p
p
i i
i
.
8.1. 3.2 Genot ypic dat a wit h unknown Gamet ic phase
8.1.3.2. 1 EM al gor i t hm
Maxi mum- li kelihood hapl ot ype fr equenci es can be est i mat ed usi ng an Expect at i on-
Maxi mi zat i on ( EM) al gori t hm ( see e.g. Dempst er et al . 1977; Excoffi er and Sl at ki n,
1995; Lange, 1997; Weir, 1996) . Thi s procedur e i s an i t erat i ve pr ocess ai mi ng at
obt ai ni ng maxi mum- li kelihood est i mat es of hapl ot ype fr equenci es from mul t i- l ocus
genot ype dat a when t he gamet i c phase i s unknown ( phenot ypi c dat a) . I n t hi s case, a
si mpl e gene count i ng i s not possi bl e because several genot ypes are possi bl e for
i ndi vi dual s het er ozygot e at more t han one l ocus. Ther efor e, a sl i ght l y mor e el aborat e
procedur e i s needed.
The l i kelihood of t he sampl e ( t he probabi li t y of t he observed dat a D, given t he
hapl ot ype fr equenci es - p ) i s gi ven by

= =
=
i
g
j
ij
n
i
G L
1 1
) | ( p D ,
wher e t he sum i s over all n i ndi vi dual s of t he sampl e, and t he pr oduct i s over al l
possi bl e genot ypes of t hose i ndi vidual s, and j i p G j i p p G
i ij j i ij
= = = if , or if , 2
2
.
The pri nci pl e of t he EM al gori t hm i s t he fol l owi ng:

Manual Arlequin ver 3.5 Met hodological out lines 131

1) St art wi t h arbi t rary ( random) est i mat es of hapl ot ype fr equenci es.
2) Use t hese est i mat es t o comput e expect ed genot ype fr equenci es for each
phenot ype, assumi ng Hardy- Wei nberg equi li brium ( The E- st ep) .
3) The rel at i ve genot ype frequenci es are used as wei ght s for t hei r t wo const i t ut i ng
hapl ot ypes i n a gene count i ng procedure l eadi ng t o new est i mat es of hapl ot ype
fr equenci es ( The M- st ep) .
4) Repeat st eps 2- 3, unt il t he hapl ot ype fr equenci es r each equili bri um ( do not
change mor e t han a pr edefi ned epsi l on val ue) .
Dempst er et al ( 1977) have shown t hat t he li keli hood of t he sampl e coul d onl y grow
aft er each st ep of t he EM al gori t hm. However , t her e i s no guarant ee t hat t he r esul t ing
hapl ot ype fr equenci es are maxi mum li kelihood est i mat es. They can be j ust l ocal opt i mal
val ues. I n fact , t here i s no obvi ous way t o be sure t hat t he r esul t ing frequenci es ar e
t hose t hat gl oball y maximi ze t he li kelihood of t he dat a. Thi s woul d need a compl et e
eval uat i on of t he li keli hood for al l possi bl e genot ype confi gurat i ons of t he sampl e. I n
order t o check t hat t he fi nal frequenci es ar e put at i ve maxi mum li keli hood est i mat es,
one has general l y t o r epeat t he EM al gori t hm from many di ffer ent st art i ng poi nt s ( many
di fferent i nit i al hapl ot ype frequenci es) . Sever al runs may gi ve di ffer ent fi nal
fr equenci es, suggest i ng t he pr esence of several " peaks" i n t he li keli hood surface, but
one has t o choose t he sol ut i on t hat has t he l argest l i kelihood. I t may al so ari se t hat
several di st inct peaks have t he same l i keli hood, meani ng t hat di ffer ent hapl ot ypi c
composi t i ons expl ai n equall y well t he observed dat a. At t hi s poi nt , t here i s no way t o
choose among t he al t er nat i ve sol ut i ons fr om a l i kelihood poi nt of vi ew. Some ext ernal
i nformat i on shoul d be provi ded t o make a deci si on.
St andard devi at i ons of t he hapl ot ype frequenci es ar e est i mat ed by a paramet ri c
boot st rap procedur e ( see e.g. Ri ce, 1995) , generat i ng random sampl es from a
popul at i on assumed t o have hapl ot ype fr equenci es equal t o t hei r maxi mum- li keli hood
val ues. For each boot st r ap repl i cat e, we appl y t he EM al gori t hm t o get new maxi mum-
li kelihood hapl ot ype frequenci es. The st andard devi at i on of each hapl ot ype fr equency i s
t hen est i mat ed from t he r esul t i ng di st ri but i on of hapl ot ype fr equenci es. Not e however
t hat t hi s procedur e i s quit e comput er i nt ensi ve.
Refer ence:
Excoffi er and Sl at ki n ( 1995)
8.1.3.2. 2 EM zi pper al gor i t hm
The EM zi pper i s a si mple ext ensi on of t he EM al gori t hm, ai mi ng at speedi ng up t he
est i mat i on process and all owi ng t he handli ng of a much l arger number of het erozygous
si t es per i ndi vidual . The EM al gori t hm becomes i ndeed ext r emel y sl ow when t her e ar e

Manual Arlequin ver 3.5 Met hodological out lines 132

mor e t han 20 het erozygous si t es per i ndi vidual , and i t i s t herefore not suit ed for t he
anal ysi s of l ong st r et ches of DNA wi t h hundreds of pol ymorphi c si t es.
The EM zi pper t her efor e begi ns by est i mat i ng frequenci es of t wo- l ocus hapl ot ypes, and
t hen adds anot her l ocus, t o est i mat e 3- l ocus hapl ot ype fr equenci es, and t hen adds
anot her l ocus t o get 4- l ocus hapl ot ype frequenci es, and so on unt il all loci have been
added. At each st age, any n- l ocus genot ype whi ch i ncorporat es a n- l ocus hapl ot ype wi t h
est i mat ed fr equency equal t o zero i s prevent ed fr om bei ng ext ended t o n+ 1 l oci , because
i t i s li kel y t hat t he fr equency of an ext ended ( n+ 1) - l ocus hapl ot ype woul d have al so been
equal t o zer o. Wi t h t hi s met hod, Arl equi n does not need t o bui l d all possi bl e genot ypes
for each i ndi vi dual , but it onl y consi der s t he genot ypes whose sub- hapl ot ypes have non-
null frequenci es, and one can t hus handl e a much l arger number of pol ymorphi c si t es
t han t he convent i onal EM al gori t hm.
I n Arl equin' s t ab di al og ( see sect i on 6.3.8.4.2.2) , one can speci fy i f t he l oci shoul d be
added i n random order or not , and how many r andom orders t o i mpl ement . Aft er
mul t ipl e t ri al s, Arl equi n out put s t he l ocus order havi ng l ed t o t he l argest li kelihood.
Thi s ver si on of t he EM al gori t hm i s equi val ent t o t hat i mpl ement ed i n t he SNPHAP
program ( ht t p: / / www- gene. ci mr.cam.ac.uk/ cl ayt on/ soft war e/ snphap.t xt ) by Davi d
Cl ayt on.
8.1.3.2. 3 ELB al gor i t hm
Cont rary t o t he EM al gori t hm whi ch ai ms at est imat i ng hapl ot ype frequenci es, t he ELB
al gori t hm at t empt s at r econst ruct i ng t he ( unknown) gamet i c phase of mul t i- l ocus
genot ypes. Phase updat es ar e made on t he basi s of a wi ndow of nei ghbouri ng l oci , and
t he wi ndow si ze vari es accordi ng t o t he l ocal l evel of l i nkage di sequili bri um.
Suppose t hat we have a sampl e of n i ndi vi dual s drawn from some popul at i on and
genot yped at S l oci whose chr omosomal order i s assumed known. Adj acent pai rs of l oci
are assumed t o be t i ght l y li nked, but S may be l arge so t hat t he t wo ext ernal l oci are
effect i vel y unlinked. I n t hi s case, r econst ruct i ng t he gamet i c phase i n one st ep can be
i neffi ci ent , because recombi nat i on may have creat ed t oo many di st i nct hapl ot ypes for
t hei r frequenci es t o be wel l est i mat ed. Locall y, however, r ecombi nat i on may be rare and
t o expl oi t t hi s sit uat i on t he updat es i n ELB of t he phase at a het er ozygous l ocus ar e
based on wi ndows of nei ghbori ng l oci . The algori t hm adj ust s t he wi ndow si zes and
l ocat i ons in order t o maxi mi ze t he i nformat i on for t he phase updat es.
ELB st art s wi t h an arbi t rary phase assi gnment for al l i ndi vi dual s i n t he sampl e.
Associ at ed wi t h each het er ozygous l ocus i s a wi ndow cont ai ni ng t he l ocus i t sel f and
nei ghbori ng l oci
At each i t erat i on of t he al gori t hm, an i ndi vi dual i s chosen at random and i t s het er ozygous
l oci are successi vel y vi sit ed i n random order. At each l ocus vi si t , t wo at t empt s are t hen
made t o updat e t hat wi ndow, by pr oposi ng, and t hen accept i ng or rej ect i ng, ( i ) t he

Manual Arlequin ver 3.5 Met hodological out lines 133

addi t i on of a l ocus at one end of t he wi ndow, and ( ii ) t he removal of a l ocus at t he ot her
end. The l ocus bei ng vi si t ed i s never r emoved fr om t he wi ndow, and each wi ndow al ways
i ncl udes at l east one ot her het erozygous l ocus. The t wo updat e pr oposal s are made
sequent i all y so t hat t he wi ndow can ei t her gr ow by one l ocus, shri nk by one l ocus, or , i f
bot h changes ar e accept ed, t he wi ndow sl i des by one l ocus ei t her t o t he ri ght or t he
l eft . I f bot h proposal s ar e rej ect ed, t he wi ndow remai ns unchanged. Next , t he phase at
t he l ocus bei ng vi si t ed is updat ed based on t he curr ent hapl ot ype pai rs, wi t hi n t he
chosen wi ndow, of t he ot her i ndi vi dual s i n t he sampl e.
8.1.3.2.3.1 Phase updat es
Let h
11
and h
22
denot e t he t wo hapl ot ypes wi t hin t he wi ndow gi ven t he curr ent phase
assi gnment , and l et h
12
and h
21
denot e t he hapl ot ypes whi ch woul d result from t he
al t ernat i ve phase assi gnment at t he l ocus bei ng vi si t ed. I deall y, we woul d wi sh t o
choose bet ween t he t wo hapl ot ype assi gnment s, h
11
/ h
22
and h
12
/ h
21
, wi t h probabi lit i es
proport i onal t o t hei r ( j oint ) popul at i on frequenci es. These are unknown, and i n pract i ce
t hey ar e t oo smal l for di rect est i mat i on t o be feasi bl e. To overcome t he l at t er probl em
we assume HWE, so t hat we now seek t o choose bet ween h
11
/ h
22
and h
12
/ h
21
wi t h
probabi lit i es pr oport i onal t o p
11
p
22
and p
12
p
21
, wher e p
ij
, i,j = 1,2, denot es t he popul at i on
fr equency of h
ij
. Al t hough t he p
ij
are al so unknown, we can est i mat e t hem usi ng t he n
ij
,
t he hapl ot ype count s among t he ot her n- 1 i ndivi dual s i n t he sampl e, gi ven t hei r curr ent
phase assi gnment s wi t hi n t he wi ndow.
Adopt i ng a Bayesi an post eri or mean est i mat e of p
ij
p
ij
, based on a symmet ri c
Di ri chl et pri or di st ri but i on for t he p
ij
wi t h paramet er > 0, and hence we propose
{ }
( )
11 22
11 22
11 22 12 21
( )( )
Pr /
( )( ) ( )( )
ij
n n
h h n
n n n n


+ +
=
+ + + + +
. ( 1)
Larger val ues of i mply a gr eat er chance of choosi ng a hapl ot ype pai r t hat includes an
unobserved hapl ot ype. A smal l val ues of = 0.01 has been show t o perform wel l by
si mul at i on i n most ci rcumst ances.

8.1.3.2.3.2 Recombinat ion updat e
Switch phase update
ACCTTGCCT
GCTACCTAG
ACCTCGCCT
GCTATCTAG
Current phase in selected window
Switch phase update
ACCTTGCCT
GCTACCTAG
ACCTCGCCT
GCTATCTAG
Current phase in selected window

Manual Arlequin ver 3.5 Met hodological out lines 134

I nst ead of per formi ng a swi t ch updat e as befor e, we can al so updat e t he phase usi ng a
recombi nat i on updat e, like:

I n t hat case, we choose t o change t he phase of all si t es ei t her l ocat ed on t he ri ght or on
t he l eft of t he focal si t e. The proport i on of updat es bei ng recombi nat i on st eps can be set
up i n ELB t ab di al og as shown i n sect i on 6.3.8. 4.2.1. A smal l val ue i s in order ( l ess t han
5%) si nce i t i mpli es a l arge change whi ch may oft en be rej ect ed, and cause t he chai n not
t o mi x properl y. The rat i onal e for t hi s ki nd updat e ( i ni t i all y not descri bed i n Excoffi er et al
( 2003) i s t o mor e l argel y expl ore t he set of possi bl e gamet i c phase by provoki ng a
radi cal change from t i me t o t i me.
8.1.3.2.3.3 Handling mut at ions
I ncreasi ng t hus all ows more fl exi bili t y t o choose new hapl ot ypes, but t hi s i s a noi sy
sol ut i on: all unobser ved hapl ot ypes are t r eat ed t he same. However , a r ecent mut at i on
event can cr eat e hapl ot ypes t hat ar e rar e, but si mil ar t o a mor e common hapl ot ype,
wher eas hapl ot ypes t hat are ver y di ssi mil ar t o all obser ved hapl ot ypes are hi ghl y
i mpl ausi bl e. Thi s phenomenon i s part i cul arl y pr eval ent for STR l oci , wi t h t hei r rel at i vel y
hi gh mut at i on rat es.
To encapsul at e t he effect of mut at i on, when maki ng a phase assi gnment we gi ve
addi t i onal wei ght t o an unobserved hapl ot ype for each observed hapl ot ype t hat i s cl ose
t o i t . Her e, we defi ne cl ose t o mean di ffers at one l ocus , and i n t he phase updat e we
choose h
11
/ h
22
rat her t han h
12
/ h
21
wi t h probabilit y
{ }
( ) 11 22 _1
Pr / ,
ij ij
h h n n =
11 11_1 22 22_1
11 11_1 22 22_1 12 12_1 21 21_1
( )( )
( )( ) ( )( )
n n n n
n n n n n n n n


+ + + +
+ + + + + + + + +
, ( 2)
wher e
_1 ij
n i s t he sampl e count of hapl ot ypes t hat are cl ose t o h
ij
wi t hin t he cur rent
wi ndow. Si nce i s a paramet er r efl ect i ng t he effect of mut at i on, i t shoul d for exampl e
be l arger for STR t han for SNP or DNA dat a. By si mul at i on we have found t hat a val ue of
= 0.1 gave good r esul t s for STR ( mi crosat el li t e) dat a, and a val ue of = 0.01 for ot her
dat a t ypes worked wel l .
ACCTTCTAG
GCTACGCCT
Right recombination phase update
ACCTCGCCT
GCTATCTAG
Current phase in selected window
ACCTTCTAG
GCTACGCCT
Right recombination phase update
ACCTCGCCT
GCTATCTAG
Current phase in selected window

Manual Arlequin ver 3.5 Met hodological out lines 135

8.1.3.2.3.4 Sliding window size updat es
The val ue of max{ , 1/ } R r r = , where r = p
11
p
22
/ p
12
p
21
, gi ves a measur e of li nkage
di sequili bri um ( LD) wi t hi n t he wi ndow. Broadl y speaki ng, at each choi ce bet ween t wo
wi ndows, we woul d general l y prefer t he wi ndow t hat gi ves t he l argest val ue t o R. Based
on ( 2) , a nat ural est i mat e of r i s
11 11_1 22 22_1 12 12_1 21 21_1
( )( ) ( )( ) n n n n n n n n + + + + + + + +

,
but t hi s est i mat e l eads t o di ffi cul t i es, si nce l arger wi ndows t end t o have small er count s
and hence more ext r eme est i mat es, amount i ng t o a bi as t owards l arger wi ndows. Thi s
bi as coul d be count eract ed by i ncr easi ng but we prefer t o adj ust t o opt i mi ze t he
phase updat es pr obabi lit y ( 2) . I nst ead, we add a const ant ( ) t o bot h numerat or and
denomi nat or l eadi ng t o:
11 11_1 22 22_1
12 12_1 21 21_1
( )( )

( )( )
n n n n
r
n n n n


+ + + + +
=
+ + + + +
( 3)
Thus, at each at t empt t o updat e t he l engt h of a wi ndow i n st ep 3) above, we choose
bet ween wi ndows accor di ng t o t hei r
1

max ,

R r
r

=
`
)
val ues: wi ndow 2 repl aces wi ndow 1
wi t h probabilit y
2
1 2


R
R R
=
+
. ( 4)
Even a l arge val ue for can fai l t o prevent a wi ndow from gr owi ng t oo l arge when t wo
consecut i ve het er ozygous l oci i n an i ndi vi dual are separat ed by many homozygous l oci .
The wi ndow must t hen be l arge i n order t o cont ai n t he necessar y mi nimum of t wo
het er ozygous l oci . To ci rcumvent t he probl em of smal l hapl ot ype count s whi ch may t hen
resul t , when updat ing an i ndi vi dual s phase al l ocat i on, we can i gnore homozygous l oci
t hat are separat ed from t he nearest het er ozygous l ocus by more t han an gi ven number
of i nt erveni ng homozygous l oci . Thi s i s t he paramet er call ed " Het erozygous sit e influence
zone" t o be chosen i n ELB t ab di al og i n sect i on 6.3.8.4.2.1.
8.1.3.2.3.5 Handling missing dat a
I n handli ng mi ssi ng dat a, t he phi l osophy under pi nni ng ELB i s t o i gnore t he affect ed l oci
rat her t han t o i mput e mi ssing dat a or t o augment t he space of possi bl e genot ypes. I n
t he pr esence of mi ssi ng dat a, t he hapl ot ype count s n
ij
and
_1 ij
n are not necessaril y
i nt eger s: i ndi vi dual s wi t h mi ssing dat a at m l oci wi t hi n a current wi ndow of l engt h L
cont ri but e 1- m/ L t o n
ij
( or
_1 ij
n ) for each hapl ot ype at whi ch t he remai ni ng L- m l oci
mat ch h
ij
exact l y ( or wi t h one mi smat ch) .

Manual Arlequin ver 3.5 Met hodological out lines 136

Refer ence:
Excoffi er et al . ( 2003)
8.1.4 Li nk age di sequi l i br i um bet w een pai r s of l oci
Dependi ng on whet her t he hapl ot ypi c composi t ion of t he sampl e i s known or not , we
have i mpl ement ed t wo di fferent ways t o t est for t he pr esence of pai rwi se li nkage
di sequili bri um bet ween loci .
We descri be i n det ail bel ow how t he t wo t est s are done.
8.1. 4.1 Exact t est of linkage disequilibrium ( haplot ypic dat a)
Thi s t est i s an ext ensi on of Fi sher exact pr obabili t y t est on cont i ngency t abl es ( Slat kin,
1994a) . A cont i ngency t abl e i s fi rst buil t . The k
1
xk
2
ent ri es of t he t abl e are t he
observed hapl ot ype fr equenci es ( absol ut e val ues) , wi t h k
1
and k
2
bei ng t he number of
all el es at l ocus 1 and 2, r espect i vel y. The t est consi st s i n obt ai ning t he probabi lit y of
fi nding a t abl e wi t h t he same margi nal t ot al s and whi ch has a probabi li t y equal or l ess
t han t he obser ved t abl e. Under t he nul l- hypot hesi s of no associ at i on bet ween t he t wo
t est ed l oci , t he pr obabilit y of t he obser ved t abl e i s
*
*
0 * *
,
!
( / ) ( / )
!
j
i
i j
ij
i j
i j
n
n
n
L n n n n
n
=

,
wher e t he n
ij
' s denot e t he count of t he hapl ot ypes t hat have t he i- t h al l el e at t he fi rst
l ocus and t he j - t h al l el e at t he second l ocus, n
i*
i s t he overal l frequency of t he i- t h
all el e at t he fi rst l ocus ( i= 1,. .. k
1
) and n
* i
i s t he count of t he

i- t h al l el e at t he second
l ocus ( i= 1,... k
2
) .
I nst ead of enumerat i ng all possi bl e cont i ngency t abl es, a Markov chai n is used t o
effi ci ent l y expl or e t he space of all possi bl e t ables. Thi s Mar kov chai n consi st s i n a random
wal k in t he space of al l cont i ngency t abl es. I t i s done i s such a way t hat t he pr obabili t y t o
vi si t a part i cul ar t abl e cor responds t o i t s act ual probabi lit y under t he null hypot hesi s of
linkage equi li bri um. A part i cul ar t abl e i s modi fi ed accordi ng t o t he fol l owi ng rul es ( see
al so Guo and Thompson, 1992; or Raymond and Rousset , 1995) :
1) We sel ect i n t he t able t wo di st i nct li nes i
1
, i
2
and t wo di st i nct col umns j
1
, j
2
at
random.
2) The new t abl e i s obt ai ned by decreasi ng t he count s of t he cel l s ( i
1
, j
1
) ( i
2
, j
2
) and
i ncreasi ng t he count s of t he cel l s ( i
1
, j
2
) ( i
2
, j
1
) by one uni t . Thi s l eaves t he
margi nal all el e count s n
i
unchanged.
3) The swi t ch t o t he new t abl e i s accept ed wi t h a pr obabili t y equal t o

Manual Arlequin ver 3.5 Met hodological out lines 137

2 2 1 1
1 2 2 1
, ,
, ,
0
1
) 1 )( 1 (
j i j i
j i j i
n n
n n
L
L
R
+ +
= =
,
wher e R i s j ust t he r at io of t he pr obabili t i es of t he t wo t abl es.
The st eps 1- 3 ar e done a l arge number of t i mes t o expl or e a l arge amount of t he space
of al l possi bl e cont i ngency t abl es havi ng i dent i cal marginal count s. I n order t o st art from
a random i ni t i al posi t i on i n t he Markov chai n, t he chai n i s expl ored for a pre- defi ned
number of st eps ( t he dememori zat i on phase) befor e t he pr obabi lit i es of t he swi t ched
t abl es are compar ed t o t hat of t he i ni t i al t abl e. The number of dememori zat i on st eps
shoul d be enough ( some t housands) such as t o all ow t he Markov chai n t o "forget " i t s
i nit i al st at e, and make i t i ndependent fr om i t s st art i ng point . The P- value of t he t est i s
t hen t aken as t he pr oport i on of t he vi si t ed t abl es havi ng a probabi lit y small er or equal t o
t he observed cont i ngency t abl e.
A st andard er ror on P i s est i mat ed by subdi viding t he t ot al amount of r equi red st eps i nt o
B bat ches ( see Guo and Thompson, 1992, p. 367) . A P- val ue i s cal cul at ed separ at el y for
each bat ch. Let us denot e i t by P
i
( i= 1,. .., B) . The est i mat ed st andard err or i s t hen
cal cul at ed as
.
) 1 (
) (
) .( .
1
2

=

=
B B
P P
P d s
B
i
i

The process i s st opped as soon as t he est i mat ed st andard devi at i on i s small er t han a pr e-
defi ned val ue speci fi ed by t he user .
Refer ence:
Raymond and Rousset ( 1995)
8.1. 4.2 Likelihood rat io t est of linkage disequilibrium ( genot ypic dat a, gamet ic
phase unknown)
For genot ypi c dat a wher e t he hapl ot ypi c phase i s unknown, t he t est based on t he
Markov chai n descri bed above i s not possi bl e because t he hapl ot ypi c composi t i on of t he
sampl e i s unknown, and i s j ust est i mat ed. Ther efor e, l i nkage di sequili bri um bet ween a
pai r of l oci i s t est ed for genot ypi c dat a usi ng a li kelihood- rat i o t est , whose empi ri cal
di st ri but i on i s obt ai ned by a permut at i on procedure ( Sl at ki n and Excoffi er, 1996) . The
li kelihood of t he dat a assumi ng li nkage equili brium (
* H
L ) i s comput ed by usi ng t he fact
t hat , under t hi s hypot hesi s, t he hapl ot ype fr equenci es are obt ai ned as t he pr oduct of
t he all el e frequenci es. The li keli hood of t he dat a not assumi ng li nkage equilibrium (
H
L )
i s obt ai ned by appl yi ng t he EM al gori t hm t o est i mat e hapl ot ype fr equenci es. The
li kelihood- rat i o st at i st i c gi ven by

Manual Arlequin ver 3.5 Met hodological out lines 138

) log( 2
*
H
H
L
L
S =
shoul d i n pri nci pl e foll ow a Chi - square di st ri but i on, wi t h ( k
1
- 1) ( k
2
- 1) degr ees of
fr eedom, but i t i s not always t he case i n small sampl es wi t h l arge number of all el es per
l ocus. I n order t o bet t er approxi mat e t he underlyi ng di st ri but i on of t he l i kelihood- rat i o
st at i st i c under t he nul l hypot hesi s of l inkage equilibrium, we use t he fol l owing
permut at i on procedure:
1) Permut e t he all el es bet ween i ndi vi dual s at one l ocus onl y.
2) Re- est i mat e t he li kelihood of t he dat a '
H
L by t he EM al gori t hm. Not e t hat
* H
L i s
unaffect ed by t he permut at i on procedur e.
3) Repeat st eps 1- 2 a l arge number of t i mes t o get t he null di st ri but i on of
H
L , and
t herefor e t he nul l di st ri but i on of S.
Not e t hat t hi s t est of l inkage di sequi librium assumes Hardy- Wei nberg proport i ons of
genot ypes, and t he rej ect i on of t he t est coul d be al so due t o depart ure from Hardy-
Wei nberg equili bri um ( see Excoffi er and Sl at kin, 1998)
Refer ence:
Excoffi er and Sl at ki n ( 1998)

Manual Arlequin ver 3.5 Met hodological out lines 139

8.1. 4.3 Measures of gamet ic disequilibrium ( haplot ypic dat a)
D, D , and r
2
coef f i ci ent s:
Not e t hat t hese coeffi ci ent s ar e comput ed bet ween all pai rs of al l el es at di fferent
l oci , and t hat t hei r comput at i on assumes t hat t he gamet i c phase bet ween all el es at
di fferent l oci i s known .
1) D: The cl assi cal li nkage di sequilibri um coeffi ci ent measuri ng devi at i on from
random associ at i on bet ween all el es at di ffer ent l oci ( Lewont i n and Koj ima,
1960) i s expr essed as
p
j
p
i
p
ij
D
ij
= ,
wher e p
ij
i s t he fr equency of t he hapl ot ype havi ng all el e i at t he fi rst l ocus
and all el e j at t he second l ocus, and p
i
and p
j
ar e t he frequenci es of
all el es i and j , r espect i vel y.
2)
ij
D' : The l i nkage di sequili bri um coeffi ci ent
ij
D st andar di zed by t he
maxi mum value i t can t ake (
max , ij
D ) , gi ven t he al l el e frequenci es ( Lewont i n
1964) , as

max ,
'
ij
ij
ij
D
D
D = ,
wher e
max , ij
D t akes one of t he fol l owing val ues:
0 if ) ) 1 )( 1 ( , ( min <
ij j i j i
D p p p p
0 if ) ) 1 ( , ) 1 ( ( min >
ij j i j i
D p p p p
3)
2
r : Anot her convent i onal measur e of li nkage di sequilibrium bet ween pai rs of
all el es at t wo l oci i s t he square of t he corr el at i on coeffi ci ent bet ween al lel e
fr equenci es, whi ch can be expr essed as a funct i on of t he l i nkage
di sequili bri um measur e D as

2
2
(1 ) (1 )
i i j j
D
r
p p p p
=

.
8.1.5 Har dy - Wei nber g equi l i br i um.
To det ect si gni fi cant depart ure fr om Hardy- Wei nberg equili bri um, we foll ow t he
procedur e descri bed i n Guo and Thompson ( 1992) usi ng a t est anal ogous t o Fi sher s

Manual Arlequin ver 3.5 Met hodological out lines 140

exact t est on a t wo- by- t wo cont i ngency t abl e, but ext ended t o a t ri angul ar cont i ngency
t abl e of arbi t rar y si ze. The t est i s done usi ng a modi fi ed versi on of t he Markov- chai n
random wal k al gori t hm descri bed Guo and Thomson ( 1992) . The modi fi ed ver si on gi ves
t he same resul t s t han t he ori gi nal one, but i s mor e effi ci ent fr om a comput at i onal poi nt
of vi ew.
Thi s t est i s obvi ousl y onl y possi bl e for genot ypi c dat a. I f t he gamet i c phase i s unknown,
t he t est i s onl y possi bl e for each l ocus separat el y. For dat a wi t h known gamet i c phase, i t
i s al so possi bl e t o t est for t he non random associ at i on of hapl ot ypes i nt o i ndi vi dual s. Not e
t hat t hi s t est assumes t hat t he all el e frequenci es ar e gi ven. Therefore, t hi s t est i s not
possi bl e for dat a wi t h recessi ve al l el es, as i n t his case t he al l el e fr equenci es need t o be
est i mat ed.
A cont i ngency t abl e i s first bui lt . The kxk ent ri es of t he t abl e are t he observed al l el e
fr equenci es and k i s t he number of all el es. Usi ng t he same not at i ons as i n sect i on 8.2.2,
t he pr obabili t y t o obser ve t he t abl e under t he null- hypot hesi s of no associ at i on i s gi ven
by Levene ( 1949)
H
i
j
ij
k
i
k
i
i
n n
n n
L 2
! )! 2 (
! !
1 1
1
*
0

= =
=
=
,
wher e H i s t he number of het erozygot e i ndi vi dual s.
Much li ke i t was done for t he t est of li nkage di sequilibrium, we expl or e al t ernat i ve
cont i ngency t abl es havi ng same margi nal count s. I n order t o cr eat e a new cont i ngency
t abl e from an exi st i ng one, we sel ect t wo di st i nct li nes i
1
, i
2
and t wo di st i nct col umns j
1
,
j
2
at random. The new t abl e i s obt ained by decr easi ng t he count s of t he cel l s ( i
1
, j
1
) ( i
2
,
j
2
) and i ncreasi ng t he count s of t he cel l s ( i
1
, j
2
) ( i
2
, j
1
) by one uni t . Thi s l eaves t he
all el es count s n
i
unchanged. The swi t ch t o t he new t abl e i s accept ed wi t h a probabi li t y R
equal t o :

1)
) 1 )( 1 (
) 1 )( 1 (
) 1 )( 1 (
1 2 2 1
2 2 1 1
1 2 2 1
2 2 1 1 1
j i j i
j i j i
j i j i
j i j i
n
n
n n
n n
L
L
R


+ +
+ +
+ +
= =
+
, i f
2 2 1 1
or j i j i

2)
1
4
) 2 )( 1 (
1 2 2 1
1 1
2 2
1
+ +
= =
+
j i j i
j i j i
n
n
n n
n n
L
L
R
, i f
2 2 1 1
and j i j i = =


Manual Arlequin ver 3.5 Met hodological out lines 141

3)
4
1
) 1 )( 1 (
) 1 (
1 2 2 1
2 2 1 1 1
+ +

= =
+
j i j i
j i j i
n
n
n n
n n
L
L
R
, i f
1 2 2 1
and j i j i = = .
As usual denot es t he Kronecker funct i on. R i s j ust t he rat i o of t he pr obabilit i es of t he
t wo t abl es. The swi t ch t o t he new t abl e i s accept ed i f R i s l arger t han 1.
The P- val ue of t he t est i s t he proport i on of t he vi si t ed t abl es havi ng a probabi lit y small er
or equal t o t he obser ved ( i nit i al ) cont i ngency t abl e. The st andard er ror on t he P- val ue i s
est i mat ed li ke i n t he case of li nkage di sequilibrium usi ng a syst em of bat ches ( see
sect i on 8.1.4.1) .
Refer ence:
Guo and Thomson ( 1992)
8.1.6 Neut r al i t y t est s.
8.1. 6.1 Ewens- Wat t erson homozygosit y t est
Thi s t est i s based on Ewens ( 1972) sampli ng t heory of neut ral all el es. Wat t er son ( 1978)
has shown t hat t he di st ri but i on of sel ect i vel y neut ral hapl ot ype fr equenci es coul d be
conveni ent l y summari zed by t he sum of hapl ot ype ( al l el e) frequenci es ( F) , equi val ent t o
t he expect ed homozygosi t y for di pl oids. Thi s t est can be performed equall y well on
di pl oi d or hapl oi d dat a, as t he t est st at i st i c i s not used for i t s bi ol ogi cal meani ng, but
j ust as a way t o summari ze t he all eli c frequency di st ri but i on. The null di st ri but i on of F
i s generat ed by si mul at ing random neut ral sampl es havi ng t he same number of genes
and t he same number of hapl ot ypes usi ng t he al gori t hm of St ewart ( 1977) . The
probabi lit y p= Pr( )
sim obs
F F of observi ng random sampl es wi t h F val ues i dent i cal or
small er t han t he ori gi nal sampl e i s recorded and out put . Not e t hat t hi s probabi lit y i s not
a p- val ue, as obser ved dat a havi ng very l arge F val ue will be associ at ed wi t h a hi gh p
but can st ill be consi der ed as si gni fi cant ( i .e. i f p> 0.95) . Thi s t est s i s curr ent l y li mi t ed t o
sampl e si zes of 2000 genes or l ess and 1000 di fferent al l el es ( hapl ot ypes) or l ess. I t
can be used t o t est t he hypot hesi s of sel ect i ve neut rali t y and popul at i on equili bri um
agai nst ei t her bal ancing sel ect i on or t he pr esence of advant ageous al l eles.
Refer ences:
Ewens ( 1972)
Wat t er son ( 1978)
8.1. 6.2 Ewens- Wat t erson- Slat kin exact t est
Thi s t est i s essent i all y simil ar t o t hat of Wat t erson ( 1978) t est , but i nst ead of usi ng F as
a summar y st at i st i c, i t compar es t he probabi lit i es of t he random sampl es t o t hat of t he
observed sampl e ( Sl at kin 1994b, 1996) . The pr obabili t y of obt ai ning a random sampl e

Manual Arlequin ver 3.5 Met hodological out lines 142

havi ng a probabili t y small er or equal t o t he obser ved sampl e i s recorded. The r esul t s
are i n general very cl ose t o t hose of Wat t erson' s homozygosi t y t est . Not e t hat t he
random sampl es are generat ed as expl ai ned for t he Ewens- Wat t erson homozygosi t y
t est .
Refer ences:
Ewens ( 1972)
Sl at ki n ( 1994b, 1996)
8.1. 6.3 Chakrabort y' s t est of populat ion amalgamat ion
Thi s t est i s al so based on t he i nfi ni t e- all el e model , and on Ewens ( 1972) sampli ng
t heor y of neut ral all el es. By si mul at i on, Chakrabort y ( 1990) has not i ced t hat t he
number of al l el es i n a het erogeneous sampl e ( drawn from a popul at i on resul t i ng from
t he amal gamat i on of pr evi ousl y i sol at ed popul at i ons) was l arger t han t he number of
all el es expect ed i n a homogeneous neut ral sampl e. He al so not i ced t hat t he
homozygosi t y of t he sampl e was l ess sensi t i ve t o t he amal gamat i on and t her efor e
proposed t o use t he mut at i on paramet er i nfer red fr om t he homozygosi t y (
Hom
) ( see
sect i on 8.1.2.3.1) t o comput e t he probabi lit y of obser vi ng a random neut ral sampl e
wi t h a number of all el es si milar or l arger t han t he observed val ue ( ) Pr(
obs
k K ( see
sect i on 8.1.2.3.3 t o see how t hi s probabi li t y can be comput ed) . I t i s an approxi mat i on
of t he condi t i onal probabilit y of obser vi ng some number of all el es gi ven t he observed
homozygosi t y.
Refer ences:
Ewens ( 1972)
Chakrabort y ( 1990)
8.1. 6.4 Taj ima' s t est of select ive neut ralit y
Taj imas ( 1989a) t est i s based on t he i nfi ni t e- sit e model wi t hout recombi nat i on,
appropri at e for short DNA sequences or RFLP hapl ot ypes. I t compar es t wo est i mat ors of
t he mut at i on paramet er t het a ( Mu 2 = , wi t h M= 2N i n di pl oi d popul at i ons or M= N i n
hapl oi d popul at i ons of effect i ve si ze N) . The t est st at i st i c D i s t hen defined as
) (


S
S
Var
D

= ,
wher e

= and ) / 1 ( /

1
0


=
=
n
i
S
i S , and S i s t he number of segr egat i ng si t es i n t he
sampl e. The li mi t s of confi dence i nt erval s around D may be found i n Tabl e 2 of Taj i ma' s
paper ( Taj i ma 1989a) for di ffer ent sampl e si zes.

Manual Arlequin ver 3.5 Met hodological out lines 143

The si gni fi cance of t he D st at i st i c i s t est ed by generat i ng random sampl es under t he
hypot hesi s of sel ect i ve neut rali t y and popul at i on equili bri um, using a coal escent
si mul at i on al gori t hm adapt ed from Hudson ( 1990) . The P val ue of t he D st at i st i c i s t hen
obt ai ned as t he proport i on of random F
S
st at i st ics l ess or equal t o t he observat i on. We
al so provi de a paramet r i c approxi mat i on of t he P- val ue assumi ng a bet a- di st ri but i on
li mi t ed by mi ni mum and maxi mum possi bl e D val ues ( see Taj i ma 1989a, p.589) . Not e
t hat si gni fi cant D val ues can be due t o fact ors ot her t han sel ect i ve effect s, l i ke
popul at i on expansi on, bot t l eneck, or het er ogenei t y of mut at i on rat es ( see Taj i ma,
1993; Ari s- Brosou and Excoffi er, 1996; or Taj i ma 1996, for furt her det ail s) .
Refer ences:
Taj i ma ( 1993)
Ari s- Brosou and Excoffier ( 1996)
Taj i ma ( 1996)
8.1. 6.5 Fus F
S
t est of select ive neut ralit y
Li ke Taj imas ( 1989a) t est , Fus t est ( Fu, 1997) i s based on t he i nfi ni t e- si t e model
wi t hout recombi nat i on, and t hus appropri at e for short DNA sequences or RFLP
hapl ot ypes. The pri nci ple of t he t est i s very si mil ar t o t hat of Chakrabor t y descri bed
above. Her e, we eval uat e t he pr obabili t y of observi ng a random neut ral sampl e wi t h a
number of al l el es si mil ar or smal l er t han t he observed val ue ( see sect i on 8.1.2.3.3 t o see
how t hi s probabi lit y can be comput ed) gi ven t he observed number of pai rwi se
di fferences, t aken as an est i mat or of . I n mor e det ail s, Fu fi rst cal l s t hi s probabili t y
)

| Pr( '

= =
obs
k K S and defi nes t he F
S
st at i st i c as t he l ogi t of S'
)
' 1
'
ln(
S
S
F
S

= ( Fu, 1997)
Fu ( 1997) has not i ced t hat t he F
S
st at i st i c was very sensi t i ve t o popul at i on demographi c
expansi on, whi ch gener all y l ead t o l arge negat ive F
S
val ues.
The si gni fi cance of t he F
S
st at i st i c i s t est ed by generat i ng random sampl es under t he
hypot hesi s of sel ect i ve neut rali t y and popul at i on equili bri um, using a coal escent
si mul at i on al gori t hm adapt ed from Hudson ( 1990) . The P- val ue of t he F
S
st at i st i c i s t hen
obt ai ned as t he proport i on of random F
S
st at i st ics l ess or equal t o t he observat i on. Usi ng
si mul at i ons, Fu not i ced t hat t he 2% per cent i l e of t he di st ri but i on corr esponded t o t he 5%
cut off val ue ( i .e. t he cri t i cal value of t he t est at t he 5% si gni fi cance l evel ) . We i ndeed
confi rmed t hi s behavi or by our own si mul at i ons. Even t hough t hi s propert y i s not ful l y
underst ood, i t means t hat a F
S
st at i st i c shoul d be consi der ed as si gni ficant at t he 5%
l evel , i f i t s P- val ue i s bel ow 0.02, and not bel ow 0.05.

Manual Arlequin ver 3.5 Met hodological out lines 144

Refer ence:
Fu ( 1997
8.2 I nt er - popul at i on l ev el met hods
8.2.1 Popul at i on genet i c st r uct ur e i nf er r ed by anal y si s of v ar i ance
( AMOVA)
The genet i c st ruct ure of popul at i on i s i nvest i gat ed here by an anal ysi s of vari ance
framework, as i ni t i all y defi ned by Cockerham ( 1969, 1973) , and ext ended by ot hers
( see e.g. Wei r and Cockerham, 1984; Long 1986) . The Anal ysi s of Mol ecul ar Vari ance
approach used i n Arl equi n ( AMOVA, Excoffi er et al . 1992) i s essent i all y si mil ar t o ot her
approaches based on anal yses of vari ance of gene frequenci es, but i t t akes i nt o account
t he number of mut at i ons bet ween mol ecul ar hapl ot ypes ( whi ch fi rst need t o be
eval uat ed) .
By defi ni ng groups of popul at i ons, t he user defi nes a part i cul ar genet i c st ruct ur e t hat wi ll
be t est ed ( see t he i nput fil e not at i ons for mor e det ai l s) . A hi erarchi cal anal ysi s of
vari ance part i t i ons t he t ot al vari ance i nt o covari ance component s due t o i nt ra- indi vi dual
di fferences, i nt er- i ndi vi dual di fferences, and/ or i nt er- popul at i on di fferences. See al so
Wei r ( 1996) , for det ai l ed t reat ment s of hi erarchi cal anal yses, and Excoffi er ( 2000) as
wel l as Rousset ( 2000) for an expl anat i on why t hese ar e covariance component s rat her
t han variance component s. The covari ance component s (
2
i
' s) are used t o comput e
fi xat i on indi ces, as ori ginall y defi ned by Wri ght ( 1951, 1965) , i n t erms of i nbreedi ng
coeffi ci ent s, or l at er i n t erms of coal escent t i mes by Sl at ki n ( 1991) .
Formal l y, in t he hapl oi d case, we assume t hat t he i- t h hapl ot ype fr equency vect or fr om
t he j - t h popul at i on i n t he k- t h gr oup i s a li near equat i on of t he form

ijk jk k ijk
c b a x x + + + = .
The vect or x i s t he unknown expect at i on of x
ij k
, averaged over t he whol e st udy. The
effect s ar e a for group, b for popul at i on, and c for hapl ot ypes wi t hin a popul at i on wit hi n
a group, assumed t o be addi t i ve, random, i ndependent , and t o have t he associ at ed
covari ance component s
2
a
,
2
b
, and
2
c
, r espect i vel y. The t ot al mol ecul ar vari ance (
2
) i s t he sum of t he covari ance component due t o di ffer ences among hapl ot ypes
wi t hin a popul at i on (
2
c
) , t he covari ance component due t o di fferences among
hapl ot ypes i n di ffer ent popul at i ons wi t hi n a group (
2
b
) , and t he covari ance component

Manual Arlequin ver 3.5 Met hodological out lines 145

due t o di ffer ences among t he G popul at i ons (
2
a
) . The same framewor k coul d be
ext ended t o addi t i onal hi erarchi cal l evel s, such as t o accommodat e, for i nst ance, t he
covari ance component due t o di ffer ences bet ween hapl ot ypes wi t hi n dipl oi d i ndi vi dual s.
Not e t hat i n t he case of a si mpl e hi erar chi cal genet i c st ruct ur e consi st i ng of hapl oi d
i ndi vi dual s i n popul at i ons, t he i mpl ement ed for m of t he al gori t hm l eads t o a fi xat i on
i ndex F
ST
whi ch i s absol ut el y i dent i cal t o t he wei ght ed aver age F- st at ist i c over l oci ,
w

, defi ned by Wei r and Cockerham ( 1984) ( see Mi chal aki s and Excoffi er 1996 for a formal
proof) . I n t erms of i nbreedi ng coeffi ci ent s and coal escence t i mes, t hi s F
ST
can be
expr essed as
1
0 1
1
1 0
1 t
t t
f
f f
F
ST

=
, ( Sl at ki n, 1991)

wher e
0
f i s t he probabi li t y of i dent i t y by descent of t wo di fferent genes drawn fr om t he
same popul at i on,
1
f i s t he probabi li t y of i dent i t y by descent of t wo genes drawn from
t wo di fferent popul at i ons,
1
t i s t he mean coal escence t i mes of t wo genes drawn from
t wo di fferent popul at i ons, and
0
t i s t he mean coal escence t i me of t wo genes drawn
from t he same popul at ion.
The si gni fi cance of t he fixat i on indi ces i s t est ed usi ng a non- paramet ri c permut at i on
approach descri bed i n Excoffi er et al . ( 1992) , consi st ing in permut i ng hapl ot ypes,
i ndi vi dual s, or popul at i ons, among i ndi vi dual s, popul at i ons, or gr oups of popul at i ons.
Aft er each permut at i on round, we r ecomput e al l st at i st i cs t o get t hei r null di st ri but i on.
Dependi ng on t he t est ed st at i st i c and t he gi ven hi erarchi cal desi gn, di ffer ent t ypes of
permut at i ons ar e per for med. Under t hi s pr ocedure, t he normal i t y assumpt i on usual i n
anal ysi s of vari ance t est s i s no l onger necessar y, nor i s i t necessar y t o assume equal i t y
of vari ance among popul at i ons or groups of popul at i ons. A l arge number of
permut at i ons ( 1,000 or mor e) i s necessar y t o obt ain some accuracy on t he fi nal
probabi lit y. A syst em of bat ches si mil ar t o t hose used i n t he exact t est of l i nkage
di sequili bri um ( see end of sect i on 8.1.4.1) has been i mpl ement ed t o get an i dea of t he
st andard- devi at i on of t he P val ues.
We have i mpl ement ed her e 6 di ffer ent t ypes of hi erarchi cal AMOVA. The number of
hi erarchi cal l evel s vari es from t wo t o four . I n each of t he si t uat i ons, we descri be t he
way t he t ot al sum of squares i s part i t i oned, how t he covari ance component s and t he
associ at ed F- st at i st i cs are obt ai ned, and whi ch permut at i on schemes ar e used for t he
si gni fi cance t est .
Befor e enumerat i ng all t he possi bl e si t uat i ons, we i nt roduce some not at i ons:

Manual Arlequin ver 3.5 Met hodological out lines 146

SSD( T) : Tot al sum of squar ed devi at i ons.
SSD ( AG) : Sum of squar ed devi at ions Among Groups of popul at i ons.
SSD ( AP) : Sum of squar ed devi at ions Among Popul at i ons.
SSD ( AI ) : Sum of squar ed devi at ions Among I ndi vi dual s.
SSD ( WP) : Sum of squar ed devi at ions Wi t hin Popul at i ons.
SSD ( WI ) : Sum of squar ed devi at ions Wi t hin I ndi vi dual s.
SSD ( AP/ WG) : Sum of squar ed devi at ions Among Popul at i ons, Wi t hi n Groups.
SSD ( AI / WP) : Sum of squar ed devi at ions Among I ndi vi dual s, Wi t hin Popul at i ons.
G : Number of gr oups i n t he st ruct ur e.
P : Tot al number of popul at i ons.
N : Tot al number of i ndi vi dual s for genot ypi c dat a or t ot al number of
gene copi es for hapl ot ypi c dat a.
p
N : Number of i ndi vi dual s in popul at i on p for genot ypi c dat a or t ot al
number of gene copi es in popul at i on p for hapl ot ypi c dat a.

g
N : Number of i ndi vi dual s in group g for genot ypi c dat a or t ot al
number of gene copi es in group g for hapl ot ypic dat a..

Manual Arlequin ver 3.5 Met hodological out lines 147

8.2. 1.1 Haplot ypic dat a, one group of populat ions

Source of vari at i on Degr ees of
fr eedom
Sum of squar es
( SSD)
Expect ed mean
squares
Among
Popul at i ons
P - 1 SSD( AP) 2 2
b a
n +
Wi t hin Popul at i ons N - P SSD( WP) 2
b

Tot al N - 1 SSD( T) 2
T


Wher e n and F
ST
are defi ned by
,
1
2

=

P
N
N
N
n
p
p

.
2
2
T
a
ST
F

=
We t est
2
a
and F
ST
by per mut ing hapl ot ypes among popul at i ons.
8.2. 1.2 Haplot ypic dat a, several groups of populat ions

Source of vari at i on Degr ees of
fr eedom
Sum of squar es
( SSD)
Expect ed mean
squares
Among Groups G - 1 SSD( AG) 2 2 2
' ' '
c b a
n n + +
Among Popul at i ons
/
Wi t hin Gr oups

P - G SSD( AP/ WG) 2 2
c b
n +
Wi t hin Popul at i ons N - P SSD( WP) 2
c

Tot al : N - 1 SSD( T) 2
T


Wher e t he n' s and t he F- st at i st i cs are defi ned by:

Manual Arlequin ver 3.5 Met hodological out lines 148

and ,
1
' ' ,
1
'
, ,
2
2 2
2 2
2
2
2
2 2
2
T
b a
ST
c b
b
SC
T
a
CT
G g
g
P p
p
G
G
G g g p g
p
G
F F F
G
N
N
N
n
G
N
N
S
n
G P
S N
n
N
N
S

+
=
+
= =

= =





We t est
2
c
and F
ST
by per mut ing hapl ot ypes among popul at i ons among groups.
We t est
2
b
and F
SC
by per mut ing hapl ot ypes among popul at i ons wi t hi n groups.
We t est
2
a
and F
CT
by per mut ing popul at i ons among groups.
8.2. 1.3 Genot ypic dat a, one group of populat ions, no wit hin- individual level

Source of
vari at i on
Degr ees of
fr eedom
Sum of squar es
( SSD)
Expect ed mean
squares
Among
Popul at i ons
P - 1 SSD( AP) 2 2
b a
n +
Wi t hin
Popul at i ons
2N - P SSD( WP) 2
b

Tot al : 2N - 1 SSD( T) 2
T


Wher e n and F
ST
are defi ned by
.
,
1
2
2
2
2
2
T
a
ST
P
p
F
P
N
N
n
N

=


I f t he gamet ic phase i s know:
We t est
2
a
and F
ST
by per mut ing hapl ot ypes among popul at i ons.
I f t he gamet ic phase i s unknown:

Manual Arlequin ver 3.5 Met hodological out lines 149

We t est
2
a
and F
ST
by per mut ing i ndi vi dual genot ypes among popul at i ons.
8.2. 1.4 Genot ypic dat a, several groups of populat ions, no wit hin- individual level

Source of
Vari at i on
Degr ees of
fr eedom
Sum of squar es
( SSD)
Expect ed mean
squares
Among Groups G - 1 SSD( AG) 2 2 2
' ' '
c b a
n n + +
Among
Popul at i ons /
Wi t hin Gr oups

P - G SSD( AP/ WG) 2 2
c b
n +
Wi t hin
Popul at i ons
2N - P SSD( WP) 2
c

Tot al : 2N - 1 SSD( T) 2
T


Wher e t he n' s and t he F- st at i st i cs are defi ned by:
,
2
,
2
2
G P
S N
n
N
N
S
G
G g g p g
p
G

= =



,
1
2
2
' ' ,
1
2
'
2
2


=


G
N
N
N
n
G
N
N
S
n
G g
g
P p
p
G

. and ,
2 2
2
2
2 2
2
2
c b
b
SC
T
b a
ST
T
a
CT
F F F

+
=
+
= =
I f t he gamet ic phase i s known:
We t est
2
c
and F
ST
by per mut ing hapl ot ypes among popul at i ons and among groups.
We t est
2
b
and F
SC
by per mut ing hapl ot ypes among popul at i ons but wi t hin groups.
I f t he gamet ic phase i s not known:
We t est
2
c
and F
ST
by per mut ing i ndi vi dual genot ypes among popul at i ons and
among gr oups.
We t est
2
b
and F
SC
by per mut ing i ndi vi dual genot ypes among popul at i ons but wi t hin
groups.


Manual Arlequin ver 3.5 Met hodological out lines 150

I n all cases:
We t est
2
a
and F
CT
by per mut ing whol e popul at i ons among groups.
8.2. 1.5 Genot ypic dat a, one populat ion, wit hin- individual level

Source of
vari at i on
Degr ees of
fr eedom
Sum of squar es
( SSD)
Expect ed mean
squares
Among
I ndi vi dual s
N - 1 SSD( AI ) 2 2
2
b a
+
Wi t hin
I ndi vi dual s
N SSD( WI ) 2
b

Tot al : 2N - 1 SSD( T) 2
T


Wher e F
I S
i s defi ned as:
.
2
2
T
a
IS
F

=
We t est
2
a
and F
I S
by per mut ing hapl ot ypes among i ndi vidual s.
8.2. 1.6 Genot ypic dat a, one group of populat ions, wit hin- individual level
Source of
Vari at i on
Degr ees of
fr eedom
Sum of squar es
( SSD)
Expect ed mean
squares
Among
Popul at i ons
P - 1 SSD( AP) 2 2 2
2
c b a
n + +
Among
I ndi vi dual s /
Wi t hin
Popul at i ons

N P SSD( AI / WP) 2 2
2
c b
+
Wi t hin
I ndi vi dual s
N SSD( WI ) 2
c

Tot al 2N 1 SSD( T) 2
T


Wher e n and t he F- st at i st i cs ar e defi ned by:

Manual Arlequin ver 3.5 Met hodological out lines 151

1
2
2
2

P
N
N
N
n
P p
p

. and ,
2 2
2
2
2 2
2
2
c b
b
IS
T
b a
IT
T
a
ST
F F F

+
=
+
= =
We t est
2
c
and F
I T
by per mut ing hapl ot ypes among i ndi vidual s among popul at i ons.
We t est
2
b
and F
I S
by per mut ing hapl ot ypes among i ndi vidual s wi t hin popul at i ons.
We t est
2
a
and F
ST
by per mut ing i ndi vi dual genot ypes among popul at i ons.
8.2. 1.7 Genot ypic dat a, several groups of populat ions, wit hin- individual level

Source of
Vari at i on:
Degr ees of
fr eedom
Sum of squar es
( SSD)
Expect ed mean squar es
Among Groups G - 1 SSD( AG) 2 2 2 2
2 ' ' '
d c b a
n n + + +
Among
Popul at i ons /
Wi t hin Gr oups

P G SSD( AP/ WG) 2 2 2
2
d c b
n + +
Among
I ndi vi dual s /
Wi t hin
Popul at i ons

N - P SSD( AI / WP) 2 2
2
d c
+
Wi t hin
I ndi vi dual s
N SSD( WI ) 2
d

Tot al : 2N 1 SSD( T) 2
T


Wher e t he n' s and t he F- st at i st i cs are defi ned by:

Manual Arlequin ver 3.5 Met hodological out lines 152

. and , ,
1
2
2
' ' ,
) 1 (
2
) (
' ,
2
2
2 2 2
2
2 2
2
2
2 2 2
2
2
2
2
2
d c b
b
SC
d c
c
IS
T
c b a
IT
T
a
CT
G g
g
g p
p
G g g
g
G g g p g
p
F F F F
G
N
N
N
n
G N
N
N
N N
n
G P
N
N
N
n



+ +
=
+
=
+ +
= =





We t est
2
d
and F
I T
by per mut ing hapl ot ypes among popul at i ons and among groups.
We t est
2
c
and F
I S
by per mut ing hapl ot ypes among i ndi vidual s wi t hin popul at i ons.
We t est
2
b
and F
SC
by per mut ing i ndi vi dual genot ypes among popul at i ons but wi t hin
groups.
We t est
2
a
and F
CT
by per mut ing popul at i ons among groups.
8.2.2 Mi ni mum Spanni ng Net w or k ( MSN) among hapl ot y pes
I t i s possi bl e t o comput e t he Mi ni mum Spanning Tr ee ( MST) and Mi ni mum Spanni ng
Net work ( MSN) fr om t he squared di st ance mat r i x among hapl ot ypes used for t he
cal cul at i on of F- st at i st i cs i n t he AMOVA procedure. See sect i on 8.1.2.9 for a bri ef
descri pt i on of t he met hod and r eferences.
8.2.3 Locus- by- l ocus AMOVA
AMOVA anal yses can now be performed for each l ocus separat el y i n t he same way i t was
performed at t he hapl ot ype l evel . Vari ance component s and F- st at i st i cs ar e est i mat ed for
each l ocus separat el y and li st ed int o a gl obal t abl e. The di fferent vari ance component s
from di ffer ent l evel s are combi ned t o pr oduce synt het i c est i mat ors of F- st at i st i cs, by
summi ng vari ance component s est i mat ed at a gi ven l evel i n t he hi erar chy i n t he
numerat or and denomi nat or t o produce F- st at i st i cs as vari ance component rat i os.
Ther efor e t he gl obal F- st at i st i cs are not obt ai ned as an ari t hmet i c aver age of each l ocus
F- st at i st i cs ( see e.g. Wei r and Cockerham 1984, or Wei r 1996) .
I f t here i s no mi ssi ng dat a, t he l ocus- by- l ocus and t he hapl ot ype anal yses shoul d l ead t o
i dent i cal sums of squar es, vari ance component s, and F- st at i st i cs. I f t her e are mi ssi ng
dat a, t he gl obal vari ance component s shoul d be di ffer ent , because t he degr ees of
fr eedom wi ll vary from l ocus t o l ocus, and t herefor e t he est i mat ors of F- st at i st i cs will al so
vary.

Manual Arlequin ver 3.5 Met hodological out lines 153

8.2.4 Popul at i on pai r w i se genet i c di st ances
The pai rwi se F
ST
' s can be used as short - t erm genet i c di st ances bet ween popul at i ons, wi t h
t he appli cat i on of a sl i ght t ransformat i on t o l i neari ze t he di st ance wi t h popul at i on
di vergence t i me ( Reynol ds et al . 1983; Sl at ki n, 1995) .
The pai rwi se F
ST
val ues are gi ven i n t he form of a mat ri x.
The null di st ri but i on of pai rwi se F
ST
val ues under t he hypot hesi s of no di fference bet ween
t he popul at i ons i s obt ained by permut i ng hapl ot ypes bet ween popul at ions. The P- val ue of
t he t est i s t he pr oport i on of permut at i ons l eading t o a F
ST
val ue l arger or equal t o t he
observed one. The P- val ues are al so gi ven i n mat ri x form.
Three ot her mat ri ces ar e comput ed fr om t he F
ST
val ues:
8.2. 4.1 Reynolds dist ance ( Reynolds et al. 1983) :
Si nce F
ST
bet ween pai rs of st at i onary hapl oi d popul at i ons of si ze N having di verged t
generat i ons ago vari es approxi mat el y as
N t t
ST
e
N
F
/
1 )
1
1 ( 1

=
The genet i c di st ance ) 1 log(
ST
F D = i s t hus approxi mat el y proport i onal t o t / N for
short di vergence t i mes.
8.2. 4.2 Slat kins linearized F
ST
' s ( Slat kin 1995) :
Sl at ki n consi ders a si mpl e demographi c model wher e t wo hapl oi d popul at i ons of si ze N
have di verged generat i ons ago fr om a popul at i on of i dent i cal si ze. These t wo
popul at i ons have r emai ned i sol at ed ever si nce, wi t hout exchangi ng any mi grant s. Under
such condi t i ons, F
ST
can be expressed i n t erms of t he coal escence t i mes
1
t , whi ch i s t he
mean coal escence t i me of t wo genes drawn fr om t wo di ffer ent popul at ions, and
0
t whi ch
i s t he mean coal escence t i me of t wo genes drawn from t he same popul at i on. Usi ng t he
anal ysi s of vari ance approach, t he F
ST
' s are expr essed as
1
0 1
t
t t
F
ST

= ( Sl at ki n, 1991, 1995)
Because,
0
t i s equal t o N generat i ons ( see e.g. Hudson, 1990) , and
1
t i s equal t o + N
generat i ons, t he above expr essi on r educes t o
N
F
ST
+
=


.
Ther efor e, t he rat i o ) 1 /(
ST ST
F F D = i s equal t o N / , and i s t her efore pr oport i onal t o
t he di vergence t i me bet ween t he t wo popul at i ons.

Manual Arlequin ver 3.5 Met hodological out lines 154

8.2. 4.3 M values ( M = Nm for haploid populat ions, M = 2Nm for diploid
populat ions) .
Thi s mat ri x i s comput ed under very di ffer ent assumpt i ons t han t he t wo previ ous
mat ri ces. Assume t hat t wo popul at i ons of si ze N drawn fr om a l arge pool of popul at i ons
exchange a fract i on m of mi grant s each generat i on, and t hat t he mut at ion rat e u i s
negli gi bl e as compar ed t o t he mi grat i on rat e m. I n t hi s case, we have t he foll owi ng
si mpl e rel at i onshi p at equilibrium bet ween mi grat i on and dri ft ,
1 2
1
+
=
M
F
ST

Ther efor e, M, whi ch i s t he absol ut e number of mi grant s exchanged bet ween t he t wo
popul at i ons, can be est imat ed by
ST
ST
F
F
M
2
1
= .
I f one was t o consi der t hat t he t wo popul at i ons onl y exchange wi t h each ot her and wi t h
no ot her popul at i ons, t hen one shoul d di vide t he quant i t y M by a fact or 2 t o obt ai n an
est i mat or M' = Nm for hapl oi d popul at i ons, or M' = 2Nm for di pl oi d popul at i ons. Thi s i s
because t he expect at i on of F
ST
i s i ndeed gi ven by
1
1
) 1 (
4
+
=
d
Nmd
ST
F ( e.g. Sl at ki n 1991)

wher e d i s t he number of demes exchangi ng genes. When d i s l arge t hi s t ends t owards
t he cl assi cal val ue 1/ ( 4Nm + 1) , but when d= 2, t hen t he expect at i on of F
ST
i s
1/ ( 8Nm+ 1) .
8.2. 4.4 Neis average number of differences bet ween populat ions
As addi t i onal genet i c di st ance bet ween popul at ions, we al so pr ovi de Nei ' s raw ( D) and
net ( D
A
) number of nucl eot i de di ffer ences bet ween popul at i ons ( Nei and Li , 1979) . D and
net D
A
ar e respect i vel y comput ed bet ween popul at i ons 1 and 2 as
ij j i
k
j
k
i
x x D
2 1
'
1 1
12


= =
= = , and
2

2 1
12

+
=
A
D ,
wher e k and k' ar e t he number of di st i nct hapl ot ypes i n popul at i ons 1 and 2 respect i vel y,
x
1i
i s t he fr equency of t he i- t h hapl ot ype i n popul at i on 1, and
ij
i s t he number of
di fferences bet ween hapl ot ype i and hapl ot ype j .
Under t he same not at i on concerni ng coal escence t i mes as descri bed above, t he
expect at i on of D
A
i s

Manual Arlequin ver 3.5 Met hodological out lines 155

1 0
2 ( ) 2
A
D t t = u = u ,
wher e u i s t he average mut at i on rat e per nucl eot i de, i s t he di vergence t i me bet ween
t he t wo popul at i ons. Thus D
A
i s al so expect ed t o i ncrease li nearl y wi t h di vergence t i mes
bet ween t he popul at i ons.
8.2. 4.5 Genet ic dist ance ( u)
2
( microsat ellit e dat a only)
For mi cr osat el lit e dat a, Gol dst ei n et al . ( 1995) have i nt roduced ( u)
2
,

a measure of
genet i c di st ance bet ween pai rs of popul at i ons based on t he St epwi se Mut at i on Model
( SMM) . The di st ance bet weenpopul at i ons A and B i s si mpl y defi ned as:
( )
2
2
( )
A B
u u u = ,
wher e
A
u and
B
u ar e t he average number of all eli c size di ffer ences wi t hi n popul at i ons A
and B r espect i vel y, comput ed over al l l oci . Of course, t he comput at i on of t hese di st ances
assumes t hat al l el es ar e coded as a measure t hat i s proport i onal t o t he number of mot i f
repeat s i n t he mi crosat ellit e.
Gol dst ei n et al . ( 1995) have shown t hat , i f one can assume t hat t he t wo popul at i ons
havi ng di verged T gener at i ons ago ar e now at mut at i on dri ft equili bri um, and t hat t hey
had t he same mean r epeat si ze at t he t i me of t hei r di vergence, t hen t he expect ed val ue
of ( u)
2
i s equal t o 2uT, wher e u i s t he mut at i on rat e per generat i on. ( u)
2
t hus
i ncreases l i nearl y wi t h di vergence t i me. Not e however , t hat non- li neari t ies ari se i n case
of recent popul at i on si ze change.
8.2. 4.6 Relat ive populat ion sizes - Divergence bet ween populat ions of unequal
sizes
We have i mpl ement ed a met hod t o est i mat e di vergence t i me bet ween popul at i ons of
unequal si zes ( Gaggi ot t i and Excoffi er 2000) . The model assumes t hat t wo popul at i ons
have di verged fr om an ancest ral popul at i on of si ze N
0
some T generat i ons i n t he past ,
and have remai ned i sol at ed from each ot her ever si nce. The si zes of t he t wo daught er
popul at i ons can be di fferent , but t hei r sum adds up t o t he si ze of t he ancest ral
popul at i on.
From t he average number of pai rwi se di ffer ences bet ween and wi t hi n popul at i ons, we t ry
t o est i mat e t he di vergence t i me scal ed by t he mut at i on rat e ( 2Tu = ) , t he si ze of t he
ancest ral popul at i on size scal ed by t he mut at i on rat e (
0 0
2N u = for hapl oi d popul at i ons
and
0 0
4N u = for di pl oi d popul at i ons) , as wel l as t he r el at ive si zes ( k and [ 1- k] ) of t he
t wo daught er popul at i ons.
The est i mat ed paramet er s r esul t from t he numeri cal resol ut i on of a syst em of t hr ee non-
linear equat i ons wi t h t hree unknowns, based on t he Br oyden met hod ( Press et al . 1992,
p.389) .

Manual Arlequin ver 3.5 Met hodological out lines 156

The si gni fi cance of t he paramet er s i s t est ed by a permut at i on pr ocedure si mil ar t ot t hat
used i n AMOVA. Under t he hypot hesi s t hat t he t wo popul at i ons are undi fferent i at ed, we
permut e i ndi vi dual s bet ween sampl es, and r e- est i mat e t he
t hree paramet er s, i n or der t o obt ai n t hei r empiri cal null di st ri but i on. The per cent il e val ue
of t he t hree st at i st i cs i s obt ai ned by t he pr oport i on of permut ed cases t hat produce
st at i st i cs l arger or equal t o t hose observed. I t t hus provi des a per cent i le val ue of t he
t hree st at i st i cs under t he null hypot hesi s of no di fferent i at i on.
The val ues of t he est i mat ed paramet er s shoul d be int erpr et ed wit h caut ion. The
procedur e we have i mplement ed i s based on t he compari son of i nt ra and i nt er- popul at i on
di versi t i es ( s) whi ch have a l arge vari ance, whi ch means t hat for short di vergence
t i mes, t he average di versi t y found wi t hi n populat i on coul d be l arger t han t hat observed
bet ween popul at i ons. Thi s sit uat i on coul d l ead t o negat i ve di vergence t i mes and t o
daught er popul at i on relat i ve si ze l arger t han one or smal l er t han zero ( negat i ve val ues) .
Al so l arge depart ures fr om t he assumed pur e- fi ssi on model coul d al so l ead t o observed
di versi t i es t hat woul d l ead t o aber rant est i mat ors of di vergence t i me and rel at i ve
popul at i on sizes. One shoul d t hus make t hose comput at i ons i f t he assumpt i ons of a pur e
fi ssi on model are met and i f t he di vergence t i me i s r el at i vel y ol d. Si mulat i on resul t s have
shown t hat t hi s procedure l eads t o bet t er resul t s t han ot her met hods t hat do not t ake
unequal popul at i on si zes i nt o account when t he r el at i ve si zes of daught er popul at i ons ar e
i ndeed unequal . Accordi ng t o our si mul at i ons ( Tabl e 4 i n Gaggi ot t i and Excoffi er 2000)
convent i onal met hods such as descri bed above l ead t o bet t er resul t s for equal popul at i on
si ze ( k= 0.5) and short di vergence t i mes ( T/ N
0
< 0.5) . However, t he fact t hat t he pr esent
met hod l eads t o cl earl y aber rant r esul t s i n some cases i s not necessari ly a dr awback. I t
has t he advant age t o dr aw t he user at t ent i on t o t he fact t hat some care has t o be t aken
wi t h t he i nt erpret at i ons of t he r esul t s. Some ot her est i mat ors t hat would be grossl y
bi ased but whose val ues woul d be kept wi t hin reasonabl e bounds would oft en l ead t o
mi si nt erpret at i ons.
Not e t hat t he numeri cal met hod we have used t o r esol ve t he syst em of equat i on may
somet i mes fai l t o converge. An ast eri sk wi ll i ndicat e t hose cases i n t he resul t fil e t hat
shoul d be di scarded because of convergence failure.
8.2.5 Ex act t est s of popul at i on di f f er ent i at i on
We t est t he hypot hesi s of a random di st ri but i on of k di ffer ent hapl ot ypes or genot ypes
among r popul at i ons as descri bed i n Raymond and Rousset ( 1995) . Thi s t est i s anal ogous
t o Fi shers exact t est on a 2x2 cont i ngency t able ext ended t o a r k cont i ngency t abl e.
All pot ent i al st at es of t he cont i ngency t abl e are expl or ed wi t h a Markov chai n si mil ar t o
t hat descri bed for t he case of t he l i nkage di sequilibrium t est ( sect i on 8.1.4.1) . Duri ng
t hi s random wal k bet ween t he st at es of t he Mar kov chai n, we est i mat e t he pr obabili t y of

Manual Arlequin ver 3.5 Met hodological out lines 157

observi ng a t abl e l ess or equall y li kel y t han t he observed sampl e confi gurat i on under t he
null hypot hesi s of panmi xi a.
For hapl ot ypi c dat a, t he t abl e i s buil t usi ng sampl e hapl ot ype frequenci es ( Raymond and
Rousset 1995) .
For gen ot y pi c dat a w i t h u nk now n gamet i c phase, t he cont i n gen cy t abl e i s bu i l t
f r om sampl e gen ot y pe f r equ enci es ( Gou det et al . 19 96) .
As i t was done pr evi ousl y, an est i mat i on of t he er ror on t he P- val ue i s done by
part i t i oni ng t he t ot al number of st eps i nt o a gi ven number of bat ches ( see sect i on
8.1.4.1) .
8.2.6 Assi gnment of i ndi v i dual genot y pes t o popul at i ons
I t can be of i nt erest t o t ry t o det ermi ne t he ori gi n of part i cul ar i ndi vi dual s, knowi ng a li st
of pot ent i al source popul at i ons ( e.g. Rannal a and Mont ai n, 1997; Waser and St robeck,
1998; Davi es et al . 1999) . The met hod we have i mpl ement ed her e i s t he most si mpl est
one, as i t consi st s i n det ermi ning t he l og- li keli hood of each i ndi vi dual mult i - l ocus
genot ype i n each popul at i on sampl e, assumi ng t hat t he i ndi vi dual comes fr om t hat
popul at i on. For comput i ng t he li keli hood, we si mpl y use t he al l el e fr equenci es est i mat ed
i n each sampl e fr om t he ori gi nal const i t ut i on of t he sampl es. We al so assume t hat al l l oci
are i ndependent , such t hat t he gl obal i ndi vidual li kelihood i s obt ai ned as t he product of
t he li keli hood at each l ocus. The met hod we have i mpl ement ed i s i nspired fr om t hat
descri bed i n Paet kau et al . ( 1995, 1997) and Waser and St r obeck ( 1998) . The r esul t i ng
out put t abl es can be used t o r epr esent l og- l og pl ot s of genot ypes for pai rs of popul at i ons
li kelihood ( see Paet kau et al . 1997 and Waser and St robeck 1998) , t o i dent i fy t hose
genot ypes t hat seem bet t er expl ai ned by bel ongi ng t o anot her popul at i on from t hat t hey
wer e sampl ed.

Manual Arlequin ver 3.5 Met hodological out lines 158


For i nst ance we have plot t ed on t hi s graph
t he l og- li keli hood of i ndivi dual s sampl ed in
Al geri a ( whi t e ci rcl es) for t wo HLA cl ass I I
l oci versus t hose of Senegal ese Mandenka
i ndi vi dual s ( bl ack di amonds) . The overl ap
of t he t wo di st ri but i on suggest s t hat t wo
l oci are not enough t o provi de a cl ear cut
separat i on bet ween t hese t wo popul at i ons.
One al so sees t hat t her e i s at l east one
Mandenka i ndi vi dual whose genot ype woul d
be much bet t er expl ained i f it came fr om
t he Al geri an popul at i on t han i f i t came fr om
East ern Senegal . Not e t hat i nt erpr et i ng
t hese r esul t s i n t erms of gene fl ow i s
di ffi cul t and hazardous.

8.2.7 Mant el t est
The Mant el t est consi st s i n t est i ng t he si gni fi cance of t he corr el at i on bet ween t wo or
mor e mat ri ces by a per mut at i on procedur e al l owi ng get t i ng t he empi ri cal null di st ri but i on
of t he cor rel at i on coeffi ci ent t aki ng i nt o account t he aut o- cor rel at i ons of t he el ement s of
t he mat ri x. I n mor e det ail s, t he t est i ng procedure pr oceeds as foll ows:
Let ' s fi rst defi ne t wo square mat ri ces X = { x
ij
} and Y= { y
ij
} of di mension N. The N
2

el ement s of t hese mat ri x ar e not all independent as t here ar e onl y N- 1 independent
cont rast s i n t he dat a. Thi s i s why t he permut at i on pr ocedur e does not permut e t he
el ement s of t he mat ri ces i ndependent l y. The corr el at i on of t he t wo mat ri ces i s cl assi call y
defi ned as
) ( . ) (
) , (
Y X
Y X
SS SS
SP
r
XY
= ,

t he rat i o of t he cr oss pr oduct of X and Y over t he squar e root of t he pr oduct of sums of
squares. We not e t hat t he denomi nat or of t he above equat i on i s i nsensit i ve t o
permut at i on, such t hat onl y t he numerat or will change upon permut at i on of r ows and
col umns. Upon cl oser exami nat i on, it can be shown t hat t he onl y quant i t y t hat will
act uall y change bet ween permut at i ons i s t he Hadamard pr oduct of t he t wo mat ri ces
not ed as

Manual Arlequin ver 3.5 Met hodological out lines 159


= =
= =
i
j
ij ij
N
i
XY
y x Z
1 1
*Y X

whi ch i s t he onl y vari able t erm i nvol ved i n t he comput at i on of t he cr oss- product .
The Mant el t est i ng procedur e appli ed t o t wo mat ri ces wi ll t hen consi st i n comput i ng t he
quant i t y Z
XY
from t he or i ginal mat ri ces, permut e t he r ows and col umn of one mat ri x
whil e keepi ng t he ot her const ant , and each t i me r ecomput e t he quant i t y
*
XY
Z , and
compare i t t o t he ori ginal Z
XY
val ue ( Smouse et al . 1986) .
I n t he case of t hree mat ri ces, say Y, X
1
and X
2
, t he pr ocedur e i s ver y si mil ar. The part i al
corr el at i on coeffi ci ent s are obt ai ned fr om t he pai rwi se cor r el at i ons as,

) 1 )( 1 (
2 2
.
2 2 1
2 2 1 1
2 1
r r
YX X X
YX X X YX
X X Y
r r r
r

= .
The ot her r el evant part i al corr el at i ons can be obt ained si milarl y ( see e. g. Sokal and Rohl f
1981) . The si gni fi cance of t he part i al corr el at i ons ar e t est ed by keepi ng one mat ri x
const ant and permut i ng t he r ows and col umns of t he ot her t wo mat ri ces, recomput i ng
each t i me t he new part i al corr el at i ons and compari ng i t t o t he observat i on ( Smouse et al .
1986) . Appli cat i ons of t he Mant el t est i n ant hropol ogy and genet i cs can be found i n
Smouse and Long ( 1992) .
8.2.8 Det ect i on of l oci under sel ect i on f r om F- st at i st i cs
Several procedures have been proposed t o det ect l oci under sel ect i on based on t he
pat t erns of genet i c di versi t y found i n a popul at ion, based on t he obser ved pat t ern of
di versi t y wi t hin popul at ions, and several t est s are i ndeed i mpl ement ed i n Arl equin ( see
e.g. sect i on 8.1.6) .
But , sel ect i on can al so affect genet i c di versi t y bet ween popul at i ons, si nce a l ocus under
bal anci ng sel ect i on shoul d show t oo even al l el e fr equenci es across popul at i ons and l oci
under l ocal di rect i onal sel ect i on shoul d show l arge di ffer ences bet ween popul at i ons
( Cavalli - Sforza 1966; Lewont i n and Krakauer 1973) . Thi s obser vat i on has r ecent l y l ed t o
t he devel opment of several met hods compari ng l evel s of genet i c di versi t y and
di fferent i at i on wi t hin and bet ween popul at i ons ( see e.g. Beaumont and Ni chol s 1996;
Schl ot t erer 2002; Beaumont and Bal di ng 2004; Fol l and Gaggi ot t i 2008; Excoffi er et al .
2009) .
8.2. 8.1 I sland model ( FDI ST approach)
Beaumont and Ni chol s( 1996) proposed t o obt ai n t he di st ri but i on of F
ST
acr oss l oci as a
funct i on of het er ozygosi t y bet ween popul at i ons by performi ng si mul at i ons under an fi ni t e
i sl and- model , and t o speci fi call y i dent i fy out li er l oci as bei ng t hose pr esent i n t he t ail s of

Manual Arlequin ver 3.5 Met hodological out lines 160

t he generat ed di st ri but ion. They have shown t hat t hi s si mpl e i sl and model l ed t o F
ST

di st ri but i ons t hat wer e very si mil ar t o t hose expect ed under al t ernat i ve model s, l i ke
scenari os of recent di vergence and growt h ( col oni sat i on) , of i sol at i on by di st ance ( 2- D
st eppi ng st one) or of het er ogeneous l evel s of gene fl ow bet ween popul at i ons. Thei r
approach was i mpl ement ed i n t he FDI ST comput er program, wi t h some modi fi cat i ons.

The approach of Excoffi er et al . ( 2009) i mpl ement ed i n Arl equin i s si mil ar t o t hat i n
FDI ST, wher e coal escent si mul at i ons are used t o get a null di st ri but i on and confi dence
i nt erval s ar ound t he obser ved val ues, and see i f obser ved l ocus- speci fi c F
ST
val ues can be
consi der ed as out l i ers F
ST
condi t i oned on t he gl obal observed F
ST
val ue. The approach
al so assumes a fi ni t e i sland model where d demes of si ze N recei ve on average Nm new
i mmi grant genes per generat i on, randoml y chosen from all t he ot her demes. Under t hi s
model , one expect s t he fol l owi ng rel at i onshi p bet ween t he paramet er s of t he i sl and
model and F
ST
, as
1
4
1
1
ST
F
Nmd
d
=
+

( Sl at ki n 1991)
all owi ng one t o est i mat e m fr om t he above equat i on for a fi xed number of si mul at ed
demes d and a fi xed deme si ze. Mut at i ons are t hen added under a gi ven mut at i on model
on t op of t he si mul at ed coal escent t r ee t o creat e genet i c di versi t y, and t o obt ai n t he j oi nt
di st ri but i on of F
ST
and het erozygosi t y bet ween popul at i ons. I n Arl equi n, mut at i on model s
ot her t han t he fi nit e si t e model are used, and for i nst ance a speci fi c SMM model i s used
for mi crosat el li t e dat a, and a speci fi c SNP model i s used for DNA sequences, wi t h t he
possi bili t y i n t he l at t er case t o defi ne a mi ni mum fr equency for t he deri ved al l el e
( DAF
min
) . I t i s al so possibl e in Arl equi n t o comput e Rho- st at i st i cs for mi cr osat elli t e dat a
( see Mi chal aki s and Excoffi er , 1996) i nst ead of convent i onal F- st at i st i cs, whi ch have been
shown t o l ead t o an unbi ased di st ri but i on of F
ST
. A fi nal di ffer ence bet ween t he FDI ST
approach and t hat i mplement ed i n Arl equi n i s t hat t he het erozygosi t y bet ween
popul at i ons
1

H i s inferr ed from t he average het er ozygosi t y wi t hi n popul at ion


0

h as
( )
1 0

/ 1
ST
H h F = ( Excoffi er et al , 2009) .

Manual Arlequin ver 3.5 Met hodological out lines 161

Loci wi t h vari abl e het er ozygosi t i es ar e generat ed by modeli ng di ffer ent mut at i on rat es.
For each si mul at i on, we obt ai n a di fferent mut at i on rat e by drawi ng a t arget
het er ozygosi t y at random fr om a uni form di st ribut i on and use cl assi cal r el at i onshi ps
bet ween het er ozygosi t y and scal ed mut at i on rat e 4kdNu = as ( )
1
1 1 H

= under
t he I AM ( Wri ght 1931) , and ( )
2 1
1 1
2
H


=

under t he SMM model ( Oht a and
Ki mura, 1973) .
8.2. 8.2 Hierarchical island model
The fi ni t e i sl and model has been r ecent l y shown t o l ead t o a l arge fract i on of fal se
posi t i ves, i f popul at i ons sampl es bel ong t o a hi erar chi call y subdi vi ded popul at i on or i f
some popul at i on sampl es have a recent shar ed hi st ory, such as aft er some range
expansi on over di fferent cont i nent s ( Excoffi er et al . 2009) . I nt ui t i vel y, t hi s can be
underst ood by r eal i zing t hat t he preci si on i n est imat i ng FST shoul d increase wi t h t he
number of sampl ed popul at i ons, and t herefore t he confi dence i nt erval s around a gi ven
FST val ue shoul d become nar rower wi t h t he number of sampl ed popul at i ons. So, i f some
sampl ed popul at i ons ar e not i ndependent uni t s, but shar e a very r ecent common
ancest ry wi t h some ot her s, confi dence i nt erval s est i mat ed by assumi ng all popul at i ons
are equal l y rel at ed woul d be t oo narr ow, and some l oci will be fal se posit i ves.


Ex cess of f al se posi t i v es ( t aken from
Excoffier et al. 2009) :
The excess of false posit ives occurring when
samples drawn from a hierarchically
st ruct ured populat ion are analyzed under a
finit e island model is illust rat ed in t he above
figure. The diversit y of 1,000 SNP loci ( open
circles) was simulat ed under a hierarchical
island model wit h 10 groups of 100 demes.
The j oint null dist ribut ion of F
ST
and
Het erozygosit y ( 30,000 grey dot s) was t hen
obt ained under a finit e island, leading t o a
large number of out lier loci.

I n ot her t o overcome t hi s probl em and t o reduce t he number of fal se posi t i ve l oci , a
hi erarchi cal i sl and model of popul at i on ( as defi ned by Sl at ki n and Voelm, 1991) was used
t o model some het erogenei t y i n popul at i on affinit i es.

Manual Arlequin ver 3.5 Met hodological out lines 162


Thi s model shown above, wher e demes wi t hi n groups exchange mi grant s at rat e m
1
/ ( d-
1) and demes bet ween groups exchange mi grant s at rat e m
2
/ ( k- 1) ( wher e d i s t he
number of demes wi t hin each gr oup, and k i s t he number of groups) has been st udi ed by
Sl at ki n and Voel m ( 1991) . They i nferr ed r el at i onshi p bet ween t he model paramet er s and
expect ed G- st at i st i cs. Si mil ar rel at i onshi p can be i nfer red wi t h hi erarchi cal F- st at i st i cs
( Excoffi er et al . 2009) as shown bel ow:

1
1
1 4
1
SC
F
d
Nm
d
=
+



2
2
1
1
1 4 ( 1)
1 1
CT
F
m k k
Nd m d
k k m
=
+ +



2
1
1 4
( 1)
ST
F
k
Nd m
k

,
I t fol l ows t hat t he paramet ers of a hi erar chi cal i sl and- model can be speci fi ed such as t o
have i n expect at i on t he observed F- st at i st i cs, and t herefore t hat coal escent si mul at i ons
can be used t o si mul at e t he null di st ri but i on of t hese st at i st i cs under t he hi erar chi cal
i sl and model .
For det ect i ng out li er l oci, w e adv ocat e t h e u se of F
ST
and not F
CT
as a t est st at i st i c.

Manual Arlequin ver 3.5 Met hodological out lines 163


Ex ampl e of F
ST
di st r i but i ons obt ai ned f or mi cr osat el l i t e l oci ( t aken from Excoffier et al.
2009) . The diversit y of 1000 STR loci ( open circles) was simulat ed under a hierarchical island
model wit h 10 groups of 100 demes. The migrat ion rat es wit hin and bet ween groups were adj ust ed
such as t o have F
SC
= 0.05 and F
CT
= 0.2, implying an F
ST
of 0.240. The j oint null dist ribut ion of F
ST

and Het erozygosit y ( 20,000 grey dot s) was t hen obt ained under a hierarchical island based on F-
st at ist ics comput ed assuming a st epwise mut at ion model ( RoST) . Not e t hat t he x- axis represent ing
t he het erozygosit y bet ween populat ions can be larger t han 1 since it is comput ed as t he
het erozygosit y wit hin populat ion divided by ( 1- F
ST
) .

Ex ampl e of F
ST
di st r i but i ons obt ai ned f or SNP dat a ( t aken from Excoffier et al. 2009) . The
diversit y of 1,000 SNP loci ( open circles) was simulat ed under a hierarchical island model wit h 10
groups of 100 demes. The j oint null dist ribut ion of F
ST
and Het erozygosit y ( 30,000 grey dot s) was
t hen obt ained under a hierarchical island.



Manual Arlequin ver 3.5 References 164

9 REFERENCES
Abramovi t z, M., and I . A. St egun, 1970 Handbook of Mat hemat i cal Funct i ons. Dover,
New York.
Ari s- Brosou, S., and L. Excoffi er, 1996 The i mpact of popul at i on expansi on and mut at i on
rat e het erogenei t y on DNA sequence pol ymorphi sm. Mol . Bi ol . Evol . 13: 494- 504.
Beaumont MA, Ni chol s RA ( 1996) Eval uat ing l oci for use i n t he genet i c anal ysi s of
popul at i on st ruct ure. Pr oceedi ngs of t he Royal Soci et y London B 263, 1619- 1626.
Beaumont MA, Bal di ng DJ ( 2004) I dent i fyi ng adapt i ve genet i c di vergence among
popul at i ons from genome scans. Mol Ecol 13, 969- 980.
Cavalli - Sforza LL ( 1966) Popul at i on st ruct ur e and human evol ut i on. Proc R Soc Lond B
Bi ol Sci 164, 362- 379.
Cavalli - Sforza, L. L., and W. F. Bodmer, 1971 The Genet i cs of Human Popul at i ons. W. H.
Freeman and Co., San Franci sco, CA.
Chakrabort y, R. 1990 Mi t ochondri al DNA pol ymorphi sm r eveal s hi dden het er ogenei t y
wi t hin some Asi an populat i ons. Am. J. Hum. Genet . 47: 87- 94.
Chakrabort y, R., and K. M. Wei ss, 1991 Genet i c vari at i on of t he mi t ochondri al DNA
genome i n Ameri can I ndi ans i s at mut at i on- dri ft equili bri um. Am. J. Hum. Genet .
86: 497- 506.
Cockerham, C. C., 1969 Vari ance of gene fr equenci es. Evol ut i on 23: 72- 83.
Cockerham, C. C., 1973 Anal ysi s of gene frequenci es. Genet i cs 74: 679- 700.
Davi es N, Vill abl anca FX and Roderi ck GK, 1999. Det ermi ni ng t he source of i ndi vi dual s:
mul t il ocus genot ypi ng in nonequili bri um popul at i on genet i cs. TREE 14: 17- 21.
Dempst er , A. , N. Lai rd and D. Rubi n, 1977 Maximum li kelihood est i mat ion fr om
i ncompl et e dat a vi a t he EM al gori t hm. J Roy St at i st Soc 39: 1- 38.
Efron, B. 1982 The Jackni fe, t he Boot st rap and ot her Resampli ng Pl ans. Regi onal
Confer ence Seri es i n Appli ed Mat hemat i cs, Phil adel phi a: .
Efron, B., and R. J. Ti bshi rani . 1993. An I nt roduct i on t o t he Boot st rap. Chapman and
Hall , London.
Ewens, W.J. 1972 The sampli ng t heory of sel ect i vel y neut ral all el es. Theor. Popul . Bi ol .
3: 87- 112.
Ewens, W.J. 1977. Popul at i on genet i cs t heor y i n r el at i on t o t he neut rali st - sel ect i oni st
cont r over sy. I n: Advances i n human genet i cs, edi t ed by Har ri s, H. and Hi rschhorn,
K.New York: Pl enum Press,p. 67- 134.

Manual Arlequin ver 3.5 References 165

Excoffi er L. 2003. Anal ysi s of Popul at i on Subdi vi si on. I n: Bal di ng D, Bi shop M, Canni ngs
C, edi t or s. Handbook of St at i st i cal Genet i cs, 2nd Edi t i on. New Yor k: John Wil ey &
Sons, Lt d. pp. 713- 750.
Excoffi er L. 2004. Pat t er ns of DNA sequence di ver si t y and genet i c st ruct ure aft er a range
expansi on: l essons from t he i nfini t e- i sl and model . Mol Ecol 13( 4) : 853- 864.
Excoffi er, L., Smouse, P., and Quat t ro, J. 1992 Anal ysi s of mol ecul ar vari ance i nferr ed
from met ri c di st ances among DNA hapl ot ypes: Appli cat i on t o human mi t ochondri al
DNA r est ri ct i on dat a. Genet i cs 131: 479- 491.
Excoffi er, L., and P. Smouse, 1994. Usi ng all el e fr equenci es and geographi c subdi vi si on
t o r econst ruct gene geneal ogi es wi t hi n a speci es. Mol ecul ar vari ance parsi mony.
Genet i cs 136, 343- 59.
Excoffi er, L. and M. Sl at ki n. 1995 Maxi mum- li keli hood est i mat i on of mol ecul ar hapl ot ype
fr equenci es i n a di pl oid popul at i on. Mol . Bi ol . Evol . 12: 921- 927
Excoffi er, L., and M. Sl at ki n, 1998 I ncorporat i ng genot ypes of r el at i ves i nt o a t est of
linkage di sequili bri um. Am. J. Hum. Genet . 171- 180
Excoffi er L, Laval G, Bal di ng D. 2003. Gamet i c phase est i mat i on over l arge genomi c
regi ons usi ng an adapt ive wi ndow approach. Human Genomi cs 1: 7- 19.
Excoffi er L, Est oup A, Cornuet J- M ( 2005) Bayesi an Anal ysi s of an Admixt ure Model Wi t h
Mut at i ons and Arbi t rarily Li nked Marker s. Genet i cs 169: 1727- 1738.
Excoffi er L, Hofer T, Fol l M ( 2009) Det ect i ng l oci under sel ect i on i n a hi erar chi call y
st ruct ur ed popul at i on. Her edi t y.
Foll M, Gaggi ot t i O ( 2008) A genome- scan met hod t o i dent i fy sel ect ed l oci appropri at e for
bot h domi nant and codomi nant markers: a Bayesi an perspect i ve. Genet i cs 180,
977- 993.
Fu, Y. - X. ( 1997) St at i st ical t est s of neut ral it y of mut at i ons against popul at i on growt h,
hit chhi ki ng and backgroud sel ect i on. Genet i cs 147: 915- 925.
Gaggi ot t i , O., and L. Excoffi er, 2000. A si mpl e met hod of removi ng t he effect of a
bot t l eneck and unequal popul at i on sizes on pai r wi se genet i c di st ances. Proceedi ngs
of t he Royal Soci et y London B 267: 81- 87.
Garza JC, Wi lli amson EG ( 2001) Det ect i on of r educt i on i n popul at i on si ze using dat a from
mi crosat el li t e l oci . Mol Ecol 10: 305- 318.
Gol dst ei n DB, Rui z Linares A, Cavalli - Sforza LL, Fel dman MW ( 1995) Genet i c absol ut e
dat ing based on mi crosat elli t es and t he ori gin of modern humans. Proc Nat l Acad
Sci U S A 92, 6723- 6727.

Manual Arlequin ver 3.5 References 166

Goudet , J., M. Raymond, T. de Mees and F. Rousset , 1996 Test i ng di ffer ent i at i on in
di pl oi d popul at i ons. Genet i cs 144: 1933- 1940.
Guo, S. and Thompson, E. 1992 Perfor mi ng t he exact t est of Hardy- Wei nberg proport i on
for mul t i pl e all el es. Bi omet ri cs 48: 361- 372.
Harpendi ng, R. C., 1994 Si gnat ure of anci ent popul at i on growt h i n a l ow- resol ut i on
mi t ochondri al DNA mi smat ch di st ri but i on. Hum. Bi ol . 66: 591- 600.
Hudson, R. R. , 1990 Gene geneal ogi es and t he coal escent pr oces, pp. 1- 44 i n Oxford
Surveys i n Evol ut i onary Bi ol ogy, edi t ed by Fut uyama, and J. D. Ant onovi cs. Oxford
Uni versi t y Press, New York.
Jin, L., and Nei M. ( 1990) Li mi t at i ons of t he evol ut i onary parsi mony met hod of
phyl ogenet i c anal ysi s. Mol . Bi ol . Evol . 7: 82- 102.
Jukes, T. and Cant or, C. 1969 Evol ut i on of prot ei n mol ecul es. I n: Mammali an Prot ei n
Met abol i sm, edi t ed by Munro HN, New York: Academi c press, p. 21- 132.
Lewont i n RC, Krakauer J ( 1973) Di st ri but i on of gene fr equency as a t est of t he t heor y of
t he sel ect i ve neut ral i t y of pol ymorphi sms. Genet i cs 74, 175- 195.
Ki mura, M. 1980 A si mpl e met hod for est i mat i ng evol ut i onary rat e of base subst i t ut i on
t hrough comparat i ve st udi es of nucl eot i de sequences. J. Mol . Evol . 16: 111- 120.
Kruskal , J. B. , 1956. On t he short est spanni ng subt ree of a graph and t he t ravel ling
sal esman pr obl em. Proc. Amer. Mat h. Soc. 7: 48- 50.
Kumar, S., Tamur a, K., and M. Nei . 1993 MEGA, Mol ecul ar Evol ut i onary Genet i c Anal ysi s
ver 1.0. The Pennsyl vania St at e Uni ver si t y, Uni ver si t y Park, PA 16802.
Lange, K. , 1997 Mat hemat i cal and St at i st i cal Met hods for Genet i c Anal ysi s. Spri nger,
New York.
Levene H. ( 1949) . On a mat chi ng probl em ari sing i n genet i cs. Annal s of Mat hemat i cal
St at i st i cs 20, 91- 94.
Lewont i n, R. C. ( 1964) The i nt eract i on of sel ect i on and li nkage. I . General
consi derat i ons; het er ot i c model s. Genet i cs 49: 49- 67.
Lewont i n, R. C., and K. Koj i ma. ( 1960) The evol ut i onary dynami cs of compl ex
pol ymorphi sms. Evol ut ion 14: 450- 472.
Li , W.H. ( 1977) Di st ri but i on of nucl eot i de di ffer ences bet ween t wo randoml y chosen
ci st rons i n a fi ni t e populat i on. Genet i cs 85: 331- 337.
Long, J. C. , 1986 The al l eli c corr el at i on st ruct ur e of Gai nj and Kal am speaki ng peopl e. I .
The est i mat i on and i nt erpr et at i on of Wri ght ' s F- st at i st i cs. Genet i cs 112: 629- 647.

Manual Arlequin ver 3.5 References 167

Mant el , N. 1967. The det ect i on of di sease cl ust eri ng and a general i zed regr essi on
approach. Cancer Res 27: 209- 220.
Mi chal aki s, Y. and Excoffi er, L. , 1996 A generi c est i mat i on of popul at i on subdi vi si on
usi ng di st ances bet ween all el es wi t h speci al refer ence t o mi crosat el li t e l oci .
Genet i cs 142: 1061- 1064.
Nei , M., 1987 Mol ecul ar Evol ut i onary Genet i cs. Col umbi a Uni versi t y Pr ess, New Yor k, NY,
USA.
Nei , M., and W. H. Li . 1979. Mat hemat i cal model for st udyi ng genet i c vari at i on in t erms
of rest ri ct i on endonucl eases. Pr oc. Nat l .Acad.Sci .USA 76: 5269- 5273.
Paet kau D, Cal vert W, St i rli ng I and St robeck C, 1995. Mi cr osat el li t e anal ysi s of
popul at i on st ruct ure i n Canadi an pol ar bear s. Mol Ecol 4: 347- 54.
Oht a T, Ki mura M ( 1973) A model of mut at i on appropri at e t o est i mat e t he number of
el ect r ophor et i call y det ect abl e all el es i n a fi ni t e popul at i on. Genet Res 22: 201- 204
Paet kau D, Wai t s LP, Cl arkson PL, Crai ghead L and St robeck C, 1997. An empi ri cal
eval uat i on of genet i c di st ance st at i st i cs usi ng micr osat elli t e dat a fr om bear
( Ursi dae) popul at i ons. Genet i cs 147: 1943- 1957.
Pri m, R. C. , 1957. Short est connect i on net works and some general izat i ons. Bell Syst .
Tech. J. 36: 1389- 1401.
Press, W. H., S. A. Teukol sky, W. T. Vet t erli ng and B. P. Fl annery, 1992. Numeri cal
Reci pes i n C: The Art of Sci ent i fi c Comput ing. Cambri dge: Cambri dge Uni versi t y
Press.
Rannal a B, and Mount ai n JL, 1997. Det ect i ng immi grat i on by usi ng mult il ocus genot ypes.
Proc.Nat l .Acad.Sci .USA 94: 9197- 9201.
Ray N, Cur rat M, Excoffi er L. 2003. I nt ra- Deme Mol ecul ar Di versi t y i n Spat i all y Expanding
Popul at i ons. Mol Bi ol Evol 20( 1) : 76- 86.
Raymond M. and F. Rousset . 1994 GenePop. ver 3.0. I nst i t ut des Sci ences de l ' Evol ut i on.
Uni versi t de Mont pel li er, France.
Raymond M. and F. Rousset . 1995 An exact t es for popul at i on di ffer ent iat i on. Evol ut i on
49: 1280- 1283.
Reynol ds, J. , Wei r, B.S., and Cockerham, C.C. 1983 Est i mat i on for t he coancest r y
coeffi ci ent : basi s for a short - t erm genet i c di st ance. Genet i cs 105: 767- 779.
Ri ce, J.A. 1995 Mat hemat i cal St at i st i cs and Dat a Anal ysi s. 2nd ed. Duxburry Pr ess:
Bel mont , CA
Rogers, A., 1995 Genet i c evi dence for a Pl ei st ocene popul at i on expl osi on. Evol ut i on 49:
608- 615.

Manual Arlequin ver 3.5 References 168

Rogers, A. R., and H. Harpendi ng, 1992 Popul at i on growt h makes waves i n t he
di st ri but i on of pai rwi se genet i c di ffer ences. Mol . Bi ol . Evol . 9: 552- 569.
Rohl f, F. J., 1973. Al gori t hm 76. Hi erarchi cal cl ust eri ng usi ng t he mini mum spanni ng
t ree. The Comput er Journal 16: 93- 95.
Rousset , F. , 1996 Equilibri um values of measur es of popul at i on subdi visi on for st epwi se
mut at i on processes. Genet i cs 142: 1357- 1362.
Rousset , F. , 2000. I nfer ences fr om spat i al popul at i on genet i cs, i n Handbook of St at ist i cal
Genet ics, D. Bal di ng, M. Bi shop and C. Canni ngs. ( eds.) Wi l ey & Sons, Lt d.,
Schl ot t erer C ( 2002) A mi crosat el li t e- based mul t il ocus scr een for t he i dent i fi cat i on of
l ocal sel ect i ve sweeps. Genet i cs 160, 753- 763.
Schnei der, S. , and L. Excoffi er. 1999. Est i mat i on of demographi c paramet ers fr om t he
di st ri but i on of pai rwi se di fferences when t he mut at i on rat es vary among si t es:
Appli cat i on t o human mi t ochondri al DNA. Genet i cs 152: 1079- 1089.
Sl at ki n, M., 1991 I nbr eedi ng coeffi ci ent s and coal escence t i mes. Genet . Res. Camb. 58:
167- 175.
Sl at ki n M, Voel m L ( 1991) FST i n a hi erarchi cal i sl and model . Genet i cs 127, 627. - 629
Sl at ki n, M. 1994a Li nkage di sequili bri um i n growi ng and st abl e popul at i ons. Genet i cs
137: 331- 336.
Sl at ki n, M. 1994b An exact t est for neut ral i t y based on t he Ewens sampl i ng di st ri but i on.
Genet . Res. 64( 1) : 71- 74.
Sl at ki n, M. 1995 A measure of popul at i on subdivi si on based on mi crosat elli t e all el e
fr equenci es. Genet i cs 139: 457- 462.
Sl at ki n , M. 1996 A cor r ect i on t o t he exact t est based on t he Ewens sampling di st ri but i on.
Genet . Res. 68: 259- 260.
Sl at ki n, M. and Excoffi er, L. 1996 Test i ng for l i nkage di sequi librium in genot ypi c dat a
usi ng t he EM al gori t hm. Her edi t y 76: 377- 383.
Smouse, P. E., and J. C. Long. 1992. Mat ri x corr el at i on anal ysi s i n Ant hropol ogy and
Genet i cs. Y. Phys. Ant hop. 35: 187- 213.
Smouse, P. E., J. C. Long and R. R. Sokal . 1986. Mul t i pl e regressi on and corr el at i on
ext ensi ons of t he Mant el Test of mat ri x cor r espondence. Syst emat i c Zool ogy
35: 627- 632.
Sokal , R. R. , and F. J. Rohl f. 1981. Bi omet r y. 2
nd
edi t i on. W. H. Freeman and Co., San
Franci sco, CA.

Manual Arlequin ver 3.5 References 169

St ewart , F. M. 1977 Comput er al gori t hm for obt ai ning a random set of all el e fr equenci es
for a l ocus i n an equili bri um popul at i on. Genet i cs 86: 482- 483.
St robeck, K. 1987 Aver age number of nucl eot i de di fferences i n a sample from a si ngl e
subpopul at i on: A t est for popul at i on subdi vi si on. Genet i cs 117: 149- 153.
Taj i ma, F. 1983 Evol ut ionary r el at i onshi p of DNA sequences i n fi ni t e popul at i ons.
Genet i cs 105: 437- 460.
Taj i ma, F. 1989a. St at i st i cal met hod for t est i ng t he neut ral mut at i on hypot hesi s by DNA
pol ymorphi sm. Genet i cs 123: 585- 595,.
Taj i ma, F. 1989b. The effect of change i n populat i on size on DNA pol ymorphi sm.
Genet i cs 123: 597- 601,.
Taj i ma, F. 1993. Measurement of DNA pol ymor phi sm. I n: Mechani sms of Mol ecul ar
Evol ut i on. I nt r oduct i on t o Mol ecul ar Pal eopopul at i on Bi ol ogy, edi t ed by Takahat a,
N. and Cl ark, A.G., Tokyo, Sunderl and, MA: Japan Sci ent i fi c Soci et i es Press, Si nauer
Associ at es, I nc., p. 37- 59.
Taj i ma, F. and Nei , M. 1984. Est i mat i on of evolut i onary di st ance bet ween nucl eot i de
sequences. Mol . Bi ol . Evol . 1: 269- 285.
Taj i ma, F., 1996 The amount of DNA pol ymorphi sm mai nt ai ned i n a fi ni t e popul at i on
when t he neut ral mut at ion rat e vari es among si t es. Genet i cs 143: 1457- 1465.
Tamura, K., 1992 Est i mat i on of t he number of nucl eot i de subst i t ut i ons when t her e ar e
st rong t ransi t i on- t ransver si on and G+ C cont ent bi ases. Mol . Bi ol . Evol . 9: 678- 687.
Tamura, K., and M. Nei , 1993 Est i mat i on of t he number of nucl eot i de subst i t ut i ons i n t he
cont r ol regi on of mi t ochondri al DNA in humans and chi mpanzees. Mol . Bi ol . Evol .
10: 512- 526.
Uzell , T., and K. W. Cor bi n, 1971 Fi t t ing di scr et e pr obabi lit y di st ri but i on t o evol ut i onary
event s. Sci ence 172: 1089- 1096.
Waser PM, and St robeck C, 1998. Genet i c si gnat ures of i nt erpopul at i on di spersal . TREE
43- 44.
Wat t er son, G., 1975 On t he number of segregat i ng si t es i n genet i cal model s wi t hout
recombi nat i on. Theor .Popul .Bi ol . 7: 256- 276.
Wat t er son, G. 1978. The homozygosi t y t est of neut rali t y. Genet i cs 88: 405- 417
Wat t er son, G. A. , 1986 The homozygosi t y t est aft er a change i n popul at i on size. genet i cs
112: 899- 907.

Manual Arlequin ver 3.5 References 170

Wei r, B. S., 1996 Genet i c Dat a Anal ysi s I I : Met hods for Di scr et e Popul at i on Genet i c Dat a.
Si nauer Assoc., I nc., Sunderl and, MA, USA.
Wei r, B.S. and Cockerham, C.C. 1984 Est i mat ing F- st at i st i cs for t he anal ysi s of
popul at i on st ruct ure. Evol ut i on 38: 1358- 1370.
Wei r, B.S., and Hi ll , W.G. 2002. Est i mat i ng F- st at i st i cs. Annu Rev Genet 36, 721- 750.
Wri ght S ( 1931) Evol ut ion i n Mendeli an popul at i ons. Genet i cs 16, 97- 159.
Wri ght , S., 1951 The genet i cal st ruct ur e of popul at i ons. Ann.Eugen. 15: 323- 354.
Wri ght , S., 1965 The i nt erpret at i on of popul at i on st ruct ur e by F- st at i st ics wi t h speci al
regard t o syst ems of mat i ng. Evol 19: 395- 420.
Zour os, E., 1979 Mut at ion rat es, popul at i on si zes and amount s of el ect rophor et i c
vari at i on of enzyme l oci i n nat ural popul at i ons. Genet i cs 92: 623- 646.


Manual Arlequin ver 3.5 Appendix 171

10 APPENDI X
10.1 Over v i ew of i nput f i l e k eyw or ds
Key w or ds Descr i pt i on Possi bl e v al ues
[ Pr of i l e]
Ti t l e A t i t l e descri bi ng t he
present anal ysi s
A st ri ng of al phanumeric char act ers wi t hi n
doubl e quot es
NbSampl es The number of di fferent
sampl es li st ed i n t he
dat a fil e
A posi t i ve i nt eger l arger t han zer o
Dat aTy pe The t ype of dat a t o be
anal yzed
( onl y one t ype of dat a
per pr oj ect fil e i s
all owed)
STANDARD,
DNA,
RFLP,
MI CROSAT,
FREQUENCY
Gen ot y pi cDat a Speci fi es i f genot ypi c or
gamet i c dat a i s
avail abl e
0 ( hapl ot ypi c dat a) ,
1 ( genot ypi c dat a)
LocusSepar at or The charact er used t o
separat e adj acent l oci
WHI TESPACE,
TAB,
NONE,
or any charact er ot her t han "# ", or t he
charact er speci fyi ng mi ssi ng dat a
Defaul t : WHI TESPACE
Gamet i cPhase Speci fi es i f t he gamet i c
phase i s known ( for
genot ypi c dat a onl y)
0 ( gamet i c phase not known) ,
1 ( known gamet i c phase)
Defaul t : 1
Recessi v eDat a Speci fi es whet her
recessi ve al l el es are
present at al l l oci ( for
genot ypi c dat a)
0 ( co- domi nant dat a) ,
1 ( recessi ve dat a)
Defaul t : 0
Recessi v eAl l el e Speci fi es t he code for
t he r ecessi ve al l el e
Any st ri ng wi t hin quot at i on marks
Thi s st ri ng can be expli ci t l y used i n t he
i nput fil e t o i ndi cat e t he occur rence of a
recessi ve homozygot e at one or sever al
l oci .
Defaul t : "null "
Mi ssi ngDat a A charact er used t o
speci fy t he code for
mi ssing dat a
"?" or any charact er wi t hin quot es, ot her
t han t hose previ ousl y used
Defaul t : "?"
Fr equency Speci fi es t he format of
hapl ot ype fr equenci es
ABS ( absol ut e val ues) ,
REL ( rel at i ve val ues: absol ut e val ues wi ll
be found by mul t i pl yi ng t he r el at i ve
fr equenci es by t he sampl e si zes)
Defaul t : ABS


Manual Arlequin ver 3.5 Appendix 172

Key w or ds Descr i pt i on Possi bl e v al ues
[ Dat a]
[ [ Hapl ot y peDef i n i t i on] ]

( facul t at i ve sect i on)

Hapl Li st Name The name of a
hapl ot ype defi nit i on li st
A st ri ng wi t hin quot at i on marks
Hapl Li st The l i st of hapl ot ypes
li st ed wi t hin braces
( { ...} )
A seri es of hapl ot ype defi ni t i ons gi ven on
separat e l i nes for each hapl ot ype. Each
hapl ot ype i s defi ned by a hapl ot ype l abel
and a combi nat i on of allel es at di fferent
l oci . The Keyword EXTERN fol l owed by a
st ri ng wi t hi n quot at i on marks may be
used t o speci fy t hat a gi ven hapl ot ype l i st
i s i n a di fferent fi l e

Key w or ds Descr i pt i on Possi bl e v al ues
[ Dat a]
[ [ Di st anceMat r i x ] ]

( facul t at i ve sect i on)

Mat r i x Name The name of t he
di st ance mat ri x
A st ri ng wi t hin quot at i on marks
Mat r i x Si ze
The si ze of t he mat ri x A posi t i ve i nt eger l arger t han zer o
( cor r espondi ng t o t he number of
hapl ot ypes li st ed i n t he hapl ot ype li st )
Label Posi t i on Speci fi es whet her
hapl ot ypes l abel s are
ent er ed by r ow or by
col umn
ROW ( t he hapl ot ype l abel s will be ent ered
consecut i vel y on one or several lines,
wi t hin t he Mat ri xDat a segment , befor e
t he di st ance mat ri x el ement s) ,
COLUMN ( t he hapl ot ype l abel s will be
ent er ed as t he fi rst col umn of each r ow of
t he di st ance mat ri x i t self )
Mat r i x Dat a The mat ri x dat a i t sel f
li st ed wi t hin braces
( { ...} )
The mat ri x dat a will be ent er ed as a
for mat - fr ee l ower- di agonal mat ri x. The
hapl ot ype l abel s can be ei t her ent er ed
consecut i vel y on one or several lines ( i f
Label Posi t i on= ROW) , or ent ered at t he
fi rst col umn of each row ( i f
l abel Posi t i on= COLUMN) .
The speci al keyword EXTERN may be used
fol l owed by a fi l e name wi t hin quot at i on
marks, st at i ng t hat t he dat a must be r ead
i n an anot her fil e

Key w or ds Descr i pt i on Possi bl e v al ues
[ Dat a]
[ [ Sampl es] ]

Sampl eName The name of t he sampl e.
Thi s keyword i s used t o
A st ri ng wi t hin quot at i on marks

Manual Arlequin ver 3.5 Appendix 173

mark t he begi nning of a
sampl e defi nit i on
Sampl eSi ze Speci fi es t he sampl e si ze An i nt eger l arger t han zer o.
For hapl ot ypi c dat a, i t must speci fy t he
number of gene copi es in t he sampl e.
For genot ypi c dat a, i t must speci fy t he
number of i ndi vi dual s i n t he sampl e.
Sampl eDat a The sampl e dat a li st ed
wi t hin braces ( { .. .} )
The keyword EXTERN may be used
fol l owed by a fi l e name wi t hin quot at i on
marks, st at i ng t hat t he dat a must be r ead
i n a separat e fi l e. The Sampl eDat a keyword
ends a sampl e defi ni t i on



Manual Arlequin ver 3.5 Appendix 174

Key w or ds Descr i pt i on Possi bl e v al ues
[ Dat a]
[ [ St r uct u r e] ]

( facul t at i ve sect i on)

St r uct u r eName The name of a gi ven
genet i c st ruct ure t o t est
A st ri ng of charact er s wi t hi n quot at i on
marks
NbGr ou ps The number of groups of
popul at i ons
An i nt eger l arger t han zer o
Gr oup The defi ni t i on of a group
of sampl es, i dent i fi ed by
t hei r Sampl eName l i st ed
wi t hin braces ( { . ..} )
A seri es of st ri ngs wi t hin quot at i on marks
all encl osed wi t hi n braces, and, i f desi r ed,
on separat e l i nes

Key w or ds Descr i pt i on Possi bl e v al ues
[ Dat a]
[ [ Mant el ] ]
( facul t at i ve sect i on)
All ows comput i ng t he
( part i al ) corr el at i on
bet ween YMat rix and X1
( X2) .

Mat r i x Si ze The si ze of t he mat ri x
ent er ed i nt o t he proj ect
An i nt eger l arger t han zer o
YMat r i x Speci fi es whi ch mat ri x is
used as YMat rix.
"fst ", "l og_fst ", "sl at ki nlinearfst ",
"l og_sl at ki nlinearfst " ,
"nm", " cust om"
Mat r i x Number Number of mat ri ces t o
be compar ed wi t h t he
YMat rix.
1 : we comput e t he cor r el at i on bet ween
YMat rix and X1
2 : we comput e t he part i al corr el at i on
bet ween YMat rix, X1and X2
YMat r i x Label s Label s t o i dent i fy t he
ent ri es of t he YMat rix. I n
case of YMat rix= fst ,
t hese l abel s shoul d
corr espond t o popul at i on
names i n t he sampl e.
A seri es of st ri ngs wi t hin quot at i on marks al l
encl osed wi t hi n braces, and, i f desi red, on
separat e l i nes
Di st Mat Mant el A keyword used t o
defi ne a mat ri x, whi ch
can be ei t her t he
Ymat ri x, or anot her
mat ri x t hat will be
compared wi t h t he
Ymat ri x.
The mat ri x dat a will be ent er ed as a format -
fr ee l ower- di agonal mat ri x.
UsedYMat r i x Label s Label s defi ni ng t he sub-
mat ri x of t he YMat ri x on
whi ch t he cor r el at i on i s
comput ed.
A seri es of st ri ngs wi t hin quot at i on marks al l
encl osed wi t hi n braces, and, i f desi red, on
separat e l i nes

Anda mungkin juga menyukai