Email: becky.hill@nist.gov Phone: 301-975-4275 Carolyn R. Hill, Michael D. Coble,* and John M. Butler
National Institute of Standards and Technology, Biochemical Science Division, Gaithersburg, MD 20899-8311 *Current address: Armed Forces DNA Identification Laboratory, Research Section, Rockville, MD 20850 Copy of poster available: A total of 26 novel mini-short tandem repeat (miniSTR) loci have been developed and characterized to aid in the analysis of degraded DNA samples. These new markers produce short PCR products in the target range of 50 150 base pairs (bp) by moving the primer sequences as close as possible, if not directly next to the identified repeat region [1]. More than 900 candidate loci were initially screened to determine optimal miniSTR markers based on the following criteria: small amplicon sizes (<125 bp), narrow allele spreads (<24bp), observed heterozygosities (>0.70), and locations on chromosomes unoccupied by the 13 CODIS STR loci, or at least 50 Mb away from them on the same chromosome [2]. The miniSTR loci selected included D1GATA113E02, D1S1627, D1S1677, D2S441, D2S1776, D3S3053, D3S4529, D4S2364, D4S2408, D5S2500, D6S474, D6S1017, D8S1115, D9S1122, D9S2157, D10S1248, D10S1435, D11S4463, D12ATA63A05, D14S1434, D17S974, D17S1301, D18S853, D20S482, D20S1082, and D22S1045. All of these markers were sequenced and evaluated across more than 600 samples, and their population statistics were determined [3]. The heterozygosities of the new loci were compared to those of the 13 CODIS loci and all were found to be comparable. Only seven of the new loci had lower heterozygosity values than the CODIS loci; however, all of these were much smaller in size [3]. This data suggests that these additional 26 miniSTR loci will serve as useful complements to the CODIS loci to aid in the forensic analysis of degraded DNA. In addition, these new loci will be valuable in a variety of scenarios, particularly for paternity cases, missing persons work, or mass fatality DNA identification testing involving kinship samples [2]. In fact, three of these new markers (D10S1248, D2S441, and D22S1045) from the initial six miniSTR loci previously described [2] have recently been recommended for adoption by the European DNA community as new core loci for forensic testing [4,5].
[1] Butler, J.M., Shen, Y., McCord, B.R. (2003) The development of reduced size STR amplicons as tools for analysis of degraded DNA. J. Forensic Sci. 48(5): 1054-1064. [2] Coble, M.D., Butler, J.M. (2005) Characterization of new miniSTR loci to aid analysis of degraded DNA. J. Forensic Sci. 50(1): 43-53. [3] Hill, C.R., Coble, M.D., Butler, J.M. (2006) Development of additional new miniSTR loci for improved analysis of degraded DNA samples. submitted. [4] Gill, P., Fereday, L., Morling, N., Schneider, P.M. (2006) The evolution of DNA databases--recommendations for new European loci. Forensic Sci. Int. 156:242-244. [5] Gill, P., Fereday, L., Morling, N., Schneider, P.M. (2006) Letter to the Editor: New multiplexes for Europe Amendments and clarification of strategic development. Forensic Sci. Int .in press.
http://www.cstl.nist.gov/biotech/strbase/pub_pres/Promega2006_Hill.pdf
Issue of Potential Disease Linkage with New and Currently Used STR Loci The repeat unit nomenclature for these STR loci is defined by using the top strand in the GenBank accession reference sequence, selecting the first 5 full tandem repeat, and allowing a single nucleotide change in the repeat structure for a compound repeat (e.g., D22S1045 ATT and ACT). We have altered the nomenclature for D10S1248 and D22S1045 from that previously published in Coble and Butler (2005).
PubMed searches have noted potential disease linkages with the following STR loci: TH01 nicotine dependence, schizophrenic and bipolar disorders, hypertension, malaria predisposition D5S818 malaria predisposition D8S1179 neural tube defects (Meckel-Gruber syndrome), urinary microalbuminuria, type-2 diabetes D21S11 trisomy-21 (Downs syndrome) D18S51 trisomy-18 (Edwards syndrome) D10S1248 variation in sleep respiration rate, urogenital development D2S441 (also known as D2S1778) hereditary gingival fibromatosis, congenital glaucoma D22S1045 May-Hegglin anomaly D12ATA63 none known D9S2157 none known Please keep in mind though as noted by Butler [JFS 2006;51(2):253-265] that many of the core STR loci in current use have a common origin to loci widely used for human disease gene linkage analysis studies and it is important to keep in mind that even though medical genetic researchers claim to have shown linkage between a particular disease gene and a core STR marker, these types of findings are often tentative and should not prevent the continued use of the STR locus in question. In fact, Colin Kimpton and coworkers from the European DNA Profiling Group recognized early on in the application of STRs for human identity testing that it is likely that many or possibly most STRs will eventually be shown to be useful in following a genetic disease or other genetic trait within a family and therefore this possibility must be recognized at the outset of the use of such systems [FSI 1995;71:137-152]. The Marshfield genome-scan ~400 STR marker set used in searches for disease-causing gene includes TPOX, D7S820, D8S1179, D13S317, D16S539, and D19S433. The majority of the 26 new miniSTR loci characterized here are part of the same Marshfield genome-scan STR marker set.
D10S1248
GenBank accession AL391869; positions 136,773..136,874
8
-0.004 -------
9
-0.004 -0.004 0.002 0.003 ---
10
-0.004 -0.004 0.002 0.003 ---
11
-0.037 --0.002 --0.008
12
0.036 0.121 0.061 0.123 0.061 0.078 0.052 0.020
13
0.330 0.241 0.264 0.296 0.265 0.360 0.310 0.191
14
0.289 0.278 0.318 0.243 0.318 0.224 0.236 0.253
15
0.192 0.204 0.236 0.218 0.201 0.232 0.247 0.272
16
0.125 0.080 0.104 0.092 0.123 0.084 0.110 0.197
17
0.025 0.025 0.014 0.018 0.023 0.016 0.041 0.056
18
0.002 0.002 -0.004 0.004 -0.003 0.003
19
0.002 -0.004 ------
D2S441
GenBank accession AC079112; positions 85,324..85,415
Population U.S. Caucasian African American U.S. Hispanic Japanese Spanish Singapore/Chinese Singapore/Malay Singapore/Indian
# Tested
8
----0.002 --0.003
9
0.002 --------
10
0.202 0.084 0.332 0.229 0.191 0.227 0.231 0.343
11
0.350 0.370 0.318 0.363 0.301 0.376 0.272 0.382
11.3
0.061 0.053 0.036 0.018 0.102 0.062 0.190 0.062
12
0.053 0.171 0.025 0.222 0.032 0.203 0.110 0.065
12.3
0.002 0.006 0.004 -0.002 -0.006 --
13
0.029 0.041 0.014 0.032 0.019 0.014 0.011 0.017
13.3
-0.002 -------
14
0.245 0.253 0.214 0.127 0.305 0.114 0.168 0.110
14.3
-0.002 -------
15
0.057 0.019 0.050 0.011 0.046 0.005 0.014 0.014
16
--0.004 ----0.006
17
--0.004 ------
D22S1045
GenBank accession AL022314; positions 92,943..93,047 Population U.S. Caucasian African American U.S. Hispanic Japanese Spanish Singapore/Chinese Singapore/Malay Singapore/Indian
# Tested
8
-0.010 -------
9
---------
10
-0.043 0.018 ----0.006
11
0.140 0.130 0.061 0.190 0.117 0.178 0.170 0.264
12
0.015 0.056 0.018 -0.004 --0.003
13
0.009 0.004 0.011 -0.008 -0.006 0.003
14
0.058 0.080 0.025 0.021 0.036 0.032 0.052 0.093
15
0.332 0.259 0.454 0.327 0.352 0.305 0.382 0.396
16
0.362 0.187 0.311 0.218 0.362 0.230 0.162 0.160
17
0.079 0.210 0.096 0.201 0.116 0.233 0.209 0.070
18
0.004 0.016 0.007 0.035 0.006 0.019 0.014 0.003
19
-0.006 -0.007 -0.003 0.006 0.003 Stutter % -4 = 6.3% Stutter % -4 = 11.5% Stutter % -4 = 5.5%
*<50 RFUs
Stutter % -4 = 8.8%
Stutter % -4 = 10.2%
Stutter % -4 =4.0%
*<50 RFUs
Stutter % -4 = 4.3%
*<50 RFUs
Stutter % -3 = 2.9%
Stutter % -3 = 4.4%
Stutter % -3 = 7.5%
Stutter % -3 = 6.9%
Stutter % -4 = 5.8%
Stutter % -3 = 5.0%
Stutter % -3 = 6.2%
Stutter % -3 = 11.5%
The D22S1045 nomenclature has changed by +3 repeats from Coble and Butler (2005) where only the 14 ATT repeats were considered without the
Nomenclature adjustments of +3 for D22S1045 and -1 for D10S1248 were made for the following: Japanese data from Asamura et al. (2006) Int J Legal Med 120:182-184 Spanish data from Martn et al. (2006) Forensic Sci Int, in press additional 1 ACT and 2 ATT repeats (originally called a 13 TAA repeat due to using Singapore data from Yong et al. (2006) Forensic Sci Int, in press
D10S1248 [GGAA]
D2S441 [TCTA]
D22S1045 [ATT]
D9S2157 Stutter N = 156 alleles/93 samples Min: 3.5% Max: 20.6% Average: 11.32%
D12ATA63 [YAA]
D9S2157 [ATA]
Computer Work
107 potential loci Candidate STR marker selection Pull down sequence data from the web Identify Chromosome Location (e.g. Human BLAT Search ) Screen for PCR Primers (e.g. Primer3) Test primers for Multiplex-ability (e.g. AutoDimer - NIST )
a different GenBank reference). Some confusion in nomenclature has arisen due to the GenBank accession numbers in Table 1 of Coble and Butler (2005) being different from those used in primer design and allele designation.
Characteristics of the new 26 non-CODIS (NC) miniSTR loci are listed below along with the 13 CODIS and 4 additional kit STR loci (D2S1338, D19S433, Penta D, and Penta E).
Heterozygosity
Locus Name GenBank (repeat #) Z97987 (11) AC093119 (13) AL513307 (15) AC079112 (12) AC009475 (11) AC069259 (9) AC117452 (13) AC022317 (9) AC110763 (9) AC008791 (17) AL357514 (17) AL035588 (10) AC090739 (9) AL161789 (12) AL162417 (10) AL391869 (13) AL354747 (11) AP002806 (14) AC009771 (13) AL121612 (13) AC034303 (10) AC016888 (12) AP005130 (11) AL121781 (14) AL158015 (14) AL022314 (17) Chromosomal Position Chr 1 7.377 Mb Chr 1 106.676 Mb Chr 1 160.747 Mb Chr 2 68.214 Mb Chr 2 169.471 Mb Chr 3 173.234 Mb Chr 3 85.935 Mb Chr 4 93.976 Mb Chr 4 30.981 Mb Chr 5 58.735 Mb Chr 6 112.986 Mb Chr 6 41.785 Mb Chr 8 42.656 Mb Chr 9 76.918 Mb Chr 9 133.065 Mb Chr 10 130.567 Mb Chr 10 2.233 Mb Chr 11 130.338 Mb Chr 12 106.825 Mb Chr 14 93.298 Mb Chr 17 10.459 Mb Chr 17 70.193 Mb Chr 18 3.981 Mb Chr 20 4.454 Mb Chr 20 53.299 Mb Chr 22 35.779 Mb Location 1p36.23 1p21.1 1q23.3 2p14 2q24.3 3q26.31 3p12.1 4q22.3 4p15.1 5q11.2 6q21 6p21.1 8p11.21 9q21.2 9q34.2 10q26.3 10p15.3 11q25 12q23.3 14q32.13 17p13.1 17q25.1 18p11.31 20p13 20q13.2 22q12.3 Observed Size (bp) 81 - 105 81 - 100 81 - 117 78 - 110 127 - 161 84 - 108 111 - 139 67 - 83 85 - 109 85 - 126 107 - 136 81 - 110 63 - 96 93 - 125 71 - 107 79 - 123 82 - 139 88 - 116 76 - 106 70 - 98 95 - 124 114 - 139 82 - 104 85 - 126 73 - 101 82 - 115 Allele Range 7 - 13 10 - 16 9 - 18 9 - 17 7 - 15 7 - 13 13 - 20 8 - 12 7 - 13 14 - 24 10 - 17 6 - 13 9 - 20 9 - 17 7 - 19 8 - 19 5 - 19 10 - 17 9 - 19 13 - 20 5 - 12 9 - 15 9 - 16 9 - 19 8 - 17 8 - 19 Repeat Motif GATA ATT TTCC TCTA AGAT TATC ATYT GRAT ATCT GRYW [AGAT][GATA] ATCC ATT TAGA ATA GGAA TATC TATC YAA CTRT CTAT AGAT ATA AGAT ATA ATT Overall 0.668 0.746 0.746 0.774 0.763 0.739 0.761 0.511 0.722 0.747 0.761 0.740 0.663 0.734 0.844 0.792 0.766 0.730 0.829 0.696 0.732 0.649 0.711 0.691 0.696 0.784 Af. Am. 0.673 0.783 0.743 0.798 0.740 0.713 0.752 0.385 0.752 0.757 0.765 0.807 0.629 0.753 0.884 0.825 0.798 0.780 0.788 0.685 0.757 0.626 0.772 0.673 0.792 0.817 Cau. 0.632 0.737 0.749 0.780 0.801 0.724 0.723 0.551 0.709 0.747 0.802 0.698 0.660 0.742 0.840 0.785 0.770 0.676 0.842 0.721 0.702 0.717 0.645 0.689 0.653 0.785 Hisp. 0.727 0.693 0.743 0.721 0.734 0.814 0.829 0.664 0.691 0.729 0.679 0.693 0.729 0.686 0.779 0.743 0.700 0.743 0.879 0.650 0.743 0.564 0.721 0.729 0.600 0.721 N 654 660 660 660 654 648 660 660 654 664 648 664 664 659 661 663 663 664 659 663 664 664 664 648 664 663 Forward Primer (5'dye labels shown) [VIC] - TCTTAGCCTAGATAGATACTTGCTTCC [VIC] - CATGAGGTTTGCAAATACTATCTTAAC [NED] - TTCTGTTGGTATAGAGCAGTGTTT [VIC] - CTGTGGCTCATCTATGAAAACTT [FAM] - TGAACACAGATGTTAAGTGTGTATATG [VIC] - TCTTTGCTCTCATGAATAGATCAGT [VIC] - CCCAAAATTACTTGAGCCAAT [FAM] - CTAGGAGATCATGTGGGTATGATT [NED] - AAGGTACATAACAGTTCAATAGAAAGC [NED] - CTGTTGGTACATAATAGGTAGGTAGGT [NED] - GGTTTTCCAAGAGATAGACCAATTA [VIC] - CCACCCGTCCATTTAGGC [FAM] - TCCACATCCTCACCAACAC [VIC] - GGGTATTTCAAGATAACTGTAGATAGG [FAM] - CAAAGCGAGACTCTGTCTCAA [FAM] - TTAATGAATTGAACAAATGAGTGAG [FAM] - TGTTATAATGCATTGAGTTTTATTCTG [FAM] - TCTGGATTGATCTGTCTGTCC [FAM] - GAGCGAGACCCTGTCTCAAG [VIC] - TGTAATAACTCTACGACTGTCTGTCTG [VIC] - GCACCCAAAACTGAATGTCATA [FAM] - AAGATGAAATTGCCATGTAAAAATA [NED] - GCACATGTACCCTAAAACTTAAAAT [FAM] - CAGAGACACCGAACCAATAAGA [VIC] - ACATGTATCCCAGAACTTAAAGTAAAC [NED] - ATTTTCCCCGATGATAGTAGTCT Reverse Primer (extra G on 5'end --> +A) GTCAACCTTTGAGGCTATAGGAA GTTTTAATTTTCTCCAAATCTCCA GTGACAGGAAGGACGGAATG GAAGTGGCTGTGGTGTTATGAT GTCTGAGGTGGACAGTTATGAAA GTTTGTGATAATGAACCCACTCAG GAGACAAAATGAAGAAACAGACAG GCAGTGAATAAATGAACGAATGGA GTGAAATGACTGAAAAATAGTAACCA GTCGTGGGCCCCATAAATC GTCCTCTCATAAATCCCTACTCATATC GTGAAAAAGTAGATATAATGGTTGGTG GCCTAGGAAGGCTACTGTCAA GCTTCTGAAAGCTTCTAGTTTACC GAAAATGCTATCCTCTTTGGTATAAAT GCAACTCTGGTTGTATTGTCTTCAT GCCTGTCTCAAAAATAAAGAGATAGACA GAATTAAATACCATCTGAGCACTGAA GGAAAAGACATAGGATAGCAATTT GAATAGGAGGTGGATGGATGG GGTGAGAGTGAGACCCTGTC GTGTGTATAACAAAATTCCTATGATGG GTCAACCAAAACTCAACAAGTAGTAA GCCACATGAATCAATTCCTATAATAAA GCAGAAGGGAAAATTGAAGCTG GCGAATGTATGATTGGCAATATTTTT D1GATA113 D1S1627
Laboratory Work
Sequence homozygotes to determine allele sizes Build Bins for Genotyping Construct Allelic Ladders
12 additional markers
NC01
Miniplex01 D10S1248 D14S1434 D22S1045 Miniplex02 D1S1677 D2S441 D4S2364
NC01
PCR Product Size (bp)
GeneMapperID bins and panels created following population analysis and sequencing
NC02
D4S2364
D12ATA63
GenBank accession AC009771; positions 55,349..55,437
D10S1248
D10S1248
6FAM (blue)
D14S1434
VIC (green)
D2S441
*Lasergene map of D12ATA63
D14S1434
NED (yellow)
D22S1045
D22S1045
D1S1677
D12ATA63
(chosen to avoid linkage with CODIS 13 STRs to enable use of the product rule)
mD1GATA113 mD1S1677 mD1S1627 mD2S1776
D2S1338
TPOX
4
D3S1358
mD2S441 mD3S4529
mD6S1017 mD6S474
Summary
From Butler, J.M. (2006) Genetics and genomics of core STR loci used in human identity testing. J. Forensic Sci. 51(2): 253-265.
mD3S3053
9
Heterozygosity
Repeat TAGA CTTT TCAT GAAT TCTR TCTR AGAT GATA TCTR TATC GATA AGAA TCTR TKCC AAGG AAAGA AAAGA Overall 0.745 0.886 0.745 0.707 0.826 0.763 0.721 0.806 0.774 0.747 0.766 0.876 0.844 0.882 0.803 ----------Af. Am. 0.759 0.883 0.759 0.763 0.802 0.767 0.735 0.763 0.763 0.693 0.786 0.860 0.829 0.903 0.876 ----------Cau. 0.733 0.889 0.721 0.668 0.836 0.763 0.702 0.817 0.779 0.748 0.733 0.870 0.844 0.882 0.752 ----------Hisp. 0.743 0.886 0.764 0.679 0.850 0.757 0.729 0.864 0.786 0.843 0.793 0.914 0.871 0.843 0.764 ----------N 659 659 659 659 659 659 659 659 659 659 659 659 659 659 659 -----------
10mD10S1435
11
TH01
12
Location
mD8S1115
D7S820 D8S1179
VWA
Locus CSF1PO
GenBank (repeat #) X14720 (12) M64982 (21) D00269 (9) M68651 (11) M25858 (18) AC099539 (16) AC008512 (11) AC004848 (13) AF216671 (13) AL353628 (11) AC024591 (11) AP001534 (18) AP000433 (29) AC010136 (20) AC008507 (16) AP001752 (13) AC027004 (5)
Chromosome Position Chr 5 149.436 Mb Chr 4 155.866 Mb Chr 11 2.149 Mb Chr 2 1.472 Mb Chr 12 19.83 Mb Chr 3 45.557 Mb Chr 5 123.139 Mb Chr 7 83.433 Mb Chr 8 125.976 Mb Chr 13 81.620 Mb Chr 16 84.944 Mb Chr 18 59.100 Mb Chr 21 19.476 Mb Chr 2 218.705 Mb Chr 19 35.109 Mb Chr 21 43.880 Mb Chr 15 95.175 Mb
Location 5q33.1 4q31.3 11p15.5 2p25.3 12p13.31 3p21.31 5q23.2 7q21.11 8q24.13 13q31.1 16q24.1 18q21.33 21q21.1 2q35 19q12 21q22.3 15q26.2
Size (bp) 276 - 320 196 - 348 160 - 204 209 - 257 152 - 212 97 - 149 134 - 178 253 - 297 123 - 175 193 - 237 233 - 277 264 - 394 138 - 256 289 - 341 106 - 140 376 - 449 379 - 474
New miniSTR markers are being characterized and information will be made available on STRBase (http://www.cstl.nist.gov/biotech/strbase/newSTRs.htm). Several of these miniSTR loci have been recommended for adoption by the European DNA community as new core loci. (Gill et al. 2006) In addition to increasing the successful typing of degraded materials, these loci can also provide additional discrimination in complex paternity cases or missing persons cases.
Acknowledgments and Disclaimer This project was funded by the National Institute of Justice through interagency agreement 2003-IJ-R-029 to the NIST Office of Law Enforcement Standards. Points of view are those of the authors and do not necessarily represent the official position or policies of the US Department of Justice. Certain commercial equipment, instruments and materials are identified in order to specify experimental procedures as completely as possible. In no case does such identification imply a recommendation or endorsement by the National Institute of Standards and Technology nor does it imply that any of the materials, instruments, or equipment identified are necessarily the best available for the purpose. We thank Margaret Kline, Jan Redman, Richard Schoske, Peter Vallone, and Amy Decker for initial preparation and quantitation of the NIST U.S. population samples.
mD9S1122
mD9S2157
15
mD10S1248
16
mD11S4463
17 mD17S974
mD12ATA63
FGA TH01 TPOX VWA D3S1358 D5S818 D7S820 D8S1179 D13S317 D16S539 D18S51 D21S11 D2S1338 D19S433
13
14
18 mD18S853
D18S51
D13S317
mD14S1434
Penta E
D16S539
mD17S1301
X
AMEL_X
20 mD20S482
21
D21S11 Penta D
22
AMEL_Y
mD20S1082
mD22S1045
Chromosome
CODIS Identifiler
PowerPlex 16 Sex-Typing
*32 loci x 663 samples = 21,216 total data points in this study
Positions determined along May 2004 Human Genome Reference Sequence (NCBI Build 35)
Penta D Penta E
http://www.cstl.nist.gov/biotech/strbase/newSTRs.htm http://www.cstl.nist.gov/biotech/strbase/miniSTR.htm