EXTENDED PDF FORMAT avel Grants Available FENS Forum o Science Mapping Human Genetic Diversity in Asia The HUGO Pan-Asian SNP Consortium Science326,1541(2009) O:10.1126/ scIence.1177074 AAAS This copy is for your personal, non-commercial use only If you wish to distribute this article to others, you can order high-quality copies for your colleagues, clients, or customers by clicking here Permission to republish or repurpose articles or portions of articles can be obtained by following the guidelines he The following resources related to this article are available online at www.sciencemag.org(thisinformationiscurrentasofMarch23,2014) Updated information and services, including high-resolution figures, can be found in the online version of this article at http://www.sciencemag.org/content/326/5959/1541.full.html Supporting Online Material can be found at http://www.sciencemag.org/content/suppl/2009/12/10/326.5959.1541.dc1.html A list of selected additional articles on the science Web sites related to this article can be found at http://www.sciencemag.org/content/326/5959/1541.fullhtml#frelated This article cites 24 articles 5 of which can be accessed free http://www.sciencemag.org/content/326/5959/1541.fullhtmh#fref-List-1 article has been cited by 26 articles hosted by HighWire Press; see ://www.sciencemag.org/content/326/5959/1541.full.html#related-urls This article appears in the following subject collections Genetics http://www.sciencemag.org/cgi/collection/genetics Science(print ISSN 0036-8075: online ISSN 1095-9203)is published weekly, except the last week in December, by the American Association for the Advancement of Science, 1200 New York Avenue NW, Washington, DC 20005. Copyright 2009 by the American Association for the Advancement of Science; all rights reserved. The title Science is a egistered trademark of AAAs
DOI: 10.1126/science.1177074 Science 326, 1541 (2009); The HUGO Pan-Asian SNP Consortium Mapping Human Genetic Diversity in Asia This copy is for your personal, non-commercial use only. colleagues, clients, or customers by clicking here. If you wish to distribute this article to others, you can order high-quality copies for your following the guidelines here. Permission to republish or repurpose articles or portions of articles can be obtained by www.sciencemag.org (this information is current as of March 23, 2014 ): The following resources related to this article are available online at http://www.sciencemag.org/content/326/5959/1541.full.html version of this article at: Updated information and services, including high-resolution figures, can be found in the online http://www.sciencemag.org/content/suppl/2009/12/10/326.5959.1541.DC1.html Supporting Online Material can be found at: http://www.sciencemag.org/content/326/5959/1541.full.html#related found at: A list of selected additional articles on the Science Web sites related to this article can be http://www.sciencemag.org/content/326/5959/1541.full.html#ref-list-1 This article cites 24 articles, 5 of which can be accessed free: http://www.sciencemag.org/content/326/5959/1541.full.html#related-urls This article has been cited by 26 articles hosted by HighWire Press; see: http://www.sciencemag.org/cgi/collection/genetics Genetics This article appears in the following subject collections: registered trademark of AAAS. 2009 by the American Association for the Advancement of Science; all rights reserved. The title Science is a American Association for the Advancement of Science, 1200 New York Avenue NW, Washington, DC 20005. Copyright Science (print ISSN 0036-8075; online ISSN 1095-9203) is published weekly, except the last week in December, by the on March 23, 2014 www.sciencemag.org Downloaded from
hybrid sterility involves both the unusual abun- References and notes 21. 1. E. Tomkiel, Genetica 109, 95(2000) dance and retention of OdsHmau protein in 1. E. Mayr, Systematics and the Ongin of the d. simulans testis. as well as an unusual ewpoint of a Zoologist ( Columbia Univ. 23. 0. Mihola. Z. Trachtulec ek, I C Schimenti J. Foret, Science 323, 373(2009) localization and possibly decondensation of the 2).A Coyne, H A Orr,Speciation (Sinauer N. Phadnis, H. A Orr, Science 323, 376(2009) D. simulans Y chromosome. We conclude on Sunderland, MA 2004). 25. K Sawamura. M. T Yamamoto, T. K Watanabe, Genetics the basis of these data that hybrid male sterility 3. C.C. Laurie, Genetics 147, 937(1997) 133.307(1993) is caused by a gain-of-function interaction be- 4. R. M. Kliman et al. Genetics 156, 1913(2000) tween OdsHmau and some component of the 5. C. T Ting, S. C. Tsaur, M. L. Wu, C. 1. Wu, Science 282, 27. N. ). Brideau et al, Science 314, 1292(2006). H. S. Malik, S. Henikoff, Cell 138, D. simulans Y chromosome heterochromatin, 6.S. Sun, C T. Ting, C. I. Wu, Science 305, 81(2004) 29. We thank C-l. Wu for the d. simulans fertile and sterile with this protein-DNA interaction representing 7. D. E Perez, C L Wu, Genetics 140, 201(1995). introgression lines; C. Ting for scientific discussions the Dobzhansky-Muller incompatibility and sharing data: G. Findlay for initial observations on Odsh shares similarities with the hybrid 134,261(1993) odsH cytology, and K. Ahmad, S. Biggins, N. Elde, S. Henikoff, N. Phadnis, T. Tsukiyama, and D. Vermaal sterility genes Prdm9 (or Meisetz) in mouse(23) 10. C.T. Ting et al, Proc. NatL. Acad. Sci. U.SA.101, 12232 comments ed by nih and Overdrive(Ovd) in Drosophila(24), all of (2004) training grant PHS NRSA 132 GM07270(]].B which encode proteins with putative DNA- 11. K Tabuchi, 5. Yoshikawa, Y Yuasa, K Sawamoto and grants from the Mathers binding domains. Satellite DNAs have also 12. M. Nei, 1. Zhang, Science 282, 1428(1998) NIH RO1-GM74108(HS M ) H.S.M. is an Early-Career Scientist of the Howard Hughes Medical Institute. been implicated in hybrid inviability, including 13. S Henikoff. K Ahmad, H.S. Malik, Science 293, 1098(2001) a pericentric satellite locus(Zhr)(25, 26) and a 14. S. Henikoff, H S Malik, Nature 417, 227(2002) ting Online Material gene encoding a heterochromatin-binding pro- 15. L Fishman, A Saunders,, Science 322, 1559(2008) tein(hr)(27). Thus, rapidly evolving repetitive 16. A Daner er al. Mold. ele e: oL 22. 52 DNA elements driven by genetic conflict may 18. M Ashburner, KG.Golic, RSHawley, Drosophila represent a major evolutionary force driving A Laboratory Handbook(Cold Spring Harbor Laboratory sequence divergence of speciation genes that would 10 September 2009: accepted 13 october 2009 ultimately result in hybrid incompatibilities 19. G. cendi 20. B. D. McKee, Curr. Top. Dev. Biol. 37, 77(1998) Include this information when citing this paper Mapping Human Genetic Diversity in Asia by geographe primit, b a knw n histoy or The HUGO Pan-Asian SNP Consortium*t admixture, or, especially at higher Ks, by mem- bership in a small population isolate. The results ia harbors substantial cultural and linguistic diversity, but the geographic structure of obtained using frappe(In), a maximum-likehhoodH based clustering analysis, showed a general con- genetic variation across the continent remains enigmatic. Here we report a large-scale survey of cordance with those of struCture utosomal variation from a broad geographic sample of Asian human populations. Our results Most populations show relatedness within ethnic/linguistic groups, despite prevalent gene lor y.u26). These analyses show that most individ- show that genetic ancestry is strongly correlated with linguistic affiliations as well as geography within a population share very similar an- cestry estimates at all Ks, an observation that is Southeast Asian(SEA) or Central-South Asian(CSA) populations and show clinal structure with viduals(fig. $27)based on an allele-sharing dis- haplotype diversity decreasing from south to north. Furthermore, 50% of EA haplotypes were tance(12). Therefore, we proceeded to evaluate found in SEA only and 5% were found in CSA only, indicating that SEA was a major geographic the relationships among populations. A maximum source of EA populations likelihood tree of populations, based on 42, 793 SNPs whose ancestral states were known(Fig. S ontinental relationships, or fine-scale struc We first performed a Bayesian clustering pro- by 100% of bootstrap replicates. This pattern re- ture in Europe, have been published recently (1-8). cedure using the STRUCtUre algorithm (10) mained even after data from 51 additional popu- Asian(SEA) and East Asian(EA)populations by person is posited to derive from an arbitrary num- recent study were integrated into the tree fe We have extended this approach to Southeast to examine the ancestry of each individual. Each lations and 19, 934 commonly typed SNPs from sing the Affymetrix Gene Chip Human Mapping ber of ancestral populations, denoted by K. We ran S28). These observations suggest that SEA and 50K Xba Array. Stringently quality-controlled STRUCTURE from K=2 to K= 14 using both EA populations share a common origin. genotypes were obtained at 54, 794 autosomal the complete data set and SNP subsets to exclude STRUCTURElfrappe and principal compo- single-nucleotide polymorphisms(SNPs)in 1928 those in strong linkage disequilibrium(Fig. I and nents analyses(PCA)(13)(Figs. I and 2 and figs. individuals representing 73 Asian and two non- figs. SI to S13). AtK=2 andk =3, all SEA and SI to $26) identify as many as 10 main popula- Asian Hap Map populations(9). Apart from de- EA samples are united by predominant member- tion components. Each component corresponds veloping a general description of Asian population ship in a common cluster, with the other cluster(s) largely to one of the five major linguistic groups structure and its relation to geography, language, corresponding largely to Indo-European(E)and (Altaic, Sino-Tibetan/Tai-Kadai, Hmong-Mien, and demographic history, we concentrated on un- African(AF)ancestries. At K= 4, a component Austro-Asiatic, and Austronesian), three ethnic most frequently found in Negrito populations that categones(Philippine Negritos, Malaysian Negritos, All authors with their affiliations appear at the end of this is also shared by all SEA populations emerges, and East Indonesians/Melanesians)and two small uggesting a common SEA ancestry. Each value population isolates(the Bidayuh of Borneo and in007@gmail com(LJ): liue @gis. d-star. edu.sg (ET. ); of K beyond 4 introduces a new component that the hunter-gatherer Mlabri population of central elstadm@gisa-star. edu.sg (M.S. ); xushua@picb ac cn(Sx) tends to be associated with a group of popula- and northem Thailand). The STRUCTURe results www.sciencemag.orgScieNceVol32611DecembEr2009 1541
hybrid sterility involves both the unusual abundance and retention of OdsHmau protein in the D. simulans testis, as well as an unusual localization and possibly decondensation of the D. simulans Y chromosome. We conclude on the basis of these data that hybrid male sterility is caused by a gain-of-function interaction between OdsHmau and some component of the D. simulans Y chromosome heterochromatin, with this protein-DNA interaction representing the Dobzhansky-Muller incompatibility. OdsH shares similarities with the hybrid sterility genes Prdm9 (or Meisetz) in mouse (23) and Overdrive (Ovd) in Drosophila (24), all of which encode proteins with putative DNAbinding domains. Satellite DNAs have also been implicated in hybrid inviability, including a pericentric satellite locus (Zhr) (25, 26) and a gene encoding a heterochromatin-binding protein (Lhr) (27). Thus, rapidly evolving repetitive DNA elements driven by genetic conflict may represent a major evolutionary force driving sequence divergence of speciation genes that would ultimately result in hybrid incompatibilities (13, 14, 28). References and Notes 1. E. Mayr, Systematics and the Origin of Species from the Viewpoint of a Zoologist (Columbia Univ. Press, New York, 1942). 2. J. A. Coyne, H. A. Orr, Speciation (Sinauer Associates, Sunderland, MA, 2004). 3. C. C. Laurie, Genetics 147, 937 (1997). 4. R. M. Kliman et al., Genetics 156, 1913 (2000). 5. C. T. Ting, S. C. Tsaur, M. L. Wu, C. I. Wu, Science 282, 1501 (1998). 6. S. Sun, C. T. Ting, C. I. Wu, Science 305, 81 (2004). 7. D. E. Perez, C. I. Wu, Genetics 140, 201 (1995). 8. D. E. Perez, C. I. Wu, N. A. Johnson, M. L. Wu, Genetics 134, 261 (1993). 9. S. D. Hueber, I. Lohmann, Bioessays 30, 965 (2008). 10. C. T. Ting et al., Proc. Natl. Acad. Sci. U.S.A. 101, 12232 (2004). 11. K. Tabuchi, S. Yoshikawa, Y. Yuasa, K. Sawamoto, H. Okano, Neurosci. Lett. 257, 49 (1998). 12. M. Nei, J. Zhang, Science 282, 1428 (1998). 13. S. Henikoff, K. Ahmad, H. S. Malik, Science 293, 1098 (2001). 14. S. Henikoff, H. S. Malik, Nature 417, 227 (2002). 15. L. Fishman, A. Saunders, Science 322, 1559 (2008). 16. A. Daniel, Am. J. Med. Genet. 111, 450 (2002). 17. N. Aulner et al., Mol. Cell. Biol. 22, 1218 (2002). 18. M. Ashburner, K. G. Golic, R. S. Hawley, Drosophila: A Laboratory Handbook (Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY, ed. 2, 2005). 19. G. Cenci, S. Bonaccorsi, C. Pisano, F. Verni, M. Gatti, J. Cell Sci. 107, 3521 (1994). 20. B. D. McKee, Curr. Top. Dev. Biol. 37, 77 (1998). 21. J. E. Tomkiel, Genetica 109, 95 (2000). 22. J. Forejt, Trends Genet. 12, 412 (1996). 23. O. Mihola, Z. Trachtulec, C. Vlcek, J. C. Schimenti, J. Forejt, Science 323, 373 (2009). 24. N. Phadnis, H. A. Orr, Science 323, 376 (2009). 25. K. Sawamura, M. T. Yamamoto, T. K. Watanabe, Genetics 133, 307 (1993). 26. P. M. Ferree, D. A. Barbash, PLoS Biol. 7, e1000234 (2009). 27. N. J. Brideau et al., Science 314, 1292 (2006). 28. H. S. Malik, S. Henikoff, Cell 138, 1067 (2009). 29. We thank C-I. Wu for the D. simulans fertile and sterile introgression lines; C. Ting for scientific discussions and sharing data; G. Findlay for initial observations on OdsH cytology; and K. Ahmad, S. Biggins, N. Elde, S. Henikoff, N. Phadnis, T. Tsukiyama, and D. Vermaak for comments on the manuscript. Supported by NIH training grant PHS NRSA T32 GM07270 (J.J.B.), and grants from the Mathers foundation and NIH R01-GM74108 (H.S.M.). H.S.M. is an Early-Career Scientist of the Howard Hughes Medical Institute. Supporting Online Material www.sciencemag.org/cgi/content/full/1181756/DC1 Materials and Methods Figs. S1 to S8 References 10 September 2009; accepted 13 October 2009 Published online 22 October 2009; 10.1126/science.1181756 Include this information when citing this paper. Mapping Human Genetic Diversity in Asia The HUGO Pan-Asian SNP Consortium*† Asia harbors substantial cultural and linguistic diversity, but the geographic structure of genetic variation across the continent remains enigmatic. Here we report a large-scale survey of autosomal variation from a broad geographic sample of Asian human populations. Our results show that genetic ancestry is strongly correlated with linguistic affiliations as well as geography. Most populations show relatedness within ethnic/linguistic groups, despite prevalent gene flow among populations. More than 90% of East Asian (EA) haplotypes could be found in either Southeast Asian (SEA) or Central-South Asian (CSA) populations and show clinal structure with haplotype diversity decreasing from south to north. Furthermore, 50% of EA haplotypes were found in SEA only and 5% were found in CSA only, indicating that SEA was a major geographic source of EA populations. Several genome-wide studies of human genetic diversity focusing primarily on broad continental relationships, or fine-scale structure in Europe, have been published recently (1–8). We have extended this approach to Southeast Asian (SEA) and East Asian (EA) populations by using the Affymetrix GeneChip Human Mapping 50K Xba Array. Stringently quality-controlled genotypes were obtained at 54,794 autosomal single-nucleotide polymorphisms (SNPs) in 1928 individuals representing 73 Asian and two nonAsian HapMap populations (9). Apart from developing a general description of Asian population structure and its relation to geography, language, and demographic history, we concentrated on uncovering the geographic source(s) of EA and SEA populations. We first performed a Bayesian clustering procedure using the STRUCTURE algorithm (10) to examine the ancestry of each individual. Each person is posited to derive from an arbitrary number of ancestral populations, denoted by K. We ran STRUCTURE from K = 2 to K = 14 using both the complete data set and SNP subsets to exclude those in strong linkage disequilibrium (Fig. 1 and figs. S1 to S13). At K = 2 and K = 3, all SEA and EA samples are united by predominant membership in a common cluster, with the other cluster(s) corresponding largely to Indo-European (IE) and African (AF) ancestries. At K = 4, a component most frequently found in Negrito populations that is also shared by all SEA populations emerges, suggesting a common SEA ancestry. Each value of K beyond 4 introduces a new component that tends to be associated with a group of populations united by membership in a linguistic family, by geographic proximity, by a known history of admixture, or, especially at higher Ks, by membership in a small population isolate. The results obtained using frappe (11), a maximum-likelihood– based clustering analysis, showed a general concordance with those of STRUCTURE (figs. S14 to S26). These analyses show that most individuals within a population share very similar ancestry estimates at all Ks, an observation that is consistent also with a phylogeny relating individuals (fig. S27) based on an allele-sharing distance (12). Therefore, we proceeded to evaluate the relationships among populations. A maximumlikelihood tree of populations, based on 42,793 SNPs whose ancestral states were known (Fig. 1), showed that all the SEA and EA populations make up a monophyletic clade that is supported by 100% of bootstrap replicates. This pattern remained even after data from 51 additional populations and 19,934 commonly typed SNPs from a recent study were integrated into the tree (fig. S28). These observations suggest that SEA and EA populations share a common origin. STRUCTURE/frappe and principal components analyses (PCA) (13) (Figs. 1 and 2 and figs. S1 to S26) identify as many as 10 main population components. Each component corresponds largely to one of the five major linguistic groups (Altaic, Sino-Tibetan/Tai-Kadai, Hmong-Mien, Austro-Asiatic, and Austronesian), three ethnic categories (Philippine Negritos, Malaysian Negritos, and East Indonesians/Melanesians) and two small population isolates (the Bidayuh of Borneo and the hunter-gatherer Mlabri population of central and northern Thailand). The STRUCTURE results *All authors with their affiliations appear at the end of this paper. †To whom correspondence should be addressed. E-mail: ljin007@gmail.com (L.J.); liue@gis.a-star.edu.sg (E.T.L.); seielstadm@gis.a-star.edu.sg (M.S.); xushua@picb.ac.cn (S.X.) www.sciencemag.org SCIENCE VOL 326 11 DECEMBER 2009 1541 REPORTS
REPORTS and figs. SI to S13), population pl Mantel test confirms the correlation between lin- 0.005 with 10.0 (Fig. I and figs. S27 and S28), and PCa guistic and genetic affinities(R=0. 253: P<0.0001 identified eight (Fig. 2)all show that populations from the with 10,000 permutations), even after controlling and genetic affinities are inconsistent [Affymetrix same linguistic group tend to cluster together. a for geography (partial correlation= 0.136: P Melanesian(AX-ME), Malaysia-Jehai (MY-JH) n Latitude Longitude Ethnicity K=14 10 ean +mn32 ee ngapoi Cantonese 20 1 al kort tai keri一 Tai Yuan Tai Yuan 2492 Plang ng NWI Lawa LAwa aland aren and 19 embark dongsi 四pPPPP Mamanwa negrito A旧 M工 Batak KaroBatak I ID-DY Minangkabau分 s0mm111题 Yynm MY-BD I MY-TM Malaysia Proto-Malay Temuan Negrito R#十m由 ndia origin Tamil Upper.casteBengali European English Fig. 1. Maximum-likelihood tree of 75 populations. a hypothetical most- population IDs except the four HapMap samples are denoted by four recent common ancestor(MRCA)composed of ancestral alleles as inferred characters. The first two letters indicate the country where the samples from the genotypes of one gorilla and 21 chimpanzees was used to root the were collected or(in the case of Affymetrix) genotyped, according to the tree. Branches with bootstrap values less than 50% were condensed. following convention: AX, Affymetrix; CN, China; ID, Indonesia; IN, India: Population identification numbers(IDs), sample collection locations with JP, Japan; KR, Korea; MY, Malaysia; PL, the Philippines: SG, Singapore; TH atitudes and longitudes, ethnicities, Language spoken, and size of pop- Thailand; and TW, Taiwan. The last two letters are unique IDs for the ulation samples are shown in the table adjacent to each branch in the tree. population. To the right of the table, an averaged graph of results from Linguistic groups are indicated with colors as shown in the legend. All STRUCTURE is shown for K= 14 1542 11DecemBer2009Vol326scIencEwww.sciencemag.org
(Fig. 1 and figs. S1 to S13), population phylogenies (Fig. 1 and figs. S27 and S28), and PCA results (Fig. 2) all show that populations from the same linguistic group tend to cluster together. A Mantel test confirms the correlation between linguistic and genetic affinities (R2 = 0.253; P < 0.0001 with 10,000 permutations), even after controlling for geography (partial correlation = 0.136; P < 0.005 with 10,000 permutations). Nevertheless, we identified eight population outliers whose linguistic and genetic affinities are inconsistent [AffymetrixMelanesian (AX-ME), Malaysia-Jehai (MY-JH) Fig. 1. Maximum-likelihood tree of 75 populations. A hypothetical mostrecent common ancestor (MRCA) composed of ancestral alleles as inferred from the genotypes of one gorilla and 21 chimpanzees was used to root the tree. Branches with bootstrap values less than 50% were condensed. Population identification numbers (IDs), sample collection locations with latitudes and longitudes, ethnicities, language spoken, and size of population samples are shown in the table adjacent to each branch in the tree. Linguistic groups are indicated with colors as shown in the legend. All population IDs except the four HapMap samples are denoted by four characters. The first two letters indicate the country where the samples were collected or (in the case of Affymetrix) genotyped, according to the following convention: AX, Affymetrix; CN, China; ID, Indonesia; IN, India; JP, Japan; KR, Korea; MY, Malaysia; PI, the Philippines; SG, Singapore; TH, Thailand; and TW, Taiwan. The last two letters are unique IDs for the population. To the right of the table, an averaged graph of results from STRUCTURE is shown for K = 14. 1542 11 DECEMBER 2009 VOL 326 SCIENCE www.sciencemag.org REPORTS
(Negrito), Malaysia-Kensiu(MY-KS)(Negrito), practice endogamy based on linguistic, cultural, European-speaking populations(Fig. 1 and figs. Thailand-Mon(TH-MO), Thailand-Karen(TH- and ethnic information. In fact, most popula- SI to $26) KA), China-Jinuo(CN-JN), India-Spiti (IN-TB), tions studied, even at lower Ks, show evidence The geographic source(s) contributing to EA and China-Uyghur(CN-UG); see table $3).These of admixture in the STRUCtURe analyses For populations have long been debated. One hypoth linguistic outliers tend to chuster with their geo- example, the Han Chinese have grown to be- esis suggests that all SEA and Ea populations graphic neighbors or [especially evident in the come the largest ethnic group today in a de- derive primarily from a single initial migration, principal component(PC)plots of Fig. 2]occupy mographic expansion that has occurred mostly which entered the continent along a southern an intermediate position between their geographic within historical times. STRUCTURE reveals largely coastal route(19, 20). Another hypothesis neighbors and the more-distant members of their that the six Han Chinese population samples in argues for at independent migrations inguistic group. These pattens are consistent either our study show varying degrees of admixture into East Asia, first along a southem route, fol- with substantial recent admixture among the pop-(Fig. I and figs. SI to S26) between a northern lowed later by a series of migrations along a more ulations(14-16), a history of language replacement uster and a Sino-Tibetan/Tai-Kadai northem route that served to bridge European and (7), or uncertainties in the linguistic classifications which most frequently appears in the EA populations, but with little contribution to themselves(for example, the controversial Altaic groups sampled from southern China populations in Southeast Asia(20). The topology family, which groups Korean and Japanese with and northern Thailand. Finally, most of the of a maximum-likelihood tree(Fig. 1 and fig. Uyghur). Indian populations showed evidence of shared S28)displays a largely south-to-north ordering of Considerable gene flow Asian pop. ancestry with European populations, which is the populations, and a plot of the first two PCs lations was observed among subpopulations in consistent with the recent observations (18)and (Fig. 2)similarly orients most populations accord- these clusters, including those believed to our understanding of the expansion of Indo- ing to their geographic coordinates. The average B CEU CN-UG B IN-TB 0.02 :; 0.04 -0.04 TH-MA o East Southeast Asian 岁 e Han Chinese -0.1 40.06· Hmong-Mien Austro-Asiatic aysia Negritos 0.060.040.0200.020.040.06 40.06 -0.04 4.02 Pc2(240%) Pc1(360%) .02 0.04 AHHTA Austro-Asiatic ·mMlM 浮0.02 Philippine Negrito 0.02 0.06 0.04 Hmong-Mien MY-BD 0.03 0.07 0.03 0.05 Pc2(0.80 Pc2(0.53%) Fig. 2. Analysis of the first two PCs. (A)1928 individuals representing all 75 CN-UG, TH-MA, AX-ME, and Negritos from Malaysia).(D)1235 individuals populations. (B)1868 individuals representing 74 populations (excluding representing 44 populations(excluding Philippine Negritos, Pl-MA, and East YRD). (C)1471 individuals representing 58 populations (excluding all Indians, Indonesians) www.sciencemag.orgScieNceVol32611DecemBer2009 1543
(Negrito), Malaysia-Kensiu (MY-KS) (Negrito), Thailand-Mon (TH-MO), Thailand-Karen (THKA), China-Jinuo (CN-JN), India-Spiti (IN-TB), and China-Uyghur (CN-UG); see table S3]. These linguistic outliers tend to cluster with their geographic neighbors or [especially evident in the principal component (PC) plots of Fig. 2] occupy an intermediate position between their geographic neighbors and the more-distant members of their linguistic group. These patterns are consistent either with substantial recent admixture among the populations (14–16), a history of language replacement (17), or uncertainties in the linguistic classifications themselves (for example, the controversial Altaic family, which groups Korean and Japanese with Uyghur). Considerable gene flow among Asian populations was observed among subpopulations in these clusters, including those groups believed to practice endogamy based on linguistic, cultural, and ethnic information. In fact, most populations studied, even at lower Ks, show evidence of admixture in the STRUCTURE analyses. For example, the Han Chinese have grown to become the largest ethnic group today in a demographic expansion that has occurred mostly within historical times. STRUCTURE reveals that the six Han Chinese population samples in our study show varying degrees of admixture (Fig. 1 and figs. S1 to S26) between a northern Altaic cluster and a Sino-Tibetan/Tai-Kadai cluster, which most frequently appears in the ethnic groups sampled from southern China and northern Thailand. Finally, most of the Indian populations showed evidence of shared ancestry with European populations, which is consistent with the recent observations (18) and our understanding of the expansion of IndoEuropean–speaking populations (Fig. 1 and figs. S1 to S26). The geographic source(s) contributing to EA populations have long been debated. One hypothesis suggests that all SEA and EA populations derive primarily from a single initial migration, which entered the continent along a southern, largely coastal route (19, 20). Another hypothesis argues for at least two independent migrations into East Asia, first along a southern route, followed later by a series of migrations along a more northern route that served to bridge European and EA populations, but with little contribution to populations in Southeast Asia (20). The topology of a maximum-likelihood tree (Fig. 1 and fig. S28) displays a largely south-to-north ordering of the populations, and a plot of the first two PCs (Fig. 2) similarly orients most populations according to their geographic coordinates. The average Fig. 2. Analysis of the first two PCs. (A) 1928 individuals representing all 75 populations. (B) 1868 individuals representing 74 populations (excluding YRI). (C) 1471 individuals representing 58 populations (excluding all Indians, CN-UG, TH-MA, AX-ME, and Negritos from Malaysia). (D) 1235 individuals representing 44 populations (excluding Philippine Negritos, PI-MA, and East Indonesians). www.sciencemag.org SCIENCE VOL 326 11 DECEMBER 2009 1543 REPORTS
REPORTS value of the first PC is highly correlated with the frappe analyses, whereas the partial correlation (Fig. 3A)that haplotype di latitude at which the populations were sampled of the genetic and group indicator matrices was (R=0.79, P< 0.0001). Such a patten could 0.403(P< 0.0001) after controlling for geogra- with diversity decreasing from result simply from isolation-by-distance (IBD), as phy. The superior association between genetic lich is consistent with a loss of suggested by Ding et al.(21), although a recent distance and the group indicator matrix as mea- diversity as populations moved to higher lati- study failed to detect IBD in East Asia with data sured by the correlation coefficients suggests that tudes In estimating the contribution of SEA and from the human Genome Diversity Project(22). prehistorical population divergence is the favored Central-South Asian(CSA)haplotypes to the ea In an effort to distinguish between long-term model over IBD in explaining the data(24 gene pool by haplotype sharing analyses(16),we historical divergence and the effects of IBD, we conchusion is supported by simulation studies that found that more than 90% of haplotypes in EA applied partial and multiple Mantel tests to the also suggest that the observed pattems cannot be populations could be found in SEA and CSa pop- data(23) [see supporting online material (SOM) explained by simple IBD effects alone(see SOM ulations, of which about 50% were found in SEA text for details]. The primary approach was to text for details). and Ea only and 5% found in CSA only(Fig. 3B, ascertain the differential correlation between To further refine the analysis, we looked to see also SOM text). Phylogenetic analysis of pri- netic distance, geographical distance, and a group haplotype organization to limit the effect of fluc- vate haplotypes indicates greater similarity be- indicator matrix as an indication of prehistoric tuations in single-nucleotide determinations and tween EA and sEa populations relative to Ea and population divergence. The partial correlation co- to increase the resolution around genetic diversity. CSa populations(Fig. 3C). These observations efficient of genetic and geographic distances was The IBD model predicts a correlation of genetic suggest that the geographic source(s ) contributing 0. 228(P<0.0006), after controlling for the group distance with geographical distance but not ge- to Ea populations were mainly from SEA popula- indicator matrix (inferred from STRUCTURE/ netic diversity and geographic distance(24). By tions, with rather minor contributions from CSA, ① SEA private haplotypes CSA private haplotypes African haplotypes D American Pima 0010203040 Yakut Mongola East Asian KR-KR B CHB Han MY-KS Pl-IR PI-MW Negrito PI-AG PI-AE Melanesian Oceanian YKT N-CM USSI French Basque european allan BHSd35 Yoruba BiakaPygmy MbutiPygmy African San Fig. 3. Analysis of haplotype diversity, haplotype sharing, and population 0.0001).( B) Haplotype sharing analysis for EA populations and groups. YKT, phylogeny. (A) Haplotype diversity versus latitudes. Haplotypes were estimated Yakut; N-CM, Northern Chinese minorities; N-HAN, Northern Han Chinese; from combined data, and diversity was measured by heterozygosity of haplo- JP-KR, Japanese and Korean; S-HAN, Southern Han Chinese: S-CM, Southern types. HSa, b, c, and d and the corresponding colors show the percentages of Ea Chinese minorities; EA, East Asian. (C) Phylogeny of group private haplot oup haplotypes in each class: HSa, found in CSA only: HSb found in neither EA private haplotypes: haplotypes found only in EA samples; SEA private CSA nor SEA; HSc, found in both CSA and SEA: HSd, found in SEA only Latitudes haplotypes: haplotypes found only in SEA samples; CSA private haplotypes: fy axis) for groups were obtained from the center of sample collection locations. haplotypes found only in CSA samples; Shared haplotypes: haplotypes found Cirded numbers are as follows: 1, Indonesian; 2, Malay, 3, Philippine; 4, Thai; 5, in all EA, SEA, and CSA samples; African haplotypes were used as outgroup. (D) outhern Chinese minorities: 6, Southern Han Chinese; 7, Japanese and Korean; Maximum-likelihood tree of 29 populations. The tree is based on data from 8, Northern Han Chinese: 9, Northern Chinese minorities; and 10, Yakut. Haplo- 19,934 SNPs. Bootstrap values were based on 100 replicates. Only values on type heterozygosity of each group was estimated from 100-kb bins and taking splitting of African and non-African, European and Oceanian and Asian, and together all haplotypes within each group. R for the regression line is 0.91(P< Oceanian and Asian are shown 1544 11DecemBer2009Vol326scIencEwww.sciencemag.org
value of the first PC is highly correlated with the latitude at which the populations were sampled (R2 = 0.79, P < 0.0001). Such a pattern could result simply from isolation-by-distance (IBD), as suggested by Ding et al. (21), although a recent study failed to detect IBD in East Asia with data from the Human Genome Diversity Project (22). In an effort to distinguish between long-term historical divergence and the effects of IBD, we applied partial and multiple Mantel tests to the data (23) [see supporting online material (SOM) text for details]. The primary approach was to ascertain the differential correlation between genetic distance, geographical distance, and a group indicator matrix as an indication of prehistoric population divergence. The partial correlation coefficient of genetic and geographic distances was 0.228 (P < 0.0006), after controlling for the group indicator matrix (inferred from STRUCTURE/ frappe analyses), whereas the partial correlation of the genetic and group indicator matrices was 0.403 (P < 0.0001) after controlling for geography. The superior association between genetic distance and the group indicator matrix as measured by the correlation coefficients suggests that prehistorical population divergence is the favored model over IBD in explaining the data (24). This conclusion is supported by simulation studies that also suggest that the observed patterns cannot be explained by simple IBD effects alone (see SOM text for details). To further refine the analysis, we looked to haplotype organization to limit the effect of fluctuations in single-nucleotide determinations and to increase the resolution around genetic diversity. The IBD model predicts a correlation of genetic distance with geographical distance but not genetic diversity and geographic distance (24). By contrast, we found (Fig. 3A) that haplotype diversity is strongly correlated with latitude (R2 = 0.91, P < 0.0001), with diversity decreasing from south to north, which is consistent with a loss of diversity as populations moved to higher latitudes. In estimating the contribution of SEA and Central-South Asian (CSA) haplotypes to the EA gene pool by haplotype sharing analyses (16), we found that more than 90% of haplotypes in EA populations could be found in SEA and CSA populations, of which about 50% were found in SEA and EA only and 5% found in CSA only (Fig. 3B, see also SOM text). Phylogenetic analysis of private haplotypes indicates greater similarity between EA and SEA populations relative to EA and CSA populations (Fig. 3C). These observations suggest that the geographic source(s) contributing to EA populations were mainly from SEA populations, with rather minor contributions from CSA, Fig. 3. Analysis of haplotype diversity, haplotype sharing, and population phylogeny. (A) Haplotype diversity versus latitudes. Haplotypes were estimated from combined data, and diversity was measured by heterozygosity of haplotypes. HSa, b, c, and d and the corresponding colors show the percentages of EA group haplotypes in each class: HSa, found in CSA only; HSb, found in neither CSA nor SEA; HSc, found in both CSA and SEA; HSd, found in SEA only. Latitudes (y axis) for groups were obtained from the center of sample collection locations. Circled numbers are as follows: 1, Indonesian; 2, Malay; 3, Philippine; 4, Thai; 5, Southern Chinese minorities; 6, Southern Han Chinese; 7, Japanese and Korean; 8, Northern Han Chinese; 9, Northern Chinese minorities; and 10, Yakut. Haplotype heterozygosity of each group was estimated from 100-kb bins and taking together all haplotypes within each group. R2 for the regression line is 0.91 (P < 0.0001). (B) Haplotype sharing analysis for EA populations and groups. YKT, Yakut; N-CM, Northern Chinese minorities; N-HAN, Northern Han Chinese; JP-KR, Japanese and Korean; S-HAN, Southern Han Chinese; S-CM, Southern Chinese minorities; EA, East Asian. (C) Phylogeny of group private haplotypes. EA private haplotypes: haplotypes found only in EA samples; SEA private haplotypes: haplotypes found only in SEA samples; CSA private haplotypes: haplotypes found only in CSA samples; Shared haplotypes: haplotypes found in all EA, SEA, and CSA samples; African haplotypes were used as outgroup. (D) Maximum-likelihood tree of 29 populations. The tree is based on data from 19,934 SNPs. Bootstrap values were based on 100 replicates. Only values on splitting of African and non-African, European and Oceanian and Asian, and Oceanian and Asian are shown. 1544 11 DECEMBER 2009 VOL 326 SCIENCE www.sciencemag.org REPORTS