Li et al. Horticulture Research (2020)7:64HorticultureResearchhttps://doi.org/10.1038/s41438-020-0289-1www.nature.com/hortresARTICLEOpen AccessFine mapping of the sex locus in Salix triandraconfirms a consistent sex determinationmechanism in genus SalixWei Lil,Huaitong Wu', Xiaoping Lil, Yingnan Chen' and Tongming Yin!AbstractSalix triandra belongs to section Amyqdalinae in genus Salix,which is in a different section from the willow species inwhich sex determination has been well studied. Studying sex determination in distantly related willow species willhelp to clarify whether the sexes of different willows arise through a common sex determination system. For thispurpose,we generated an intraspecific full-sib F, population for S.triandra and constructed high-density geneticlinkage maps for the crossing parents using restriction site-associated DNA sequencing and following a two-waypseudo-testcross strategy.With the established maps, the sex locus was positioned in linkage group xV only in thematernal map,and no sex linkagewas detected in thepaternal map.Consistent with previous findings in other willowspecies, our study showed that chromosome xV was the incipient sex chromosome and that females were theheterogametic sex in S.triandra. Therefore, sex in this willow species is also determined through a ZW sexdetermination system. We further performed fine mapping in the vicinity of the sex locus with SSR markers.Bycomparing the physical and genetic distances forthetarget interval encompassing the sex determination geneconfinedbySsRs,severerecombination repressionwas revealed inthesexdeterminationregioninthefemalemapTherecombinationrateintheconfined interval encompassingthesexlocuswasapproximatelyeight-fold lowerthanthe genome-wide average. This study provides critical information relevant to sex determination in S.triandra.Introductionare also present in plant sex chromosomes, with theFlowering plants have evolved a considerably morephysical lengths of PARs varying among different plants4complex sex determination system than animals,whichDifferent sex determination systems, including male het-have distinct germlines. Sexual diversity can first beerogamety (XY system)and femaleheterogamety(Zwobserved in the various floral forms, ranging from her-system), have been described in dioecious plants5. Themaphroditic tomonoecious and dioeciousor evenoverall interactions among elements such as plant hor-trioecious'.Approximately5-10%of angiosperms aremones, genetic factors, and epigenetic modificationsdetermine plant sex6.dioecious,with either heteromorphic or homomorphicsex chromosomes2. All sex chromosomes in dioeciousPopulus and Salix are sister genera in the Salicaceaeplants harbor a sex determination region (SDR), which isfamily.These two lineages diverged at least 45 millionyears ago7.8.The chromosomenumber of willows andcharacterized by suppressed recombination,leading tohaplotype divergence.Pseudo-autosomal regions (PARs)poplars is the same, and they do not exhibit a morpho-iogically differentiated sex chromosome.10. The Salica-ceae family,whose young sex chromosomes have evolvedfrom different autosomes, provides a valuable compara-Correspondence: Yingnan Chen (chenyingnan@njfu.edu.cn)ITheKeyLabofCultivarInnovationand Germplasm Improvementoftive system for studying sex differentiation in plantslSalicaceae, College of Forestry, Nanjing Forestry University, Nanjing 210037Multiple studies have reported that chromosome XIX isChinaThese authors contributed equally: Wei Li, Huaitong Wu@ The Author(s) 2020Open Access This aricleislicenedunderaCreativeCommons Atribution4.ntemationalLicene,whichpemits use,haringadaptation,distributionand reproduction@0in any medium or format, as long as you give appropriate credit to the original author(s) and the souurce, provide a link to the Creative Commons license, and indicate ifchangesweremadeTeimagesorotherthirdpartymaterialinthisarticleareincludedinthearticle'sCreativeCommonslicenseuness indicatedotherwiseinacreditlinetohemateriafmaterial is not included in the artide's Creative Cormmons license use is not permitted by statutory regulation or exceeds the permitted use, you wil need to obtainandyourintendedtpermissiondirectyfrom thecopyright holder.To view a copyof thislicense,visithttp:/creativecommons.org/licenses/by/4.0v
Li et al. Horticulture Research (2020) 7:64 Horticulture Research https://doi.org/10.1038/s41438-020-0289-1 www.nature.com/hortres ARTICLE Open Access Fine mapping of the sex locus in Salix triandra confirms a consistent sex determination mechanism in genus Salix Wei Li1 , Huaitong Wu1 , Xiaoping Li1 , Yingnan Chen1 and Tongming Yin1 Abstract Salix triandra belongs to section Amygdalinae in genus Salix, which is in a different section from the willow species in which sex determination has been well studied. Studying sex determination in distantly related willow species will help to clarify whether the sexes of different willows arise through a common sex determination system. For this purpose, we generated an intraspecific full-sib F1 population for S. triandra and constructed high-density genetic linkage maps for the crossing parents using restriction site-associated DNA sequencing and following a two-way pseudo-testcross strategy. With the established maps, the sex locus was positioned in linkage group XV only in the maternal map, and no sex linkage was detected in the paternal map. Consistent with previous findings in other willow species, our study showed that chromosome XV was the incipient sex chromosome and that females were the heterogametic sex in S. triandra. Therefore, sex in this willow species is also determined through a ZW sex determination system. We further performed fine mapping in the vicinity of the sex locus with SSR markers. By comparing the physical and genetic distances for the target interval encompassing the sex determination gene confined by SSRs, severe recombination repression was revealed in the sex determination region in the female map. The recombination rate in the confined interval encompassing the sex locus was approximately eight-fold lower than the genome-wide average. This study provides critical information relevant to sex determination in S. triandra. Introduction Flowering plants have evolved a considerably more complex sex determination system than animals, which have distinct germlines. Sexual diversity can first be observed in the various floral forms, ranging from hermaphroditic to monoecious and dioecious or even trioecious1 . Approximately 5–10% of angiosperms are dioecious, with either heteromorphic or homomorphic sex chromosomes2 . All sex chromosomes in dioecious plants harbor a sex determination region (SDR), which is characterized by suppressed recombination, leading to haplotype divergence3 . Pseudo-autosomal regions (PARs) are also present in plant sex chromosomes, with the physical lengths of PARs varying among different plants4 . Different sex determination systems, including male heterogamety (XY system) and female heterogamety (ZW system), have been described in dioecious plants5 . The overall interactions among elements such as plant hormones, genetic factors, and epigenetic modifications determine plant sex6 . Populus and Salix are sister genera in the Salicaceae family. These two lineages diverged at least 45 million years ago7,8 . The chromosome number of willows and poplars is the same, and they do not exhibit a morphologically differentiated sex chromosome9,10. The Salicaceae family, whose young sex chromosomes have evolved from different autosomes, provides a valuable comparative system for studying sex differentiation in plants11. Multiple studies have reported that chromosome XIX is © The Author(s) 2020 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/. Correspondence: Yingnan Chen (chenyingnan@njfu.edu.cn) 1 The Key Lab of Cultivar Innovation and Germplasm Improvement of Salicaceae, College of Forestry, Nanjing Forestry University, Nanjing 210037, China These authors contributed equally: Wei Li, Huaitong Wu 1234567890():,; 1234567890():,; 1234567890():,; 1234567890():,;
Li et al. Horticulture Research (2020)7:64Page 2 of 12the incipient sex chromosome in poplars'2, and both xx/Materials and methodsxy13,14 and ZW/Zzz12,15 sex determination systems havePlant materials and DNA extractionbeen observed in different poplar species. In poplars, theThe mapping population, which consisted of 152 full-sex-determining locus has been mapped to either thesib F progenies, was established in 2013 by crossing theperitelomeric region'2 or the centromeric region ofS.triandra female clone"DB447"with the male clonechromosome XIx14,16. However, no sex-determining"DB134""DB447"and"DB134"weresampledfromthegenes have been identified among the Populus speciesMaoer Mountain in Heilongjiang Province of Chinaanalyzed so far, and only candidate genes have been(permissions were granted by the local administration).reported.Forexample,themale-specificTOZ19genewasThe parental clones and progeny were maintained at theidentified asa sexdetermination candidate inP.tremulaBaima Forest Farm in Lishui in Jiangsu Province, China.and P. tremuloides'7. The sex determination region inGenomic DNA was extracted from theyoung leaves ofpoplar was found to contain a large number of Reach individual by using an E.ZN.A. Plant DNA Kitgenesi2,18, and thus, it was hypothesized that the emer-(Omega Bio-tek, Norcross, GA, USA).DNA quality wasgence of the sex determination region might have beenassessed by1%agarosegel electrophoresis, and theDNAdue to selective pressure arising from sex-specific floralconcentration was measured with a Nanodrop 2000pathogens19.In P. balsamifera, the PbRR9 gene exhibits(Thermo Scientific, MA, USA).male-biased methylation, indicating a role of epigeneticregulation in poplar sex determination20Library construction and sequencingIn willows, the sex determination locus has been con-The whole-genome sequencing (WGS)was conductedsistently mapped to chromosome XV, and only theZWwith the two crossing parents, and restriction site-sex determination system has been observed21-24. Theassociated DNA (RAD) sequencing wasperformedforreconstruction of alternate haplotypes in the SDR152 progenies of the mapping population. For the cross-revealed sequence divergence between the Z and Wingparents,twopaired-end libraries with 300-500bpchromatids22,and nohomologous genes in the SDR haveinsertsizeswereconstructedseparatelyaccordingtothebeen found between the willow and poplar223. Pucholtstandard protocol of Illumina (llumina).For each pro-et al.23 localized the sex determination locus to a 2.5-Mbgeny, the RAD library was prepared following the methoddescribed by Baird et al.30 with minor modifications.genomic region in S.viminalis that harbors 48 protein-coding genes. Further study showed that the SDR inBriefly,300 ngof genomicDNAfrom eachprogenywasS.viminalis is of limited size (~804kb)and exhibits adigested separately by using 5 U of Tap I (Takara Bio,higher SNP densityin females25.Pseudogenization andJapan) at 37°C for 60 min, and then the P1 adapter, whichthe accumulation of repetitive elements in the SDR sug-contained a forward primer site,an Illumina sequencinggest that the fundamental process of sex chromosomeprimer site and a barcode (4-8bp), was ligated to theformation occurred very swiftly after recombinationfragments. Subsequently, the Pi-ligated fragments of allceased'l. In a recent study, the SDR of S. purpurea wassamples (1μL each) were pooled and then randomlyfound to contain large palindromic repeats, and thesheared (Bioruptor) to an average size of 500bp.The entire sample was separated using 1% agarose gelSpRR9 gene was considered a putative candidate forcontrolling sexdeterminationthroughthemodulationofelectrophoresis,and the DNA fraction corresponding tothe cytokinin signaling pathway26. Whether willow exhi-300-700 bp was isolated using an E.Z.N.A. Gel Extractionbits a relatively conserved sex determination system needsKit (Omega Bio-tek,USA).Thepurified fragments wereto be explored in more willow species.subjected to end repair and the 3'-end addition of dATPS. triandra is a shrub willow belonging to sectionoverhangs,followed by theligation of a P2 adapter con-Amygdalinae in genus Salix.It is distributed widelytaining a reverse primer site and an Illumina sequencingfrom Japan to western Europe27.More recently,S. tri-primer site. Finally, the RAD library was selectively enrichedandra has received attention because of its potentialby PCR amplification with the P1-forward primer and P2-implications in insect resistance28.29. Due to the repro-reverseprimer,andthe300-700bpampliconswerepurifiedagain with the Gel Extraction Kit (Omega Bio-tech, USA).ductiveefficiency,easycultivation.andsmallgenome,Striandra is suitable for obtaining additional informationBoth WGS and RAD sequencingwere performed on theto better understand sex determination in dioeciousIluminaHiSeg X Ten platform(Illumina,USA) atShanghai Major Biological Medicine Technology follow-plants. In this study, S. triandra is used to provide newevidence of the sex determination mechanism in willow.ing the manufacturer's protocol (Illumina).Our purpose is to clarify whether the previously repor-ted willow sex determination system alsofunctions in aSequence analysis and nucleotide variant identificationwillow species belonging to a different section of genusRaw reads were assigned to each individual based on theSalix.unique barcodes and then subjected to quality control
the incipient sex chromosome in poplars12, and both XX/ XY13,14 and ZW/ZZ12,15 sex determination systems have been observed in different poplar species. In poplars, the sex-determining locus has been mapped to either the peritelomeric region12 or the centromeric region of chromosome XIX14,16. However, no sex-determining genes have been identified among the Populus species analyzed so far, and only candidate genes have been reported. For example, the male-specific TOZ19 gene was identified as a sex determination candidate in P. tremula and P. tremuloides17. The sex determination region in poplar was found to contain a large number of R genes12,18, and thus, it was hypothesized that the emergence of the sex determination region might have been due to selective pressure arising from sex-specific floral pathogens19. In P. balsamifera, the PbRR9 gene exhibits male-biased methylation, indicating a role of epigenetic regulation in poplar sex determination20. In willows, the sex determination locus has been consistently mapped to chromosome XV, and only the ZW sex determination system has been observed21–24. The reconstruction of alternate haplotypes in the SDR revealed sequence divergence between the Z and W chromatids22, and no homologous genes in the SDR have been found between the willow and poplar22,23. Pucholt et al.23 localized the sex determination locus to a 2.5-Mb genomic region in S. viminalis that harbors 48 proteincoding genes. Further study showed that the SDR in S. viminalis is of limited size (~804 kb) and exhibits a higher SNP density in females25. Pseudogenization and the accumulation of repetitive elements in the SDR suggest that the fundamental process of sex chromosome formation occurred very swiftly after recombination ceased11. In a recent study, the SDR of S. purpurea was found to contain large palindromic repeats, and the SpRR9 gene was considered a putative candidate for controlling sex determination through the modulation of the cytokinin signaling pathway26. Whether willow exhibits a relatively conserved sex determination system needs to be explored in more willow species. S. triandra is a shrub willow belonging to section Amygdalinae in genus Salix. It is distributed widely from Japan to western Europe27. More recently, S. triandra has received attention because of its potential implications in insect resistance28,29. Due to the reproductive efficiency, easy cultivation, and small genome, S. triandra is suitable for obtaining additional information to better understand sex determination in dioecious plants. In this study, S. triandra is used to provide new evidence of the sex determination mechanism in willow. Our purpose is to clarify whether the previously reported willow sex determination system also functions in a willow species belonging to a different section of genus Salix. Materials and methods Plant materials and DNA extraction The mapping population, which consisted of 152 fullsib F1 progenies, was established in 2013 by crossing the S. triandra female clone “DB447” with the male clone “DB134”. “DB447” and “DB134” were sampled from the Maoer Mountain in Heilongjiang Province of China (permissions were granted by the local administration). The parental clones and progeny were maintained at the Baima Forest Farm in Lishui in Jiangsu Province, China. Genomic DNA was extracted from the young leaves of each individual by using an E.Z.N.A. Plant DNA Kit (Omega Bio-tek, Norcross, GA, USA). DNA quality was assessed by 1% agarose gel electrophoresis, and the DNA concentration was measured with a Nanodrop 2000 (Thermo Scientific, MA, USA). Library construction and sequencing The whole-genome sequencing (WGS) was conducted with the two crossing parents, and restriction siteassociated DNA (RAD) sequencing was performed for 152 progenies of the mapping population. For the crossing parents, two paired-end libraries with 300–500 bp insert sizes were constructed separately according to the standard protocol of Illumina (Illumina). For each progeny, the RAD library was prepared following the method described by Baird et al.30 with minor modifications. Briefly, 300 ng of genomic DNA from each progeny was digested separately by using 5 U of Tap I (Takara Bio, Japan) at 37 °C for 60 min, and then the P1 adapter, which contained a forward primer site, an Illumina sequencing primer site and a barcode (4–8 bp), was ligated to the fragments. Subsequently, the P1-ligated fragments of all samples (1 µL each) were pooled and then randomly sheared (Bioruptor) to an average size of 500 bp. The entire sample was separated using 1% agarose gel electrophoresis, and the DNA fraction corresponding to 300–700 bp was isolated using an E.Z.N.A. Gel Extraction Kit (Omega Bio-tek, USA). The purified fragments were subjected to end repair and the 3′-end addition of dATP overhangs, followed by the ligation of a P2 adapter containing a reverse primer site and an Illumina sequencing primer site. Finally, the RAD library was selectively enriched by PCR amplification with the P1-forward primer and P2- reverse primer, and the 300–700 bp amplicons were purified again with the Gel Extraction Kit (Omega Bio-tech, USA). Both WGS and RAD sequencing were performed on the Illumina HiSeq X Ten platform (Illumina, USA) at Shanghai Major Biological Medicine Technology following the manufacturer’s protocol (Illumina). Sequence analysis and nucleotide variant identification Raw reads were assigned to each individual based on the unique barcodes and then subjected to quality control, Li et al. Horticulture Research (2020) 7:64 Page 2 of 12
Li et al. Horticulture Research (2020)7:64Page3of 12adapter trimming and read filtering by using FASTPmatrix of each parent and scored as a testcross marker.(version0.6.0,https://github.com/OpenGene/fastp).ReadsBased on the established genetic maps, the sex locus wasthat contained >40% low-quality bases (base quality valuemapped as a segregating morphological marker with<15)or >10% Nbases were discarded.Sequences shorterMapMaker software (version3.0).To verify the accuracythan 30 bp after trimming were also removed.of thepositioning interval,wedesigned SSRmarkers witha physical distance of 4 Mb upstream and downstreamThe resulting high-quality reads were mapped to thefrom two SNP markers that were completely linkedreference genome of S. purpurea vl.o (DOE-JGI, http://phytozome.jgi.doe.gov/pz/portal.html#linfo?alias= Org--with sex.Spurpurea) by using BWA (version 0.7.16, http://bio-bwa. sourceforge.net/) software3i with the default parameters.ResultsGATK Haplotype Caller32 was used to call nucleotideDNA sequencing dataFor each parent, WGS yielded 9.65Gb of cleanvariants, including SNPs and InDels, which were furtherfiltered according to GATK Best Practices33sequencing data on average, and the sequencing depthwas ~20xgenome coverage (Fig.1).After quality control,Linkagemapconstructiona total of 136.96 M clean reads were obtained from theTo obtain high-quality linkage maps, all the filteredtwoparents,with anaverageQ30 ratioof90.03%and angenetic markers were further screened based on the fol-averageguanine-cytosine(GC)content of 37.25%(Table1).For the 152F offspring,a total of 504.21 Gb of cleanlowing criteria: (1) average sequence depth >5x in thedata were generated, including 3622.24 M of high-qualityparents and >4x in the progeny; (2) heterozygous as leastin one parent; (3)present in ≥70% progeny; and (4) fol-clean reads with a length of 150bp (Table 1).The numberlowing a Mendelian segregation ratio. Markers with sig-of the clean reads ranged from i1.03 to 53.06M amongnificant segregation distortion (test,P<0.05)weredifferentoffspring,with an average of 23.83M.Theexcludedfromlinkagemapconstructionaverage sequencing depth for each progeny was approxi-The integrated linkage analysis was performed by usingmately 7.53x, varying from 4.06x to 16.78x (Fig.1).TheJoinMap5.0(https://www.kyazma.nl/index.php/JoinMap/averageQ30ratiowas88.88%,andtheGC contentwas), and a logarithm of odds (LOD) score threshold of 4.041.58% (Table 1).was employed to establish linkagegroups (LGs).Thefemale and malemaps were constructed with a two-wayNucleotide variant discovery and genotypingpseudo-testcross strategy.The LGs were nominatedThe high-quality reads obtained from all samples wereaccording to thealignment of the mapped markers withseparately mapped to the reference genome of S.purpureathe S.purpurea vl.o.genome assembly.The genetic dis-v1.0, and the mapped ratio ranged from 46.89% to 88.04%,tance between markers was estimated using the Kosambiwithan averageof80.34%(SupplementaryTable S1).Allmapping function34.The marker distribution in each LGmappedreadswereusedforSNPcalling,andatotalofwas analyzed using the sliding window (10cM)1,150,885 putative nucleotide variant loci were detected inapproach35.Thequality of thegenetic map was assessedboth parents.Based on genotyping information and thestringent filtering criteria described in the 'Materials andusing a haplotypemap and a heatmap3methodssection,22,830 high-quality markers wereLinkage analysis of the sex locusretained from thewholeF,population,including20,695The sex of the plants was visually recorded for the 152SNPs and 2135 InDels (Table2).Among thesemarkersprogenies.Among these progenies,77werefemale and 759188(8301SNPsand887InDels)wereonlymaternallywere male. The phenotypic data were included in the datainformative (nnxnp), 9089 (8297 SNPs and 792 InDels)nl1Fig. 1 Sequencing depth of each sample. The sampled accessions are indicated on the x-axis
adapter trimming and read filtering by using FASTP (version 0.6.0, https://github.com/OpenGene/fastp). Reads that contained >40% low-quality bases (base quality value <15) or >10% N bases were discarded. Sequences shorter than 30 bp after trimming were also removed. The resulting high-quality reads were mapped to the reference genome of S. purpurea v1.0 (DOE-JGI, http:// phytozome.jgi.doe.gov/pz/portal.html#!info?alias= Org_- Spurpurea) by using BWA (version 0.7.16, http://bio-bwa. sourceforge.net/) software31 with the default parameters. GATK Haplotype Caller32 was used to call nucleotide variants, including SNPs and InDels, which were further filtered according to GATK Best Practices33. Linkage map construction To obtain high-quality linkage maps, all the filtered genetic markers were further screened based on the following criteria: (1) average sequence depth >5× in the parents and >4× in the progeny; (2) heterozygous as least in one parent; (3) present in ≥70% progeny; and (4) following a Mendelian segregation ratio. Markers with significant segregation distortion (χ2 test, P < 0.05) were excluded from linkage map construction. The integrated linkage analysis was performed by using JoinMap 5.0 (https://www.kyazma.nl/index.php/JoinMap/ ), and a logarithm of odds (LOD) score threshold of 4.0 was employed to establish linkage groups (LGs). The female and male maps were constructed with a two-way pseudo-testcross strategy. The LGs were nominated according to the alignment of the mapped markers with the S. purpurea v1.0. genome assembly. The genetic distance between markers was estimated using the Kosambi mapping function34. The marker distribution in each LG was analyzed using the sliding window (10 cM) approach35. The quality of the genetic map was assessed using a haplotype map and a heat map36. Linkage analysis of the sex locus The sex of the plants was visually recorded for the 152 progenies. Among these progenies, 77 were female and 75 were male. The phenotypic data were included in the data matrix of each parent and scored as a testcross marker. Based on the established genetic maps, the sex locus was mapped as a segregating morphological marker with MapMaker software (version 3.0). To verify the accuracy of the positioning interval, we designed SSR markers with a physical distance of 4 Mb upstream and downstream from two SNP markers that were completely linked with sex. Results DNA sequencing data For each parent, WGS yielded 9.65 Gb of clean sequencing data on average, and the sequencing depth was ~20× genome coverage (Fig. 1). After quality control, a total of 136.96 M clean reads were obtained from the two parents, with an average Q30 ratio of 90.03% and an average guanine-cytosine (GC) content of 37.25% (Table 1). For the 152 F1 offspring, a total of 504.21 Gb of clean data were generated, including 3622.24 M of high-quality clean reads with a length of 150 bp (Table 1). The number of the clean reads ranged from 11.03 to 53.06 M among different offspring, with an average of 23.83 M. The average sequencing depth for each progeny was approximately 7.53×, varying from 4.06× to 16.78× (Fig. 1). The average Q30 ratio was 88.88%, and the GC content was 41.58% (Table 1). Nucleotide variant discovery and genotyping The high-quality reads obtained from all samples were separately mapped to the reference genome of S. purpurea v1.0, and the mapped ratio ranged from 46.89% to 88.04%, with an average of 80.34% (Supplementary Table S1). All mapped reads were used for SNP calling, and a total of 1,150,885 putative nucleotide variant loci were detected in both parents. Based on genotyping information and the stringent filtering criteria described in the ‘Materials and methods' section, 22,830 high-quality markers were retained from the whole F1 population, including 20,695 SNPs and 2135 InDels (Table 2). Among these markers, 9188 (8301 SNPs and 887 InDels) were only maternally informative (nn×np), 9089 (8297 SNPs and 792 InDels) Fig. 1 Sequencing depth of each sample. The sampled accessions are indicated on the x-axis Li et al. Horticulture Research (2020) 7:64 Page 3 of 12
Li et al. Horticulture Research (2020)7:64Page 4 of 12Table1Statistics of the WGS and RAD sequencing results in S. triandraSampleRaw data (Gb)Clean data (Gb)Clean reads (M)Clean data percentage (%)Q30 (%)GC (%)10.949.9070.0790.49DB13490.4336.40DB44710.499.4166.8989.7038.1089.63504.2188.88Progeny603.693622.2483.5241.58Total625.12523.523759.20-Haplotype maps, which revealed the missing data andTable2Summary of themarkertypes and numbers ofrecombination events of each individual intuitively,weremarkers used forgenetic map constructiongenerated for each LG in the 152 offspring (Supplemen-tary Fig.S2).The percentages of missing data and doubleSNP numberInDel numberMarker typecrossovers were less than 0.30% and 0.23%, respectively.62efxeg (1:1:1:1)Based on the pairwise recombination values of the mar-kers grouped in each LG, heat maps were generated for454hkxhk (1:2:1)4091the 19 LGs (Supplementary Fig. S3). All the heat maps8297792Imxll (1:1)demonstrated a clear trend in which the pairwise linkage8878301nnxnp (1:1)generally decreased with an increase in genetic distanceTotal213520,695between the mapped markers, indicating that the markersin each LG were precisely mapped and ordered.Note: TheImxil and'nnxnp'segregation types represent markers that areheterozygous only in the paternal or matemal parent, respectively.Collinearitybetween the genetic map and reference,andthewereonlypaternallyinformative(lmxll),genomeremaining4553(4097SNPs and456InDels)were inter-All the mapped markers were aligned to the S. purpureacrossing markers in both parents (Table 2).vl.o genome to estimate the physical distances of themarkers and to assess the collinearity between the geneticConstruction and evaluation of the high-density linkagemap and reference genome.In general, high collinearitymapwas observed between the markers and the correspondingWe constructed the paternal, maternal and consensuschromosomes (Fig.4),with the Spearman rank correla-tion coeffcient ranging from 0.99-1.00. However, theremaps for S. triandra separately.The maternal map was2193.78cMinlength,withLGsizesranging fromwerealso LGs showing discrepancies in some narrow91.25cM (LG13) to 141.64cM (LG14) (Table 3).Theregions (e.g, LGs 14, 18 and 19), which might have beenpaternal mapwas2381.93cMinlength,withLG sizesdue to different recombination rates, missing data orvarying from 90.34 cM (LG17) to 148.70 cM (LG1) (Tablecompromised marker orders in the consensus map.3).The mapped markers in the integrated genetic mapAll 22,830 markers that segregated as efxeg, hkxhk,covered at least 96.95% of the physical length of theImxll or nnxnp were used to generate a consensus mapreference genome (Table 4).The genetic-to-physical dis-tance ratios ranged from 3.78 cM/Mb (LG6) to 11.65 cM/for S.triandra. At the LOD threshold of 4.0, all of thesemarkers were successfully grouped into 19 LGs (Fig. 2 andMb (LG14) (Table 4). The marker density along eachchromosome ranged from 28.81 to 70.64 markers/Mb,Fig.S1).The number of LGs was consistent with thehaploid chromosomenumber ofwillows (2n=38).Theaveraging 49.18 markers/Mb (Table 4).established consensus map covered a genetic distance of2239.71cM, with LG sizes varying from 96.87 cM (LG6)Mappingthesexlocusand genecontent intheconfinedto 145.29 cM (LG14) (Table 3).The marker distributiongenetic intervalalong each LG was evaluated by counting the number ofThe mapping results showed that the sex locus couldmarker bins and all mapped markers using a slidingonly be mapped in the maternal map 27.07cM from thewindow of 10 cM. The average number of marker binstelemetric end of chromosome XV (Fig.5a), andnoranged from 18.00 to 69.87,with theaverage number oflinkage with sex was detected in the paternal map. Usingmapped markers varying from 43.93 to 154.27.The win-the sequences of the SNP markers co-segregating withdowwith thehighest density (26.7markersper cM)wassex, we obtained genome sequences in the confinedfound in LG16 (Fig. 3).genetic interval and developed upstream and downstream
were only paternally informative (lm×ll), and the remaining 4553 (4097 SNPs and 456 InDels) were intercrossing markers in both parents (Table 2). Construction and evaluation of the high-density linkage map We constructed the paternal, maternal and consensus maps for S. triandra separately. The maternal map was 2193.78 cM in length, with LG sizes ranging from 91.25 cM (LG13) to 141.64 cM (LG14) (Table 3). The paternal map was 2381.93 cM in length, with LG sizes varying from 90.34 cM (LG17) to 148.70 cM (LG1) (Table 3). All 22,830 markers that segregated as ef×eg, hk×hk, lm×ll or nn×np were used to generate a consensus map for S. triandra. At the LOD threshold of 4.0, all of these markers were successfully grouped into 19 LGs (Fig. 2 and Fig. S1). The number of LGs was consistent with the haploid chromosome number of willows (2n = 38). The established consensus map covered a genetic distance of 2239.71 cM, with LG sizes varying from 96.87 cM (LG6) to 145.29 cM (LG14) (Table 3). The marker distribution along each LG was evaluated by counting the number of marker bins and all mapped markers using a sliding window of 10 cM. The average number of marker bins ranged from 18.00 to 69.87, with the average number of mapped markers varying from 43.93 to 154.27. The window with the highest density (26.7 markers per cM) was found in LG16 (Fig. 3). Haplotype maps, which revealed the missing data and recombination events of each individual intuitively, were generated for each LG in the 152 offspring (Supplementary Fig. S2). The percentages of missing data and double crossovers were less than 0.30% and 0.23%, respectively. Based on the pairwise recombination values of the markers grouped in each LG, heat maps were generated for the 19 LGs (Supplementary Fig. S3). All the heat maps demonstrated a clear trend in which the pairwise linkage generally decreased with an increase in genetic distance between the mapped markers, indicating that the markers in each LG were precisely mapped and ordered. Collinearity between the genetic map and reference genome All the mapped markers were aligned to the S. purpurea v1.0 genome to estimate the physical distances of the markers and to assess the collinearity between the genetic map and reference genome. In general, high collinearity was observed between the markers and the corresponding chromosomes (Fig. 4), with the Spearman rank correlation coefficient ranging from 0.99–1.00. However, there were also LGs showing discrepancies in some narrow regions (e.g., LGs 14, 18 and 19), which might have been due to different recombination rates, missing data or compromised marker orders in the consensus map. The mapped markers in the integrated genetic map covered at least 96.95% of the physical length of the reference genome (Table 4). The genetic-to-physical distance ratios ranged from 3.78 cM/Mb (LG6) to 11.65 cM/ Mb (LG14) (Table 4). The marker density along each chromosome ranged from 28.81 to 70.64 markers/Mb, averaging 49.18 markers/Mb (Table 4). Mapping the sex locus and gene content in the confined genetic interval The mapping results showed that the sex locus could only be mapped in the maternal map 27.07 cM from the telemetric end of chromosome XV (Fig. 5a), and no linkage with sex was detected in the paternal map. Using the sequences of the SNP markers co-segregating with sex, we obtained genome sequences in the confined genetic interval and developed upstream and downstream Table 1 Statistics of the WGS and RAD sequencing results in S. triandra Sample Raw data (Gb) Clean data (Gb) Clean reads (M) Clean data percentage (%) Q30 (%) GC (%) DB134 10.94 9.90 70.07 90.49 90.43 36.40 DB447 10.49 9.41 66.89 89.70 89.63 38.10 Progeny 603.69 504.21 3622.24 83.52 88.88 41.58 Total 625.12 523.52 3759.20 — —— Table 2 Summary of the marker types and numbers of markers used for genetic map construction Marker type SNP number InDel number ef×eg (1:1:1:1) 6 2 hk×hk (1:2:1) 4091 454 lm×ll (1:1) 8297 792 nn×np (1:1) 8301 887 Total 20,695 2135 Note: The ‘lm×ll’ and ‘nn×np’ segregation types represent markers that are heterozygous only in the paternal or maternal parent, respectively. Li et al. Horticulture Research (2020) 7:64 Page 4 of 12
Page 5 of 12Li et al. Horticulture Research (2020)7:64Table 3Summaryof thelinkagemapofS.triandraMaternal mapPaternal mapConsensus mapGroupTotal markerTotal distance (cM)GroupTotal markerTotal distance (cM)GroupTotal markerTotal distance (cM)837805LG11355FLG1118.00MLG1148.70143.26882FLG2788132.74MLG2124.02LG21402142.75713LG3FLG3725136.62MLG3137.281183103.54481MLG4549LG4869FLG4101.33136.98142.71793LGSFLG5782128.83MLG5104.401296110.57872794FLG6129.78MLG6125.72LG6138796.87FLG7602138.12MLG7618140.52LG7998102.74FLG8729113.96MLG8667136.27LG81164140.66912FLG952491.60MLG9576125.62LG998.541377FLG1077591.70MLG10827107.81LG10101.10FLG11800129.95MLG11827128.47LG111353116.95FLG12512127.17MLG12612103.98LG1296797.317707191231FLG1391.25MLG13117.28LG13129.91FLG14491141.64MLG14527123.03LG14874145.29FLG15810MLG15713LG151269105.68133.58125.36FLG1614011432123.74LG162314110.86MLG16144.95FLG17702105.81MLG1772590.34LG171170139.85FLG1866796.65MLG18580132.16LG181094120.19382FLG19374102.09MLG19142.02LG19615137.16Total13,6422193.78Total13,7412381.93Total22,8302239.71sex-linked simple sequence repeat (SSR) markers. In total,methylation. Salix_newGene_2 is a homologous gene ofseven SSR markers co-segregating with the sex locus wereLRR receptor-like serine/threonine-protein kinase ERL1generated in the SDR on chromosome XV of the femalein Arabidopsis thaliana, which is important for anther(Fig. 5c), and the confined interval encompassing the sexlobeformation.TheEVM0005130genecontainsa Myb-locus(IESL)wasboundedbySSRmarkerswssr304andlike DNA-binding domain that plays an important role inwssr470, with spacing of 5.9 cM (Fig. 5b). Based on theregulating anther and pollen development; EVM0045351contains a mitogen-activated protein kinase (MAP kinase)reference genome of S.purpurea, the confined IESL cor-responded to a 6.5-Mb genomic region on chromosomedomain that may function as a regulator of pollen devel-15. On average, a 1 cM genetic length contains a 140 kbopmentandgerminationsequence in the willow genome.Thus, the recombinationIn addition to the genes involved in flower organrate in the confined IESL of the female is approximatelydevelopment,flowerdevelopment-associatedmiR156waseight-fold lower than the genome-wide average.also found in the confined IESL.miR156 has emerged asThe target region harbored 249 genes. Gene Ontologythe most important regulator in the vegetative phase(GO) terms clarified that genes involved in metabolicchange and the vegetative-to-reproductive transition inboth Arabidopsis and maize. The candidate genes relatedprocesses,cellular processes, single organism processes,reproductive processes and biological organization wereto flower development are listed in Table 5.the mostrepresented groups (Fig. 6). Six genes wereDiscussionassociated with microtubule motor activity (GO:0003777),The Salicaceae family is a valuable model system forand six genes were associated with microtubule bindingactivity(GO:0008017).EVM0038350hasamethyl-revealing the origin and evolution of plant sex chromo-domain that may be related1toDNAtransferasesomes.These genera are widely distributed around the
sex-linked simple sequence repeat (SSR) markers. In total, seven SSR markers co-segregating with the sex locus were generated in the SDR on chromosome XV of the female (Fig. 5c), and the confined interval encompassing the sex locus (IESL) was bounded by SSR markers wssr304 and wssr470, with spacing of 5.9 cM (Fig. 5b). Based on the reference genome of S. purpurea, the confined IESL corresponded to a 6.5-Mb genomic region on chromosome 15. On average, a 1 cM genetic length contains a 140 kb sequence in the willow genome. Thus, the recombination rate in the confined IESL of the female is approximately eight-fold lower than the genome-wide average. The target region harbored 249 genes. Gene Ontology (GO) terms clarified that genes involved in metabolic processes, cellular processes, single organism processes, reproductive processes and biological organization were the most represented groups (Fig. 6). Six genes were associated with microtubule motor activity (GO:0003777), and six genes were associated with microtubule binding activity (GO:0008017). EVM0038350 has a methyltransferase domain that may be related to DNA methylation. Salix_newGene_2 is a homologous gene of LRR receptor-like serine/threonine-protein kinase ERL1 in Arabidopsis thaliana, which is important for anther lobe formation. The EVM0005130 gene contains a Myblike DNA-binding domain that plays an important role in regulating anther and pollen development; EVM0045351 contains a mitogen-activated protein kinase (MAP kinase) domain that may function as a regulator of pollen development and germination. In addition to the genes involved in flower organ development, flower development-associated miR156 was also found in the confined IESL. miR156 has emerged as the most important regulator in the vegetative phase change and the vegetative-to-reproductive transition in both Arabidopsis and maize. The candidate genes related to flower development are listed in Table 5. Discussion The Salicaceae family is a valuable model system for revealing the origin and evolution of plant sex chromosomes. These genera are widely distributed around the Table 3 Summary of the linkage map of S. triandra Maternal map Paternal map Consensus map Group Total marker Total distance (cM) Group Total marker Total distance (cM) Group Total marker Total distance (cM) FLG1 837 118.00 MLG1 805 148.70 LG1 1355 143.26 FLG2 788 132.74 MLG2 882 124.02 LG2 1402 142.75 FLG3 725 136.62 MLG3 713 137.28 LG3 1183 103.54 FLG4 481 101.33 MLG4 549 136.98 LG4 869 142.71 FLG5 782 128.83 MLG5 793 104.40 LG5 1296 110.57 FLG6 872 129.78 MLG6 794 125.72 LG6 1387 96.87 FLG7 602 138.12 MLG7 618 140.52 LG7 998 102.74 FLG8 729 113.96 MLG8 667 136.27 LG8 1164 140.66 FLG9 524 91.60 MLG9 576 125.62 LG9 912 98.54 FLG10 775 91.70 MLG10 827 107.81 LG10 1377 101.10 FLG11 800 129.95 MLG11 827 128.47 LG11 1353 116.95 FLG12 512 127.17 MLG12 612 103.98 LG12 967 97.31 FLG13 770 91.25 MLG13 719 117.28 LG13 1231 129.91 FLG14 491 141.64 MLG14 527 123.03 LG14 874 145.29 FLG15 810 105.68 MLG15 713 133.58 LG15 1269 125.36 FLG16 1401 110.86 MLG16 1432 123.74 LG16 2314 144.95 FLG17 702 105.81 MLG17 725 90.34 LG17 1170 139.85 FLG18 667 96.65 MLG18 580 132.16 LG18 1094 120.19 FLG19 374 102.09 MLG19 382 142.02 LG19 615 137.16 Total 13,642 2193.78 Total 13,741 2381.93 Total 22,830 2239.71 Li et al. Horticulture Research (2020) 7:64 Page 5 of 12