ga Botanical Studies ORIGINAL PAPER Open Access Comparative transcriptome analysis reveals the genetic basis underlying the biosynthesis of polysaccharides in Hericium erinaceus Nan ZhangonfuTangun ZhangZiqan Yang'Chun Yang'Zhaofeng Zhang and Zuoxi Huang' Abstract Background:Hericium erinaceus,also known as lion's mane mushroom,is a widely distributed edible and medicinal msHnarbors arverse actve mea Although the c al s Results:In this study the tra ysaccharides.The transcriptomes ranged in size from 46.58 to 58.14 Mb,with the number of unigenes ranging rom 20,902 to 37,259 across the six H.erinaceus strains.Approximately 60%of the unigenes were successfully anno tated by comparing sequences against different databases,including the nonredundant (NR).GeneOn es an EGG),clusters of orthc gen metabolism.translation.transport and catabolism.and amino acid metabolism.Genes involved in polvsaccharide iosynthesis were identified,and these genes encoded phosphoglucomutase(PGM),glucose phosphate isomerase ODP-gluco pyrophosph orylase (UGP),glyco ly protein d Addit the transcriptome data of the six strains olusions Overall the present study cos in erinaceus and provide useful information for exploring the secondary metabolites in other members of the Basidiomy cetes genus. Keywords:Hericium erinaceus,RNA-Seq,Comparative transcriptome,Polysaccharide biosynthesis,Erinacines Background conditions,such as air circulation,light,temperature Hericium erinaceus,considered a delicacy in China humidity,and pH (Jiang et al.2014),thus increasing since ancient times,is a valuable edible mus nroom and the value of H erinaceus one or the d op f 018.H ents and health-pr omoting compounds (Cohen et al. 2014;Feeney et al.2014a;Feeney et al.2014b).There- ②SpringerOpen 4.0 nd in
Zhang et al. Bot Stud (2019) 60:15 https://doi.org/10.1186/s40529-019-0263-0 ORIGINAL PAPER Comparative transcriptome analysis reveals the genetic basis underlying the biosynthesis of polysaccharides in Hericium erinaceus Nan Zhang1,2, Zongfu Tang1 , Jun Zhang1 , Xin Li1 , Ziqian Yang1 , Chun Yang1 , Zhaofeng Zhang1 and Zuoxi Huang1* Abstract Background: Hericium erinaceus, also known as lion’s mane mushroom, is a widely distributed edible and medicinal fungus in Asian countries. H. erinaceus harbors diverse bioactive metabolites with anticancer, immunomodulating, anti-infammatory, antimicrobial, antihypertensive, antidiabetic and neuroprotective properties. Although the chemical synthesis processes of these bioactive metabolites are known, the biosynthetic processes remain unknown. Results: In this study, we obtained the transcriptomes of six H. erinaceus strains using next-generation RNA sequencing and investigated the characteristics of the transcriptomes and biosynthesis of bioactive compounds, especially polysaccharides. The transcriptomes ranged in size from 46.58 to 58.14 Mb, with the number of unigenes ranging from 20,902 to 37,259 across the six H. erinaceus strains. Approximately 60% of the unigenes were successfully annotated by comparing sequences against diferent databases, including the nonredundant (NR), Gene Ontology (GO), Kyoto Encyclopedia of Genes and Genomes (KEGG), clusters of orthologous groups for eukaryotic complete genomes (KOG) and Swiss-Prot databases. Most of the transcripts were putatively involved in signal transduction, carbohydrate metabolism, translation, transport and catabolism, and amino acid metabolism. Genes involved in polysaccharide biosynthesis were identifed, and these genes encoded phosphoglucomutase (PGM), glucose phosphate isomerase (PGI), UDP-glucose pyrophosphorylase (UGP), glycoside hydrolase family proteins, glycosyltransferase family proteins and other proteins. Moreover, the putative pathway for the intracellular polysaccharide biosynthesis of H. erinaceus was analyzed. Additionally, the open reading frames (ORFs) and simple sequence repeats (SSRs) were predicted from the transcriptome data of the six strains. Conclusions: Overall, the present study may facilitate the discovery of polysaccharide biosynthesis processes in H. erinaceus and provide useful information for exploring the secondary metabolites in other members of the Basidiomycetes genus. Keywords: Hericium erinaceus, RNA-Seq, Comparative transcriptome, Polysaccharide biosynthesis, Erinacines © The Author(s) 2019. This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. Background Hericium erinaceus, considered a delicacy in China since ancient times, is a valuable edible mushroom and one of the “top four treasures”, together with cubilose, trepang and shark fns (Huang 2018). Te growth of H. erinaceus is strongly infuenced by environmental conditions, such as air circulation, light, temperature, humidity, and pH (Jiang et al. 2014), thus increasing the value of H. erinaceus. In 1959, the artifcial cultivation of H. erinaceus was frst reported in China (Huang 2018). H. erinaceus is generally a good source of nutrients and health-promoting compounds (Cohen et al. 2014; Feeney et al. 2014a; Feeney et al. 2014b). Terefore, H. erinaceus is popular in Asian countries for both culinary and medicinal purposes (Friedman 2015). Open Access *Correspondence: huangzx118@126.com 1 College of Life Sciences, Neijiang Normal University, Neijiang 641100, People’s Republic of China Full list of author information is available at the end of the article
201960:1 Page2of14 Due to its anticancer,immunomodulating,hypolipi- mmatory,antimicroba were identified,such as phosphoglucomutase (PGM). e Ic,antioxidant, glucose osphate eras PGI)an UDP-glucos hy Khan 2013) the chemical isotion and physioo expression analysis,and open reading frame(ORF)and cal functions of bioactive metabolites in H.erin been pert recent year ng the playa major role in the medicinal p athways of mpounds and will be very useful aceus(Chen 2016).Guo et al (201)reported that H. for improving compound production in H.erinaceus. elles enhanc Methods Origin of strains and cultur enhanced T cells and macr The haploid monokaryotic strains of the Herinace mor effects(Wang et a 2001).The crude water-soluble rides of upregu the 21 China) ng PZH-0 (Sichuan.China). oxide(NO)and the expression of cytokines (and ceus TJH-03(Sichuan,China),and H.erinaceus TD-04 TNF-B),which might be responsible for the a Hubei,China).Among these strains,Hericiun of thi Lee et al.200 ZH-05 Is a mu ant s.Al and s m on potato de ose agar (PDA)at room rature for affect the serum triglyceride and total cholesterol con 3 weeks in darkness.The morphological characteris tents (Wang et al. 005).ng amples and ty the pharmacological mecha. the top of the medium,and the samples were immedi nisms of bioactive compounds of H.erinaceus have been ledge of the pathw in tota ee bio et al.(2017) by a e ed the Estimation of polysaccharide in H.erinaceus fruiting body of H erinaceus to investigate the biosyn- thesis lites f om six H.erinac rd w oid no lketide and sterol biosynthesi after beine by proteome analysis of H.erinaceus.These two stud. dried mycelium (5 g)after extraction from each sample es successfully provided a theoretical basis for elucidat was further ground into a powder and resuspended in 20 components. me 12 h.Th a the biosynthesis of polvsaccharides.which are the most important substance in H.erinaceus anol to reach a final concentration of 80%(v/v)and ther HG68 six strain (H.erinace d) incubated at 4'C for 12 h.Th crude polysaccharides of sam were m de The tome sequencing to in estigate the mechanism of h eri. polysaccharide content was measured by the phenol-sul haride biosynthesis.The trans criptomes uric acid method using glucose as a standard the strains were obta d with the bic nthesis of bioa tive comp unds,especially polysaccharide biosynthesis.A the total of 13 genes involved in polysaccharide biosynthesis RNAprep Pure Plant Kit(Bio TeKe,China)following the
Zhang et al. Bot Stud (2019) 60:15 Page 2 of 14 Due to its anticancer, immunomodulating, hypolipidemic, antioxidant, anti-infammatory, antimicrobial, antihypertensive, antidiabetic and neuroprotective properties (Kim et al. 2011a, b, 2013; Khan et al. 2013), many studies on the chemical isolation and physiological functions of bioactive metabolites in H. erinaceus have been performed in recent years. Among the bioactive compounds in H. erinaceus, polysaccharides play a major role in the medicinal properties of H. erinaceus (Chen 2016). Guo et al. (2012) reported that H. erinaceus polysaccharides enhanced cellular immunity and enhanced T cell function inhibited by TGF-β1. Moreover, the study demonstrated that polysaccharides enhanced T cells and macrophages to accelerate antitumor efects (Wang et al. 2001). Te crude water-soluble polysaccharides of H. erinaceus upregulated certain functional immunomodulating events mediated by activated macrophages, such as the production of nitric oxide (NO) and the expression of cytokines (IL-1β and TNF-β), which might be responsible for the anticancer properties of this mushroom (Lee et al. 2009). Additionally, the study revealed that polysaccharides can signifcantly reduce the blood glucose concentration and afect the serum triglyceride and total cholesterol contents (Wang et al. 2005). In general, H. erinaceus polysaccharides can improve immunity, provide antitumor, antiaging and other efects and have broad applications. Although the pharmacological molecular mechanisms of bioactive compounds of H. erinaceus have been researched, knowledge of the pathway involved in the biosynthesis of bioactive metabolites is limited by a lack of research. Chen et al. (2017) sequenced the genome in the monokaryotic mycelium, dikaryotic mycelium and fruiting body of H. erinaceus to investigate the biosynthesis of bioactive secondary metabolites from H. erinaceus. Zeng et al. (2018) identifed numerous proteins involved in terpenoid, polyketide and sterol biosynthesis by proteome analysis of H. erinaceus. Tese two studies successfully provided a theoretical basis for elucidating the synthesis of active components. However, these two studies did not predict genes or proteins involved in the biosynthesis of polysaccharides, which are the most important substance in H. erinaceus. In the present study, six strains (H. erinaceus sample: HT-4903, GT-06, CC-02, PZH-05, TJH-03 and TD-04) from diferent regions of China were used for transcriptome sequencing to investigate the mechanism of H. erinaceus polysaccharide biosynthesis. Te transcriptomes of the six strains were obtained by high-throughput sequencing on an Illumina platform. We identifed a set of gene clusters associated with the biosynthesis of bioactive compounds, especially polysaccharide biosynthesis. A total of 13 genes involved in polysaccharide biosynthesis were identifed, such as phosphoglucomutase (PGM), glucose phosphate isomerase (PGI) and UDP-glucose pyrophosphorylase (UGP), which are most important to polysaccharide production. Ten, functional annotation, expression analysis, and open reading frame (ORF) and simple sequence repeat (SSR) predictions were performed to detect the characteristics of the transcriptome structure. Our study will provide insights into the biosynthetic pathways of bioactive compounds and will be very useful for improving compound production in H. erinaceus. Methods Origin of strains and culture conditions Te haploid monokaryotic strains of the H. erinaceus samples included H. erinaceus HT-4903, H. erinaceus CC-02 (purchased from the Jiang du tian da Institute of Edible Fungi, Jiangsu, China), H. erinaceus GT-06 (Fujian, China), H. erinaceus PZH-05 (Sichuan, China), H. erinaceus TJH-03 (Sichuan, China), and H. erinaceus TD-04 (Hubei, China). Among these strains, Hericium erinaceus PZH-05 is a mutant strain and is mainly used for liquid fermentation processes. All of these strains were grown on potato dextrose agar (PDA) at room temperature for 3 weeks in darkness. Te morphological characteristics of the six H. erinaceus strain samples are shown in Fig. 1 and Additional fle 1: Table S1. In the third week of growth, mycelium samples were collected by scraping the top of the medium, and the samples were immediately frozen in liquid nitrogen and then stored at −80 °C for total RNA extraction. Tree biological replicates were performed for each H. erinaceus strain. Estimation of polysaccharide in H. erinaceus Te mycelium polysaccharides were extracted from the six H. erinaceus samples at the third week of growth. Te mycelium obtained from each sample was dried after being scraped from the plates of the six strains. Te dried mycelium (5 g) after extraction from each sample was further ground into a powder and resuspended in 20 volumes of water at 70 °C for 12 h. Te supernatant was collected by centrifugation, concentrated by evaporation under reduced pressure, precipitated with 95% (v/v) ethanol to reach a fnal concentration of 80% (v/v) and then incubated at 4 °C for 12 h. Te crude polysaccharides of each sample were obtained after centrifuging (4390×g, 20 min) and vacuum-drying (40 °C) the precipitate. Te polysaccharide content was measured by the phenol-sulfuric acid method using glucose as a standard. Library construction and RNA sequencing Total RNA from each sample was isolated using the RNAprep Pure Plant Kit (Bio TeKe, China) following the
Zhang etal Bot Stud 201:15 Page3of14 Fig.1 The n ceus HT-4903,b H.erinaceus GT-06.c H urity and concentr of tran sequences spectrophotometer (Thermo Scientific,USA).Equal were aligned to the four public databases:NCBI nonre. dundant protein sequences(NR;https://www.ncbi.nlm tic Orthol eq/a s(KOG proteins/), the NEBNext Ultra Directional RNA Library Prep Kit (cat#E7420,NEB,UK according to the manufacturer's ally annotated and reviewed protein sequence database wa nt025 http://www. c.uk/ Ke. d Kyo was added as arker durin the synthesis of the //www an egg/ko.html)for functional second-strand cDNA.Finally,the double-strand cDNA in the BLASTP program (E-vaue0) was digested with uracil DNA glycocasylase (UDG The Gene Ontology (GO;http://geneontology.org/ ore and hus only th nrst stranc ing was carried out on the Hiseq000(umina)platform The ssrs were detected using microsatellite identifica using a paired-end run(2x 150 bp). tion tool (MISA)software (version 1.0).The minimum ers fo m ifs of mono",dl te me assembl d 61 a的 (ve (Zhao et al.2011).The ex of trans were normalized by calculating the fragments per kik the genes involved in sucrose,fructose,mannose,and d across the six strains of
Zhang et al. Bot Stud (2019) 60:15 Page 3 of 14 manufacturer’s instructions. Te purity and concentration of RNA were determined using a NanoDrop-2000 spectrophotometer (Termo Scientifc, USA). Equal amounts of RNA from each sample that belonged to the same strain were pooled for cDNA library construction. Stranded cDNA libraries were constructed using the NEBNext Ultra Directional RNA Library Prep Kit (cat#E7420, NEB, UK) according to the manufacturer’s protocols. Briefy, mRNA was fragmented into 250– 450 bp, followed by frst-strand cDNA synthesis. Ten, dUTP was added as a marker during the synthesis of the second-strand cDNA. Finally, the double-strand cDNA was digested with uracil—DNA glycocasylase (UDG) before the PCR. Tus, only the frst strands of cDNA were retained and sequenced. Transcriptome sequencing was carried out on the HiSeq4000 (Illumina) platform using a paired-end run (2 × 150 bp). Transcriptome assembly and annotation Raw reads were fltered by removing the adaptor sequences, reads with quality lower than Q20 and reads with poly-N. De novo transcriptome assembly for each strain was performed with Trinity software (Version 2.2.0) with default parameters (fxed k-mer value of 25) (Zhao et al. 2011). Te expression levels of transcripts were normalized by calculating the fragments per kilobase of exon per million fragments mapped (FPKM) using RSEM software. Te CDS and protein sequences of transcripts were predicted using TransDecoder (http://transdecoder.github.io/). Te protein sequences were aligned to the four public databases: NCBI nonredundant protein sequences (NR; https://www.ncbi.nlm. nih.gov/refseq/about/nonredundantproteins/), Eukaryotic Orthologous Groups (KOG; https://genome.jgi.doe. gov/Tutorial/tutorial/kog.html), Swiss-Prot (a manually annotated and reviewed protein sequence database: http://www.ebi.ac.uk/uniprot), and Kyoto Encyclopedia of Genes and Genomes (KEGG) Orthology (KO; https ://www.kegg.jp/kegg/ko.html) for functional annotation using the BLASTP program (cut-of E-value <1×10−5 ). Te Gene Ontology (GO; http://geneontology.org/) annotation of the proteins was carried out using WEGO software based on the NR annotation (Ye et al. 2006). Te SSRs were detected using MIcroSAtellite identifcation tool (MISA) software (version 1.0). Te minimum repeat numbers for motifs of mono-, di-, tri-, tetra-, penta-, and hexanucleotides were set as 10, 6, 5, 5, 5, and 5, respectively. Prediction of genes involved in polysaccharide biosynthesis in H. erinaceus A BLAST search was performed for the prediction of genes participating in polysaccharide biosynthesis. Ten, the genes involved in sucrose, fructose, mannose, and galactose metabolism and shared across the six strains of H. erinaceus were detected by manual processing. Fig. 1 The morphological characteristics of H. erinaceus dikaryotic mycelium on PDA medium. a H. erinaceus HT-4903, b H. erinaceus GT-06, c H. erinaceus CC-02, d H. erinaceus PZH-05, e H. erinaceus TJH-03, f H. erinaceus TD-04
201960:15 Page4of14 genes Iv ,nhta m path PZH-05 03 d TD-04 Table D T involved in polysaccharide biosynthesis prevostudyrttheNRt most bases along the reads were above 030.and more than 96%of the re esults nd itional file base ng ha deviations of three replications. (Additional file 1:Fig.S2).These results suggested that the clean reads with high quality could be used for sub ipta PR( quent analy viner)kit (Nani gChinal was ording to the manufacturer's instructions to generate the first-strand and 40,590 transcripts for H.erinaceus HT-4903,GT-06, cDNA afte ub CC-02,PZH-05,TJH-03 and TD-04 using Trinity,respec- to valid ss Mb in H. size op 1-05 10 58 14 Mh naceus Ht-4903 The n50 values of transcrints in the six strains were 2579 bp,2220 bp,2470 bp,2946 bp,1991 bp (Shar Color and 1990 bp,respectively.Then,the ongest 10L with th 6 ina)was riliz gene was used a 22618m 222420 and the synthesized cDNA in a total reaction volume 37.259 and 28.640 unigenes with N50 values of 2195 bp, 1944bp,2111bp,2431bp,1668 bp and1708bpw Analyt wo-step quan- C-02.PZH cles 1 and 60C for espec vely (Table 30s.The 2- method was used to calculate the relative Functional annotation of the transcripts cted as Functiona annotation of the predicted genes was per ne or normalizati c 8 B 1990)a8ain00 ogical replicatesand thre rpic KEGG (Kanchisa et al.2016).KOG (Tatusov et al 2003) Swiss-Prot (Gasteiger et al 2001)and NCBI NR pro- tein databa of 19.01-65.98%of transo ngand de n CE- Thit above the dat Raw data were generated by sequencing each H.crina- Among these transcripts,only 8729-16,622 transcripts ceus strain.After the reads were filtered and subiected (33.70-40.95%of the total)were not matched with these Table1 Summary of the sequencing and assembly of six H.erinaceus strain samples Sequencing index HT-4903 GT-06 CC-02 PZH-05 TH-03 TD-04 :(Gb) 81219 01,15 c57 66 5651 613 6336 30 content( 9721 97.0 97.24 96.77 97.13 9728 of transcript 36.945 25.904 50 value of transcript (bp (Mb 2618 24915 20.90 37.259 Median length of unigene (bp) 5 1012 450 438 N50 value of unigene (bp) 2195 1944 Total bases of unigene (Mb) 26,13 256
Zhang et al. Bot Stud (2019) 60:15 Page 4 of 14 According to the NR database and KEGG annotation, the genes involved in the polysaccharide metabolism pathway were obtained. Additionally, the important genes involved in polysaccharide biosynthesis reported in a previous study were analyzed through the NR annotation results. Statistical analyses were performed with Excel (2016). All of the data are expressed as the means and standard deviations of three replications. Quantitative reverse transcriptase‑PCR (qRT‑PCR) Te HiScript® II Q RT SuperMix for qPCR (+g DNA wiper) Kit (Nanjing, China) was used according to the manufacturer’s instructions to generate the frst-strand cDNA after extracting total RNA from six samples subjected to RNA-Seq. Ten genes were selected to validate the reliability of the RNA-Seq data. Te gene-specifc primers were produced by Primer 5.0 (Additional fle 2: Dataset 1) and synthesized by Sangon Biotech (Shanghai) Co., Ltd. (Shanghai, China). ChamQTM SYBR® Color qPCR Master Mix (10 μL; Vazyme, Nanjing, China) was mixed with the gene-specifc primers, sterilized water and the synthesized cDNA in a total reaction volume of 20 μL. Reactions were performed on a qTOWER 2.2 (Analytik Jena AG, Jena, Germany). Te two-step quantitative RT-PCR program was performed at 95 °C for 30 s, followed by 40 cycles of 95 °C for 10 s and 60 °C for 30 s. Te 2−∆∆Ct method was used to calculate the relative expression level of each gene, and actin was selected as the reference gene for normalization (Livak and Schmittgen 2001). Each reaction was carried out with three biological replicates and three technical replicates. Results Illumina sequencing and de novo assembly Raw data were generated by sequencing each H. erinaceus strain. After the reads were fltered and subjected to quality control, a total of 21 to 46 million clean reads were obtained for H. erinaceus HT-4903, GT-06, CC-02, PZH-05, TJH-03 and TD-04 (Table 1). Te quality of most bases along the reads were above Q30, and more than 96% of the reads had a quality score > Q30 (Table 1 and Additional fle 1: Fig. S1). Te contents of bases A and T were very similar, as well as the C and G contents, suggesting a balance among bases across the reads (Additional fle 1: Fig. S2). Tese results suggested that the clean reads with high quality could be used for subsequent analyses. Ten, all clean reads in each strain were de novo assembled into 36,945, 40,141, 36,065, 25,905, 47,294 and 40,590 transcripts for H. erinaceus HT-4903, GT-06, CC-02, PZH-05, TJH-03 and TD-04 using Trinity, respectively (Table 1). Te size of these transcripts ranged from 46.58 Mb in H. erinaceus PZH-05 to 58.14 Mb in H. erinaceus HT-4903. Te N50 values of transcripts in the six strains were 2579 bp, 2220 bp, 2470 bp, 2946 bp, 1991 bp and 1990 bp, respectively. Ten, the longest transcript of each gene was used as a unigene. After the redundant transcripts were removed, 22,618, 24,915, 22,284, 20,902, 37,259 and 28,640 unigenes with N50 values of 2195 bp, 1944 bp, 2111 bp, 2431 bp, 1668 bp and 1708 bp were obtained for H. erinaceus HT-4903, GT-06, CC-02, PZH- 05, TJH-03 and TD-04, respectively (Table 1). Functional annotation of the transcripts Functional annotation of the predicted genes was performed using BLAST (Altschul et al. 1990) against the following six databases: GO (Ashburner et al. 2000), KEGG (Kanehisa et al. 2016), KOG (Tatusov et al. 2003), Swiss-Prot (Gasteiger et al. 2001) and NCBI NR protein databases. A total of 19.01–65.98% of transcripts returned a BLAST hit above the E-value cut-of of 10−5 (E-value < 1 × 10−5 ) from these fve databases (Table 2). Among these transcripts, only 8729–16,622 transcripts (33.70–40.95% of the total) were not matched with these Table 1 Summary of the sequencing and assembly of six H. erinaceus strain samples Sequencing index HT-4903 GT-06 CC-02 PZH-05 TJH-03 TD-04 Total number of clean reads 28,172,192 44,081,115 24,313,274 21,587,820 22,681,715 46,117,401 Total number of clean bases (Gb) 8.32 13.06 7.15 6.36 6.67 13.52 GC content (%) 56.77 56.62 56.51 57 56.13 53.36 Q30 content (%) 97.21 97.07 97.24 96.77 97.13 97.28 Total number of transcripts 36,945 40,141 36,065 25,905 47,294 40,590 N50 value of transcript (bp) 2579 2220 2470 2946 1991 1990 Total bases of transcript (Mb) 58.14 55.24 55.51 46.58 53.11 47.69 Total number of unigene 22,618 24,915 22,284 20,902 37,259 28,640 Median length of unigene (bp) 549 498 574 1012 450 438 N50 value of unigene (bp) 2195 1944 2111 2431 1668 1708 Total bases of unigene (Mb) 26,13 25,66 25,47 31 33,54 25,82
Zhang etal.Bot Stud 2019)60:15 Page5 of14 Then.we obtained the Go classification using these maintenance of the basic regulation and metabolic func terms in the six str showed highly similar pattersc among gene prod ieved.In the KEGG sified into three main categories and contained 46 level- pathway analysis,the top five clustered classes were sig two GO terms.The top five clustered classes in function nal transduction,carbohydrate metabolism,translation tage (and Table 2 Functional annotations of the de novo transcriptomes for HT-4903,GT-06,CC-02,PZH-05,TJH-03 and TD-04 Database HT-4903 GT-06 c-02 PZH-05 TJH-03 TD-04 23,116(625790 24517(61.086 22.76063.116 17.09365.98%) 30.95065.449% 23,399(5765% 15,162(41.0% 5,780B931% 4,9T741 113824394 20,7754393 14,75536 Swiss-Prot 7099192296 7631(19019% 71181974% 77982010%1 126952684% 82353058%1 Unannotated 13,723(37.1496 15,42238,426 13,239(367190 872933.709%6) 162563437% 16,622(40.95% Total 36,945(100009%) 40,141(100.009%) 36,065(100.00% 25,905100.00% 47294100.00% 40590(100.00% HT-4903■GT-06cc-02Pa-05TJH-03TD-04 55 ar Componont Fig-2assificationf
Zhang et al. Bot Stud (2019) 60:15 Page 5 of 14 databases (Table 2). Most of the transcripts in six strains were successfully annotated in each database. Ten, we obtained the GO classifcation using these transcripts. Intriguingly, the distribution of annotated genes at the level-two GO terms in the six strains showed highly similar patterns (Fig. 2). In the GO classifcation, all of the genes annotated in the GO database were classifed into three main categories and contained 46 leveltwo GO terms. Te top fve clustered classes in function were catalytic activity, metabolic process, binding, cellular process and membrane. Te percentage of annotated genes in the top fve clusters was more than 30% (Fig. 2). Tese highly enriched GO terms mainly referred to the maintenance of the basic regulation and metabolic functions of the six strains. Additionally, we performed KEGG pathway analysis to understand the biological functions and interactions among gene products. A total of 36 pathways in 5 categories were retrieved. In the KEGG pathway analysis, the top fve clustered classes were signal transduction, carbohydrate metabolism, translation, transport and catabolism, and amino acid metabolism (Fig. 3). Table 2 Functional annotations of the de novo transcriptomes for HT-4903, GT-06, CC-02, PZH-05, TJH-03 and TD-04 Database HT-4903 GT-06 CC-02 PZH-05 TJH-03 TD-04 NR 23,116 (62.57%) 24,517 (61.08%) 22,760 (63.11%) 17,093 (65.98%) 30,950 (65.44%) 23,399 (57.65%) GO 15,162 (41.04%) 15,780 (39.31%) 14,917 (41.36%) 11,382 (43.94%) 20,775 (43.93%) 14,755 (36.35%) KO 8662 (23.45%) 8893 (22.15%) 8489 (23.54%) 6004 (23.18%) 11,673 (24.68%) 8668 (21.36%) KOG 9299 (25.17%) 9210 (22.94%) 9085 (25.19%) 7196 (27.78%) 11,383 (24.07%) 8489 (20.91%) Swiss-Prot 7099 (19.22%) 7631 (19.01%) 7118 (19.74%) 7798 (30.10%) 12,695 (26.84%) 8353 (20.58%) Unannotated 13,723 (37.14%) 15,422 (38.42%) 13,239 (36.71%) 8729 (33.70%) 16,256 (34.37%) 16,622 (40.95%) Total 36,945 (100.00%) 40,141 (100.00%) 36,065 (100.00%) 25,905 (100.00%) 47,294 (100.00%) 40,590 (100.00%) Fig. 2 GO classifcation of unigenes in the six H. erinaceus strains