Mol Genet Genomics (2015)290:239-255 D0I10.1007/s00438-014-0912.7 ORIGINAL PAPER Genome-wide analysis of the MADS-box gene family in Brassica rapa(Chinese cabbage) Weike Duan·Xiaoming Song·Tongkun Liu· Zhinan Huang·Jun Ren·Xilin Hou·Ying Li Received:29 June 2014/Accepted:28 August 2014 /Published online:13 September 2014 Springer-Verlag Berlin Heidelberg 2014 Abstract The MADS-box gene family is an ancient and the type I genes.Finally,RNA-seq transcriptome data and well-studied transcription factor family that functions in quantitative real-time PCR analysis revealed that BrMADS almost every developmental process in plants.There are a genes are expressed in a tissue-specific manner similar to number of reports about the MADS-box family in differ- Arabidopsis.Interestingly,a number of BrMIKC genes ent plant species,but systematic analysis of the MADS-box showed responses to different abiotic stress treatments,sug- transcription factor family in Brassica rapa (Chinese cab- gesting a function for some of the genes in these processes bage)is still lacking.In this study,160 MADS-box tran- as well.Taken together,the characterization of the B.rapa scription factors were identified from the entire Chinese MADS-box family presented here,will certainly help in the cabbage genome and compared with the MADS-box factors selection of appropriate candidate genes and further facilitate from 21 other representative plant species.A detailed list of functional studies in Chinese cabbage. MADS proteins from these 22 species was sorted.Phylo- genetic analysis of the BrMADS genes,together with their Keywords Abiotic stress.Chinese cabbage.Genome- Arabidopsis and rice counterparts,showed that the BrMADS wide analysis.MADS-box transcription factor.qRT-PCR genes were categorised into type I (Ma,MB,My)and type II (MIKCC,MIKC*)groups,and the MIKCC proteins were further divided into 13 subfamilies.The Chinese cabbage Introduction type II group has 95 members,which is twice as much as the Arabidopsis type II group,indicating that the Chinese cab- MADS-box genes encode transcription factors that are bage type II genes have been retained more frequently than involved in developmental control and signal transduction in eukaryotes (Riechmann and Meyerowitz 1997).These Communicated by S.Hohmann genes are found in fungi (Passmore et al.1988),animals (Norman et al.1988)and plants (Sommer et al.1990; Electronic supplementary material The online version of this Yanofsky et al.1990).They constitute a large gene family. article(doi:10.1007/s00438-014-0912-7)contains supplementary which is named after a few of its earliest members,MCMI material,which is available to authorized users. (from yeast)(Passmore et al.1988),AGAMOUS (from W.Duan.X.Song.T.Liu.Z.Huang J.Ren.X.Hou. A.thaliana)(Yanofsky et al.1990).DEFICIENS (from Y.Li(☒) Antirrhinum majus)(Sommer et al.1990)and SRF(from State Key Laboratory of Crop Genetics and Germplasm Homo sapiens)(Norman et al.1988).Previous studies of Enhancement,Ministry of Agriculture,Nanjing Agricultural University,Nanjing 210095,People's Republic of China the MADS-box genes have included a thorough compari- e-mail:yingli@njau.edu.cn son analysis of their roles in plant growth and development. However,there are relatively few analyses of the response W.Duan.X.Song.T.Liu.Z.Huang.J.Ren.X.Hou.Y.Li of these genes to stress conditions.Brassica rapa ssp.peki- Key laboratory of Biology and Germplasm Enhancement nensis (Chinese cabbage)is one of the subspecies of Bras- of Crops in East China,Ministry of Agriculture,Nanjing Agricultural University,Nanjing 210095,People's Republic sica rapa.This subspecies,which originated in China,is of China one of the most economically significant vegetable crops 鱼Springer
1 3 Mol Genet Genomics (2015) 290:239–255 DOI 10.1007/s00438-014-0912-7 ORIGINAL PAPER Genome‑wide analysis of the MADS‑box gene family in Brassica rapa (Chinese cabbage) Weike Duan · Xiaoming Song · Tongkun Liu · Zhinan Huang · Jun Ren · Xilin Hou · Ying Li Received: 29 June 2014 / Accepted: 28 August 2014 / Published online: 13 September 2014 © Springer-Verlag Berlin Heidelberg 2014 the type I genes. Finally, RNA-seq transcriptome data and quantitative real-time PCR analysis revealed that BrMADS genes are expressed in a tissue-specific manner similar to Arabidopsis. Interestingly, a number of BrMIKC genes showed responses to different abiotic stress treatments, suggesting a function for some of the genes in these processes as well. Taken together, the characterization of the B. rapa MADS-box family presented here, will certainly help in the selection of appropriate candidate genes and further facilitate functional studies in Chinese cabbage. Keywords Abiotic stress · Chinese cabbage · Genomewide analysis · MADS-box transcription factor · qRT-PCR Introduction MADS-box genes encode transcription factors that are involved in developmental control and signal transduction in eukaryotes (Riechmann and Meyerowitz 1997). These genes are found in fungi (Passmore et al. 1988), animals (Norman et al. 1988) and plants (Sommer et al. 1990; Yanofsky et al. 1990). They constitute a large gene family, which is named after a few of its earliest members, MCM1 (from yeast) (Passmore et al. 1988), AGAMOUS (from A. thaliana) (Yanofsky et al. 1990), DEFICIENS (from Antirrhinum majus) (Sommer et al. 1990) and SRF (from Homo sapiens) (Norman et al. 1988). Previous studies of the MADS-box genes have included a thorough comparison analysis of their roles in plant growth and development. However, there are relatively few analyses of the response of these genes to stress conditions. Brassica rapa ssp. pekinensis (Chinese cabbage) is one of the subspecies of Brassica rapa. This subspecies, which originated in China, is one of the most economically significant vegetable crops Abstract The MADS-box gene family is an ancient and well-studied transcription factor family that functions in almost every developmental process in plants. There are a number of reports about the MADS-box family in different plant species, but systematic analysis of the MADS-box transcription factor family in Brassica rapa (Chinese cabbage) is still lacking. In this study, 160 MADS-box transcription factors were identified from the entire Chinese cabbage genome and compared with the MADS-box factors from 21 other representative plant species. A detailed list of MADS proteins from these 22 species was sorted. Phylogenetic analysis of the BrMADS genes, together with their Arabidopsis and rice counterparts, showed that the BrMADS genes were categorised into type I (Mα, Mβ, Mγ) and type II (MIKCC, MIKC*) groups, and the MIKCC proteins were further divided into 13 subfamilies. The Chinese cabbage type II group has 95 members, which is twice as much as the Arabidopsis type II group, indicating that the Chinese cabbage type II genes have been retained more frequently than Communicated by S. Hohmann. Electronic supplementary material The online version of this article (doi:10.1007/s00438-014-0912-7) contains supplementary material, which is available to authorized users. W. Duan · X. Song · T. Liu · Z. Huang · J. Ren · X. Hou · Y. Li (*) State Key Laboratory of Crop Genetics and Germplasm Enhancement, Ministry of Agriculture, Nanjing Agricultural University, Nanjing 210095, People’s Republic of China e-mail: yingli@njau.edu.cn W. Duan · X. Song · T. Liu · Z. Huang · J. Ren · X. Hou · Y. Li Key laboratory of Biology and Germplasm Enhancement of Crops in East China, Ministry of Agriculture, Nanjing Agricultural University, Nanjing 210095, People’s Republic of China
240 Mol Genet Genomics (2015)290:239-255 in Asia.Moreover,Chinese cabbage has become a vegeta- made great progress in elucidating the roles of these genes ble that is grown worldwide due to its high yield and good in plant development.Further genetic and molecular anal- quality.Thus,the growth,development and flowering time yses regarding their biological functions have focused on of this plant are significant for its yield.Recently.the Chi- flower organogenesis,which acts as the major component nese cabbage(Chiifu-401-42)genome has been sequenced, in the well-known ABCDE model:sepals(A +E),petals and this sequence can help us with the analysis of MADS- (A +B+E),stamens (B+C+E),carpels (C E),and box genes from the entire genome (Wang et al.2011).This ovules (D+E)(Zahn et al.2006).Briefly,a previous study genome has undergone triplication events since its diver- of Arabidopsis MIKC genes classified these genes into five gence from Arabidopsis (13-17 mya)(Wang et al.2011); functional classes as follows:Class A includes APETALAl however,a high degree of sequence similarity and con- (API);class B includes PISTILATA (PD)and AP3;class C served genome structure remain between these two species, includes AGAMOUS (AG):class D includes SEEDSTICK/ these traits make B.rapa a good species to use to study the AGAMOUS-LIKEII (STK/AGL11);and class E includes retention and ortholog groups of MADS-box genes dur- SEPALLATA (SEPI,SEP2,SEP3,and SEP4)(Pinyop- ing genome duplication events.Furthermore,plant growth ich et al.2003).Other MIKC genes were later identified and development are influenced greatly by numerous plant as being involved in different regulatory steps,such as: growth regulators and environmental factors. (1)Determination of flowering time genes,which include MADS proteins are characterised by the presence of a Suppressor of Overexpression Of Constansl (SOCI) conserved 58-60 amino acids long DNA-binding domain (Samach et al.2000;Moon et al.2003a,b),AGAMOUS- in the N-terminal region,which is known as the MADS LIKE GENE 24 (AGL24)(Liu et al.2008),Short Vegeta- domain,and which binds to CArG boxes (Yanofsky et al. tive Phase (SVP)(Lee et al.2007),MADS Affecting Flow- 1990).Based on the phylogenetic analysis,the plant MADS ering (MAF1/FLM),Flowering Locus c(FLC)(Michaels gene family is divided into two large lineages,type I and and Amasino 1999;Ratcliffe et al.2003)and AGL/5, type II,which were generated by an ancestral gene duplica- AGL/8(Adamczyk et al.2007);(2)Fruit ripening genes, tion event (Alvarez-Buylla et al.2000;Becker and TheiBen which include SHATTERPROOF 1-2(SHPI.SHP2)and 2003).The type I genes encode SRF-like domain proteins, FUL (Liljegren et al.2000);(3)Seed pigmentation and whereas type II genes encode MEF2-like proteins(De Bodt embryo development genes,which include TRANSPARENT et al.2003).The plant type II proteins are named MIKC due TESTA/6(TT/6)(Nesi et al.2002).Apart from reproduc- to their four domains.In addition to the MADS(M)domain. tive development,MIKC genes also function in vegetative MIKC type proteins contain the I (intervening),K(keratin- development and root development,such as AGL/2 and like)and C(C-terminal)domains (Cho et al.1999).The I AGLI7 genes (Tapia-Lopez et al.2008). domain contributes to dimer formation (Henschel et al. Some MIKCC genes have already been shown to play 2002).The K domain is characterised by a coiled-coil struc- key roles to control flowing time in Brassica,such as ture,which primarily regulates to the dimerisation of MADS BrFLCI,2,3,BcFLC,BrAGL20 and BnAP3 (Pylatuik proteins (Diaz-Riquelme et al.2009).The C domain func- et al.2003;Hong et al.2012;Liu et al.2013).For example, tions in transcriptional activation and in the formation of the overexpression of BrAGL20 can significantly affect the higher order protein complexes (Honma and Goto 2001). flowering time of B.napus,and BrFLC genes act similar to MIKC-type genes have been further divided into two sub- AtFLC,with lower expression in early-flowering Chinese groups,MIKCC and MIKC*,based on sequence divergence cabbage (Hong et al.2012).Furthermore,plant growth at the I domain (Henschel et al.2002).The MIKC*genes and development are infuenced greatly by numerous plant encode proteins that tend to have longer I domains and have growth regulators and environmental factors.Gibberel- a duplicated K domain.The type I lineage groups genes lin(GA)promotes flower formation and flowering time in with a relatively simple gene structure (only with one or biennial plants.Its involvement in flower initiation in plants two exons)that lack the K domain and that have common is well-established,and there is growing insight into the ancestors.The type I genes are subdivided into three groups, mechanisms by which floral induction is achieved (Mutasa- Ma,MB,My,based on the sequence of the MADS domain Gottgens and Hedden 2009).Salicylic acid(SA)also reg- and on the presence of additional motifs.The function of the ulates flowering time because SA-deficient plants are late type I genes appears to be restricted to female gametophyte flowering (Martinez et al.2004).Abscisic acid (ABA) (AGL80 and AGL61)and seed development (PHEI,PHE2, regulates many aspects of plant growth and development AGL23.AGL28.AGL40.AGL62)(Kohler et al.2003:Bemer (Bezerra et al.2004;Wilmowicz et al.2008).As important et al.2010:Colombo et al.2008:Masiero et al.2011). environmental stress factors,cold and heat also regulate Plant MIKC genes were first identified as floral organ plant growth and development.To learn more about the identity genes in Antirrhinum majus and in Arabidopsis response of B.rapa MADS-box genes to abiotic stresses, (Sommer et al.1990;Yanofsky et al.1990).Biologists have we selected these five treatments to explore in this study Springer
240 Mol Genet Genomics (2015) 290:239–255 1 3 in Asia. Moreover, Chinese cabbage has become a vegetable that is grown worldwide due to its high yield and good quality. Thus, the growth, development and flowering time of this plant are significant for its yield. Recently, the Chinese cabbage (Chiifu-401-42) genome has been sequenced, and this sequence can help us with the analysis of MADSbox genes from the entire genome (Wang et al. 2011). This genome has undergone triplication events since its divergence from Arabidopsis (13–17 mya) (Wang et al. 2011); however, a high degree of sequence similarity and conserved genome structure remain between these two species, these traits make B. rapa a good species to use to study the retention and ortholog groups of MADS-box genes during genome duplication events. Furthermore, plant growth and development are influenced greatly by numerous plant growth regulators and environmental factors. MADS proteins are characterised by the presence of a conserved 58–60 amino acids long DNA-binding domain in the N-terminal region, which is known as the MADS domain, and which binds to CArG boxes (Yanofsky et al. 1990). Based on the phylogenetic analysis, the plant MADS gene family is divided into two large lineages, type I and type II, which were generated by an ancestral gene duplication event (Alvarez-Buylla et al. 2000; Becker and Theißen 2003). The type I genes encode SRF-like domain proteins, whereas type II genes encode MEF2-like proteins (De Bodt et al. 2003). The plant type II proteins are named MIKC due to their four domains. In addition to the MADS (M) domain, MIKC type proteins contain the I (intervening), K (keratinlike) and C (C-terminal) domains (Cho et al. 1999). The I domain contributes to dimer formation (Henschel et al. 2002). The K domain is characterised by a coiled-coil structure, which primarily regulates to the dimerisation of MADS proteins (Díaz-Riquelme et al. 2009). The C domain functions in transcriptional activation and in the formation of higher order protein complexes (Honma and Goto 2001). MIKC-type genes have been further divided into two subgroups, MIKCC and MIKC*, based on sequence divergence at the I domain (Henschel et al. 2002). The MIKC* genes encode proteins that tend to have longer I domains and have a duplicated K domain. The type I lineage groups genes with a relatively simple gene structure (only with one or two exons) that lack the K domain and that have common ancestors. The type I genes are subdivided into three groups, Mα, Mβ, Mγ, based on the sequence of the MADS domain and on the presence of additional motifs. The function of the type I genes appears to be restricted to female gametophyte (AGL80 and AGL61) and seed development (PHE1, PHE2, AGL23, AGL28, AGL40, AGL62) (Köhler et al. 2003; Bemer et al. 2010; Colombo et al. 2008; Masiero et al. 2011). Plant MIKC genes were first identified as floral organ identity genes in Antirrhinum majus and in Arabidopsis (Sommer et al. 1990; Yanofsky et al. 1990). Biologists have made great progress in elucidating the roles of these genes in plant development. Further genetic and molecular analyses regarding their biological functions have focused on flower organogenesis, which acts as the major component in the well-known ABCDE model: sepals (A + E), petals (A + B + E), stamens (B + C + E), carpels (C + E), and ovules (D + E) (Zahn et al. 2006). Briefly, a previous study of Arabidopsis MIKC genes classified these genes into five functional classes as follows: Class A includes APETALA1 (AP1); class B includes PISTILATA (PI) and AP3; class C includes AGAMOUS (AG); class D includes SEEDSTICK/ AGAMOUS-LIKE11 (STK/AGL11); and class E includes SEPALLATA (SEP1, SEP2, SEP3, and SEP4) (Pinyopich et al. 2003). Other MIKC genes were later identified as being involved in different regulatory steps, such as: (1) Determination of flowering time genes, which include Suppressor of Overexpression Of Constans1 (SOC1) (Samach et al. 2000; Moon et al. 2003a, b), AGAMOUSLIKE GENE 24 (AGL24) (Liu et al. 2008), Short Vegetative Phase (SVP) (Lee et al. 2007), MADS Affecting Flowering (MAF1/FLM), Flowering Locus c(FLC) (Michaels and Amasino 1999; Ratcliffe et al. 2003) and AGL15, AGL18 (Adamczyk et al. 2007); (2) Fruit ripening genes, which include SHATTERPROOF 1–2 (SHP1, SHP2) and FUL (Liljegren et al. 2000); (3) Seed pigmentation and embryo development genes, which include TRANSPARENT TESTA16 (TT16) (Nesi et al. 2002). Apart from reproductive development, MIKC genes also function in vegetative development and root development, such as AGL12 and AGL17 genes (Tapia-López et al. 2008). Some MIKCC genes have already been shown to play key roles to control flowing time in Brassica, such as BrFLC1, 2, 3, BcFLC, BrAGL20 and BnAP3 (Pylatuik et al. 2003; Hong et al. 2012; Liu et al. 2013). For example, the overexpression of BrAGL20 can significantly affect the flowering time of B. napus, and BrFLC genes act similar to AtFLC, with lower expression in early-flowering Chinese cabbage (Hong et al. 2012). Furthermore, plant growth and development are influenced greatly by numerous plant growth regulators and environmental factors. Gibberellin (GA) promotes flower formation and flowering time in biennial plants. Its involvement in flower initiation in plants is well-established, and there is growing insight into the mechanisms by which floral induction is achieved (MutasaGöttgens and Hedden 2009). Salicylic acid (SA) also regulates flowering time because SA-deficient plants are late flowering (Martínez et al. 2004). Abscisic acid (ABA) regulates many aspects of plant growth and development (Bezerra et al. 2004; Wilmowicz et al. 2008). As important environmental stress factors, cold and heat also regulate plant growth and development. To learn more about the response of B. rapa MADS-box genes to abiotic stresses, we selected these five treatments to explore in this study
Mol Genet Genomics(2015)290:239-255 241 Flower development is controlled by a complex network The Pfam database (http://pfam.sanger.ac.uk/)was of interactions between transcription factors,most of them used to screen the genome assemblies of Prunus per- belonging to the MADS-box family (Airoldi and Davies sica,Arabidopsis lyrata,Capsella rubella,Thellungiella 2012).To get a better picture about the size and phylog- halophila,Solanum tuberosum,Solanum lycopersicum, eny of the MADS-box family in plants,we sorted and Aquilegia coerulea and Volvox carteri.The genome data compared the MADS-box genes from 22 different plant were downloaded from the genome browser phytozome species.To better understand these transcription factors in (http://www.phytozome.net/),and the evolutionary rela- Chinese cabbage,we determined 160 MADS-box genes tionships of these species were determined using the and analysed the phylogenetic relationships,conserved genome browser phytozome and the public database PGDD motifs,retention and ortholog groups between these Chi- (http://chibba.agtec.uga.edu/duplication/)(Lee et al.2013) nese cabbage MADS-box genes and Arabidopsis MADS- box genes.We further studied the chromosomal locations, Phylogenetic analysis gene duplication and tissue-specific expression of BrMADS genes.The expression of all of the BrMIKCC genes was In the phylogenetic tree,the Arabidopsis MADS proteins also investigated under different treatments.which included were used to classify the Chinese cabbage MADS proteins GA.SA.ABA.heat and cold. into different groups.Full-length sequences of MADS pro- teins of Chinese cabbage and Arabidopsis were aligned using the Clustalw2 program with default parameters Materials and methods (Thompson et al.1997).Then,a phylogenetic tree was then constructed by the neighbour-joining method,and boot- Identification of MADS-box gene family in Chinese strap values were calculated with 1,000 replications using cabbage MEGA5.2 (Tamura et al.2011).Additionally,an Arabidop- sis MADS proteins phylogenetic tree was used to detect the All the files that are related to Brassica genome sequence reliability of this method,and to test and verify the classi- data that were used for the identification and annotation of fication,a phylogenetic tree of Chinese cabbage,Arabidop- MADS proteins were downloaded from the Brassica data- sis,rice and grapevine was built. base (BRAD;http://brassicadb.org/brad/)(Wang et al.2011). To estimate the nucleotide divergence between Proteins with SRF-TF domains (PF00319)were retrieved sequences,all nucleotide sequences of Chinese cabbage from the Pfam 27.0 database (http://Pfam.sanger.ac.uk/) MADS-box genes were also analysed by MEGA5.2 using (Punta et al.2012).The hidden Markov model (HMM)was the Jukes-Cantor model.Bootstrap(1,000 replicates)analy- used to identify the putative MADS proteins in Chinese ses were also performed for this estimation. cabbage (Finn et al.2011).To obtain the proteins,first we used the tool hmmsearch,with an expected value (e-value) Identification of conserved motifs and gene structure cut-off 1.0.Then,we verified these sequences using the tool SMART (http://smart.embl-heidelberg.de/)(Letunic et al. To identify the conserved motifs in full-length Chinese cab- 2012),the Pfam database (http://Pfam.sanger.ac.uk/)and the bage and Arabidopsis MADS proteins,the Multiple Expec- NCBI database (http://www.ncbi.nlm.nih.gov/). tation-maximisations for Motif Elicitation (MEME)pro- gram version 4.9.0(Bailey et al.2009)was used with default Sequence retrieval parameters,except for the following parameters:(1)opti- mum motif width was set to =10 and <100;and (2)the max- The Arabidopsis thaliana MADS proteins were retrieved imum number of motifs was set to identify 15 motifs.The from the TAIR database (http://www.arabidopsis.org/) MEME motifs were annotated using the SMART program according to a previous report by Parenicova et al.(2003). (http://smart.embl-heidelberg.de)and the Pfam database. The dataset of predicted Oryza sativa MADS proteins The coding domain sequences(CDS)and DNA sequences was retrieved from previous analyses by Arora et al. of Chinese cabbage MADS-box genes were used to reveal (Arora et al.2007).A MADS-box domain was not found the gene structure using the tool GSDS (http://gsds.cbi.pku. in LOC_Os02g01360 (OsMADS60),LOC_Os12g31010 edu.cn/). (OsMADS67),and LOC_Os08g20460 (OsMADS69).The MADS proteins of Populus trichocarpa,Medicago trun- Ortholog groups of MADS-box genes in Brassica catula,Glycine max,Cucumls sativus,Citrus sinensis,Cit- and Arabidopsis genome rus clementine,Vitis vinifra,Sorghum bicolor,Zea mays, Selaginella moellendorffi and Physcomitrella paters were The program OrthoMCL (http://www.orthomcl.org/cgi- retrieved from a previous report. bin/OrthoMclWeb.cgi)(Li et al.2003)was used to identify ②Springer
Mol Genet Genomics (2015) 290:239–255 241 1 3 Flower development is controlled by a complex network of interactions between transcription factors, most of them belonging to the MADS-box family (Airoldi and Davies 2012). To get a better picture about the size and phylogeny of the MADS-box family in plants, we sorted and compared the MADS-box genes from 22 different plant species. To better understand these transcription factors in Chinese cabbage, we determined 160 MADS-box genes and analysed the phylogenetic relationships, conserved motifs, retention and ortholog groups between these Chinese cabbage MADS-box genes and Arabidopsis MADSbox genes. We further studied the chromosomal locations, gene duplication and tissue-specific expression of BrMADS genes. The expression of all of the BrMIKCC genes was also investigated under different treatments, which included GA, SA, ABA, heat and cold. Materials and methods Identification of MADS-box gene family in Chinese cabbage All the files that are related to Brassica genome sequence data that were used for the identification and annotation of MADS proteins were downloaded from the Brassica database (BRAD; http://brassicadb.org/brad/) (Wang et al. 2011). Proteins with SRF-TF domains (PF00319) were retrieved from the Pfam 27.0 database (http://Pfam.sanger.ac.uk/) (Punta et al. 2012). The hidden Markov model (HMM) was used to identify the putative MADS proteins in Chinese cabbage (Finn et al. 2011). To obtain the proteins, first we used the tool hmmsearch, with an expected value (e-value) cut-off 1.0. Then, we verified these sequences using the tool SMART (http://smart.embl-heidelberg.de/) (Letunic et al. 2012), the Pfam database (http://Pfam.sanger.ac.uk/) and the NCBI database (http://www.ncbi.nlm.nih.gov/). Sequence retrieval The Arabidopsis thaliana MADS proteins were retrieved from the TAIR database (http://www.arabidopsis.org/) according to a previous report by Parenicova et al. (2003). The dataset of predicted Oryza sativa MADS proteins was retrieved from previous analyses by Arora et al. (Arora et al. 2007). A MADS-box domain was not found in LOC_Os02g01360 (OsMADS60), LOC_Os12g31010 (OsMADS67), and LOC_Os08g20460 (OsMADS69). The MADS proteins of Populus trichocarpa, Medicago truncatula, Glycine max, Cucumls sativus, Citrus sinensis, Citrus clementine, Vitis vinifra, Sorghum bicolor, Zea mays, Selaginella moellendorffi and Physcomitrella paters were retrieved from a previous report. The Pfam database (http://pfam.sanger.ac.uk/) was used to screen the genome assemblies of Prunus persica, Arabidopsis lyrata, Capsella rubella, Thellungiella halophila, Solanum tuberosum, Solanum lycopersicum, Aquilegia coerulea and Volvox carteri. The genome data were downloaded from the genome browser phytozome (http://www.phytozome.net/), and the evolutionary relationships of these species were determined using the genome browser phytozome and the public database PGDD (http://chibba.agtec.uga.edu/duplication/) (Lee et al. 2013). Phylogenetic analysis In the phylogenetic tree, the Arabidopsis MADS proteins were used to classify the Chinese cabbage MADS proteins into different groups. Full-length sequences of MADS proteins of Chinese cabbage and Arabidopsis were aligned using the ClustalW2 program with default parameters (Thompson et al. 1997). Then, a phylogenetic tree was then constructed by the neighbour-joining method, and bootstrap values were calculated with 1,000 replications using MEGA5.2 (Tamura et al. 2011). Additionally, an Arabidopsis MADS proteins phylogenetic tree was used to detect the reliability of this method, and to test and verify the classification, a phylogenetic tree of Chinese cabbage, Arabidopsis, rice and grapevine was built. To estimate the nucleotide divergence between sequences, all nucleotide sequences of Chinese cabbage MADS-box genes were also analysed by MEGA5.2 using the Jukes-Cantor model. Bootstrap (1,000 replicates) analyses were also performed for this estimation. Identification of conserved motifs and gene structure To identify the conserved motifs in full-length Chinese cabbage and Arabidopsis MADS proteins, the Multiple Expectation-maximisations for Motif Elicitation (MEME) program version 4.9.0 (Bailey et al. 2009) was used with default parameters, except for the following parameters: (1) optimum motif width was set to ≥10 and ≤100; and (2) the maximum number of motifs was set to identify 15 motifs. The MEME motifs were annotated using the SMART program (http://smart.embl-heidelberg.de) and the Pfam database. The coding domain sequences (CDS) and DNA sequences of Chinese cabbage MADS-box genes were used to reveal the gene structure using the tool GSDS (http://gsds.cbi.pku. edu.cn/). Ortholog groups of MADS-box genes in Brassica and Arabidopsis genome The program OrthoMCL (http://www.orthomcl.org/cgibin/OrthoMclWeb.cgi) (Li et al. 2003) was used to identify
242 Mol Genet Genomics (2015)290:239-255 the homologous genes of MADS-box between Chinese completed for this cultivar;thus,this cultivar is a typical cabbage and Arabidopsis.Briefly,the tools BLASTP, cultivar for Chinese cabbage research.Seeds were grown with an e-value <le-10,and orthomclPairs were applied in pots containing a soil:vermiculite mixture (3:1)in the to find orthologs,inparalogs and coorthologs in these two greenhouse of Nanjing Agricultural University,and the species.To link these genes to chromosomes,a tool called controlled-environment growth chamber programmed is Circos (Krzywinski et al.2009)was used.In addition.the light 16 h/25 C,dark 8 h/20 C (Song et al.2013).One Cytoscape software was applied to build the network of month later,seedlings at the five-leaf stage were trans- these relationships (Shannon et al.2003). ferred to growth chambers that were set at 4 or 38 C under identical light intensity and day length as the Chromosome localisation and gene duplications cold and heat treatments.Simultaneously.for acclima- tion,some plants were cultured in 1/2 Hoagland's solu- To determine the physical locations of MADS-box genes,the tion in plastic containers,with the pH at 6.5(Jensen and starting and ending positions of all MADS-box genes on each Bassham 1966).After 5 days of acclimatisation,plants chromosome were obtained from the BRAD database.The were cultured in the following four treatments:(1)Con- Perl in-house program was used to draw the location images of trol:(2)100 uMABA:(3)100 uM GA:(4)100 uM SA. the Chinese cabbage MADS-box genes.The positions of each At 4 and 12 h after treatment,the young leaf samples were Chinese cabbage MADS-box gene on the blocks were verified collected,frozen in liquid nitrogen and stored at-70C by searching for homologous genes between Arabidopsis and for further analysis. three B.rapa subgenomes,including least fractionated (LF), medium fractionated (MFl)and most fractionated (MF2) RNA isolation and quantitative real-time PCR genomes (http://brassicadb.org/brad/searchSynteny.php) (Wang et al.2011;Cheng et al.2013). Total RNA was isolated from 100 mg of frozen tissue To determine the gene duplications,first,the CDS using an RNA kit(RNAsimply Total RNA Kit,Tiangen, sequences of Chinese cabbage MADS-box genes were Beijing,China)according to the manufacturer's instruc- blasted against each other (evalue <le-10,identity >85%), tions.Five micrograms of each sample were reverse and then Ks values were calculated for all pair-wise align- transcribed into cDNA using the PrimeScript RT rea- ments of these genes,which previously obtained by blast, gent Kit (TaKaRa).The specific primers of Chinese using the method of Nei and Gojobori as implemented in cabbage MADS-box genes and the housekeeping actin KaKs_calculator (Zhang et al.2006).Lastly,based on phy- gene(Bra028615)were designed using the Primer Pre- logenies,the nucleotide divergence (Dist <0.1)was used mier 5.0 software (Supplementary Table 11).To verify as the final standard (Lynch and Conery 2000).The pur- the primer specificity,we used the program BLAST ple lines were used to link the duplicate genes on different against the Chinese cabbage genome.The qRT-PCR chromosomes. assays were performed with three biological and three technical replicates.Each reaction was performed in a Chinese cabbage RNA-seg data analysis 20 uL reaction mixture containing a diluted cDNA sam- ple as the template,2x Power SYBR Green PCR Mas- For the expression profiling of Chinese MADS-box genes, ter Mix(Applied Biosystems),and 400 nM each of for- we utilised the Illumina RNA-seq data that were previously ward and reverse gene-specific primers.The reactions generated and analysed by Tong et al.(2013).Six tissues were performed using a MyiQ Single-Color Real-Time of B.rapa accession Chiifu-401-42,including callus,root, PCR Detection System (Bio-Rad,Hercules,CA)with stem,leaf,flower,and silique,were analysed.Two sam- the following cycling profile:94 C for 30 s,followed ples of root and leaf tissues were generated from different by 40 cycles at 94 C for 10 s,and 58 C for 30 s.A batches of plants.The transcript abundance is expressed as melting curve(61 cycles at 65 C for 10 s)was gener- fragments per kilobase of exon model per million mapped ated to verify the specificity of the amplification (Song reads (FPKM)values.Heat maps for Chinese cabbage et al.2013).The relative expression ratio of each gene MADS-box genes were generated,which have positive was calculated using the comparative C,value method FPKM values in at least one or more of the samples. (Heid et al.1996).The MADS-box gene expression cluster from each stress treatment was analysed using Plant material and treatments the Cluster program (http://bonsai.hgc.jp/~mdehoon/ software/cluster/software.htm)(Eisen et al.1998).and The Chinese cabbage cultivar Chiifu-401-42 was used the results were shown using the TreeView software for the experiments.Whole genome sequencing has been (http://jtreeview.sourceforge.net/). Springer
242 Mol Genet Genomics (2015) 290:239–255 1 3 the homologous genes of MADS-box between Chinese cabbage and Arabidopsis. Briefly, the tools BLASTP, with an e-value ≤1e−10, and orthomclPairs were applied to find orthologs, inparalogs and coorthologs in these two species. To link these genes to chromosomes, a tool called Circos (Krzywinski et al. 2009) was used. In addition, the Cytoscape software was applied to build the network of these relationships (Shannon et al. 2003). Chromosome localisation and gene duplications To determine the physical locations of MADS-box genes, the starting and ending positions of all MADS-box genes on each chromosome were obtained from the BRAD database. The Perl in-house program was used to draw the location images of the Chinese cabbage MADS-box genes. The positions of each Chinese cabbage MADS-box gene on the blocks were verified by searching for homologous genes between Arabidopsis and three B. rapa subgenomes, including least fractionated (LF), medium fractionated (MF1) and most fractionated (MF2) genomes (http://brassicadb.org/brad/searchSynteny.php) (Wang et al. 2011; Cheng et al. 2013). To determine the gene duplications, first, the CDS sequences of Chinese cabbage MADS-box genes were blasted against each other (evalue <1e−10, identity >85 %), and then Ks values were calculated for all pair-wise alignments of these genes, which previously obtained by blast, using the method of Nei and Gojobori as implemented in KaKs_calculator (Zhang et al. 2006). Lastly, based on phylogenies, the nucleotide divergence (Dist <0.1) was used as the final standard (Lynch and Conery 2000). The purple lines were used to link the duplicate genes on different chromosomes. Chinese cabbage RNA-seq data analysis For the expression profiling of Chinese MADS-box genes, we utilised the Illumina RNA-seq data that were previously generated and analysed by Tong et al. (2013). Six tissues of B. rapa accession Chiifu-401-42, including callus, root, stem, leaf, flower, and silique, were analysed. Two samples of root and leaf tissues were generated from different batches of plants. The transcript abundance is expressed as fragments per kilobase of exon model per million mapped reads (FPKM) values. Heat maps for Chinese cabbage MADS-box genes were generated, which have positive FPKM values in at least one or more of the samples. Plant material and treatments The Chinese cabbage cultivar Chiifu-401-42 was used for the experiments. Whole genome sequencing has been completed for this cultivar; thus, this cultivar is a typical cultivar for Chinese cabbage research. Seeds were grown in pots containing a soil: vermiculite mixture (3:1) in the greenhouse of Nanjing Agricultural University, and the controlled-environment growth chamber programmed is light 16 h/25 °C, dark 8 h/20 °C (Song et al. 2013). One month later, seedlings at the five-leaf stage were transferred to growth chambers that were set at 4 or 38 °C under identical light intensity and day length as the cold and heat treatments. Simultaneously, for acclimation, some plants were cultured in 1/2 Hoagland’s solution in plastic containers, with the pH at 6.5 (Jensen and Bassham 1966). After 5 days of acclimatisation, plants were cultured in the following four treatments: (1) Control; (2) 100 μM ABA; (3) 100 μM GA; (4) 100 μM SA. At 4 and 12 h after treatment, the young leaf samples were collected, frozen in liquid nitrogen and stored at −70 °C for further analysis. RNA isolation and quantitative real-time PCR Total RNA was isolated from 100 mg of frozen tissue using an RNA kit (RNAsimply Total RNA Kit, Tiangen, Beijing, China) according to the manufacturer’s instructions. Five micrograms of each sample were reverse transcribed into cDNA using the PrimeScript RT reagent Kit (TaKaRa). The specific primers of Chinese cabbage MADS-box genes and the housekeeping actin gene (Bra028615) were designed using the Primer Premier 5.0 software (Supplementary Table 11). To verify the primer specificity, we used the program BLAST against the Chinese cabbage genome. The qRT-PCR assays were performed with three biological and three technical replicates. Each reaction was performed in a 20 μL reaction mixture containing a diluted cDNA sample as the template, 2× Power SYBR Green PCR Master Mix (Applied Biosystems), and 400 nM each of forward and reverse gene-specific primers. The reactions were performed using a MyiQ Single-Color Real-Time PCR Detection System (Bio-Rad, Hercules, CA) with the following cycling profile: 94 °C for 30 s, followed by 40 cycles at 94 °C for 10 s, and 58 °C for 30 s. A melting curve (61 cycles at 65 °C for 10 s) was generated to verify the specificity of the amplification (Song et al. 2013). The relative expression ratio of each gene was calculated using the comparative Ct value method (Heid et al. 1996). The MADS-box gene expression cluster from each stress treatment was analysed using the Cluster program (http://bonsai.hgc.jp/~mdehoon/ software/cluster/software.htm) (Eisen et al. 1998), and the results were shown using the TreeView software (http://jtreeview.sourceforge.net/)
Mol Genet Genomics(2015)290:239-255 243 Results to be type I MADS-box genes (including the Ma,MB and My groups),which is comparable to that in Arabidopsis. Identification and classification of MADS-box genes To perform comparative genomic analyses,we searched in Chinese cabbage and comparative analyses for MADS protein-coding sequences in the genomes of 22 other plant species.Some of these genes have been pub- To identify the putative MADS proteins in the Chinese cab- lished previously,while others are described in this work bage genome,a HMM search resulted in the identification for the first time(Supplementary Tables 2 and 3).The evo- of 164 proteins.Subsequently,all 164 protein sequences lutionary relationships of the species and the number of were subjected to Pfam and SMART analyses,which MADS-box genes in their genomes are shown in Fig.1. resulted in the identification of 162 MADS proteins,called The data that are coloured green were for the first time BrMADS001 to BrMADS162 according to the hmmsearch analysed in this work.The pre-classified groups of these e-value (Supplementary Table 1).Simultaneously,by per- species were based on their phylogenetic relationships forming a homology search against Arabidopsis and by with Arabidopsis MADS-box genes.The data show that analyzing the gene structure,two genes were removed. the number of MADS-box genes in Alga,Bryophyta and BrMADS047 and BrMADS124 contained other functional Pteridophyta is less than that in Angiospermae.Since sev- domains,while their homologs were non-MADS genes eral whole genome duplication(WGD)events happened (Supplementary Fig.1). during angiosperm evolution,it is likely that this higher To pre-classify the Chinese MADS-box genes,a phy- number is caused by an elevated duplication frequency, logenetic relationship with Arabidopsis MADS-box genes in combination with an increased retention of MADS- was built (Supplementary Fig.2).In total,95 genes were box genes that were subjected to neofunctionalization and determined to be type II MADS-box genes (including gained important functions in angiosperm flower devel- MIKC and MIKC*),with twofold more members than that opment(Doebley and Lukens 1998;Theissen et al.2000; in Arabidopsis.However,65 of these genes were confirmed Nam et al.2003). The number of MADS-box genes in plant species Total Total Type ll genes MIKCC MIKC'Total Type I genes Ma MB My Populus trichocarpa 105 64 55 9 41 23126 Medicago fruncatufa 91 31 27 60 31524 -Glycine max 163 89 82 7 75 3714 品 Cucimls sativus 3 4 52 72 32 29 3 40 1910 Arabidopsis thaliana 108 46 3 > 24 22 16 Arabidopsis lyrata 81 44 10 21 13 3 Caosele rubela 133 9 8 4112 160 95 1 16 2 Theiungielle halophia 120 46 9 7 27 11 36 52 36 2 8 9 3 Citnis clementine 84 9 24 54 4 3 10 167 4 1 66 22 14 Solanum lycopersicu 95 6 56 7 6 63 2 1 Sorghum bicola 65 33 30 Zea mays 75 43 4 27 2 Oryza sativa 72 37 4 31 12 9 Selaginella moellendorfi 19 6 3 13 3 24 6 11 2 0 Volvox carten 2 Whole Genome Tripiication Whole Genome Duplication This work Previous work Fig.1 The evolutionary relationships of the species and the number species.The data that are coloured blue were described in this work. detail of the MADS-box family of each species.The left of this fig- and the data that are coloured green were published in previous works ure shows the evolutionary relationships of the species;the right of (colour figure online) this figure shows the number detail of the MADS-box family of each ②Springer
Mol Genet Genomics (2015) 290:239–255 243 1 3 Results Identification and classification of MADS-box genes in Chinese cabbage and comparative analyses To identify the putative MADS proteins in the Chinese cabbage genome, a HMM search resulted in the identification of 164 proteins. Subsequently, all 164 protein sequences were subjected to Pfam and SMART analyses, which resulted in the identification of 162 MADS proteins, called BrMADS001 to BrMADS162 according to the hmmsearch e-value (Supplementary Table 1). Simultaneously, by performing a homology search against Arabidopsis and by analyzing the gene structure, two genes were removed. BrMADS047 and BrMADS124 contained other functional domains, while their homologs were non-MADS genes (Supplementary Fig. 1). To pre-classify the Chinese MADS-box genes, a phylogenetic relationship with Arabidopsis MADS-box genes was built (Supplementary Fig. 2). In total, 95 genes were determined to be type II MADS-box genes (including MIKCc and MIKC*), with twofold more members than that in Arabidopsis. However, 65 of these genes were confirmed to be type I MADS-box genes (including the Mα, Mβ and Mγ groups), which is comparable to that in Arabidopsis. To perform comparative genomic analyses, we searched for MADS protein-coding sequences in the genomes of 22 other plant species. Some of these genes have been published previously, while others are described in this work for the first time (Supplementary Tables 2 and 3). The evolutionary relationships of the species and the number of MADS-box genes in their genomes are shown in Fig. 1. The data that are coloured green were for the first time analysed in this work. The pre-classified groups of these species were based on their phylogenetic relationships with Arabidopsis MADS-box genes. The data show that the number of MADS-box genes in Alga, Bryophyta and Pteridophyta is less than that in Angiospermae. Since several whole genome duplication (WGD) events happened during angiosperm evolution, it is likely that this higher number is caused by an elevated duplication frequency, in combination with an increased retention of MADSbox genes that were subjected to neofunctionalization and gained important functions in angiosperm flower development (Doebley and Lukens 1998; Theissen et al. 2000; Nam et al. 2003). Fig. 1 The evolutionary relationships of the species and the number detail of the MADS-box family of each species. The left of this figure shows the evolutionary relationships of the species; the right of this figure shows the number detail of the MADS-box family of each species. The data that are coloured blue were described in this work, and the data that are coloured green were published in previous works (colour figure online)