244 Mol Genet Genomics (2015)290:239-255 Copy number variation and differential retention Identification of conserved motifs and gene structure of MADS-box genes in Chinese cabbage To compare the differences in the protein structure,MEME A comparison of the homologous MADS-box genes in was used to identify the conserved motifs among the Chi- Arabidopsis and the three B.rapa subgenomes (LF,MF1 nese cabbage and Arabidopsis MADS proteins.The type and MF2)using the BRAD database revealed that most I and type II MADS proteins of these two species were BrMADSs on the conserved collinear blocks have been compared in separate analyses,and for each comparison, well-conserved throughout the divergent evolution of fifteen conserved motifs,named motif 1 to motif 15,were Arabidopsis and B.rapa (Cheng et al.2012)(Supplemen- identified(Supplementary Fig.5 and Fig.3).In general,the tary Table 4).The gene dosage hypothesis predicts that MADS proteins were clustered in the same subgroups and genes whose products are dose-sensitive,interacting either shared similar motif composition,which indicates func- with other proteins or in networks,should be overretained tional similarities among members of the same subgroup (Thomas et al.2006;Birchler and Veitia 2007).The type (Parenicova et al.2003).The Arabidopsis and Chinese cab- II proteins have been shown to function in large complexes bage MADS proteins were found to have similar structure during flower development,while it is still unclear how for every subgroup in type II except BrMADS031,060. the type I proteins perform their functions.Interestingly, 112 and 103 which with incomplete domains.However,in type II genes have been retained after triplication and frac- type I,the protein structure was divergent (Supplementary tionation in B.rapa at a significantly higher rate than the Fig.5).This finding indicates that the C-terminal part of type I genes(Supplementary Fig.3a).Most(74 %)type II the MADS domain in the Ma,MB and My groups is more genes were retained in two or three copies,which is sig- divergent than that in the MIKC group.In type I,except the nificantly greater than the retention of type I genes(15 % MADS domain,each of the groups shows a different motif (Supplementary Fig.3a),while more(65 %)of the type I profile,and none of these motifs can be annotated using the genes were completely lost.The proportion of homoeologs tool SMART.The protein motifs shared by the Arabidopsis retained varied among the three sub-genomes (Supplemen- and Brassica type I proteins within a clade,show that there tary Fig.3b).In the LF sub-genome,more MADS-box gene is also conservation beyond the MADS domain,although homoeologs were retained than other two sub-genomes. proteins of one clade sometimes show some variation in The retention of type II genes homoeologs among the sub- the motif profile,like for example:BrMADS118,128,144, genomes was more than that of the type I genes(Supple- 136,157 in My and BrMADS106,108,113 in MB (Supple- mentary Fig.3b). mentary Fig.5). Simultaneously,the protein structure of BrMADS was Phylogenetic and classification analysis of BrMADS genes analysed using the program MEME.As expected,the com- monly shared motifs tend to be in the same group.The To examine the phylogenetic relationships between motifs were detected by the tool SMART(Supplementary BrMADS genes in detail,independent phylogenetic trees Fig.6).It will be interesting to characterise the functions of were constructed with Arabidopsis and rice type I and type the common motifs within the newly designated groups in II proteins(Supplementary Fig.4 and Fig.2).The type I relation to the functions of these genes. proteins were divided into three subfamilies Ma(27),MB In addition to the protein structure,the gene structure (16),My(22),whereas the type II proteins were divided was also analysed.We found that all type II BrMADS genes into 13 subgroups (Supplementary Table 5).Subgroup have at least three exons.while the number of exons in the TM3-like (SOC/)consisted of the highest (16)number type I genes is at maximum two consistent with AtMADS of BrMADS type II proteins,whereas subgroup AGL12, genes(Parenicova et al.2003).Furthermore the first exon AGL6 and Bs(TT/6)had the lowest members,with only (approximately 180 bp)of type II genes conservatively three.Other subgroups contained from four to ten mem- codes the MADS domain.Supplementary Fig.6 gives an bers (Supplementary Fig.3c).In addition,in the type II overview of the structures of the Chinese cabbage MADS group,there are eleven genes that were identified as genes and proteins. MIKC*-type. Finally,we visualized the phylogenetic relationship of Ortholog groups,chromosomal localization and gene the BrMADS proteins with the Arabidopsis,rice,soybean duplication of MADS-box genes and grapevine MADS proteins by building an unrooted tree of the full-length MADS protein sequences.The phy- Most angiosperm plant lineages have experienced one or logenetic tree divided these proteins into 5 distinct sub- more rounds of ancient polyploidy (Lee et al.2013).Chi- families (MIKCC,MIKC*,Ma,MB,My)(Supplementary nese cabbage has undergone genome triplication since its Fig.2c). divergence from Arabidopsis(Wang et al.2011).Generally, Springer
244 Mol Genet Genomics (2015) 290:239–255 1 3 Copy number variation and differential retention of MADS-box genes in Chinese cabbage A comparison of the homologous MADS-box genes in Arabidopsis and the three B. rapa subgenomes (LF, MF1 and MF2) using the BRAD database revealed that most BrMADSs on the conserved collinear blocks have been well-conserved throughout the divergent evolution of Arabidopsis and B. rapa (Cheng et al. 2012) (Supplementary Table 4). The gene dosage hypothesis predicts that genes whose products are dose-sensitive, interacting either with other proteins or in networks, should be overretained (Thomas et al. 2006; Birchler and Veitia 2007). The type II proteins have been shown to function in large complexes during flower development, while it is still unclear how the type I proteins perform their functions. Interestingly, type II genes have been retained after triplication and fractionation in B. rapa at a significantly higher rate than the type I genes (Supplementary Fig. 3a). Most (74 %) type II genes were retained in two or three copies, which is significantly greater than the retention of type I genes (15 %) (Supplementary Fig. 3a), while more (65 %) of the type I genes were completely lost. The proportion of homoeologs retained varied among the three sub-genomes (Supplementary Fig. 3b). In the LF sub-genome, more MADS-box gene homoeologs were retained than other two sub-genomes. The retention of type II genes homoeologs among the subgenomes was more than that of the type I genes (Supplementary Fig. 3b). Phylogenetic and classification analysis of BrMADS genes To examine the phylogenetic relationships between BrMADS genes in detail, independent phylogenetic trees were constructed with Arabidopsis and rice type I and type II proteins (Supplementary Fig. 4 and Fig. 2). The type I proteins were divided into three subfamilies Mα (27), Mβ (16), Mγ (22), whereas the type II proteins were divided into 13 subgroups (Supplementary Table 5). Subgroup TM3-like (SOC1) consisted of the highest (16) number of BrMADS type II proteins, whereas subgroup AGL12, AGL6 and Bs (TT16) had the lowest members, with only three. Other subgroups contained from four to ten members (Supplementary Fig. 3c). In addition, in the type II group, there are eleven genes that were identified as MIKC*-type. Finally, we visualized the phylogenetic relationship of the BrMADS proteins with the Arabidopsis, rice, soybean and grapevine MADS proteins by building an unrooted tree of the full-length MADS protein sequences. The phylogenetic tree divided these proteins into 5 distinct subfamilies (MIKCC, MIKC*, Mα, Mβ, Mγ) (Supplementary Fig. 2c). Identification of conserved motifs and gene structure To compare the differences in the protein structure, MEME was used to identify the conserved motifs among the Chinese cabbage and Arabidopsis MADS proteins. The type I and type II MADS proteins of these two species were compared in separate analyses, and for each comparison, fifteen conserved motifs, named motif 1 to motif 15, were identified (Supplementary Fig. 5 and Fig. 3). In general, the MADS proteins were clustered in the same subgroups and shared similar motif composition, which indicates functional similarities among members of the same subgroup (Parenicová et al. 2003). The Arabidopsis and Chinese cabbage MADS proteins were found to have similar structure for every subgroup in type II except BrMADS031, 060, 112 and 103 which with incomplete domains. However, in type I, the protein structure was divergent (Supplementary Fig. 5). This finding indicates that the C-terminal part of the MADS domain in the Mα, Mβ and Mγ groups is more divergent than that in the MIKC group. In type I, except the MADS domain, each of the groups shows a different motif profile, and none of these motifs can be annotated using the tool SMART. The protein motifs shared by the Arabidopsis and Brassica type I proteins within a clade, show that there is also conservation beyond the MADS domain, although proteins of one clade sometimes show some variation in the motif profile, like for example: BrMADS118, 128, 144, 136, 157 in Mγ and BrMADS106, 108, 113 in Mβ (Supplementary Fig. 5). Simultaneously, the protein structure of BrMADS was analysed using the program MEME. As expected, the commonly shared motifs tend to be in the same group. The motifs were detected by the tool SMART (Supplementary Fig. 6). It will be interesting to characterise the functions of the common motifs within the newly designated groups in relation to the functions of these genes. In addition to the protein structure, the gene structure was also analysed. We found that all type II BrMADS genes have at least three exons, while the number of exons in the type I genes is at maximum two consistent with AtMADS genes (Parenicová et al. 2003). Furthermore the first exon (approximately 180 bp) of type II genes conservatively codes the MADS domain. Supplementary Fig. 6 gives an overview of the structures of the Chinese cabbage MADS genes and proteins. Ortholog groups, chromosomal localization and gene duplication of MADS-box genes Most angiosperm plant lineages have experienced one or more rounds of ancient polyploidy (Lee et al. 2013). Chinese cabbage has undergone genome triplication since its divergence from Arabidopsis (Wang et al. 2011). Generally
Mol Genet Genomics(2015)290:239-255 245 E (SEP1/2/3) AGL6 A (AP1/FU儿U CAL) AGL12 -030 MIKC 014 C/D BrMAD (AG/ STK/ SHP1/2) FLC MAF TM3-like (S0C1) (AP3/PD Bs Arabidopsis type ll MADS-box genes (TT16) ◆ Rice type ll AGL17 MADS-box genes AGL18/15 SVP Fig.2 Phylogenetic tree of Chinese cabbage,Arabidopsis and rice rice (41)showing similar groups in all of the plant species.In total type II MADS-box proteins.Phylogenetic analysis of 182 type II 13 clades with different colours that were formed by type II MADS MADS proteins from Chinese cabbage (95),Arabidopsis (46)and proteins are also marked (colour figure online) the gene number in the Chinese cabbage genome was nota- the correlation of the MADS-box genes in Chinese cabbage bly less than three times the Arabidopsis gene number and Arabidopsis,the networks of MADS-box genes were because some genes were lost during polyploidy speciation. constructed using these two species orthologous (Sup- Additionally,both segmental and tandem gene duplications plementary Fig.7).Among the orthologous gene pairs of have significant impacts on the expansion and evolution of Chinese cabbage and Arabidopsis,16 Arabidopsis MADS- gene families in plant genomes.In this study,we analysed box genes were found no ortholog with Chinese cabbage the ortholog groups between Chinese cabbage and Arabi- MADS-box genes,these genes have been duplicated in dopsis MADS-box genes using the OrthoMCL program. Arabidopsis after the split.Fifty Arabidopsis MADS-box Then,we identified 67 orthologous gene pairs and 120 co- genes have only one ortholog in Chinese cabbage,these orthologous gene pairs in the MADS proteins of these two genes were present before the split,but two of the three species (Supplementary Table 6).Their visualisation was copies have been lost after the B.rapa genome triplica- performed using the Circos software (Fig.4).Among the tion (Supplementary Fig.7a),and 42 Arabidopsis genes orthologous gene pairs of Chinese cabbage and Arabidop- have co-orthologs in Chinese cabbage,these genes were sis,we found more Chinese cabbage MADS-box homolo- preferentially retained after the triplication (Supplemen- gous genes in Arabidopsis chromosome 5 and chromosome tary Fig.7b,c and d).Meanwhile,we found 71 and 60 in I than in other chromosomes.To further obtain insight into paralogous gene pairs in Arabidopsis and Chinese cabbage, Springer
Mol Genet Genomics (2015) 290:239–255 245 1 3 the gene number in the Chinese cabbage genome was notably less than three times the Arabidopsis gene number because some genes were lost during polyploidy speciation. Additionally, both segmental and tandem gene duplications have significant impacts on the expansion and evolution of gene families in plant genomes. In this study, we analysed the ortholog groups between Chinese cabbage and Arabidopsis MADS-box genes using the OrthoMCL program. Then, we identified 67 orthologous gene pairs and 120 coorthologous gene pairs in the MADS proteins of these two species (Supplementary Table 6). Their visualisation was performed using the Circos software (Fig. 4). Among the orthologous gene pairs of Chinese cabbage and Arabidopsis, we found more Chinese cabbage MADS-box homologous genes in Arabidopsis chromosome 5 and chromosome 1 than in other chromosomes. To further obtain insight into the correlation of the MADS-box genes in Chinese cabbage and Arabidopsis, the networks of MADS-box genes were constructed using these two species orthologous (Supplementary Fig. 7). Among the orthologous gene pairs of Chinese cabbage and Arabidopsis, 16 Arabidopsis MADSbox genes were found no ortholog with Chinese cabbage MADS-box genes, these genes have been duplicated in Arabidopsis after the split. Fifty Arabidopsis MADS-box genes have only one ortholog in Chinese cabbage, these genes were present before the split, but two of the three copies have been lost after the B. rapa genome triplication (Supplementary Fig. 7a), and 42 Arabidopsis genes have co-orthologs in Chinese cabbage, these genes were preferentially retained after the triplication (Supplementary Fig. 7b, c and d). Meanwhile, we found 71 and 60 in paralogous gene pairs in Arabidopsis and Chinese cabbage, Fig. 2 Phylogenetic tree of Chinese cabbage, Arabidopsis and rice type II MADS-box proteins. Phylogenetic analysis of 182 type II MADS proteins from Chinese cabbage (95), Arabidopsis (46) and rice (41) showing similar groups in all of the plant species. In total, 13 clades with different colours that were formed by type II MADS proteins are also marked (colour figure online)