器 TIANDIN ONDIVERSTTY 10M(1x)213M(20x)532M(50×)1.1G(100x) Hadoop 35s 10m54s 26m14s 51m51s Spark 7s 1m50s 5m11s 8m44s MAFFT 1m59s 3h52m14s21h54m18s3d12h41m42 KAlign 1h27m10s O TIANJIN UNIVERSITY
10M(1x) 213M(20x) 532M(50x) 1.1G(100x) Hadoop 35s 10m54s 26m14s 51m51s Spark 7s 1m50s 5m11s 8m44s MAFFT 1m59s 3h52m14s 21h54m18s 3d12h41m42 s KAlign 1h27m10s --- --- ---
器 TIANDIN ONDIVERSTTY Sequence Similarity Analysis 140000 120000 10000 80000 60000 40000 20000 147101316192225283134374043464952 MASC . MAFFT ⑨ TIANJIN UNIVERSITY
器 TIANDIN ONDIVERSTTY sequences clustering ◎◎ OOOO ○oo.6◎ final phylogenetic tree alignment and construction of subtrees O TIANJIN UNIVERSITY
器 TIANDIN ONDIVERSTTY clustering OO ○◎ 1. Sampling, Clustering Label all sequences All Map Reduce sequences Map local file system Map ⑨ TIANJIN UNIVERSITY HDFS
1.Sampling, Clustering All sequences 2. Label all sequences Map Map Map local file system HDFS ? Reduce
器 TIANDIN ONDIVERSTTY secon ds 30000 25000 20000 15000 10000 20 5000 50x ■100x Running time of different software tools on mtDNA datasets ⑨ TIANJIN UNIVERSITY
0 5000 10000 15000 20000 25000 30000 seconds 1x 20x 50x 100x Running time of different software tools on mtDNA datasets