器 TIANDIN ONDIVERSTTY Experiments 100 human mitochondria genome sequences 16k length(1555KB) Center Suffix treeTrie K-bandExtremeExtreme Star center center center Trie suffix tree star star star Running 129332s 248s 156s109s 19.7s 5.4s time Our output 1558KB ClustalQ2 1627KB ⑨ TIANJIN UNIVERSITY
Experiments • 100 human mitochondria genome sequences • 16k length (1555KB) Center Star Suffix tree center star Trie center star K-band center star Extreme Trie Extreme suffix tree Running time 12933.2s 24.8s 15.6s 10.9s 19.7s 5.4s • Our output 1558KB • ClustalΩ 1627KB
器 TIANDIN ONDIVERSTTY Time cost of every steps Input data Build tree Pairwise alignment sum up spaces Output Get aligned cente O TIANJIN UNIVERSITY
Time cost of every steps
器 TIANDIN ONDIVERSTTY Outline Sequence alignment Algorithm Parallel Identification and mining microrna machine learning related works Function prediction miRNA disease relationship crops yield related genes O TIANJIN UNIVERSITY
• Sequence alignment – Algorithm – Parallel • Identification and mining – microRNA – machine learning related works • Function prediction – miRNA disease relationship – crops yield related genes Outline
器 TIANDIN ONDIVERSTTY Multiple sequence alignment in Hadoop input fasta file localfile output aligned result system 筆重 Map HIDES Reduce Map Reduce S centel SA SL: ACGTGAC SASI AC-GTGA-CH A-C-GTGA--C S sum u S2 S2: AGTGACG I S2: AGT--G-ACG I AG-T--G-ACG center S3: AGGCGTG A 4 S3': AG-GCGT--G AG-GC-GT--G M 4: GCGTGCG I M S4: G-CGTG-CG S5: TGCGTAC a S5: TGCG--TAC 4 S6: GGCAGTG S6’:GGC-AGTG SI: AC-GTGA--C cenrer S2: AGT--G-ACG aligned Sn-2: GCTGTO S3: AG-GCGT--G A Sn-2: GC-TGTC Sn-1: GCAGTG result ●。● Sn: GCAGTGC 4sn: GC-AGTGC ⑨ TIANJIN UNIVERSITY
local file system input fasta file HDFS Multiple sequence alignment in Hadoop
器 TIANDIN ONDIVERSTTY Multiple sequence alignment in Spark ast center Center star pairwise l sequence space pairwise 1 sequence space pairwise 2 Data Node 1 Data node 1 Reduce 1 Reduce 1 RDD & space pairwise 3 rdd mar Transfer HDFS pairwise 4 Reduce 2 4 Reduce 2 Data node 2 Data node 2 Reduce N Reduce n pairwise space pairwise… Data node n Data Node N Distributed file system Pairwise alignment Reduce phase nserted spaces ahgnment Reduce phase O TIANJIN UNIVERSITY
Multiple sequence alignment in Spark