Three patterns of base contents Rio Arabidopsis Human
Three patterns of base contents Rice Arabidopsis Human TSS
Neighboring bases are not independet P air Observed/Expected Example TG 1.29 CT 126 Dinucleotide frequencies CC in some vertebrate AG 1.16 Squences AA 15 CA 1.15 GG 1.14 Based on 166 vertebrate TT 1.07 sequences, totaling GA 1.0 136, 731 bases(Nussinov, TC 1.00 1984) GO 0.99 AT 0.85 AC 0.84 GT 0.82 Pn≠PuPv TA 0.65 CG 0.42
Neighboring bases are not independet Pair Observed/Expected TG CT CC AG AA CA GG TT GA TC GC AT AC GT TA CG 1.29 1.26 1.18 1.16 1.15 1.15 1.14 1.07 1.04 1.00 0.99 0.85 0.84 0.82 0.65 0.42 Example: Dinucleotide frequencies in some vertebrate squences. Based on 166 vertebrate sequences, totaling 136,731 bases (Nussinov, 1984) Puv ≠ PuPv
相邻碱基对观测频率/期望频率 人类 水稻 数据来自这两个 127 1.05 GG 122 1.03 物种目前注释出 1.20 来的所有基因的 TG 1.19 DNA序列,总长 AG 0.99 CT 0.99 各为168717,208 1.13 1.13 和1,506657,427 AA 1.13 个碱基(邱杰, GC 1.02 105 2016) 0.96 100 AT 0.88 1.02 0.84 0.84 AC 0.83 0.86 A 0.75 0.77 CG 0.26 0.83
相邻碱基对 观测频率/期望频率* 人类 水稻 CC 1.27 1.05 GG 1.22 1.03 CA 1.20 1.11 TG 1.19 1.11 AG 1.18 0.99 CT 1.15 0.99 TT 1.13 1.13 AA 1.13 1.11 GC 1.02 1.11 GA 0.99 1.05 TC 0.96 1.00 AT 0.88 1.02 GT 0.84 0.84 AC 0.83 0.86 TA 0.75 0.77 CG 0.26 0.83 数据来自这两个 物种目前注释出 来的所有基因的 DNA序列,总长 各为168,717,208 和1,506,657,427 个碱基 (邱杰, 2016)
3. 2 Alignment of pairs of sequences The most basic sequence analysis task is to ask if two sequences are related This is usually done by first aligning the sequences(or parts of them) and deciding whether that alignment is more likely to have occurred because the sequences are related, or just by chance Sequence alignment is the procedure of comparing two(pairwise alignment or more(multiple sequence alignment) sequences by searching for a series of individual characters or character patterns that in the same order in the sequences
3.2 Alignment of Pairs of Sequences • The most basic sequence analysis task is to ask if two sequences are related. • This is usually done by first aligning the sequences (or parts of them) and deciding whether that alignment is more likely to have occurred because the sequences are related, or just by chance. • Sequence alignment is the procedure of comparing two (pairwise alignment) or more (multiple sequence alignment) sequences by searching for a series of individual characters or character patterns that in the same order in the sequences
Web BLAST blastx Nucleotide BLAST tblastn Protein BLAST nucleotide b nucleotide protein> translated nucleotide BLAST Genomes Enteremansm common name scentific name, or tar d Search Standalone and API BLAST Download BLAST Use BLAST API Get BLAST databases and executables all BLAST from your application Specialized searches SmartBLAST Primer- BLAST Global Align CD-search primers specific to Compare two sequences VecScreen DART Multiple Allgnment Search immunoglobulins Search sequences for Find sequences with Align sequences using and T cell receptor contaminator similar conserved domain domain and protein MOLE- BLAST Establsh taxonomy for cultured or