原始数据处理分细胞(UMI-tools)R@A00228:279:HFWFVDMXX:1:1101:8486:10001:N:0:NCATTACTuMI-toolsNGTGATTAGCTGTACTCGTATGTAAGGT?#FFFFFFFFFFFFFFFFFFFFFFFFFFF@A00228:279:HFWFVDMXX:1:1101:10782:10001:N:0:NCATTACTNTCATGAAGTTTGGCTAGTTATGTTCATToolsfordealingwithUniqueMolecularIdentifiers#FFFFFFFFFFFFFFFFFFFFFFFFFFFStep 1:get datawhitelist.txtStep2:IdentifycorrectcellbarcodesAAACCCAAGGAGAGTAAAAACCAAGGAGAGTA,AAACACAAGGAGAGTA,.5053151,67,Step3:ExtractbarcdoesandUMlsandaddtoreadnamesAAACGCTTCAGCCCAGAAAAGCTTCAGCCCAG,AAACACTTCAGCCCAG,4607147,11,Step4:MapreadsAAAGAACAGACGACTGAAACAACAGACGACTG, AAAGAAAAGACGACTG,.334661,28,Step5:AssignreadstogenesAAAGAACCAATGGCAGAAAAAACCAATGGCAG,AAAGAAACAATGGCAG,..21680Step6:CountUMispergenepercell1,28,-AAAAAACGTCTGCAAT,AAACAACGTCTGCAAT,AAAGAACGTCTGCAAT465382,1,.AAAGGATAGTAGACATAAAAGATAGTAGACAT,AAACGATAGTAGACAT,564932,1.>简介细胞类型注释实例分析原始数据处理表达矩阵处理和可视化
原始数据处理——分细胞(UMI-tools) 简介 原始数据处理 表达矩阵处理和可视化 细胞类型注释 实例分析 Step 1: get data Step 2: Identify correct cell barcodes Step 3: Extract barcdoes and UMIs and add to read names Step 4: Map reads Step 5: Assign reads to genes Step 6: Count UMIs per gene per cell @A00228:279:HFWFVDMXX:1:1101:8486:1000 1:N:0:NCATTACT NGTGATTAGCTGTACTCGTATGTAAGGT + #FFFFFFFFFFFFFFFFFFFFFFFFFFF @A00228:279:HFWFVDMXX:1:1101:10782:1000 1:N:0:NCATTACT NTCATGAAGTTTGGCTAGTTATGTTCAT + #FFFFFFFFFFFFFFFFFFFFFFFFFFF R1 AAACCCAAGGAGAGTA AAAACCAAGGAGAGTA,AAACACAAGGAGAGTA,. 50531 51,67,. AAACGCTTCAGCCCAG AAAAGCTTCAGCCCAG,AAACACTTCAGCCCAG,. 46071 47,11,. AAAGAACAGACGACTG AAACAACAGACGACTG,AAAGAAAAGACGACTG,. 33466 1,28,. AAAGAACCAATGGCAG AAAAAACCAATGGCAG,AAAGAAACAATGGCAG,. 21680 1,28,. AAAGAACGTCTGCAAT AAAAAACGTCTGCAAT,AAACAACGTCTGCAAT,. 46538 2,1,. AAAGGATAGTAGACAT AAAAGATAGTAGACAT,AAACGATAGTAGACAT,. 56493 2,1. whitelist.txt
原始数据处理一分细胞(UMl-tools)R1extracted@A00228:279:HFWFVDMXX:1:1101:8486:1000_NGTGATTAGCTGTACTCGTATGTAAGGT1:N:@:NCATTACTuMI-tools@A00228:279:HFWFVDMXX:1:1101:10782:1000_NTCATGAAGTTTGGCT_AGTTATGTTCAT1:N:0:NCATTACTTools for dealing with Unique Molecular IdentifiersR2extractedStep 1:get data@A00228:279:HFWFVDMXX:1:1101:8486:1000_NGTGATTAGCTGTACTCGTATGTStep2:IdentifycorrectcellbarcodesAAGGT2:N:0:NCATTACTNACAAAGTCCCCCCCATAATACAGGGGGAGCCACTTGGGCAGGAGGCAGGGAGGGGTCCATTCStep3:ExtractbarcdoesandUMlsandaddtoreadnamesCCCCTGGTGGGGCTGGTGGGGAGCTGTA+Step4:Mapreads#FFFFFFFFFFFFFFF:FFFFFFF:FFFFFFFFFFFFFFFFFFFFFEFFFF:FFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFStep5:Assignreadstogenes@A00228:279:HFWFVDMXX:1:1101:10782:1000_NTCATGAAGTTTGGCT_AGTTATGTTCAT2:N:O:NCATTACTStep6:CountUMispergenepercellNTTGCAGCTGAACTGGTAAACTTGTCCCTAAAGAGACATAAGAATGGTCAACTGGAATGTGGATTCATCTGTAACATTACTCAGTGGGCCT#FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF>简介实例分析原始数据处理表达矩阵处理和可视化细胞类型注释
原始数据处理——分细胞(UMI-tools) 简介 原始数据处理 表达矩阵处理和可视化 细胞类型注释 实例分析 Step 1: get data Step 2: Identify correct cell barcodes Step 3: Extract barcdoes and UMIs and add to read names Step 4: Map reads Step 5: Assign reads to genes Step 6: Count UMIs per gene per cell @A00228:279:HFWFVDMXX:1:1101:8486:1000_NGTGATTAGCTGTACT_CGTATGT AAGGT 2:N:0:NCATTACT NACAAAGTCCCCCCCATAATACAGGGGGAGCCACTTGGGCAGGAGGCAGGGAGGGGTCCATTC CCCCTGGTGGGGCTGGTGGGGAGCTGTA + #FFFFFFFFFFFFFFF:FFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFF FFFFF:FFFFFFFFFFFFFFFFFFFFFF @A00228:279:HFWFVDMXX:1:1101:10782:1000_NTCATGAAGTTTGGCT_AGTTAT GTTCAT 2:N:0:NCATTACT NTTGCAGCTGAACTGGTAAACTTGTCCCTAAAGAGACATAAGAATGGTCAACTGGAATGTGGA TTCATCTGTAACATTACTCAGTGGGCCT + #FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF FFFFFFFFFFFFFFFFFFFFFFFFFFFF @A00228:279:HFWFVDMXX:1:1101:8486:1000_NGTGATTAGCTGTACT_CGTATGT AAGGT 1:N:0:NCATTACT + @A00228:279:HFWFVDMXX:1:1101:10782:1000_NTCATGAAGTTTGGCT_AGTTAT GTTCAT 1:N:0:NCATTACT + R1_extracted R2_extracted
原始数据处理比对(STAR)·STAR比对结果STAR_res/Aligned.sortedByCoord.out.bamSTAR:ultrafastuniversal RNA-seqalignerLog.final.outAlexanderDobin',,CarrieA.Davis!,FelixSchlesingeri,JorgDrenkow',ChrisZaleski!Sonali Jhai.PhiippeBatut,Mark Chaisson?andThomasR,GingerasLog.outCold Spring Harbor Laboratory, Cald Spring Harbor, NY, USA and "Pacific Bioscienoes, Menlo Park, CA, USALog.progress.outAeecciate Edtor:inanc BiroiSJ.out.tabSTARtmpAlexander DobinteutobinBAMsort01Step 1:get data2Step2:ldentifycorrectcellbarcodesmStep3:ExtractbarcdoesandUMisandaddtoreadnaTGGAEOStep4:MapreadsEStep5:Assign readstogenesStep6:Count UMis pergeneper cellGGSAeCSA.ATe.7r.r.,F.,1.tTFATTCOCAETCGTCC::8:1:89 ::9>简介原始数据处理实例分析表达矩阵处理和可视化细胞类型注释
原始数据处理——比对(STAR) 简介 原始数据处理 表达矩阵处理和可视化 细胞类型注释 实例分析 Step 1: get data Step 2: Identify correct cell barcodes Step 3: Extract barcdoes and UMIs and add to read names Step 4: Map reads Step 5: Assign reads to genes Step 6: Count UMIs per gene per cell • STAR比对结果
Started job onOct 1210:07:39Started mapping on0ct1210:36:00原始数据处理比对(STAR)Finished onOct 1212:21:40Million of reads per hour37.82Mapping speed,STAR比对结果文件(bam/sam)Number of input reads6660188791Average input read lengthA00228:279:HFWFVDMXX:1:1102:1108:22388_CTCAAGAGTCAAAGAT_TTTTGTCAATAGUNIQUE READS:011447325585M583N6M*58334733Uniquely mapped reads numbere0Uniquelymapped reads%87.59%GGCTGGGTGGAGCCGTCCCCCCATGGAGCACAGGCAGACAAAAGTCCCCGCCCCAGCTG89.27Average mapped lengthTGTGGCCTCAAGCCAGCCTGCGCCACTGTGTTNumber of splices: Total9210314FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF,FNumberof splices: Annotated (sjdb)9023795FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFNH:i:1HI:i:1AS:i:79Number of splices:GT/AG9116393nM:i:5Number of splices:27095GC/AG1644Number of splices:AT/ACColFieldBrief Description65182Numberof splices:Non-canonical1QNAMEQuerytemplateNAME0.54%Mismatch rate per base,%0.01%2Deletion rate per baseFLAGbitwiseFLAG1.55Deletion averagelength3RNAMEReferences sequence NAME0.02%Insertion rate per baseInsertion average length1.384POS1-based leftmostmappingPositionMULTI-MAPPING READS:5MAPQMappingQuality0Number of reads mapped to multipleloci10.00%% of readslocimappedtomultiple6CIGARCIGAR StringNumber of readstotoo manyloci5401311mapped7MRNM/RNEXTRef.nameof themate/next read% of reads mappedloci8.11%totoo manyUNMAPPEDREADS:8MPOS/NEXTPosition of the mate/next readeNumber of reads unmapped:too manymismatches9ISIZE/TLENobservedTemplateLENgth0.00%% of reads unmapped:too manymismatchesNumber of reads2702824unmapped:too short10SEQsegmentSEQuence%of reads4.06%sunmapped:too short11QUALASCllofPhred-scaledbNumber of reads unmapped: other163019% of reads unmapped:other0.24%12TAGSTAGSCHIMERIC READS:0Number of chimeric reads>简介A>原始数据处理表达矩阵处理和可礼% of chimeric reads0.00%
简介 原始数据处理 表达矩阵处理和可视化 细胞类型注释 实例分析 原始数据处理——比对(STAR) Started job on | Oct 12 10:07:39 Started mapping on | Oct 12 10:36:00 Finished on | Oct 12 12:21:40 Mapping speed, Million of reads per hour | 37.82 Number of input reads | 66601887 Average input read length | 91 UNIQUE READS: Uniquely mapped reads number | 58334733 Uniquely mapped reads % | 87.59% Average mapped length | 89.27 Number of splices: Total | 9210314 Number of splices: Annotated (sjdb) | 9023795 Number of splices: GT/AG | 9116393 Number of splices: GC/AG | 27095 Number of splices: AT/AC | 1644 Number of splices: Non-canonical | 65182 Mismatch rate per base, % | 0.54% Deletion rate per base | 0.01% Deletion average length | 1.55 Insertion rate per base | 0.02% Insertion average length | 1.38 MULTI-MAPPING READS: Number of reads mapped to multiple loci | 0 % of reads mapped to multiple loci | 0.00% Number of reads mapped to too many loci | 5401311 % of reads mapped to too many loci | 8.11% UNMAPPED READS: Number of reads unmapped: too many mismatches | 0 % of reads unmapped: too many mismatches | 0.00% Number of reads unmapped: too short | 2702824 % of reads unmapped: too short | 4.06% Number of reads unmapped: other | 163019 % of reads unmapped: other | 0.24% CHIMERIC READS: Number of chimeric reads | 0 % of chimeric reads | 0.00% Col Field Brief Description 1 QNAME Query template NAME 2 FLAG bitwise FLAG 3 RNAME References sequence NAME 4 POS 1- based leftmost mapping Position 5 MAPQ Mapping Quality 6 CIGAR CIGAR String 7 MRNM/RNEXT Ref. name of the mate/next read 8 MPOS/NEXT Position of the mate/next read 9 ISIZE/TLEN observed Template LENgth 10 SEQ segment SEQuence 11 QUAL ASCII of Phred-scaled b 12 TAGs TAGs A00228:279:HFWFVDMXX:1:1102:1108:22388_CTCAAGAGTCAAAGAT_TTTTGTCAATAG 0 1 14473 255 85M583N6M * 0 0 GGCTGGGTGGAGCCGTCCCCCCATGGAGCACAGGCAGACAAAAGTCCCCGCCCCAGCTG TGTGGCCTCAAGCCAGCCTGCGCCACTGTGTT FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF,F FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFNH:i:1 HI:i:1 AS:i:79 nM:i:5 • STAR比对结果文件(bam/sam)
原始数据处理一定量(featurecounts)featureCountssettingmInputfiles:I BAMfileo.Aligned.sortedByCoord.out.bamOutput file : gene assignedSummary: geneassigned.summaryAnnotation:Homo_sapiens.GRCh38.i10.gtf(GTF)SUBREADDirfortemp files:-/:<input_file>.featureCounts.bamAssignment details(Note that files are saved to the output directory)Subreadpackage:high-performancereadalignment,quantification举例输出文件格式and mutation discoveryGeneidENSG00000269896Step 1: get dataChr1;1Step2:ldentifycorrectcellbarcodesStep3:ExtractbarcdoesandUMlsandaddtoreadnamesStart2350414:2351644Step4:MapreadsEnd2352820:2351857Step 5: Assign reads to genesStrand-i-Step6:CountUMispergenepercell2407Length8Aligned.sortedByCoord.out.bam>简介>原始数据处理细胞类型注释实例分析表达矩阵处理和可视化
原始数据处理——定量(featurecounts) 简介 原始数据处理 表达矩阵处理和可视化 细胞类型注释 实例分析 Step 1: get data Step 2: Identify correct cell barcodes Step 3: Extract barcdoes and UMIs and add to read names Step 4: Map reads Step 5: Assign reads to genes Step 6: Count UMIs per gene per cell 输出文件格式 举例 Geneid ENSG00000269896 Chr 1;1 Start 2350414;2351644 End 2352820;2351857 Strand -;- Length 2407 Aligned.sortedByCoord.out.bam 8