User Tools

Site Tools


archive:bioinformatic_tools:bwa

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
archive:bioinformatic_tools:bwa [2015/07/28 06:22]
ceisenhart ↷ Page moved from bioinformatic_tools:bwa to archive:bioinformatic_tools:bwa
archive:bioinformatic_tools:bwa [2015/09/04 09:06] (current)
68.180.228.52 ↷ Links adapted because of a move operation
Line 63: Line 63:
 BWA was used to estimate the distribution of insert sizes in the Illumina runs for banana slug. The 454 reads were used as the reference and the Illumina reads were mapped onto them. The distribution of the insert lengths can be inferred from the pairs that map onto the same 454 read. This is possible because our insert sizes are smaller than the size of the 454 reads. BWA was used to estimate the distribution of insert sizes in the Illumina runs for banana slug. The 454 reads were used as the reference and the Illumina reads were mapped onto them. The distribution of the insert lengths can be inferred from the pairs that map onto the same 454 read. This is possible because our insert sizes are smaller than the size of the 454 reads.
  
-Here is the frequencies of each inferred insert length from the SAM file from the paired end alignments for Illumina run 2. The mean inferred insert size for the barcode 7 reads is 258 bases and 138 bases for the barcode 8 reads. ​ Note that this differs considerably from the estimates of 411 bp for barcode 7 and 372bp for barcode 8 from the [[computer_resources:​data|computer_resources:​data]] page, which was based on bioanalyzer results for the DNA library. ​ What is the discrepancy?​ The difference is that the Bioanalyzer results include the adapters, not just the DNA that is sequenced, so the difference is actually fairly small.+Here is the frequencies of each inferred insert length from the SAM file from the paired end alignments for Illumina run 2. The mean inferred insert size for the barcode 7 reads is 258 bases and 138 bases for the barcode 8 reads. ​ Note that this differs considerably from the estimates of 411 bp for barcode 7 and 372bp for barcode 8 from the [[archive:computer_resources:​data|computer_resources:​data]] page, which was based on bioanalyzer results for the DNA library. ​ What is the discrepancy?​ The difference is that the Bioanalyzer results include the adapters, not just the DNA that is sequenced, so the difference is actually fairly small.
  
 Why does the barcode 8 graph cut off so abruptly? (overlapping reads?​) ​ Why does the barcode 8 graph cut off so abruptly? (overlapping reads?​) ​
Line 72: Line 72:
  
 ===== After SeqPrep ===== ===== After SeqPrep =====
-We ran [[bioinformatic_tools:​seqprep|SeqPrep]] on run 2 to remove the Illumina adapter sequences and merge pairs that overlapped and mapped the remaining pairs to the 454 reference. SeqPrep removed most of the barcode 8 pairs that were mapped previously, but left most of the barcode 7 pairs that previously mapped unchanged.+We ran [[archive:bioinformatic_tools:​seqprep|SeqPrep]] on run 2 to remove the Illumina adapter sequences and merge pairs that overlapped and mapped the remaining pairs to the 454 reference. SeqPrep removed most of the barcode 8 pairs that were mapped previously, but left most of the barcode 7 pairs that previously mapped unchanged.
  
 {{:​bioinformatic_tools:​run2_seqprep_template_size_histogram.png|}} {{:​bioinformatic_tools:​run2_seqprep_template_size_histogram.png|}}
Line 117: Line 117:
  
 ===== After Assembly ===== ===== After Assembly =====
-This process was repeated using the [[bioinformatic_tools:​soapdenovo|SOAPdenovo]] contigs from ''​assemblies/​slug/​SOAPdenovo-assembly2/​k47_w_454_contigs''​ instead of the 454 reads as the reference. The shapes of these distributions follow the same patterns as the ones found using the 454 reads, although there are more mapped pairs because of the higher coverage of the contigs and we see some longer templates sizes than before. For the run1 and run2 barcode 7 pairs, the results involve some self-reference,​ since the previous estimates were used to build the contigs to which we mapped the pairs. However, the pairs for barcode 8 were not used in the assembly because of their negative mean+This process was repeated using the [[archive:bioinformatic_tools:​soapdenovo|SOAPdenovo]] contigs from ''​assemblies/​slug/​SOAPdenovo-assembly2/​k47_w_454_contigs''​ instead of the 454 reads as the reference. The shapes of these distributions follow the same patterns as the ones found using the 454 reads, although there are more mapped pairs because of the higher coverage of the contigs and we see some longer templates sizes than before. For the run1 and run2 barcode 7 pairs, the results involve some self-reference,​ since the previous estimates were used to build the contigs to which we mapped the pairs. However, the pairs for barcode 8 were not used in the assembly because of their negative mean
 insert size (most pairs overlap but not in a way that SeqPrep could merge). When these were mapped to the contigs, we saw the same pattern as before but with more reads that mapped successfully. insert size (most pairs overlap but not in a way that SeqPrep could merge). When these were mapped to the contigs, we saw the same pattern as before but with more reads that mapped successfully.
  
archive/bioinformatic_tools/bwa.1438064550.txt.gz · Last modified: 2015/07/28 06:22 by ceisenhart