Differences

This shows you the differences between two versions of the page.

--- archive:bioinformatic_tools:bwa [2011/05/20 18:53]
svohr
+++ archive:bioinformatic_tools:bwa [2011/06/08 18:22]
svohr
@@ Line 29: / Line 29: @@
 </code>
+Note that BWA does seem to accept gzipped files, so there is no need to ungzip the read files, though the documentation doesn't mention this.
 ===== Quirks =====
 The SAM formatted alignments include a column labeled "inferred insert length" by the BWA manual, but in the SAM specification it is described as the "template length" or distance between the leftmost mapped base to the rightmost mapped base. The second description seems to
@@ Line 73: / Line 74: @@
 {{:bioinformatic_tools:run2_seqprep_template_size_histogram.png|}}
-These histograms show the mapped lengths for the paired-end templates and the lengths of merged reads from SeqPrep along with the 454 read length distribution for comparison. In each of these, we can see the distinct range for the SeqPrep merged reads and the split between merged and unmerged pairs. Lengths less than 90 may be incorrect. The higher frequency of these in run 1 can be explained its higher coverage.
-In the merged lengths for both run 1 and run 2 barcode 8 there is a gap of 10 lengths (66-75 for run 1, 105-114 for run 2 bc08) where no reads were observed. This may be an artifact of SeqPrep and the read lengths.
+The next histogram shows the 454 length distribution with the SeqPrep merged read lengths and the mapped lengths for the paired-end templates. From this we can see that the lengths of the 454 reads are greater than the illumina templates and the short templates lengths are not due to the lengths of the sequences in the 454 reference.
-{{:bioinformatic_tools:run1_seqprep_histogram.png|}}
+{{:bioinformatic_tools:run_all_illumina_v_454_histogram.png|}}
+These histograms show the mapped lengths for the paired-end templates and the lengths of merged reads from SeqPrep. In each of these, we can see the distinct range for the SeqPrep merged reads and the split between merged and unmerged pairs. The counts for the paired-end reads are lower than the counts for merged reads because the paired-end reads had to map to one of the 454 reads. We can see that if the paired-end counts were increased by a factor of 10 (for the 0.1x coverage in the 454 reads) the merged and paired-end reads would form a continuous distribution. Lengths less than 90 may be incorrect. The higher frequency of these in run 1 can be explained its higher coverage.
+In the merged lengths for both run 1 and run 2 barcode 8 there is a gap of 10 lengths (66-75 for run 1, 105-114 for run 2 bc08) where no reads were observed. This is an artifact of SeqPrep's two methods for merging reads and the 10 base overlap requirement for merging.
+{{:bioinformatic_tools:run1_seqprep_histogram_r2.png|}}
+{{:bioinformatic_tools:run2_bc07_seqprep_histogram_r2.png|}}
+{{:bioinformatic_tools:run2_bc08_seqprep_histogram_r2.png|}}
+===== After Quake =====
+This process was repeat on the data after Quake correction. In addition to the paired reads and merge reads, Quake separates the reads whose pair could not be successfully corrected.
+{{:bioinformatic_tools:run1_quake_histogram.png|}}
+{{:bioinformatic_tools:run2_bc07_quake_histogram.png|}}
+{{:bioinformatic_tools:run2_bc08_quake_histogram.png|}}
+==== Mean Lengths ====
+Run 1
+| Template | 151.7 |
+| Merged   | 113.3 |
+| Single   | 65.7  |
+Run 2 bc07
+| Template | 250.3 |
+| Merged   | 142.5 |
+| Single   |  88.8 |
+Run 2 bc08
+| Template |  94.5 |
+| Merged   | 104.2 |
+| Single   |  61.3 |
+===== After Assembly =====
+This process was repeated using the [[bioinformatic_tools:soapdenovo|SOAPdenovo]] contigs from ''assemblies/slug/SOAPdenovo-assembly2/k47_w_454_contigs'' instead of the 454 reads as the reference. The shapes of these distributions follow the same patterns as the ones found using the 454 reads, although there are more mapped pairs because of the higher coverage of the contigs and we see some longer templates sizes than before. For the run1 and run2 barcode 7 pairs, the results involve some self-reference, since the previous estimates were used to build the contigs to which we mapped the pairs. However, the pairs for barcode 8 were not used in the assembly because of their negative mean
+insert size (most pairs overlap but not in a way that SeqPrep could merge). When these were mapped to the contigs, we saw the same pattern as before but with more reads that mapped successfully.
+{{:bioinformatic_tools:run1_assembly_histogram.png|}}
+{{:bioinformatic_tools:run2_bc07_assembly_histogram.png|}}
+{{:bioinformatic_tools:run2_bc08_assembly_histogram.png|}}
-{{:bioinformatic_tools:run2_bc07_seqprep_histogram.png|}}
-{{:bioinformatic_tools:run2_bc08_seqprep_histogram.png|}}

Banana Slug Genomics

User Tools

Site Tools

Differences

Page Tools