User Tools

Site Tools


archive:bioinformatic_tools:bwa

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Next revision Both sides next revision
archive:bioinformatic_tools:bwa [2011/05/16 19:27]
svohr
archive:bioinformatic_tools:bwa [2011/05/20 18:53]
svohr
Line 28: Line 28:
 bwa sampe database.fasta aln_sa1.sai aln_sa2.sai read1.fq read2.fq > aln.sam ​ bwa sampe database.fasta aln_sa1.sai aln_sa2.sai read1.fq read2.fq > aln.sam ​
 </​code>​ </​code>​
 +
 +===== Quirks =====
 +The SAM formatted alignments include a column labeled "​inferred insert length"​ by the BWA manual, but in the SAM specification it is described as the "​template length"​ or distance between the leftmost mapped base to the rightmost mapped base. The second description seems to
 +match the output of BWA. However, there are some template lengths that do not appear to be calculated correctly.
 +
 +<​code>​
 +bc07_1.fastq:​
 +@HWUSI-EAS1722:​4:​66:​6286:​18215#​CAGATC/​1
 +AGCAGTCGTCGTGGTATGCCTGGATGTTACAGCAGTCGTCGTGGTATGACTGGATGTTACAGCAGTCGTCGTGGTATGACTGGATGTTACAGCAGTCGTCGTGGTATGACTGGAT
 +
 +bc07_2.fastq:​
 +@HWUSI-EAS1722:​4:​66:​6286:​18215#​CAGATC/​2
 +CACGACGACTGCTGTAACATCCAGGCATACCACGACGACTGCTGTAACATCCAGGCATACCACGACGACAGCTATAACATACACTCATACCACGA
 +</​code>​
 +
 +For example, these two reads make up a pair that overlaps.
 +
 +<​code>​
 +...ACATCCAGTCATACCACGACGACTGCTGTAACATCCAGGCATACCACGACGACTGCT
 +                 ​CACGACGACTGCTGTAACATCCAGGCATACCACGACGACTGCTGTAACATCCAGGCATACCACGACGACAGCTATAACATACACTCATACCACGA
 +</​code>​
 +
 +Instead of reporting the total length, the length of the overlap is reported.
 +<​code>​
 +HWUSI-EAS1722:​4:​66:​6286:​18215#​CAGATC 81 GAZ7HUX03HIJAL 272 23 115M = 344 -43
 +HWUSI-EAS1722:​4:​66:​6286:​18215#​CAGATC 161 GAZ7HUX03HIJAL 344 25 95M = 272 43
 +</​code>​
 +
 +This explains the incorrect short lengths found in our histograms. This does not appear to affect the pairs that do not overlap and most of these overlapping reads that should be combined using SeqPrep.
 +
  
 ====== Determining Paired-End Insert Size ====== ====== Determining Paired-End Insert Size ======
Line 43: Line 73:
 {{:​bioinformatic_tools:​run2_seqprep_template_size_histogram.png|}} {{:​bioinformatic_tools:​run2_seqprep_template_size_histogram.png|}}
  
-Here is the same plot with the 454 length distribution for comparison.+These histograms show the mapped lengths for the paired-end templates and the lengths of merged reads from SeqPrep along with the 454 read length distribution for comparison. In each of these, we can see the distinct range for the SeqPrep merged reads and the split between merged and unmerged pairs. Lengths less than 90 may be incorrect. The higher frequency of these in run 1 can be explained its higher coverage. 
 + 
 +In the merged lengths for both run 1 and run 2 barcode 8 there is a gap of 10 lengths (66-75 for run 1, 105-114 for run 2 bc08) where no reads were observed. This may be an artifact of SeqPrep and the read lengths.
  
-{{:​bioinformatic_tools:​run2_seqprep_template_size_histogram_with_454_length2.png|}}+{{:​bioinformatic_tools:​run1_seqprep_histogram.png|}}
  
 +{{:​bioinformatic_tools:​run2_bc07_seqprep_histogram.png|}}
  
 +{{:​bioinformatic_tools:​run2_bc08_seqprep_histogram.png|}}
archive/bioinformatic_tools/bwa.txt · Last modified: 2015/09/04 09:06 by 68.180.228.52