User Tools

Site Tools


archive:bioinformatic_tools:bwa

This is an old revision of the document!


A PCRE internal error occured. This might be caused by a faulty plugin

====== Overview ====== BWA is a tool for aligning short reads to a reference genome using two different algorithms based on the Burrows-Wheeler Transform (BWT). ====== Install ====== The source for BWA can be found on the [[http://bio-bwa.sourceforge.net/| BWA SourceForge page]]. The archive (v0.5.9) was extracted to the ''/campusdata/BME235/programs'' directory, compiled and the executable was copied to ''/campusdata/BME235/bin''. ====== Using BWA ====== The first step in using BWA is building a reference database. <code> bwa index -a is database.fasta </code> Their are two options for the algorithm. The default option, ''is'', is relatively fast and works on genomes smaller than 2GB. The other algorithm, ''bwtsw'', is slower and less accurate but works on longer reads and works with larger databases. Next, the reads are aligned to the reference using the ''aln'' command. <code> bwa aln database.fasta short_read.fastq > aln_sa.sai </code> The reads in the fastq file are aligned against the reference data base and the results are written to standard output in the ''.sai'' format. The 'samse' and 'sampe' commands are used to generate single-end and paired-end alignments in SAM format from the FASTQ and SAI files. <code> bwa samse database.fasta aln_sa.sai short_read.fastq > aln.sam bwa sampe database.fasta aln_sa1.sai aln_sa2.sai read1.fq read2.fq > aln.sam </code> ====== Determining Paired-End Insert Size ====== BWA was used to estimate the distribution of insert sizes in the Illumina runs for banana slug. The 454 reads were used as the reference and the Illumina reads were mapped onto them. The distribution of the insert lengths can be inferred from the pairs that map onto the same 454 read. This is possible because our insert sizes are smaller than the size of the 454 reads. Here is the frequencies of each inferred insert length from the SAM file from the paired end alignments for Illumina run 2. The mean inferred insert size for the barcode 7 reads is 258 bases and 138 bases for the barcode 8 reads. {{:bioinformatic_tools:run2_insert_size_histogram.png|}}

You could leave a comment if you were logged in.
archive/bioinformatic_tools/bwa.1305258399.txt.gz · Last modified: 2011/05/13 03:46 by svohr