lecture_notes:05-19-2010
ALLPATHS
Presented by Thomas
ALLPATHS was created to improve reference genomes.
The version described here is optimized for 100 bases (illumina reads).
Does paired-end.
Requires high coverage 40x+ raw read coverage for each library.
A minimum of 2 paired-end libraries: one short and one long
The short separation size must be less than twice the read size.
The distribution of siezes should be as small as possible, with a std dev of < 20%.
Long library insert size should be approximtely 4000 bases long and can have a larger size distribution
Installation
Requires Boost libraries and an up-to-date c compiler
Very long installation, over 2 hours of compilation time.
Download and extract the tarball
autoconf
./configure
make -j8 (parallel compilation)
make install scripts
Pipeline/Modules
All binaries are located in /bin
RunAllpaths3g controls the entire pipeline.
Directories are created for each new job so different assemblies can be compared.
Reference
Data
reads fasta, qual, and pairs files.
May contain many run directories, each representing a particular attempt to assemble the original data using a different set of parameters.
Run
Assemblies
SubDir
OptionsFile
Preparing read data
ploidy file: 1 for haploid, 2 for diploid
Fragment library reads are expected to be oriented towards each other.
Jumping library reads away from each other.
difference in v1 and v2 ALLPATHS
Input:
Output
Some other things
Runtime
Error rate:
Mira
Presented by Michael Cusack
lecture_notes/05-19-2010.txt · Last modified: 2010/05/19 22:14 by hyjkim