This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
archive:bioinformatic_tools:sga [2011/05/19 20:24] eyliaw |
archive:bioinformatic_tools:sga [2015/07/28 06:26] (current) ceisenhart ↷ Page moved from bioinformatic_tools:sga to archive:bioinformatic_tools:sga |
||
---|---|---|---|
Line 1: | Line 1: | ||
====== Sanger String Graph Assembler ====== | ====== Sanger String Graph Assembler ====== | ||
- | * Written by Jared Simpson((http://people.pwf.cam.ac.uk/js779/)). | + | * Written by [[http://people.pwf.cam.ac.uk/js779/|Jared Simpson]]. |
- | * Currently only has a GitHub((https://github.com/jts/sga)) repository. | + | * Currently only has a [[https://github.com/jts/sga|GitHub repository]]. |
- | * Efficient construction of an assembly string graph using the FM-index ((http://bioinformatics.oxfordjournals.org/content/26/12/i367.abstract)) | + | * Paper: Efficient construction of an assembly string graph using the FM-index [(cite:string_graph>Jared T. Simpson and Richard Durbin. Efficient construction of an assembly string graph using the FM-index. Bioinformatics 2010, 26(12): i367-i373. doi: [[http://dx.doi.org/10.1093/bioinformatics/btq217]])] |
+ | |||
+ | ===== Methods ====== | ||
+ | * Uses the Burrows-Wheeler Transform(BWT)/Ferragina—Manzini(FM)-index to build a string graph. | ||
+ | |||
+ | ===== String Graphs ====== | ||
+ | * Nodes are reads (reads that are substrings are condensed into superstrings). Edges are overlaps between reads, and the non-overlapping prefix is stored in the forward edge and suffix is stored in the backwards edge. | ||
+ | * Condenses repeats like a de Bruijn graph. | ||
+ | * More expensive to construct than a de Bruijn graph. | ||
+ | |||
+ | ===== BWT/FM-index ===== | ||
+ | * Like BWA, it uses the FM-index, which is a compressed method of inferring the suffix array. | ||
+ | * The Burrows Wheeler transform B_X is an array of the last characters in the alphabetically sorted suffix array. | ||
+ | * The FM-index (two data structures: 1. C_X(a) be the number of symbols in X that are lexographically lower than the symbol a, 2. Occ_X(a, i) be the number of occurrences of the symbol a in B_X[1, i], the ) allows substring searching and can be extended to construct the string graph. | ||
+ | ===== String Graph Construction with the FM-index ===== | ||
+ | ===== Installation ===== | ||
+ | Installation of SGA from the GitHub is a major pain, because it has so many dependencies. It needs | ||
+ | * google-sparsehash (also needed for Abyss) | ||
+ | * hoard | ||
+ | * bamtools, which is a particular pain to install because it uses | ||
+ | * cmake (newer than the version installed on campusrocks), and cmake does not support installing stuff anywhere other than in the initial directory or root-access-only directories, so we can't do the usual "configure --prefix=/campusdata/BME235" | ||
+ | |||
+ | Once cmake, bamtools, hoard, and google/sparsehash are all installed as best they can be, then the UCSC-install.bash script can be run to install SGA. Actually running SGA is still to be tested! |