User Tools

Site Tools


archive:bioinformatic_tools:sga

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Last revision Both sides next revision
archive:bioinformatic_tools:sga [2011/05/19 20:26]
eyliaw
archive:bioinformatic_tools:sga [2011/06/08 23:50]
karplus [Installation]
Line 1: Line 1:
 ====== Sanger String Graph Assembler ====== ====== Sanger String Graph Assembler ======
-  * Written by Jared Simpson((http://​people.pwf.cam.ac.uk/​js779/​)) +  * Written by [[http://​people.pwf.cam.ac.uk/​js779/​|Jared Simpson]]. 
-  * Currently only has a GitHub((https://​github.com/​jts/​sga)) repository +  * Currently only has a [[https://​github.com/​jts/​sga|GitHub ​repository]]. 
-  * Efficient construction of an assembly string graph using the FM-index [(cite:​string_graph>​http://​bioinformatics.oxfordjournals.org/content/26/12/i367.abstract)]+  * Paper: ​Efficient construction of an assembly string graph using the FM-index [(cite:​string_graph>​Jared T. Simpson and Richard Durbin. Efficient construction of an assembly string graph using the FM-index. Bioinformatics 2010, 26(12): i367-i373. doi: [[http://dx.doi.org/10.1093/bioinformatics/btq217]])] 
 + 
 +===== Methods ====== 
 +  * Uses the Burrows-Wheeler Transform(BWT)/Ferragina—Manzini(FM)-index to build a string graph. 
 + 
 +===== String Graphs ====== 
 +  * Nodes are reads (reads that are substrings are condensed into superstrings).  Edges are overlaps between reads, and the non-overlapping prefix is stored in the forward edge and suffix is stored in the backwards edge. 
 +  * Condenses repeats like a de Bruijn graph. 
 +  * More expensive to construct than a de Bruijn graph. 
 + 
 +===== BWT/​FM-index ===== 
 +  * Like BWA, it uses the FM-index, which is a compressed method of inferring the suffix array. 
 +  * The Burrows Wheeler transform B_X is an array of the last characters in the alphabetically sorted suffix array. 
 +  * The FM-index (two data structures: 1. C_X(a) be the number of symbols in X that are lexographically lower than the symbol a, 2. Occ_X(a, i) be the number of occurrences of the symbol a in B_X[1, i], the ) allows substring searching and can be extended to construct the string graph. 
 +===== String Graph Construction with the FM-index ===== 
 +===== Installation ===== 
 +Installation of SGA from the GitHub is a major pain, because it has so many dependencies. ​ It needs 
 +  * google-sparsehash (also needed for Abyss) 
 +  * hoard 
 +  * bamtools, which is a particular pain to install because it uses 
 +    * cmake (newer than the version installed on campusrocks),​ and cmake does not support installing stuff anywhere other than in the initial directory or root-access-only directories,​ so we can't do the usual "​configure --prefix=/​campusdata/​BME235"​ 
 + 
 +Once cmake, bamtools, hoard, and google/​sparsehash are all installed as best they can be, then the UCSC-install.bash script can be run to install SGA.  Actually running SGA is still to be tested!
archive/bioinformatic_tools/sga.txt · Last modified: 2015/07/28 06:26 by ceisenhart