User Tools

Site Tools


archive:bioinformatic_tools:sga

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
archive:bioinformatic_tools:sga [2011/05/19 20:30]
eyliaw
archive:bioinformatic_tools:sga [2015/07/28 06:26] (current)
ceisenhart ↷ Page moved from bioinformatic_tools:sga to archive:bioinformatic_tools:sga
Line 2: Line 2:
   * Written by [[http://​people.pwf.cam.ac.uk/​js779/​|Jared Simpson]].   * Written by [[http://​people.pwf.cam.ac.uk/​js779/​|Jared Simpson]].
   * Currently only has a [[https://​github.com/​jts/​sga|GitHub repository]].   * Currently only has a [[https://​github.com/​jts/​sga|GitHub repository]].
-  * Efficient construction of an assembly string graph using the FM-index [(cite:​string_graph>​Jared T. Simpson and Richard Durbin. Efficient construction of an assembly string graph using the FM-index. Bioinformatics 2010, 26;12: i367-i373. doi: [[http://​dx.doi.org/​10.1093/​bioinformatics/​btq217]])]+  * Paper: ​Efficient construction of an assembly string graph using the FM-index [(cite:​string_graph>​Jared T. Simpson and Richard Durbin. Efficient construction of an assembly string graph using the FM-index. Bioinformatics 2010, 26(12): i367-i373. doi: [[http://​dx.doi.org/​10.1093/​bioinformatics/​btq217]])]
  
- http://bioinformatics.oxfordjournals.org/content/26/12/i367.abstract)]+===== Methods ====== 
 +  * Uses the Burrows-Wheeler Transform(BWT)/Ferragina—Manzini(FM)-index to build a string graph. 
 + 
 +===== String Graphs ====== 
 +  * Nodes are reads (reads that are substrings are condensed into superstrings). ​ Edges are overlaps between reads, and the non-overlapping prefix is stored in the forward edge and suffix is stored in the backwards edge. 
 +  * Condenses repeats like a de Bruijn graph. 
 +  * More expensive to construct than a de Bruijn graph. 
 + 
 +===== BWT/FM-index ===== 
 +  * Like BWA, it uses the FM-index, which is a compressed method of inferring the suffix array. 
 +  * The Burrows Wheeler transform B_X is an array of the last characters in the alphabetically sorted suffix array. 
 +  * The FM-index (two data structures: 1. C_X(a) be the number of symbols in X that are lexographically lower than the symbol a, 2. Occ_X(a, i) be the number of occurrences of the symbol a in B_X[1, i], the ) allows substring searching and can be extended to construct the string graph. 
 +===== String Graph Construction with the FM-index ===== 
 +===== Installation ===== 
 +Installation of SGA from the GitHub is a major pain, because it has so many dependencies. ​ It needs 
 +  * google-sparsehash (also needed for Abyss) 
 +  * hoard 
 +  * bamtools, which is a particular pain to install because it uses 
 +    * cmake (newer than the version installed on campusrocks),​ and cmake does not support installing stuff anywhere other than in the initial directory or root-access-only directories,​ so we can't do the usual "​configure --prefix=/campusdata/BME235"​ 
 + 
 +Once cmake, bamtools, hoard, and google/sparsehash are all installed as best they can be, then the UCSC-install.bash script can be run to install SGA.  Actually running SGA is still to be tested!
archive/bioinformatic_tools/sga.1305837026.txt.gz · Last modified: 2011/05/19 20:30 by eyliaw