Differences

This shows you the differences between two versions of the page.

--- archive:bioinformatic_tools:sga [2011/05/19 20:24]
eyliaw
+++ archive:bioinformatic_tools:sga [2015/07/28 06:26] (current)
ceisenhart ↷ Page moved from bioinformatic_tools:sga to archive:bioinformatic_tools:sga
@@ Line 1: / Line 1: @@
 ====== Sanger String Graph Assembler ======
-  * Written by Jared Simpson((http://people.pwf.cam.ac.uk/js779/)).
+  * Written by [[http://people.pwf.cam.ac.uk/js779/|Jared Simpson]].
-  * Currently only has a GitHub((https://github.com/jts/sga)) repository.
+  * Currently only has a [[https://github.com/jts/sga|GitHub repository]].
-  * Efficient construction of an assembly string graph using the FM-index ((http://bioinformatics.oxfordjournals.org/content/26/12/i367.abstract))
+  * Paper: Efficient construction of an assembly string graph using the FM-index [(cite:string_graph>Jared T. Simpson and Richard Durbin. Efficient construction of an assembly string graph using the FM-index. Bioinformatics 2010, 26(12): i367-i373. doi: [[http://dx.doi.org/10.1093/bioinformatics/btq217]])]
+===== Methods ======
+  * Uses the Burrows-Wheeler Transform(BWT)/Ferragina—Manzini(FM)-index to build a string graph.
+===== String Graphs ======
+  * Nodes are reads (reads that are substrings are condensed into superstrings).  Edges are overlaps between reads, and the non-overlapping prefix is stored in the forward edge and suffix is stored in the backwards edge.
+  * Condenses repeats like a de Bruijn graph.
+  * More expensive to construct than a de Bruijn graph.
+===== BWT/FM-index =====
+  * Like BWA, it uses the FM-index, which is a compressed method of inferring the suffix array.
+  * The Burrows Wheeler transform B_X is an array of the last characters in the alphabetically sorted suffix array.
+  * The FM-index (two data structures: 1. C_X(a) be the number of symbols in X that are lexographically lower than the symbol a, 2. Occ_X(a, i) be the number of occurrences of the symbol a in B_X[1, i], the ) allows substring searching and can be extended to construct the string graph.
+===== String Graph Construction with the FM-index =====
+===== Installation =====
+Installation of SGA from the GitHub is a major pain, because it has so many dependencies.  It needs
+  * google-sparsehash (also needed for Abyss)
+  * hoard
+  * bamtools, which is a particular pain to install because it uses
+    * cmake (newer than the version installed on campusrocks), and cmake does not support installing stuff anywhere other than in the initial directory or root-access-only directories, so we can't do the usual "configure --prefix=/campusdata/BME235"
+Once cmake, bamtools, hoard, and google/sparsehash are all installed as best they can be, then the UCSC-install.bash script can be run to install SGA.  Actually running SGA is still to be tested!

Banana Slug Genomics

User Tools

Site Tools

Differences

Page Tools