Sanger String Graph Assembler

Written by Jared Simpson.
Currently only has a GitHub repository.
Paper: Efficient construction of an assembly string graph using the FM-index [1]

Methods

Uses the Burrows-Wheeler Transform(BWT)/Ferragina—Manzini(FM)-index to build a string graph.

String Graphs

Nodes are reads (reads that are substrings are condensed into superstrings). Edges are overlaps between reads, and the non-overlapping prefix is stored in the forward edge and suffix is stored in the backwards edge.
Condenses repeats like a de Bruijn graph.
More expensive to construct than a de Bruijn graph.

BWT/FM-index

Like BWA, it uses the FM-index, which is a compressed method of inferring the suffix array.
The Burrows Wheeler transform B_X is an array of the last characters in the alphabetically sorted suffix array.
The FM-index (two data structures: 1. C_X(a) be the number of symbols in X that are lexographically lower than the symbol a, 2. Occ_X(a, i) be the number of occurrences of the symbol a in B_X[1, i], the ) allows substring searching and can be extended to construct the string graph.

String Graph Construction with the FM-index

Installation

Installation of SGA from the GitHub is a major pain, because it has so many dependencies. It needs

google-sparsehash (also needed for Abyss)
hoard
bamtools, which is a particular pain to install because it uses
- cmake (newer than the version installed on campusrocks), and cmake does not support installing stuff anywhere other than in the initial directory or root-access-only directories, so we can't do the usual “configure –prefix=/campusdata/BME235”

Once cmake, bamtools, hoard, and google/sparsehash are all installed as best they can be, then the UCSC-install.bash script can be run to install SGA. Actually running SGA is still to be tested!

1. ^a Jared T. Simpson and Richard Durbin. Efficient construction of an assembly string graph using the FM-index. Bioinformatics 2010, 26(12): i367-i373. doi: http://dx.doi.org/10.1093/bioinformatics/btq217

You could leave a comment if you were logged in.

Banana Slug Genomics

Table of Contents

Sanger String Graph Assembler

Methods

String Graphs

BWT/FM-index

String Graph Construction with the FM-index

Installation

Banana Slug Genomics

User Tools

Site Tools

Table of Contents

Sanger String Graph Assembler

Methods

String Graphs

BWT/FM-index

String Graph Construction with the FM-index

Installation

Page Tools