Differences

This shows you the differences between two versions of the page.

--- contributors:team_3:overlap [2015/04/15 00:06]
jolespin created
+++ contributors:team_3:overlap [2015/07/28 06:00] (current)
ceisenhart ↷ Page moved from overlap to contributors:team_3:overlap
@@ Line 1: / Line 1: @@
 <code>
-$ sga correct --help
+$ sga overlap --help
-Usage: sga correct [OPTION] ... READSFILE
+Usage: sga overlap [OPTION] ... READSFILE
-Correct sequencing errors in all the reads in READSFILE
+Compute pairwise overlap between all the sequences in READS
-        --help                           display this help and exit
+      --help                           display this help and exit
-        -v, --verbose                    display verbose output
+      -v, --verbose                    display verbose output
-        -p, --prefix=PREFIX              use PREFIX for the names of the index files (default: prefix of the input file)
+      -t, --threads=NUM                use NUM worker threads to compute the overlaps (default: no threading)
-        -o, --outfile=FILE               write the corrected reads to FILE (default: READSFILE.ec.fa)
+      -e, --error-rate                 the maximum error rate allowed to consider two sequences aligned (default: exact matches only)
-        -t, --threads=NUM                use NUM threads for the computation (default: 1)
+      -m, --min-overlap=LEN            minimum overlap required between two reads (default: 45)
-             --discard                    detect and discard low-quality reads
+      -p, --prefix=PREFIX              use PREFIX for the names of the index files (default: prefix of the input file)
-        -d, --sample-rate=N              use occurrence array sample rate of N in the FM-index. Higher values use significantly
+      -f, --target-file=FILE           perform the overlap queries against the reads in FILE
-                                                    less memory at the cost of higher runtime. This value must be a power of 2 (default: 128)
+      -x, --exhaustive                 output all overlaps, including transitive edges
-        -a, --algorithm=STR              specify the correction algorithm to use. STR must be one of kmer, hybrid, overlap. (default: kmer)
+          --exact                      force the use of the exact-mode irreducible block algorithm. This is faster
-             --metrics=FILE               collect error correction metrics (error rate by position in read, etc) and write them to FILE
+                                       but requires that no substrings are present in the input set.
+      -l, --seed-length=LEN            force the seed length to be LEN. By default, the seed length in the overlap step
-Kmer correction parameters:
+                                       is calculated to guarantee all overlaps with --error-rate differences are found.
-        -k, --kmer-size=N                The length of the kmer to use. (default: 31)
+                                       This option removes the guarantee but will be (much) faster. As SGA can tolerate some
-        -x, --kmer-threshold=N           Attempt to correct kmers that are seen less than N times. (default: 3)
+                                       missing edges, this option may be preferable for some data sets.
-        -i, --kmer-rounds=N              Perform N rounds of k-mer correction, correcting up to N bases (default: 10)
+      -s, --seed-stride=LEN            force the seed stride to be LEN. This parameter will be ignored unless --seed-length
-             --learn                      Attempt to learn the k-mer correction threshold (experimental). Overrides -x parameter.
+                                       is specified (see above). This parameter defaults to the same value as --seed-length
+      -d, --sample-rate=N              sample the symbol counts every N symbols in the FM-index. Higher values use significantly
-Overlap correction parameters:
+                                       less memory at the cost of higher runtime. This value must be a power of 2 (default: 128)
-        -e, --error-rate                 the maximum error rate allowed between two sequences to consider them overlapped (default: 0.04)
-        -m, --min-overlap=LEN            minimum overlap required between two reads (default: 45)
-        -c, --conflict=INT               use INT as the threshold to detect a conflicted base in the multi-overlap (default: 5)
-        -l, --seed-length=LEN            force the seed length to be LEN. By default, the seed length in the overlap step
-                                                    is calculated to guarantee all overlaps with --error-rate differences are found.
-                                                    This option removes the guarantee but will be (much) faster. As SGA can tolerate some
-                                                    missing edges, this option may be preferable for some data sets.
-        -s, --seed-stride=LEN            force the seed stride to be LEN. This parameter will be ignored unless --seed-length
-                                                    is specified (see above). This parameter defaults to the same value as --seed-length
-        -b, --branch-cutoff=N            stop the overlap search at N branches. This parameter is used to control the search time for
-                                                    highly-repetitive reads. If the number of branches exceeds N, the search stops and the read
-                                                    will not be corrected. This is not enabled by default.
-        -r, --rounds=NUM                 iteratively correct reads up to a maximum of NUM rounds (default: 1)
 Report bugs to js18@sanger.ac.uk
 </code>

Banana Slug Genomics

User Tools

Site Tools

Differences

Page Tools