User Tools

Site Tools


contributors:team_3:overlap

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
contributors:team_3:overlap [2015/04/15 00:06]
jolespin created
contributors:team_3:overlap [2015/07/28 06:00] (current)
ceisenhart ↷ Page moved from overlap to contributors:team_3:overlap
Line 1: Line 1:
 <​code>​ <​code>​
-$ sga correct ​--help +$ sga overlap ​--help 
-Usage: sga correct ​[OPTION] ... READSFILE +Usage: sga overlap ​[OPTION] ... READSFILE 
-Correct sequencing errors in all the reads in READSFILE+Compute pairwise overlap between ​all the sequences ​in READS
  
-        ​--help ​                          ​display this help and exit +      ​--help ​                          ​display this help and exit 
-        -v, --verbose ​                   display verbose output +      -v, --verbose ​                   display verbose output 
-        -p, --prefix=PREFIX ​             use PREFIX for the names of the index files (default: prefix of the input file) +      -t, --threads=NUM ​               use NUM worker ​threads ​to compute ​the overlaps ​(default: ​no threading
-        -o, --outfile=FILE ​              write the corrected reads to FILE (default: READSFILE.ec.fa) +      -e, --error-rate                 the maximum error rate allowed to consider two sequences aligned ​(default: ​exact matches only
-        ​-t, --threads=NUM ​               use NUM threads ​for the computation ​(default: ​1+      -m, --min-overlap=LEN            minimum ​overlap ​required between two reads (default: ​45
-             ​--discard ​                   detect and discard low-quality reads +      -p, --prefix=PREFIX ​             use PREFIX for the names of the index files (default: ​prefix of the input file
-        -d, --sample-rate=N              use occurrence array sample ​rate of N in the FM-index. Higher values use significantly +      -f, --target-file=FILE           ​perform the overlap queries against the reads in FILE 
-                                                    less memory at the cost of higher runtime. This value must be a power of 2 (default: ​128+      -x, --exhaustive ​                ​output all overlapsincluding transitive edges 
-        -a, --algorithm=STR              specify the correction algorithm to use. STR must be one of kmer, hybrid, ​overlap(default: ​kmer+          --exact                      force the use of the exact-mode irreducible block algorithmThis is faster 
-             ​--metrics=FILE ​              ​collect error correction metrics (error rate by position in readetc) and write them to FILE +                                       but requires that no substrings are present in the input set
- +      -l, --seed-length=LEN ​           force the seed length to be LEN. By default, the seed length in the overlap step 
-Kmer correction parameters:​ +                                       ​is calculated to guarantee all overlaps with --error-rate differences are found. 
-        ​-k, --kmer-size=N                The length ​of the kmer to use. (default: ​31+                                       ​This option removes the guarantee but will be (much) faster. As SGA can tolerate some 
-        -x, --kmer-threshold=          ​Attempt to correct kmers that are seen less than N times. (default: 3) +                                       ​missing edges, this option may be preferable for some data sets. 
-        -i, --kmer-rounds=N ​             Perform N rounds of k-mer correctioncorrecting up to N bases (default: 10) +      -s, --seed-stride=LEN ​           force the seed stride to be LEN. This parameter will be ignored unless --seed-length 
-             ​--learn                      Attempt to learn the k-mer correction threshold (experimental). Overrides -x parameter+                                       ​is specified (see above). This parameter defaults to the same value as --seed-length 
- +      -d, --sample-rate=N              ​sample ​the symbol counts every symbols in the FM-indexHigher values use significantly 
-Overlap correction parameters:​ +                                       less memory at the cost of higher runtime. This value must be power of (default: ​128)
-        -e, --error-rate ​                the maximum error rate allowed between two sequences to consider them overlapped (default: 0.04) +
-        -m, --min-overlap=LEN ​           minimum overlap required between two reads (default: 45) +
-        -c, --conflict=INT ​              use INT as the threshold to detect a conflicted base in the multi-overlap (default: 5) +
-        -l, --seed-length=LEN ​           force the seed length to be LEN. By default, the seed length in the overlap step +
-                                                    is calculated to guarantee all overlaps with --error-rate differences are found. +
-                                                    This option removes the guarantee but will be (much) faster. As SGA can tolerate some +
-                                                    missing edges, this option may be preferable for some data sets. +
-        -s, --seed-stride=LEN ​           force the seed stride to be LEN. This parameter will be ignored unless --seed-length +
-                                                    is specified (see above). This parameter defaults to the same value as --seed-length +
-        -b, --branch-cutoff=N            ​stop ​the overlap search at branches. This parameter is used to control ​the search time for +
-                                                    highly-repetitive readsIf the number ​of branches exceeds N, the search stops and the read +
-                                                    will not be corrected. This is not enabled by default. +
-        -r, --rounds=NUM ​                ​iteratively correct reads up to maximum ​of NUM rounds ​(default: ​1)+
  
 Report bugs to js18@sanger.ac.uk Report bugs to js18@sanger.ac.uk
 </​code>​ </​code>​
contributors/team_3/overlap.1429056376.txt.gz · Last modified: 2015/04/15 00:06 by jolespin