Usage: sga correct [OPTION] ... READSFILE
Correct sequencing errors in all the reads in READSFILE
--help display this help and exit
-v, --verbose display verbose output
-p, --prefix=PREFIX use PREFIX for the names of the index files (default: prefix of the input file)
-o, --outfile=FILE write the corrected reads to FILE (default: READSFILE.ec.fa)
-t, --threads=NUM use NUM threads for the computation (default: 1)
--discard detect and discard low-quality reads
-d, --sample-rate=N use occurrence array sample rate of N in the FM-index. Higher values use significantly
less memory at the cost of higher runtime. This value must be a power of 2 (default: 128)
-a, --algorithm=STR specify the correction algorithm to use. STR must be one of kmer, hybrid, overlap. (default: kmer)
--metrics=FILE collect error correction metrics (error rate by position in read, etc) and write them to FILE
Kmer correction parameters:
-k, --kmer-size=N The length of the kmer to use. (default: 31)
-x, --kmer-threshold=N Attempt to correct kmers that are seen less than N times. (default: 3)
-i, --kmer-rounds=N Perform N rounds of k-mer correction, correcting up to N bases (default: 10)
--learn Attempt to learn the k-mer correction threshold (experimental). Overrides -x parameter.
Overlap correction parameters:
-e, --error-rate the maximum error rate allowed between two sequences to consider them overlapped (default: 0.04)
-m, --min-overlap=LEN minimum overlap required between two reads (default: 45)
-c, --conflict=INT use INT as the threshold to detect a conflicted base in the multi-overlap (default: 5)
-l, --seed-length=LEN force the seed length to be LEN. By default, the seed length in the overlap step
is calculated to guarantee all overlaps with --error-rate differences are found.
This option removes the guarantee but will be (much) faster. As SGA can tolerate some
missing edges, this option may be preferable for some data sets.
-s, --seed-stride=LEN force the seed stride to be LEN. This parameter will be ignored unless --seed-length
is specified (see above). This parameter defaults to the same value as --seed-length
-b, --branch-cutoff=N stop the overlap search at N branches. This parameter is used to control the search time for
highly-repetitive reads. If the number of branches exceeds N, the search stops and the read
will not be corrected. This is not enabled by default.
-r, --rounds=NUM iteratively correct reads up to a maximum of NUM rounds (default: 1)
Report bugs to js18@sanger.ac.uk