This shows you the differences between two versions of the page.
Next revision | Previous revision | ||
contributors:team_3:overlap [2015/04/15 00:06] jolespin created |
contributors:team_3:overlap [2015/07/28 06:00] (current) ceisenhart ↷ Page moved from overlap to contributors:team_3:overlap |
||
---|---|---|---|
Line 1: | Line 1: | ||
<code> | <code> | ||
- | $ sga correct --help | + | $ sga overlap --help |
- | Usage: sga correct [OPTION] ... READSFILE | + | Usage: sga overlap [OPTION] ... READSFILE |
- | Correct sequencing errors in all the reads in READSFILE | + | Compute pairwise overlap between all the sequences in READS |
- | --help display this help and exit | + | --help display this help and exit |
- | -v, --verbose display verbose output | + | -v, --verbose display verbose output |
- | -p, --prefix=PREFIX use PREFIX for the names of the index files (default: prefix of the input file) | + | -t, --threads=NUM use NUM worker threads to compute the overlaps (default: no threading) |
- | -o, --outfile=FILE write the corrected reads to FILE (default: READSFILE.ec.fa) | + | -e, --error-rate the maximum error rate allowed to consider two sequences aligned (default: exact matches only) |
- | -t, --threads=NUM use NUM threads for the computation (default: 1) | + | -m, --min-overlap=LEN minimum overlap required between two reads (default: 45) |
- | --discard detect and discard low-quality reads | + | -p, --prefix=PREFIX use PREFIX for the names of the index files (default: prefix of the input file) |
- | -d, --sample-rate=N use occurrence array sample rate of N in the FM-index. Higher values use significantly | + | -f, --target-file=FILE perform the overlap queries against the reads in FILE |
- | less memory at the cost of higher runtime. This value must be a power of 2 (default: 128) | + | -x, --exhaustive output all overlaps, including transitive edges |
- | -a, --algorithm=STR specify the correction algorithm to use. STR must be one of kmer, hybrid, overlap. (default: kmer) | + | --exact force the use of the exact-mode irreducible block algorithm. This is faster |
- | --metrics=FILE collect error correction metrics (error rate by position in read, etc) and write them to FILE | + | but requires that no substrings are present in the input set. |
- | + | -l, --seed-length=LEN force the seed length to be LEN. By default, the seed length in the overlap step | |
- | Kmer correction parameters: | + | is calculated to guarantee all overlaps with --error-rate differences are found. |
- | -k, --kmer-size=N The length of the kmer to use. (default: 31) | + | This option removes the guarantee but will be (much) faster. As SGA can tolerate some |
- | -x, --kmer-threshold=N Attempt to correct kmers that are seen less than N times. (default: 3) | + | missing edges, this option may be preferable for some data sets. |
- | -i, --kmer-rounds=N Perform N rounds of k-mer correction, correcting up to N bases (default: 10) | + | -s, --seed-stride=LEN force the seed stride to be LEN. This parameter will be ignored unless --seed-length |
- | --learn Attempt to learn the k-mer correction threshold (experimental). Overrides -x parameter. | + | is specified (see above). This parameter defaults to the same value as --seed-length |
- | + | -d, --sample-rate=N sample the symbol counts every N symbols in the FM-index. Higher values use significantly | |
- | Overlap correction parameters: | + | less memory at the cost of higher runtime. This value must be a power of 2 (default: 128) |
- | -e, --error-rate the maximum error rate allowed between two sequences to consider them overlapped (default: 0.04) | + | |
- | -m, --min-overlap=LEN minimum overlap required between two reads (default: 45) | + | |
- | -c, --conflict=INT use INT as the threshold to detect a conflicted base in the multi-overlap (default: 5) | + | |
- | -l, --seed-length=LEN force the seed length to be LEN. By default, the seed length in the overlap step | + | |
- | is calculated to guarantee all overlaps with --error-rate differences are found. | + | |
- | This option removes the guarantee but will be (much) faster. As SGA can tolerate some | + | |
- | missing edges, this option may be preferable for some data sets. | + | |
- | -s, --seed-stride=LEN force the seed stride to be LEN. This parameter will be ignored unless --seed-length | + | |
- | is specified (see above). This parameter defaults to the same value as --seed-length | + | |
- | -b, --branch-cutoff=N stop the overlap search at N branches. This parameter is used to control the search time for | + | |
- | highly-repetitive reads. If the number of branches exceeds N, the search stops and the read | + | |
- | will not be corrected. This is not enabled by default. | + | |
- | -r, --rounds=NUM iteratively correct reads up to a maximum of NUM rounds (default: 1) | + | |
Report bugs to js18@sanger.ac.uk | Report bugs to js18@sanger.ac.uk | ||
</code> | </code> |