User Tools

Site Tools


contributors:team_3:correct
Usage: sga correct [OPTION] ... READSFILE
Correct sequencing errors in all the reads in READSFILE

      --help                           display this help and exit
      -v, --verbose                    display verbose output
      -p, --prefix=PREFIX              use PREFIX for the names of the index files (default: prefix of the input file)
      -o, --outfile=FILE               write the corrected reads to FILE (default: READSFILE.ec.fa)
      -t, --threads=NUM                use NUM threads for the computation (default: 1)
          --discard                    detect and discard low-quality reads
      -d, --sample-rate=N              use occurrence array sample rate of N in the FM-index. Higher values use significantly
                                       less memory at the cost of higher runtime. This value must be a power of 2 (default: 128)
      -a, --algorithm=STR              specify the correction algorithm to use. STR must be one of kmer, hybrid, overlap. (default: kmer)
          --metrics=FILE               collect error correction metrics (error rate by position in read, etc) and write them to FILE

Kmer correction parameters:
      -k, --kmer-size=N                The length of the kmer to use. (default: 31)
      -x, --kmer-threshold=N           Attempt to correct kmers that are seen less than N times. (default: 3)
      -i, --kmer-rounds=N              Perform N rounds of k-mer correction, correcting up to N bases (default: 10)
          --learn                      Attempt to learn the k-mer correction threshold (experimental). Overrides -x parameter.

Overlap correction parameters:
      -e, --error-rate                 the maximum error rate allowed between two sequences to consider them overlapped (default: 0.04)
      -m, --min-overlap=LEN            minimum overlap required between two reads (default: 45)
      -c, --conflict=INT               use INT as the threshold to detect a conflicted base in the multi-overlap (default: 5)
      -l, --seed-length=LEN            force the seed length to be LEN. By default, the seed length in the overlap step
                                       is calculated to guarantee all overlaps with --error-rate differences are found.
                                       This option removes the guarantee but will be (much) faster. As SGA can tolerate some
                                       missing edges, this option may be preferable for some data sets.
      -s, --seed-stride=LEN            force the seed stride to be LEN. This parameter will be ignored unless --seed-length
                                       is specified (see above). This parameter defaults to the same value as --seed-length
      -b, --branch-cutoff=N            stop the overlap search at N branches. This parameter is used to control the search time for
                                       highly-repetitive reads. If the number of branches exceeds N, the search stops and the read
                                       will not be corrected. This is not enabled by default.
      -r, --rounds=NUM                 iteratively correct reads up to a maximum of NUM rounds (default: 1)

Report bugs to js18@sanger.ac.uk
You could leave a comment if you were logged in.
contributors/team_3/correct.txt · Last modified: 2015/09/02 16:24 by ceisenhart