User Tools

Site Tools


contributors:team_3:preprocess
$ sga preprocess --help
Usage: sga preprocess [OPTION] READS1 READS2 ...
Prepare READS1, READS2, ... data files for assembly
If pe-mode is turned on (pe-mode=1) then if a read is discarded its pair will be discarded as well.

        --help                           display this help and exit
        -v, --verbose                    display verbose output
             --seed                       set random seed

Input/Output options:
        -o, --out=FILE                   write the reads to FILE (default: stdout)
        -p, --pe-mode=INT                0 - do not treat reads as paired (default)
                                         1 - reads are paired with the first read in READS1 and the second
                                                    read in READS2. The paired reads will be interleaved in the output file
                                         2 - reads are paired and the records are interleaved within a single file.
             --pe-orphans=FILE            if one half of a read pair fails filtering, write the passed half to FILE

Conversions/Filtering:
             --phred64                    convert quality values from phred-64 to phred-33.
             --discard-quality            do not output quality scores
        -q, --quality-trim=INT           perform Heng Li's BWA quality trim algorithm. 
                                                    Reads are trimmed according to the formula:
                                                    argmax_x{\sum_{i=x+1}^l(INT-q_i)} if q_l<INT
                                                    where l is the original read length.
        -f, --quality-filter=INT         discard the read if it contains more than INT low-quality bases.
                                                    Bases with phred score <= 3 are considered low quality. Default: no filtering.
                                                    The filtering is applied after trimming so bases removed are not counted.
                                                    Do not use this option if you are planning to use the BCR algorithm for indexing.
        -m, --min-length=INT             discard sequences that are shorter than INT
                                                    this is most useful when used in conjunction with --quality-trim. Default: 40
        -h, --hard-clip=INT              clip all reads to be length INT. In most cases it is better to use
                                                    the soft clip (quality-trim) option.
        --permute-ambiguous              Randomly change ambiguous base calls to one of possible bases.
                                                    If this option is not specified, the entire read will be discarded.
        -s, --sample=FLOAT               Randomly sample reads or pairs with acceptance probability FLOAT.
        --dust                           Perform dust-style filtering of low complexity reads.
        --dust-threshold=FLOAT           filter out reads that have a dust score higher than FLOAT (default: 4.0).
        --suffix=SUFFIX                  append SUFFIX to each read ID

Adapter/Primer checks:
             --no-primer-check            disable the default check for primer sequences
        -r, --remove-adapter-fwd=STRING
        -c, --remove-adapter-rev=STRING  Remove the adapter STRING from input reads.

Report bugs to js18@sanger.ac.uk
You could leave a comment if you were logged in.
contributors/team_3/preprocess.txt · Last modified: 2015/07/28 06:00 by ceisenhart