$ sga filter --help
Usage: sga filter [OPTION] ... READSFILE
Remove reads from a data set.
The currently available filters are removing exact-match duplicates
and removing reads with low-frequency k-mers.
Automatically rebuilds the FM-index without the discarded reads.
--help display this help and exit
-v, --verbose display verbose output
-p, --prefix=PREFIX use PREFIX for the names of the index files (default: prefix of the input file)
-o, --outfile=FILE write the qc-passed reads to FILE (default: READSFILE.filter.pass.fa)
-t, --threads=NUM use NUM threads to compute the overlaps (default: 1)
-d, --sample-rate=N use occurrence array sample rate of N in the FM-index. Higher values use significantly
less memory at the cost of higher runtime. This value must be a power of 2 (default: 128)
--no-duplicate-check turn off duplicate removal
--substring-only when removing duplicates, only remove substring sequences, not full-length matches
--no-kmer-check turn off the kmer check
--kmer-both-strand mimimum kmer coverage is required for both strand
--homopolymer-check check reads for hompolymer run length sequencing errors
--low-complexity-check filter out low complexity reads
K-mer filter options:
-k, --kmer-size=N The length of the kmer to use. (default: 27)
-x, --kmer-threshold=N Require at least N kmer coverage for each kmer in a read. (default: 3)
Report bugs to js18@sanger.ac.uk
Discussion