SARUMAN1) is a tools for aligning short reads to a microbial reference genome that computes alignments using GPU programming.
Most of the paper descibes the process for deciding what reads will be aligned on the GPU.
SARUMAN requires an error tolerance e, or an acceptable number of errors when mapping. An index of the reference genome is built using a k-mer size such that each read can be split up into e+1 k-mers with a remainder. By the pigeon-hole principle, if the read is within the error tolerance, it must have 1 k-mer in the reference. A read is required to have two hits in the index table within the expected proximity in order for it to be designated for alignment. The substring to be aligned against is extracted from the reference and enqueued with the read for alignment.
To avoid the cost of transferring data to the GPU, alignments are transfered and aligned in batches. Reads are aligned using the Needleman–Wunsch algorithm. If all the values a column in the alignment surpass the error tolerance, the alignment is stopped. The alignments are then transfered back to the CPU for formatting and output.
SARUMAN performs well on small genomes with times comparable to BWA, but the size of the index makes it impractical for use on the banana slug genome.