SeqPrep is a program to merge paired end Illumina reads that are overlapping into a single longer read. It may also just be used for its adapter trimming feature without doing any paired end overlap. When an adapter sequence is present, that means that the two reads must overlap (in most cases) so they are forcefully merged. When reads do not have adapter sequence they must be treated with care when doing the merging, so several user defined parameters for read merging must be met. The default parameters were chosen with specificity in mind, so that they could be ran on libraries where very few reads are expected to overlap. It is always safest though to save the overlapping procedure for libraries where you have some prior knowledge that a significant portion of the reads will have some overlap. (Description from SeqPrep README file)
SeqPrep works by first searching for adapter sequences at the ends of reads. If these adapter sequences are found it trims them off. If read merging is desired it will then force a merge between the two reads because if an adapter is present then the insert size must be less than the total read length.
SeqPrep calculates gapless alignments, and treats low quality mismatches differently from high quality mismatches.
fastq reads in phred+33 or phred+64 format. If phred+64 format is supplied the argument -6 must be supplied and the output is converted into phred+33 format.
ID: ILLUMINA-1898B0_0023_FC:5:1:10763:1500#0/1 SUBJ: CTTCTGAAAATAGTGTAACTCAGGGGGAAGAGGGAGGTGACCCAGTCCATCAGCCCTGGCGCTGACTTGGGATTGGGATAGGGTGAGCTGGAAAAATAGC |||||||| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| QUER: AATAGTGTGACTCAGGGGGAAGAGGGAGGTGACCCAGTCCATCAGCCCTGGCGCTGACTTGGGATTGGGATAGGGTGAGCTGGAAAAATAGCCATCATTG MERG: CTTCTGAAAATAGTGTAACTCAGGGGGAAGAGGGAGGTGACCCAGTCCATCAGCCCTGGCGCTGACTTGGGATTGGGATAGGGTGAGCTGGAAAAATAGCCATCATTG ID: ILLUMINA-1898B0_0023_FC:5:1:13576:1501#0/1 SUBJ: CTAACACTAAGGTACATGCACTGAGAGTCTAGGTAATAAGTGATGACACCGATTAAACTCTCTGAGACCAG ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| QUER: CTAACACTAAGGTACATGCACTGAGAGTCTAGGTAATAAGTGATGACACCGATTAAACTCTCTGAGACCAG MERG: CTAACACTAAGGTACATGCACTGAGAGTCTAGGTAATAAGTGATGACACCGATTAAACTCTCTGAGACCAG ID: ILLUMINA-1898B0_0023_FC:5:1:16440:1505#0/1 SUBJ: CTAAGTGCTTTTATTTTCCACCTGCAGAAAAGTTAACAAAGGAAAAAGTAAAGTGAAACATCTTCAAAAAGAAGAGCAAGTAATCCACATAAAAATGCAT |||||||| | QUER: TTAAAAATGATTGGCCATTAGTTCTACAAAAGAAGTTATTTATAAATTGTCTATTTATTGGATAGTGAGAATAATGGTTCTGCAACCTGTTCAACATCGA MERG: CTAAGTGCTTTTATTTTCCACCTGCAGAAAAGTTAACAAAGGAAAAAGTAAAGTGAAACATCTTCAAAAAGAAGAGCAAGTAATCCACATAAAAATGCATGGCCATTAGTTCTACAAAAGAAGTTATTTATAAATTGTCTATTTATTGGATAGTGAGAATAATGGTTCTGCAACCTGTTCAACATCGA
where horizontal bars show matches between the two sequences, no bar means a low quality mismatch, and a * means a high quality mismatch. You can use this human-readable file to check that the parameters chosen make sense.
All output will be gzipped.
To install SeqPrep download it and run
make
. A binary is created which can then be copied to the location of your choice. By default
make install
places the binary in your
$HOME/bin
folder.
SeqPrep -f Jan2011_33/s_1_1.fastq.gz -r Jan2011_33/s_1_2.fastq.gz -1 Jan2011_Merged/s_1_1.fastq.gz -2 Jan2011_Merged/s_1_2.fastq.gz -s Jan2011_Merged/s_1_S.fastq.gz -E Jan2011_Merged/s_1_S.align.gz