User Tools

Site Tools


archive:bioinformatic_tools:seqprep

SeqPrep

SeqPrep is a program to merge paired end Illumina reads that are overlapping into a single longer read. It may also just be used for its adapter trimming feature without doing any paired end overlap. When an adapter sequence is present, that means that the two reads must overlap (in most cases) so they are forcefully merged. When reads do not have adapter sequence they must be treated with care when doing the merging, so several user defined parameters for read merging must be met. The default parameters were chosen with specificity in mind, so that they could be ran on libraries where very few reads are expected to overlap. It is always safest though to save the overlapping procedure for libraries where you have some prior knowledge that a significant portion of the reads will have some overlap. (Description from SeqPrep README file)

Overview

SeqPrep works by first searching for adapter sequences at the ends of reads. If these adapter sequences are found it trims them off. If read merging is desired it will then force a merge between the two reads because if an adapter is present then the insert size must be less than the total read length.

SeqPrep calculates gapless alignments, and treats low quality mismatches differently from high quality mismatches.

Required Input

fastq reads in phred+33 or phred+64 format. If phred+64 format is supplied the argument -6 must be supplied and the output is converted into phred+33 format.

Output

  • -1 specifies the first trimmed and non-merged output read file name
  • -2 specifies the second trimmed and non-merged output read file name
  • -s specifies the merged file name (if no merging is desired then do not supply this argument)
  • -E specifies a human readable merged output file showing alignments which will look like this:
ID: ILLUMINA-1898B0_0023_FC:5:1:10763:1500#0/1
SUBJ: CTTCTGAAAATAGTGTAACTCAGGGGGAAGAGGGAGGTGACCCAGTCCATCAGCCCTGGCGCTGACTTGGGATTGGGATAGGGTGAGCTGGAAAAATAGC
              |||||||| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||        
QUER:         AATAGTGTGACTCAGGGGGAAGAGGGAGGTGACCCAGTCCATCAGCCCTGGCGCTGACTTGGGATTGGGATAGGGTGAGCTGGAAAAATAGCCATCATTG
MERG: CTTCTGAAAATAGTGTAACTCAGGGGGAAGAGGGAGGTGACCCAGTCCATCAGCCCTGGCGCTGACTTGGGATTGGGATAGGGTGAGCTGGAAAAATAGCCATCATTG

ID: ILLUMINA-1898B0_0023_FC:5:1:13576:1501#0/1
SUBJ: CTAACACTAAGGTACATGCACTGAGAGTCTAGGTAATAAGTGATGACACCGATTAAACTCTCTGAGACCAG
      |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
QUER: CTAACACTAAGGTACATGCACTGAGAGTCTAGGTAATAAGTGATGACACCGATTAAACTCTCTGAGACCAG
MERG: CTAACACTAAGGTACATGCACTGAGAGTCTAGGTAATAAGTGATGACACCGATTAAACTCTCTGAGACCAG

ID: ILLUMINA-1898B0_0023_FC:5:1:16440:1505#0/1
SUBJ: CTAAGTGCTTTTATTTTCCACCTGCAGAAAAGTTAACAAAGGAAAAAGTAAAGTGAAACATCTTCAAAAAGAAGAGCAAGTAATCCACATAAAAATGCAT
                                                                                               ||||||||  |                                                                                        
QUER:                                                                                         TTAAAAATGATTGGCCATTAGTTCTACAAAAGAAGTTATTTATAAATTGTCTATTTATTGGATAGTGAGAATAATGGTTCTGCAACCTGTTCAACATCGA
MERG: CTAAGTGCTTTTATTTTCCACCTGCAGAAAAGTTAACAAAGGAAAAAGTAAAGTGAAACATCTTCAAAAAGAAGAGCAAGTAATCCACATAAAAATGCATGGCCATTAGTTCTACAAAAGAAGTTATTTATAAATTGTCTATTTATTGGATAGTGAGAATAATGGTTCTGCAACCTGTTCAACATCGA

where horizontal bars show matches between the two sequences, no bar means a low quality mismatch, and a * means a high quality mismatch. You can use this human-readable file to check that the parameters chosen make sense.

All output will be gzipped.

Installation

To install SeqPrep download it and run

make

. A binary is created which can then be copied to the location of your choice. By default

make install

places the binary in your

$HOME/bin

folder.

Example Run

SeqPrep -f Jan2011_33/s_1_1.fastq.gz -r Jan2011_33/s_1_2.fastq.gz -1 Jan2011_Merged/s_1_1.fastq.gz -2 Jan2011_Merged/s_1_2.fastq.gz -s Jan2011_Merged/s_1_S.fastq.gz -E Jan2011_Merged/s_1_S.align.gz
You could leave a comment if you were logged in.
archive/bioinformatic_tools/seqprep.txt · Last modified: 2015/07/28 06:26 by ceisenhart