User Tools

Site Tools


archive:bioinformatic_tools:seqprep

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
archive:bioinformatic_tools:seqprep [2011/04/08 20:44]
jstjohn
archive:bioinformatic_tools:seqprep [2015/07/28 06:26] (current)
ceisenhart ↷ Page moved from bioinformatic_tools:seqprep to archive:bioinformatic_tools:seqprep
Line 1: Line 1:
 ====== SeqPrep ====== ====== SeqPrep ======
-SeqPrep is a program to merge paired end Illumina reads that are overlapping into a single longer read. It may also just be used for its adapter trimming feature without doing any paired end overlap. When an adapter sequence is present, that means that the two reads must overlap (in most cases) so they are forcefully merged. When reads do not have adapter sequence they must be treated with care when doing the merging, so a much more sensitive approach is taken. The default parameters were chosen with sensitivity ​in mind, so that they could be ran on libraries where very few reads are expected to overlap. It is always safest though to save the overlapping procedure for libraries where you have some prior knowledge that a significant portion of the reads will have some overlap.+SeqPrep is a program to merge paired end Illumina reads that are overlapping into a single longer read. It may also just be used for its adapter trimming feature without doing any paired end overlap. When an adapter sequence is present, that means that the two reads must overlap (in most cases) so they are forcefully merged. When reads do not have adapter sequence they must be treated with care when doing the merging, so several user defined parameters for read merging must be met. The default parameters were chosen with specificity ​in mind, so that they could be ran on libraries where very few reads are expected to overlap. It is always safest though to save the overlapping procedure for libraries where you have some prior knowledge that a significant portion of the reads will have some overlap. ​(Description from SeqPrep README file)
  
 ===== Overview ===== ===== Overview =====
 SeqPrep works by first searching for adapter sequences at the ends of reads. If these adapter sequences are found it trims them off. If read merging is desired it will then force a merge between the two reads because if an adapter is present then the insert size must be less than the total read length. SeqPrep works by first searching for adapter sequences at the ends of reads. If these adapter sequences are found it trims them off. If read merging is desired it will then force a merge between the two reads because if an adapter is present then the insert size must be less than the total read length.
  
-SeqPrep calculates gapless alignments, and doesn'​t ​treats low quality mismatches differently from high quality mismatches.+SeqPrep calculates gapless alignments, and treats low quality mismatches differently from high quality mismatches.
  
 ==== Required Input ==== ==== Required Input ====
Line 11: Line 11:
  
 ==== Output ==== ==== Output ====
--1 specifies the first trimmed and non-merged output read file name +  * -1 specifies the first trimmed and non-merged output read file name 
--2 specifies the second trimmed and non-merged output read file name +  ​* ​-2 specifies the second trimmed and non-merged output read file name 
--s specifies the merged file name (if no merging is desired then do not supply this argument) +  ​* ​-s specifies the merged file name (if no merging is desired then do not supply this argument) 
--E specifies a human readable merged output file showing alignments which will look like this:+  ​* ​-E specifies a human readable merged output file showing alignments which will look like this:
 <​code>​ <​code>​
 ID: ILLUMINA-1898B0_0023_FC:​5:​1:​10763:​1500#​0/​1 ID: ILLUMINA-1898B0_0023_FC:​5:​1:​10763:​1500#​0/​1
Line 35: Line 35:
 </​code>​ </​code>​
  
-where horizontal bars show matches between the two sequences, no bar means a low quality mismatch, and a * means a high quality mismatch.+where horizontal bars show matches between the two sequences, no bar means a low quality mismatch, and a * means a high quality mismatch. You can use this human-readable file to check that the parameters chosen make sense.
  
 +All output will be gzipped.
 ===== Installation ===== ===== Installation =====
 To install SeqPrep download it and run <​code>​make</​code>​. A binary is created which can then be copied to the location of your choice. By default <​code>​make install</​code>​ places the binary in your <​code>​$HOME/​bin</​code>​ folder. To install SeqPrep download it and run <​code>​make</​code>​. A binary is created which can then be copied to the location of your choice. By default <​code>​make install</​code>​ places the binary in your <​code>​$HOME/​bin</​code>​ folder.
archive/bioinformatic_tools/seqprep.1302295460.txt.gz · Last modified: 2011/04/08 20:44 by jstjohn