User Tools

Site Tools


archive:bioinformatic_tools:gs_de_novo_assembler

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
archive:bioinformatic_tools:gs_de_novo_assembler [2010/04/24 15:36]
karplus replaced -noace by -nobig and added explanations
archive:bioinformatic_tools:gs_de_novo_assembler [2015/07/28 06:23] (current)
ceisenhart ↷ Page moved from bioinformatic_tools:gs_de_novo_assembler to archive:bioinformatic_tools:gs_de_novo_assembler
Line 49: Line 49:
 addRun . /​campusdata/​BME235/​data/​Pog/​454_run/​sff/​FUIPDCZ01.sff addRun . /​campusdata/​BME235/​data/​Pog/​454_run/​sff/​FUIPDCZ01.sff
 addRun . /​campusdata/​BME235/​data/​Pog/​454_run/​sff/​FUIPDCZ02.sff addRun . /​campusdata/​BME235/​data/​Pog/​454_run/​sff/​FUIPDCZ02.sff
-runProject -e 50 -nobig .+runProject -e 50 -nobig ​-rst 0 .
 </​code>​ </​code>​
 Of course, different sff files will be used on different runs. Of course, different sff files will be used on different runs.
Line 56: Line 56:
  
 The -nobig parameter suppresses the generation of big output files. The -nobig parameter suppresses the generation of big output files.
 +
 +The -rst 0 parameter (repeat score threshold) says that a read should be labeled uniquely mapped if its best hit scores >0 more than the next best (the default value is 12, which means that a lot of hits get labeled as repeats, even though they can distinguish between similar repeat regions).
  
 A Makefile that illustrates the use of the SunGrid to avoid running on the head node is shown in /​campusdata/​BME235/​assemblies/​Pog/​newbler-assembly2/​Makefile A Makefile that illustrates the use of the SunGrid to avoid running on the head node is shown in /​campusdata/​BME235/​assemblies/​Pog/​newbler-assembly2/​Makefile
  
 Note: earlier versions of Newbler provided serially numbered contigs, but version 2.3 seems to skip numbers rather arbitrarily,​ so that the range of the numbers is larger than the size of the set of contigs. ​ Look at the counts (in assembly/​454NewblerMetrics.txt) or run a program to count the contigs, rather than relying on the largest contig number. Note: earlier versions of Newbler provided serially numbered contigs, but version 2.3 seems to skip numbers rather arbitrarily,​ so that the range of the numbers is larger than the size of the set of contigs. ​ Look at the counts (in assembly/​454NewblerMetrics.txt) or run a program to count the contigs, rather than relying on the largest contig number.
 +
 +For large genomes you can pass the -large argument to runProject and it will take some time-saving shortcuts.
 +<​code>​
 +        ${BIN}/​newAssembly .
 +        ${BIN}/​addRun . ${SFFS_IN}
 +        ${BIN}/​addRun . ${FA_IN}
 +        ${BIN}/​runProject -e ${EXPECTED_COVERAGE} -large -rst 0 -noace .
 +</​code>​
  
 == Mapping to existing genome == == Mapping to existing genome ==
Line 71: Line 81:
 addRun . /​campusdata/​BME235/​data/​Pog/​454_run/​sff/​FUIPDCZ01.sff addRun . /​campusdata/​BME235/​data/​Pog/​454_run/​sff/​FUIPDCZ01.sff
 addRun . /​campusdata/​BME235/​data/​Pog/​454_run/​sff/​FUIPDCZ02.sff addRun . /​campusdata/​BME235/​data/​Pog/​454_run/​sff/​FUIPDCZ02.sff
-runProject -e 50 -nobig .+runProject -e 50 -nobig ​-rst 0 .
 </​code>​ </​code>​
  
archive/bioinformatic_tools/gs_de_novo_assembler.1272123400.txt.gz · Last modified: 2010/04/24 15:36 by karplus