User Tools

Site Tools


archive:bioinformatic_tools:gs_de_novo_assembler

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
archive:bioinformatic_tools:gs_de_novo_assembler [2010/04/16 22:23]
karplus removed extraneous signature
archive:bioinformatic_tools:gs_de_novo_assembler [2015/07/28 06:23]
ceisenhart ↷ Page moved from bioinformatic_tools:gs_de_novo_assembler to archive:bioinformatic_tools:gs_de_novo_assembler
Line 44: Line 44:
 == De novo assembly == == De novo assembly ==
  
-The standard ​commands ​for de novo assembly ​are to create a new directory, and in that directory create a Makefile that includes a target to execute the following commands:+The standard ​approach ​for de novo assembly ​is to create a new directory, and in that directory create a Makefile that includes a target to execute the following commands:
 <​code>​ <​code>​
 newAssembly . newAssembly .
 addRun . /​campusdata/​BME235/​data/​Pog/​454_run/​sff/​FUIPDCZ01.sff addRun . /​campusdata/​BME235/​data/​Pog/​454_run/​sff/​FUIPDCZ01.sff
 addRun . /​campusdata/​BME235/​data/​Pog/​454_run/​sff/​FUIPDCZ02.sff addRun . /​campusdata/​BME235/​data/​Pog/​454_run/​sff/​FUIPDCZ02.sff
-runProject -e 50 .+runProject -e 50 -nobig -rst 0 .
 </​code>​ </​code>​
 Of course, different sff files will be used on different runs. Of course, different sff files will be used on different runs.
  
-A Makefile that illustrates the use of the SunGrid to avoid running on the head node is shown in /​campusdata/​BME235/​assemblies/​Pog/​newbler-assembly1/Makefile+The "​-e"​ value is the expected coverage. ​ For the Pog 454 data, that should be about 60.  For the banana-slug data, it is very much smaller (0.05?). 
 + 
 +The -nobig parameter suppresses the generation of big output files. 
 + 
 +The -rst 0 parameter (repeat score threshold) says that a read should be labeled uniquely mapped if its best hit scores >0 more than the next best (the default value is 12, which means that a lot of hits get labeled as repeats, even though they can distinguish between similar repeat regions). 
 + 
 +A Makefile that illustrates the use of the SunGrid to avoid running on the head node is shown in /​campusdata/​BME235/​assemblies/​Pog/​newbler-assembly2/Makefile 
 + 
 +Note: earlier versions of Newbler provided serially numbered contigs, but version 2.3 seems to skip numbers rather arbitrarily,​ so that the range of the numbers is larger than the size of the set of contigs. ​ Look at the counts (in assembly/​454NewblerMetrics.txt) or run a program to count the contigs, rather than relying on the largest contig number. 
 + 
 +For large genomes you can pass the -large argument to runProject and it will take some time-saving shortcuts. 
 +<​code>​ 
 +        ${BIN}/​newAssembly . 
 +        ${BIN}/​addRun . ${SFFS_IN} 
 +        ${BIN}/​addRun . ${FA_IN} 
 +        ${BIN}/​runProject -e ${EXPECTED_COVERAGE} -large -rst 0 -noace . 
 +</​code>​
  
 == Mapping to existing genome == == Mapping to existing genome ==
Line 65: Line 81:
 addRun . /​campusdata/​BME235/​data/​Pog/​454_run/​sff/​FUIPDCZ01.sff addRun . /​campusdata/​BME235/​data/​Pog/​454_run/​sff/​FUIPDCZ01.sff
 addRun . /​campusdata/​BME235/​data/​Pog/​454_run/​sff/​FUIPDCZ02.sff addRun . /​campusdata/​BME235/​data/​Pog/​454_run/​sff/​FUIPDCZ02.sff
-runProject -e 50 .+runProject -e 50 -nobig -rst 0 .
 </​code>​ </​code>​
  
archive/bioinformatic_tools/gs_de_novo_assembler.txt · Last modified: 2015/07/28 06:23 by ceisenhart