This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
archive:bioinformatic_tools:gs_de_novo_assembler [2010/04/24 15:36] karplus replaced -noace by -nobig and added explanations |
archive:bioinformatic_tools:gs_de_novo_assembler [2015/07/28 06:23] (current) ceisenhart ↷ Page moved from bioinformatic_tools:gs_de_novo_assembler to archive:bioinformatic_tools:gs_de_novo_assembler |
||
---|---|---|---|
Line 49: | Line 49: | ||
addRun . /campusdata/BME235/data/Pog/454_run/sff/FUIPDCZ01.sff | addRun . /campusdata/BME235/data/Pog/454_run/sff/FUIPDCZ01.sff | ||
addRun . /campusdata/BME235/data/Pog/454_run/sff/FUIPDCZ02.sff | addRun . /campusdata/BME235/data/Pog/454_run/sff/FUIPDCZ02.sff | ||
- | runProject -e 50 -nobig . | + | runProject -e 50 -nobig -rst 0 . |
</code> | </code> | ||
Of course, different sff files will be used on different runs. | Of course, different sff files will be used on different runs. | ||
Line 56: | Line 56: | ||
The -nobig parameter suppresses the generation of big output files. | The -nobig parameter suppresses the generation of big output files. | ||
+ | |||
+ | The -rst 0 parameter (repeat score threshold) says that a read should be labeled uniquely mapped if its best hit scores >0 more than the next best (the default value is 12, which means that a lot of hits get labeled as repeats, even though they can distinguish between similar repeat regions). | ||
A Makefile that illustrates the use of the SunGrid to avoid running on the head node is shown in /campusdata/BME235/assemblies/Pog/newbler-assembly2/Makefile | A Makefile that illustrates the use of the SunGrid to avoid running on the head node is shown in /campusdata/BME235/assemblies/Pog/newbler-assembly2/Makefile | ||
Note: earlier versions of Newbler provided serially numbered contigs, but version 2.3 seems to skip numbers rather arbitrarily, so that the range of the numbers is larger than the size of the set of contigs. Look at the counts (in assembly/454NewblerMetrics.txt) or run a program to count the contigs, rather than relying on the largest contig number. | Note: earlier versions of Newbler provided serially numbered contigs, but version 2.3 seems to skip numbers rather arbitrarily, so that the range of the numbers is larger than the size of the set of contigs. Look at the counts (in assembly/454NewblerMetrics.txt) or run a program to count the contigs, rather than relying on the largest contig number. | ||
+ | |||
+ | For large genomes you can pass the -large argument to runProject and it will take some time-saving shortcuts. | ||
+ | <code> | ||
+ | ${BIN}/newAssembly . | ||
+ | ${BIN}/addRun . ${SFFS_IN} | ||
+ | ${BIN}/addRun . ${FA_IN} | ||
+ | ${BIN}/runProject -e ${EXPECTED_COVERAGE} -large -rst 0 -noace . | ||
+ | </code> | ||
== Mapping to existing genome == | == Mapping to existing genome == | ||
Line 71: | Line 81: | ||
addRun . /campusdata/BME235/data/Pog/454_run/sff/FUIPDCZ01.sff | addRun . /campusdata/BME235/data/Pog/454_run/sff/FUIPDCZ01.sff | ||
addRun . /campusdata/BME235/data/Pog/454_run/sff/FUIPDCZ02.sff | addRun . /campusdata/BME235/data/Pog/454_run/sff/FUIPDCZ02.sff | ||
- | runProject -e 50 -nobig . | + | runProject -e 50 -nobig -rst 0 . |
</code> | </code> | ||