User Tools

Site Tools


contributors:team_5:discovar_de_novo_manual

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
contributors:team_5:discovar_de_novo_manual [2015/04/26 23:07]
gepoliano
contributors:team_5:discovar_de_novo_manual [2015/09/02 16:22] (current)
ceisenhart ↷ Page moved from team_5_page:discovar_de_novo_manual to contributors:team_5:discovar_de_novo_manual
Line 1: Line 1:
 ======Discovar de novo manual====== ======Discovar de novo manual======
  
 +===Team page=== 
 +[[contributors:​team_5_page|Team 5: Discovar de novo]]
 =====Introduction===== =====Introduction=====
 +
 +The information below is a summary of the Discovar //de novo// manual which can be found at:
 +http://​www.broadinstitute.org/​software/​discovar/​blog/?​page_id=19
  
 DISCOVAR de novo is a new fully de novo genome assembler. Its inputs are designed to optimize quality while keeping costs low. Currently it takes as input Illumina read DISCOVAR de novo is a new fully de novo genome assembler. Its inputs are designed to optimize quality while keeping costs low. Currently it takes as input Illumina read
-s of length 250 or longer ​<​E2><​80><​94> ​produced on MiSeq or HiSeq 2500 <​E2><​80><​94> ​and from a single PCR-free library. These data enable a level of completeness and co+s of length 250 or longer produced on MiSeq or HiSeq 2500 and from a single PCR-free library. These data enable a level of completeness and co
 ntinuity that was not previously possible. ntinuity that was not previously possible.
  
  
-The best source of current news and information on DISCOVAR is our blog:+The best source of current news and information on DISCOVAR ​de novo is the Broad Institute ​blog:
 http://​www.broadinstitute.org/​software/​discovar/​blog/​ http://​www.broadinstitute.org/​software/​discovar/​blog/​
  
Line 23: Line 27:
 The help section of our blog should be your starting point if you encounter problems: The help section of our blog should be your starting point if you encounter problems:
 http://​www.broadinstitute.org/​software/​discovar/​blog/?​page_id=19 http://​www.broadinstitute.org/​software/​discovar/​blog/?​page_id=19
- 
  
  
Line 62: Line 65:
 http://​picard.sourceforge.net/​ http://​picard.sourceforge.net/​
  
-Building +=====Building===== 
-========+
  
 See instructions in the file: INSTALL See instructions in the file: INSTALL
Line 69: Line 72:
  
  
-Performance +=====Performance===== 
-===========+
  
 On systems we have tested on, allowing per-thread memory management will improve computational performance. ​ If not already enabled by default, you can achieve this using: On systems we have tested on, allowing per-thread memory management will improve computational performance. ​ If not already enabled by default, you can achieve this using:
Line 78: Line 81:
  
  
-Testing +=====Testing===== 
-=======+
  
 Example data, along with instructions are available via our FTP site. Before attempting to run DISCOVAR with your own data, please first try the examples available via our FTP site: Example data, along with instructions are available via our FTP site. Before attempting to run DISCOVAR with your own data, please first try the examples available via our FTP site:
Line 87: Line 90:
  
  
-Generating sequencing data +=====Generating sequencing data===== 
-==========================+
 DISCOVAR has specific requirements for input data, and will likely fail if you do not meet them. DISCOVAR has specific requirements for input data, and will likely fail if you do not meet them.
  
Line 113: Line 116:
  
  
-Input files +=====Input files===== 
-===========+
  
 DISCOVAR requires a BAM file containing the raw reads from the sequencer. DISCOVAR requires a BAM file containing the raw reads from the sequencer.
Line 137: Line 140:
  
  
-Running DISCOVAR ​ de novo +=====Running DISCOVAR ​ de novo===== 
-=========================+
  
 DISCOVAR can currently de novo assemble genomes up to ~3 Gb in size.  All that is required are paired end reads, contained within one or more BAM files. See the previous section for details on generating the appropriate sequence data and the BAM file requirements. DISCOVAR can currently de novo assemble genomes up to ~3 Gb in size.  All that is required are paired end reads, contained within one or more BAM files. See the previous section for details on generating the appropriate sequence data and the BAM file requirements.
Line 152: Line 155:
  
 This will take as input all the reads in the BAM file reads.bam, generate an assembly, then write the output to the directory my_assembly. The location of the final assembly files is: my_assembly/​a.final/​ This will take as input all the reads in the BAM file reads.bam, generate an assembly, then write the output to the directory my_assembly. The location of the final assembly files is: my_assembly/​a.final/​
-Viewing a DISCOVAR de novo assembly + 
-===================================+=====Viewing a DISCOVAR de novo assembly===== 
  
 The assembly graph produced by DISCOVAR de novo can be explored using the tool NhoodInfo, which is part of the DISCOVAR package. Please see the NhoodInfo manual for more details. The assembly graph produced by DISCOVAR de novo can be explored using the tool NhoodInfo, which is part of the DISCOVAR package. Please see the NhoodInfo manual for more details.
  
  
-Brief guide to the assembly output +=====Brief guide to the assembly output=====
-==================================+
  
-A DISCOVAR de novo  assembly is a graph whose edges represent DNA sequences. ​ Within any assembly one can find regions that are essentially linear. We call these lines. 
  
-                 /​---\ ​           /---\ +A DISCOVAR de novo  assembly is a graph whose edges represent DNA sequences. ​ Within any assembly one can find regions that are essentially linear. We call these lines. ​ 
------------------ ​    ​------------ ​    ​------------------ + 
-                 \---/            \---/+ 
 +{{discovar_de_novo_manual:​cells.gif?​120}} 
  
 This line has two cells. ​ In this case, for each cell, there are two paths across the cell. This line has two cells. ​ In this case, for each cell, there are two paths across the cell.
Line 179: Line 183:
 DISCOVAR de novo provides several output forms from which you can select: DISCOVAR de novo provides several output forms from which you can select:
  
-    ​a.fasta = fasta file of edges +-a.fasta = fasta file of edges
- +
-    a.lines = binary file of lines, mathematically a vec<​vec<​vec<​vec<​int>>>>,​ in which the ints are edge ids. +
- +
-    a.lines.efasta = standard scaffold efasta file, which shows {s1,...,sn} for the ALTERNATIVES associated to a given cell. * +
- +
-    a.lines.fasta = standard scaffold fasta file, obtained by taking the highest coverage path through each cell; LOSES INFORMATION! * +
- ​a.lines.src = human-readable form of a.lines, represented using nested brackets {...} +
- +
-    * '​Duplicate'​ reverse complement lines have been removed from these files. Also for circular chromosomes or episomes, the header line is labeled '​circular'​ and the ends of the sequence overlap by exactly K-1 bases (K = 200).+
  
 +-a.lines = binary file of lines, mathematically a vec<​vec<​vec<​vec<​int>>>>,​ in which the ints are edge ids.
  
--------------+-a.lines.efasta = standard scaffold efasta file, which shows {s1,...,sn} for the ALTERNATIVES associated to a given cell. *
  
 +-a.lines.fasta = standard scaffold fasta file, obtained by taking the highest coverage path through each cell; LOSES INFORMATION! *
 +-a.lines.src = human-readable form of a.lines, represented using nested brackets {...}
  
 +-'​Duplicate'​ reverse complement lines have been removed from these files. Also for circular chromosomes or episomes, the header line is labeled '​circular'​ and the ends of the sequence overlap by exactly K-1 bases (K = 200).
contributors/team_5/discovar_de_novo_manual.1430089662.txt.gz · Last modified: 2015/04/26 23:07 by gepoliano