This shows you the differences between two versions of the page.
Next revision | Previous revision | ||
lecture_notes:06-02-2010 [2010/06/02 22:15] hyjkim created |
lecture_notes:06-02-2010 [2010/06/06 22:32] galt more rationale |
||
---|---|---|---|
Line 1: | Line 1: | ||
+ | ====== Sea Hare and Panda ====== | ||
+ | |||
+ | Looking at recent successful de-novo assemblies can help inform future sequencing and assembly plans for the Banana Slug. | ||
+ | |||
+ | ===== Sea Hare ===== | ||
+ | |||
+ | Sea Hare is interesting because it is a recent de-novo mollusc assembly using 454 mate-pairs. | ||
+ | |||
* Analyzing data from previously sequenced mollusk genomes | * Analyzing data from previously sequenced mollusk genomes | ||
* Broad institute /ftp/pub/assemblies/invertebrates/aplysia (seahare) | * Broad institute /ftp/pub/assemblies/invertebrates/aplysia (seahare) | ||
Line 26: | Line 34: | ||
* Estimate error after mapping. | * Estimate error after mapping. | ||
* These two measures should be correlated to each other. | * These two measures should be correlated to each other. | ||
+ | |||
+ | ===== Panda ===== | ||
+ | |||
+ | Panda is interesting because it is a recent de-novo assembly of a large | ||
+ | genome of approximately the same size as banana slug (3Gb). It is also | ||
+ | done using SOAPdenovo which we were able to use to assemble our slug data. | ||
+ | Panda is also the only known large genome yet assembled de-novo using | ||
+ | only Illumina/Solexa reads. | ||
Panda Genome statistics | Panda Genome statistics | ||
Line 61: | Line 77: | ||
* Next slug to be sequenced should be photographed during dissection in order to identify the species. | * Next slug to be sequenced should be photographed during dissection in order to identify the species. | ||
+ | |||
+ | good computational challange: | ||
+ | * Subdivide small reads into regions that they group into, then you can do local denovo assemblies on subsets of reads. This is done biologically with things like the BACs. | ||
+ | * Example, shorty: map reads to a contig, then map out to reads in other contigs, and then map back. It collects a bunch of reads that might belong together and can assemble these. | ||
+ | * Can use SOAPdenovo to get initial contigs. Then can map pieces onto contigs and gather reads togeather. Store contigs in memory, and stream out data to sub-assemblers. PHD level question, can we make an efficient parallel assembler out of this? How to stream through this and partition efficiently? How can we get efficient ways of dealing with all of this? | ||
+ |