Differences

This shows you the differences between two versions of the page.

--- lecture_notes:04-28-2010 [2010/05/02 05:34]
galt
+++ lecture_notes:04-28-2010 [2010/05/02 16:15]
karplus fixed number of contigs for Newbler
@@ Line 8: / Line 8: @@
 use makefiles, not shell scripts!
-SOLiD data formats:\\
+**Sanger quality info**\\
+Kevin found the location of the Sanger qual info.\\
+.as or something like that.\\
+different files from 3 different runs.\\
+**SOLiD data formats**:\\
 .csfasta = colorspace with numbers\\
 .de = changes #s to letters (0123 -> ACGT) but it’s colors not numbers! very confusing.\\
@@ Line 117: / Line 122: @@
 Took about 50 minutes for all.\\
-For comparison, Newbler took 18 minutes and about 40 contigs.
+For comparison, Newbler took 18 minutes and 31 non-overlapping contigs.
-Just qsub them with no arguments, and it runs everything.\\
+Just qsub them with no arguments, and it runs everything. ("Them"? "it"? What does this sentence mean? FIXME  --- //[[karplus@soe.ucsc.edu|Kevin Karplus]] 2010/05/02 09:14//)
 === MIRA ===
+Mostly used the default settings.
+mira-assembly1/
+Running is easy.
+Parameters: fasta denovo, tell it which instruments it has (e.g. 454 etc).
 Needs datafile named pog_in.[format].fa \\
-sff_extract script to create .qual files
+uses sff_extract script to create .fasta and .fasta.qual files \\
+and also the traceinfo_in.454.xml file.
-created 30 contigs >=500 (largest contig 640k) \\
+Time: 1 hour plus.
-but... upon mapping to the reference genome,  \\
+Created 621 contigs, 30 larger than 500. (largest contig 640k) \\
+The 500 cutoff it probably too large.\\
+might me more reasonable.\\
+Total concensus size is good.\\
+But... upon mapping to the reference genome,  \\
 it turns out that while it is making big contigs, it's producing a chimeric assembly, in which the contigs join genomic regions that are not truly adjacent.
-it’s getting bigger contigs because it’s joining them incorrectly! \\
+It’s getting bigger contigs because it’s joining them incorrectly! \\
-this is very bad; worse even than a lot of small contigs \\
+This is very bad; worse even than a lot of small contigs \\
+Not DBG.  Should find out more about how it actually works.\\
+Good to know how it works so you know what to do with the parameters.
+Newbler may be able to take fasta+qual file.
+Mira might be worth fussing with on the parameters a bit more if it looks like
+it is doing a good job.
+Mira probably can't handle large genomes due to memory.
+Mira has a tool to estimate memory required.
+For a 3.2G genome it will need 1.1TB ram.

Banana Slug Genomics

User Tools

Site Tools

Differences

Page Tools