Differences

This shows you the differences between two versions of the page.

--- lecture_notes:04-28-2010 [2010/05/02 05:22]
galt
+++ lecture_notes:04-28-2010 [2010/05/02 05:42]
galt
@@ Line 8: / Line 8: @@
 use makefiles, not shell scripts!
-SOLiD data formats:\\
+**Sanger quality info**\\
+Kevin found the location of the Sanger qual info.\\
+.as or something like that.\\
+different files from 3 different runs.\\
+**SOLiD data formats**:\\
 .csfasta = colorspace with numbers\\
 .de = changes #s to letters (0123 -> ACGT) but it’s colors not numbers! very confusing.\\
@@ Line 47: / Line 52: @@
 ${EUSRC}/assembly/Assemble.pl pogreads.fasta 25\\
-result: \\
+**Result**: \\
 ~2k contigs which create a 2x long genome… suspicious \\
 are contigs overlapping? \\
@@ Line 99: / Line 104: @@
 === Celera Assember: ===
-needs qual info (need this from Sanger reads, too) \\
+**Result**: \\
-... so can't run unless you have the .qual files
+Celera on Pog 454 got 2.4M genome.  386 contigs.  Max size 34k.\\
+Needs quality information also, even for the Sanger reads \\
+So can't run unless you have the .qual files
-seemed to have a script to convert Illumina -> their format… but not released yet
+Script for converting Illumina (Solexa) reads into their format but not released yet.\\
+Their next release is supposedly soon (May 1st).
-result: \\
+They have settings for sungrid running, but it did not work,\\
-with 454 data alone: 386 contigs \\
+so he turned it off.
-(newbler: ~40 contigs) \\
-took about 50min
+How noisy is the solid data? (Kevin)\\
+On the stuff that maps completely, about 1.5% err rate.\\
+The ones that didn't map cleanly had error-rate 2.5%.\\
+Error rate goes up at the end.\\
+Had some fluidics reads problems at some base positions.
+Took about 50 minutes for all.\\
+For comparison, Newbler took 18 minutes and about 40 contigs.
-=== Mira ===
+Just qsub them with no arguments, and it runs everything.\\
-needs datafile named pog_in.[format].fa \\
-sff_extract script to create .qual files
-created 30 contigs >=500 (largest contig 640k) \\
+=== MIRA ===
-but... upon mapping to the reference genome,  \\
+Mostly used the default settings.
+mira-assembly1/
+Running is easy.
+Parameters: fasta denovo, tell it which instruments it has (e.g. 454 etc).
+Needs datafile named pog_in.[format].fa \\
+uses sff_extract script to create .fasta and .fasta.qual files \\
+and also the traceinfo_in.454.xml file.
+Time: 1 hour plus.
+Created 621 contigs, 30 larger than 500. (largest contig 640k) \\
+The 500 cutoff it probably too large.\\
+might me more reasonable.\\
+Total concensus size is good.\\
+But... upon mapping to the reference genome,  \\
 it turns out that while it is making big contigs, it's producing a chimeric assembly, in which the contigs join genomic regions that are not truly adjacent.
-it’s getting bigger contigs because it’s joining them incorrectly! \\
+It’s getting bigger contigs because it’s joining them incorrectly! \\
-this is very bad; worse even than a lot of small contigs \\
+This is very bad; worse even than a lot of small contigs \\
+Not DBG.  Should find out more about how it actually works.\\
+Good to know how it works so you know what to do with the parameters.
+Newbler may be able to take fasta+qual file.
+Mira might be worth fussing with on the parameters a bit more if it looks like
+it is doing a good job.
+Mira probably can't handle large genomes due to memory.
+Mira has a tool to estimate memory required.
+For a 3.2G genome it will need 1.1TB ram.

Banana Slug Genomics

User Tools

Site Tools

Differences

Page Tools