Differences

This shows you the differences between two versions of the page.

--- lecture_notes:04-28-2010 [2010/05/02 05:01]
galt
+++ lecture_notes:04-28-2010 [2010/05/02 05:34]
galt
@@ Line 1: / Line 1: @@
+John St. John's lecture on EULER-SR and Celera; Michael Cusack's lecture on MIRA
 === Misc Notes: ===
@@ Line 12: / Line 14: @@
-Kevin
+Kevin mapped newbler to join the contigs
-I mapped newbler to join the contigs
 found a bug in the python script to map the solid reads.
 Detected because there were no joining reads
@@ Line 33: / Line 34: @@
 that can use long reads and mate-pairs.
-ran well first time (it ran, at least)  \\
+**euler-sr-assembly1/**\\
-have to run it where you installed it \\
+Ran on 454 data with the Sanger data concatenated into one file.
-no makefiles \\
-result: \\
+Have to set up env vars.\\
+No make install options.\\
+Things are mixed up.\\
+You have to run it where you installed it \\
+${EUSRC}\\
+It ran well the first time (it ran, at least)  \\
+${EUSRC}/assembly/Assemble.pl pogreads.fasta 25\\
+**Result**: \\
 ~2k contigs which create a 2x long genome… suspicious \\
 are contigs overlapping? \\
 //find out:// \\
-check blat_strict_match  (blat alignment to reference genome) \\
+check contig-blat_strict_match  (blat alignment to reference genome) \\
 look for "Q name" (contigs) which match to the same "T start" positions on the reference genome \\
 //answer://yes, appear to overlap a lot – double coverage because they totally overlap
+There is one 91k contig.\\
 Things to try to improve the run: \\
-- longer k-mers \\
+- longer k-mers, increasing to 31 should be easy \\
 - increase frequency threshold (help make up for read errors, maybe?) \\
+- throw out the tiny contigs, reduce your cutoff.
+Does have an option to do some simple quality filtering on the reads\\
+if quality data such as fastq is used?\\
+-minmult look at how many things map to this area,\\
+if less than this many things, throw it out.\\
+Error-correct reads, construct repeat graph,\\
+simplifiy repeat graph with mate-reads\\
+Error correction by threading.\\
+Tries to make minimal corrections to beginnings of reads,\\
+uses those to make the kmers.   Later threads the full readlength through.
 "Error Correction via threading" \\
@@ Line 55: / Line 79: @@
 - perhaps this is where it went wrong? \\
+Mate reads.\\
+Multiple paths of similar length are hard to disambiguate.\\
+You can use multiple matepairs and bootstrap analysis.\\
+Use the paths with the highest probability.\\
+Pog repeats aside:\\
+There are several large homologous regions on opposite strands\\
+in Pog data that are kinds of repeats.  \\
+They are at both ends of the area that inverts.\\
+Inversion happens by homologous matching, then swapping by two strands.\\
+Like a sloppy integrase.
+**Solid data.**\\
+Used the regular base-space data in colorspace_input.fa (not double-encoded).\\
 Tried to run on just the SOLiD data… started on Sunday, but still running (Wed) \\
@@ Line 60: / Line 99: @@
 === Celera Assember: ===
-needs qual info (need this from Sanger reads, too) \\
+**Result**: \\
-... so can't run unless you have the .qual files
+Celera on Pog 454 got 2.4M genome.  386 contigs.  Max size 34k.\\
+Needs quality information also, even for the Sanger reads \\
+So can't run unless you have the .qual files
+Script for converting Illumina (Solexa) reads into their format but not released yet.\\
+Their next release is supposedly soon (May 1st).
+They have settings for sungrid running, but it did not work,\\
+so he turned it off.
-seemed to have a script to convert Illumina -> their format… but not released yet
+How noisy is the solid data? (Kevin)\\
+On the stuff that maps completely, about 1.5% err rate.\\
+The ones that didn't map cleanly had error-rate 2.5%.\\
+Error rate goes up at the end.\\
+Had some fluidics reads problems at some base positions.
-result: \\
+Took about 50 minutes for all.\\
-with 454 data alone: 386 contigs \\
+For comparison, Newbler took 18 minutes and about 40 contigs.
-(newbler: ~40 contigs) \\
-took about 50min
+Just qsub them with no arguments, and it runs everything.\\
-=== Mira ===
+=== MIRA ===
-needs datafile named pog_in.[format].fa \\
+Needs datafile named pog_in.[format].fa \\
 sff_extract script to create .qual files

Banana Slug Genomics

User Tools

Site Tools

Differences

Page Tools