User Tools

Site Tools


lecture_notes:04-28-2010

This is an old revision of the document!


Misc Notes:

campusrocks is broken!

Pog has 2 repeats: ~1k & 1.1k
use makefiles, not shell scripts!

SOLiD data formats:
.csfasta = colorspace with numbers
.de = changes #s to letters (0123 → ACGT) but it’s colors not numbers! very confusing.
.fa is the real basespace

Euler

ran well first time (it ran, at least)
have to run it where you installed it
no makefiles

result:
~2k contigs which create a 2x long genome… suspicious
are contigs overlapping?
find out:
check blat_strict_match (blat alignment to reference genome)
look for “Q name” (contigs) which match to the same “T start” positions on the reference genome
answer:yes, appear to overlap a lot – double coverage because they totally overlap

Things to try to improve the run:
- longer k-mers
- increase frequency threshold (help make up for read errors, maybe?)

“Error Correction via threading”
- took reads that “they couldn’t make error free”
- made contigs out of these
- tried to map them back to the “error-free” contigs
- perhaps this is where it went wrong?

Tried to run on just the SOLiD data… started on Sunday, but still running (Wed)

Celera Assember:

needs qual info (need this from Sanger reads, too)
… so can't run unless you have the .qual files

seemed to have a script to convert Illumina → their format… but not released yet

result:
with 454 data alone: 386 contigs
(newbler: ~40 contigs)

took about 50min

Mira

needs datafile named pog_in.[format].fa
sff_extract script to create .qual files

created 30 contigs >=500 (largest contig 640k)
but… upon mapping to the reference genome,
it turns out that while it is making big contigs, it's producing a chimeric assembly, in which the contigs join genomic regions that are not truly adjacent. it’s getting bigger contigs because it’s joining them incorrectly!
this is very bad; worse even than a lot of small contigs

You could leave a comment if you were logged in.
lecture_notes/04-28-2010.1272576713.txt.gz · Last modified: 2010/04/29 14:31 by learithe