User Tools

Site Tools


This is an old revision of the document!

Misc Notes:

campusrocks is broken!

Pog has 2 repeats: ~1k & 1.1k
use makefiles, not shell scripts!

SOLiD data formats:
.csfasta = colorspace with numbers
.de = changes #s to letters (0123 → ACGT) but it’s colors not numbers! very confusing.
.fa is the real basespace


ran well first time (it ran, at least)
have to run it where you installed it
no makefiles

~2k contigs which create a 2x long genome… suspicious
are contigs overlapping?
find out:
check blat_strict_match (blat alignment to reference genome)
look for “Q name” (contigs) which match to the same “T start” positions on the reference genome
answer:yes, appear to overlap a lot – double coverage because they totally overlap

Things to try to improve the run:
- longer k-mers
- increase frequency threshold (help make up for read errors, maybe?)

“Error Correction via threading”
- took reads that “they couldn’t make error free”
- made contigs out of these
- tried to map them back to the “error-free” contigs
- perhaps this is where it went wrong?

Tried to run on just the SOLiD data… started on Sunday, but still running (Wed)

Celera Assember:

needs qual info (need this from Sanger reads, too)
… so can't run unless you have the .qual files

seemed to have a script to convert Illumina → their format… but not released yet

with 454 data alone: 386 contigs
(newbler: ~40 contigs)

took about 50min


needs datafile named pog_in.[format].fa
sff_extract script to create .qual files

created 30 contigs >=500 (largest contig 640k)
but… upon mapping to the reference genome,
it turns out that while it is making big contigs, it's producing a chimeric assembly, in which the contigs join genomic regions that are not truly adjacent. it’s getting bigger contigs because it’s joining them incorrectly!
this is very bad; worse even than a lot of small contigs

You could leave a comment if you were logged in.
lecture_notes/04-28-2010.1272576713.txt.gz · Last modified: 2010/04/29 14:31 by learithe