User Tools

Site Tools


Lecture Notes for May 28, 2010

Pcap Pog Assembly

Location: assemblies/Pog/pcap-assembly/


Looking at the T start and T end (where it is in the chromosome)
Notice there is a problem with contig 8; there is a large hole


Parameters point to overlap consensus
Coverage looks right
A small number of reads of deep coverage, they may infact be bogus reads or pieces of the virus


Contig 8 looks like it is problematic
After contig 17, we are getting down to 1k pieces, not a clear distinction between ones that are needed
Doesn't look too bad as an assembly, but needs some cleaning up
Not many chimeric reads
The virus is probably somewhere in there as well (search for ece, in contigs 11, 14, etc.)
Curious why it produces overlapping contigs (didn't build any supercontigs, maybe supporting multiple paths)
~2 hr run time, not that different from Newbler


Talk about the internals of colorspace mapping (not necessarily the correct way to do it)

Have dicts:
ref: is the list of sequences that are input from FASTA
ref_color: have the same thing, but translated to color space

  • Only two bits of information lost
  • Corresponds to the base space as the transition to get to the base
  • Also have the length of the reference sequence

ref_start: wanted to store locations (a triplet into a single integer to save space, gave each chromosome a number, done in megabases)

Then have a bunch of RDB files, contains the names of files, the files that are open
cross_rdb: The ones that didn't map well

Used to check if there is a bias of freads and rreads mapping to the forward or backward strand

A big chunk of the code is to check for inversions
Where are there places that there is homologous recombination in reverse places?

check Builds the possible inversion lists
Uses as hash table of k-mers

windows: a hash table given a color space k-mer, it doesn't take it as a string but as an int to save space, returns a list of locations where that k-mer exists
short_windows: does the same thing but with k-mers of a shorter length

The simplest test to return a list of where the read maps
If it is in the windows hash table, return the list
The fixup here for the trimming is different than before
The hack where get the width + trim lookup and then lookup trim first + width
This one is the agressive mapping
Taking the union of all the places that it could map of at least 2

take a list of locations and narrow it down to exact matches

Next Week

Clean up the files on campusrocks, finish README files, add anything to wiki
A week from today freeze files on campusrocks and do backup

You could leave a comment if you were logged in.
lecture_notes/06-04-2010.txt · Last modified: 2010/06/04 22:25 by cbrumbau