This is an old revision of the document!
Add to these lecture notes with any notes you have!
Take fix mode script from /projects/compbio/bin/scripts and replace protein user group with BME 235 user group.
Next week will have a reference genome (POG) to use for testing the tools on. For the most part POG is done; however, there are still some uncertainty with 8 SNPs left. It is definitely past the MIAMI standard at this point.
Note about sequencing platform quality scores: most platforms are trying to use the phred quality score1), so the quality score is comparable between the platforms and runs
It can be informative, once reads are mapped, to look at the quality scores for reads with observed errors.
Pog assembly is down to only 8 snps & one potentially variable insert
Types of assembler graphs:
Differences are “What are the nodes?”
* read → * read (a directed graph)
A _______________ | | | | | | __________________ B
The problem is the direction of the reads when aligning:
Need to have some tolerance for error because the reads are noisy.
* kmer → * kmer → * kmer → * kmer …
|----------| |-----------| |----------| |----------| …
No different than a count of k+1 mers.
Ways to handle representing the graph:
May run into problems with RAM on the computational nodes.
With overlap graph:
A → B
A → C
A → D
|----------| A |----------| B |----------| C |----------| D
Don't know where to go / which copy of the repeat currently in.
In ideal situation for de Bruijn graph:
kmer -> kmer -> kmer -> kmer -> kmer (done!)
Realistically, there are issues:
kmer -> kmer -> kmer -> kmer -> kmer \-> kmer -> kmer -> kmer (off to nowhere)
/-> kmer -> kmer -> kmer -\ kmer -> kmer -> kmer -> kmer -> kmer
kmer -> kmer -> kmer -> kmer -> kmer -> kmer -> kmer \- kmers <-/
Take the loop?
A B \ / -------------------> / \ B A'
Which path to take?
If you have clean data, you can disambiguate some issues. Largest bias usually comes from PCR for amplification.
Need to collapse the graph (both overlap and de Bruijn) to assemble the reads.