Table of Contents

Genome assembly—theory and practice

Administrative

N50

How to find N50? Find the total length and divide it by 2. Order your contigs from longest to shortest. Go down the list and keep track of the total bases you've seen so far. Whichever contig puts you over the length/2 is your N50 contig, and you can report it's length.

Limitations

de Bruijn graphs

Reminder: de Bruijn graph algorithm takes kmers from reads and builds and adjacency graph (usually nodes are the kmers, but the kmers could be the edges if you want). In theory we don't need extremely long kmers to get mostly unique places in the genome.

Picking k

What could possibly go wrong?

(See slides for diagrams referred to in this section.)