Table of Contents

De novo assembly II

Guest lecturer: Stefan Prost, stefan.prost@berkley.edu

Illumina paired-end sequencing libraries

	=====> end_1
	________________________
	           end_2 <=====

Illumina mate-pair sequencing libraries

		<===== end_1
	________________________
	           end_2 =====>

BAC (Bacterial Artificial Chromosome) and fosmid libraries

Read quality assessment

Tools

Estimating genome size from read data

G = (pn(1-k+1))/(λ_k)
G = Genome size
pn = proportion of correct reads
k = kmer length
λ_k= mode of the k-kmer count histogram
Simpson 2013, arXiv

Error correction

Simulated contig length in the k-de Brujin graph can estimate the best kmer to use for assembly. Based on contig length N50