User Tools

Site Tools


lecture_notes:04-08-2015

This is an old revision of the document!


De nova Assembly | Wed 8 April 2015 | Stefan Prost | stefan.prost@berkley.edu | jolespin notes

#Illumina Paired-end Sequencing Libraries

. MiSeq has 300 bp reads
. Paired ends read from both directions
	=====> read_1
	________________________
	           read_2 <=====
. Can’t sequence repeat regions with paired-end reads

#Illumina Mate-Pair Sequencing Libraries

		<===== read_1
	________________________
	           read_2 =====>
. Dependent on inferring insert size
. Genomic DNA > Fragment (2-5 kb) > biotinylate ends >
  Circularize > Fragment (400-600 bp) > Enrich biotinylated fragments >
        Ligate adaptors > …

#BAC (Bacterial Artificial Chromosome) and Fosmid Libraries

. Bacterial F-plasmid takes< 40 kb insert size
. Fragment query DNA > DNA into BAC > Transform E. coli with BAC > ~300 kb long reads
. http://www.scq.ubc.ca/wp-content/plasmidtext.gif

#Read Quality Assessment Tools

. FastQC (Most popular tool to tell you about the read library) 
. Preqc
. Reads decrease in quality further down the read
. Pacific Bio doesn’t have GC|AT bias

#Estimating Genome Size from Read Data

. G = (pn(1-k+1))/(λ_k)
G = Genome size
pn = proportion of correct reads
k = kmer length
λ_k= mode of the k-kmer count histogram
Simpson 2013, arXiv
. To estimate genome size need to know:i) total number of reads; ii) length of reads;  and iii) kmer distribution

#Error Correction

. High amount of small kmers are usually errors

***Simulated contif length in the k-de Brujin graph can estimate the best kmer to use for assembly. Based on contif length N50

You could leave a comment if you were logged in.
lecture_notes/04-08-2015.1428606673.txt.gz · Last modified: 2015/04/09 12:11 by jolespin