Banana Slug Genomics

**This is an old revision of the document!** ----

A PCRE internal error occured. This might be caused by a faulty plugin

=====De novo Assembly II===== **Guest lecturer: Stefan Prost, stefan.prost@berkley.edu** ====Illumina Paired-end Sequencing Libraries==== * MiSeq has 300 bp reads * Paired ends read from both directions, so you get one read for each end (may or may not overlap depending on molecule and read size) =====> end_1 ________________________ end_2 <===== * Can’t sequence repeat regions with paired-end reads * ends can't be very far apart because Illumina can't handle big molecules * not enough info for scaffolding ====Illumina Mate-Pair Sequencing Libraries=== * Idea: get paired reads that are much farther away (for more scaffolding info) * Basically, the same idea as paired-ends, except the middle section is missing and the ends are oriented the opposite way. <===== end_1 ________________________ end_2 =====> * Genomic DNA > Fragment (2-5 kb) > biotinylate ends > Circularize > Fragment (400-600 bp) > Enrich biotinylated fragments > Ligate adaptors > … * Dependent on inferring insert size ====BAC (Bacterial Artificial Chromosome) and Fosmid Libraries==== * Bacterial F-plasmid takes< 40 kb insert size * Fragment query DNA > DNA into BAC > Transform E. coli with BAC > ~300 kb long reads * http://www.scq.ubc.ca/wp-content/plasmidtext.gif ====Read Quality Assessment Tools==== * FastQC (Most popular tool to tell you about the read library) * Preqc * Reads decrease in quality further down the read * Pacific Bio doesn’t have GC|AT bias ====Estimating Genome Size from Read Data==== . G = (pn(1-k+1))/(λ_k) G = Genome size pn = proportion of correct reads k = kmer length λ_k= mode of the k-kmer count histogram Simpson 2013, arXiv * To estimate genome size need to know:i) total number of reads; ii) length of reads; and iii) kmer distribution ====Error Correction==== * High amount of small kmers are usually errors ***Simulated contif length in the k-de Brujin graph can estimate the best kmer to use for assembly. Based on contif length N50

You could leave a comment if you were logged in.

Banana Slug Genomics

User Tools

Site Tools

Page Tools