This shows you the differences between two versions of the page.
Next revision | Previous revision Next revision Both sides next revision | ||
lecture_notes:04-08-2015 [2015/04/09 19:11] jolespin created |
lecture_notes:04-08-2015 [2015/04/09 22:42] sihussai |
||
---|---|---|---|
Line 1: | Line 1: | ||
- | De nova Assembly | Wed 8 April 2015 | Stefan Prost | stefan.prost@berkley.edu | jolespin notes | + | =====De novo Assembly II===== |
+ | **Guest lecturer: Stefan Prost, stefan.prost@berkley.edu** | ||
- | #Illumina Paired-end Sequencing Libraries | + | ====Illumina Paired-end Sequencing Libraries==== |
- | . MiSeq has 300 bp reads | + | * MiSeq has 300 bp reads |
- | . Paired ends read from both directions | + | * Paired ends read from both directions, so you get one read for each end (may or may not overlap depending on molecule and read size) |
- | =====> read_1 | + | =====> end_1 |
________________________ | ________________________ | ||
- | read_2 <===== | + | end_2 <===== |
- | . Can’t sequence repeat regions with paired-end reads | + | |
- | #Illumina Mate-Pair Sequencing Libraries | + | * Can’t sequence repeat regions with paired-end reads |
- | <===== read_1 | + | * ends can't be very far apart because Illumina can't handle big molecules |
+ | * not enough info for scaffolding | ||
+ | |||
+ | ====Illumina Mate-Pair Sequencing Libraries=== | ||
+ | * Idea: get paired reads that are much farther away (for more scaffolding info) | ||
+ | * Basically, the same idea as paired-ends, except the middle section is missing and the ends are oriented the opposite way. | ||
+ | |||
+ | <===== end_1 | ||
________________________ | ________________________ | ||
- | read_2 =====> | + | end_2 =====> |
- | . Dependent on inferring insert size | + | |
- | . Genomic DNA > Fragment (2-5 kb) > biotinylate ends > | + | * Genomic DNA > Fragment (2-5 kb) > biotinylate ends > Circularize > Fragment (400-600 bp) > Enrich biotinylated fragments > Ligate adaptors > … |
- | Circularize > Fragment (400-600 bp) > Enrich biotinylated fragments > | + | * Dependent on inferring insert size |
- | Ligate adaptors > … | + | |
- | #BAC (Bacterial Artificial Chromosome) and Fosmid Libraries | + | ====BAC (Bacterial Artificial Chromosome) and Fosmid Libraries==== |
- | . Bacterial F-plasmid takes< 40 kb insert size | + | * Bacterial F-plasmid takes< 40 kb insert size |
- | . Fragment query DNA > DNA into BAC > Transform E. coli with BAC > ~300 kb long reads | + | * Fragment query DNA > DNA into BAC > Transform E. coli with BAC > ~300 kb long reads |
- | . http://www.scq.ubc.ca/wp-content/plasmidtext.gif | + | * http://www.scq.ubc.ca/wp-content/plasmidtext.gif |
- | #Read Quality Assessment Tools | + | ====Read Quality Assessment Tools==== |
- | . FastQC (Most popular tool to tell you about the read library) | + | * FastQC (Most popular tool to tell you about the read library) |
- | . Preqc | + | * Preqc |
- | . Reads decrease in quality further down the read | + | * Reads decrease in quality further down the read |
- | . Pacific Bio doesn’t have GC|AT bias | + | * Pacific Bio doesn’t have GC|AT bias |
- | #Estimating Genome Size from Read Data | + | ====Estimating Genome Size from Read Data==== |
. G = (pn(1-k+1))/(λ_k) | . G = (pn(1-k+1))/(λ_k) | ||
G = Genome size | G = Genome size | ||
Line 38: | Line 45: | ||
Simpson 2013, arXiv | Simpson 2013, arXiv | ||
- | . To estimate genome size need to know:i) total number of reads; ii) length of reads; and iii) kmer distribution | + | * To estimate genome size need to know:i) total number of reads; ii) length of reads; and iii) kmer distribution |
- | #Error Correction | + | ====Error Correction==== |
- | . High amount of small kmers are usually errors | + | * High amount of small kmers are usually errors |
***Simulated contif length in the k-de Brujin graph can estimate the best kmer to use for assembly. Based on contif length N50 | ***Simulated contif length in the k-de Brujin graph can estimate the best kmer to use for assembly. Based on contif length N50 |