This shows you the differences between two versions of the page.
| Both sides previous revision Previous revision | |||
|
lecture_notes:04-08-2015 [2015/04/17 21:53] sihussai fixing formatting |
lecture_notes:04-08-2015 [2015/04/17 22:34] (current) sihussai fixing capitalization |
||
|---|---|---|---|
| Line 1: | Line 1: | ||
| - | ======De novo Assembly II====== | + | ======De novo assembly II====== |
| **Guest lecturer: Stefan Prost, stefan.prost@berkley.edu** | **Guest lecturer: Stefan Prost, stefan.prost@berkley.edu** | ||
| - | =====Illumina Paired-end Sequencing Libraries===== | + | =====Illumina paired-end sequencing libraries===== |
| * MiSeq has 300 bp reads | * MiSeq has 300 bp reads | ||
| * Paired ends read from both directions, so you get one read for each end (may or may not overlap depending on molecule and read size) | * Paired ends read from both directions, so you get one read for each end (may or may not overlap depending on molecule and read size) | ||
| Line 14: | Line 14: | ||
| * not enough info for scaffolding | * not enough info for scaffolding | ||
| - | =====Illumina Mate-Pair Sequencing Libraries==== | + | =====Illumina mate-pair sequencing libraries==== |
| * Idea: get paired reads that are much farther away (for more scaffolding info) | * Idea: get paired reads that are much farther away (for more scaffolding info) | ||
| * Basically, the same idea as paired-ends, except the middle section is missing and the ends are oriented the opposite way. | * Basically, the same idea as paired-ends, except the middle section is missing and the ends are oriented the opposite way. | ||
| Line 34: | Line 34: | ||
| - | =====BAC (Bacterial Artificial Chromosome) and Fosmid Libraries===== | + | =====BAC (Bacterial Artificial Chromosome) and fosmid libraries===== |
| * Uncommon and expensive, but the gold standard | * Uncommon and expensive, but the gold standard | ||
| * Bacterial F-plasmid takes< 40 kb insert size | * Bacterial F-plasmid takes< 40 kb insert size | ||
| Line 40: | Line 40: | ||
| * http://www.scq.ubc.ca/wp-content/plasmidtext.gif | * http://www.scq.ubc.ca/wp-content/plasmidtext.gif | ||
| - | =====Read Quality Assessment===== | + | =====Read quality assessment===== |
| * Base quality: Phred scores reported by sequencer. | * Base quality: Phred scores reported by sequencer. | ||
| * Fastq files: fasta files, plus encoded phred scores | * Fastq files: fasta files, plus encoded phred scores | ||
| Line 56: | Line 56: | ||
| * Estimates how difficult the assembly will be | * Estimates how difficult the assembly will be | ||
| - | =====Estimating Genome Size from Read Data===== | + | =====Estimating genome size from read data===== |
| G = (pn(1-k+1))/(λ_k) | G = (pn(1-k+1))/(λ_k) | ||
| G = Genome size | G = Genome size | ||
| Line 66: | Line 66: | ||
| * To estimate genome size need to know:i) total number of reads; ii) length of reads; and iii) kmer distribution | * To estimate genome size need to know:i) total number of reads; ii) length of reads; and iii) kmer distribution | ||
| - | =====Error Correction===== | + | =====Error correction===== |
| * High amount of small kmers are usually errors | * High amount of small kmers are usually errors | ||
| **Simulated contig length in the k-de Brujin graph can estimate the best kmer to use for assembly. Based on contig length N50** | **Simulated contig length in the k-de Brujin graph can estimate the best kmer to use for assembly. Based on contig length N50** | ||