This shows you the differences between two versions of the page.
Both sides previous revision Previous revision | Next revision Both sides next revision | ||
lecture_notes:04-08-2015 [2015/04/09 22:42] sihussai |
lecture_notes:04-08-2015 [2015/04/09 23:05] sihussai |
||
---|---|---|---|
Line 10: | Line 10: | ||
end_2 <===== | end_2 <===== | ||
- | * Can’t sequence repeat regions with paired-end reads | + | * Problem: not really sufficient for repetitive regions |
* ends can't be very far apart because Illumina can't handle big molecules | * ends can't be very far apart because Illumina can't handle big molecules | ||
* not enough info for scaffolding | * not enough info for scaffolding | ||
Line 23: | Line 23: | ||
* Genomic DNA > Fragment (2-5 kb) > biotinylate ends > Circularize > Fragment (400-600 bp) > Enrich biotinylated fragments > Ligate adaptors > … | * Genomic DNA > Fragment (2-5 kb) > biotinylate ends > Circularize > Fragment (400-600 bp) > Enrich biotinylated fragments > Ligate adaptors > … | ||
- | * Dependent on inferring insert size | + | * Cut DNA, attach a biotin tag to both ends of the target molecule |
+ | * Circularize target molecule | ||
+ | * This step is hard | ||
+ | * Circularized molecule is fragmented, fragment with the biotin tag (a.k.a the ends of the original target stuck to each other backwards, with the tags in between) is enriched | ||
+ | * Then end repair, A-tailing, adapters added, amplification, sequencing | ||
+ | * Dependent on inferring insert size (can be tricky) | ||
+ | * Most companies can get you 8 kb inserts, with skill you can get up to 20 kb | ||
+ | * Complicated process, weird stuff can happen in between | ||
+ | * Important difference between paired ends and mate-pairs: ends are oriented the opposite way. | ||
====BAC (Bacterial Artificial Chromosome) and Fosmid Libraries==== | ====BAC (Bacterial Artificial Chromosome) and Fosmid Libraries==== | ||
+ | * Uncommon and expensive, but the gold standard | ||
* Bacterial F-plasmid takes< 40 kb insert size | * Bacterial F-plasmid takes< 40 kb insert size | ||
* Fragment query DNA > DNA into BAC > Transform E. coli with BAC > ~300 kb long reads | * Fragment query DNA > DNA into BAC > Transform E. coli with BAC > ~300 kb long reads | ||
* http://www.scq.ubc.ca/wp-content/plasmidtext.gif | * http://www.scq.ubc.ca/wp-content/plasmidtext.gif | ||
- | ====Read Quality Assessment Tools==== | + | ====Read Quality Assessment==== |
- | * FastQC (Most popular tool to tell you about the read library) | + | * Base quality: Phred scores reported by sequencer. |
- | * Preqc | + | * Fastq files: fasta files, plus encoded phred scores |
+ | * Need to know if your file has phred33 or phred64 encoding | ||
+ | * Quality for each individual base is not the whole story, the context matters to the signal processing too | ||
* Reads decrease in quality further down the read | * Reads decrease in quality further down the read | ||
+ | * **Depending on the assembler, you might need to trim lower quality reads off the end. Others want untrimmed data** | ||
* Pacific Bio doesn’t have GC|AT bias | * Pacific Bio doesn’t have GC|AT bias | ||
+ | |||
+ | ===Tools=== | ||
+ | * FastQC (Most popular tool to tell you about the read library) | ||
+ | * FastQC reported an issue with our data with kmer count (related to adapter content) | ||
+ | * **This needs to be checked out and diagnosed!** | ||
+ | * Preqc | ||
+ | * Estimates how difficult the assembly will be | ||
====Estimating Genome Size from Read Data==== | ====Estimating Genome Size from Read Data==== | ||
- | . G = (pn(1-k+1))/(λ_k) | + | G = (pn(1-k+1))/(λ_k) |
G = Genome size | G = Genome size | ||
pn = proportion of correct reads | pn = proportion of correct reads | ||
Line 50: | Line 69: | ||
* High amount of small kmers are usually errors | * High amount of small kmers are usually errors | ||
- | ***Simulated contif length in the k-de Brujin graph can estimate the best kmer to use for assembly. Based on contif length N50 | + | **Simulated contig length in the k-de Brujin graph can estimate the best kmer to use for assembly. Based on contig length N50** |