User Tools

Site Tools


lecture_notes:05-27-2015

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
lecture_notes:05-27-2015 [2015/05/28 11:56]
emfeal created
lecture_notes:05-27-2015 [2015/05/28 12:48]
emfeal
Line 1: Line 1:
 ====== Genome Annotation ====== ====== Genome Annotation ======
-===== Repeat Annotation ===== +===== Repeats ===== 
- +Masking: 
 +  * Done to facilitate conventional gene annotation efforts. 
 +  * Helps avoid false SNP calls and mapping ambiguities. 
 +  * Hard Masking: replacing repeats with Ns {ACGTNNNNNNNNNATGG} 
 +  * Soft Masking: replacing repeats with lowercase {ACGTtagtagtagATGG} 
 +Repeat Annotation
 +  * Different types of repeats can be studied along with their levels of activity (evolutionary analyses) 
 +Types of Repeats: 
 +  * low-complexity sequence: microsatellites,​ homopolymers,​ etc. 
 +  * Transposable Elements: 
 +  *   * class 1: retrotransposon;​ "copy & paste";​ LTR, LINES, SINES 
 +  *   * class 2: DNA transposons;​ "cut & paste";​ subclass 1 and subclass 2 
 +Repeat Content 
 +  * Does not necessarily correlate with genome size 
 +  * some correlation within the same group 
 +Tools 
 +  * Homology: RepeatMasker 
 +  * denovo: RepeatModeler,​ WindowMasker,​ RepeatScout,​ Piler 
 +  * denovo from reads: REPdenovo, TEDNA 
 +  * NOTE: denovo tools run risk of false positives from highly conserved protein-coding genes. 
 +===== Gene Annotation ===== 
 +Evidence-driven Annotation 
 +  * protein information,​ EST, **RNA-Seq** 
 +Ab initio Gene Prediction 
 +  * doesn'​t require evidence data 
 +  * requires training for organism of interest 
 +  * most find single most likely CDS 
 +  * do not report UTR's (incomplete gene model) 
 +  * does not accommodate spliceoforms 
 +  * requires high-quality assembly (scaffold N50 ≈ avg gene size) 
 +Combined Approach 
 +  * challenge of collating different models and sources of evidence. 
 +Annotation Metrics 
 +  * Sensitivity,​ specificity,​ accuracy, AED 
 +  * AED = 1 - ACC = 1 - .5(Sensitivity+specificity) 
 +  * AED useful for identifying low quality inconsistent annotations (can be manually curated later) 
 +Tools 
 +  * Pipelines: Maker2, Pasa, Ensembl, NCBI 
 +  * Evidence Mapping: BLAST/BLAT, Exonerate (computationally expensive) 
 +  * ab initio gene predictors: Augustus, SNAP, GeneMark 
 +  * Choosers and Combiners: JigSaw, Glean 
 +  * Visualization & Curation: Artemis, Apollo, JBROWSE, IGV
lecture_notes/05-27-2015.txt · Last modified: 2015/05/28 12:48 by emfeal