User Tools

Site Tools


lecture_notes:05-27-2015

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
Last revision Both sides next revision
lecture_notes:05-27-2015 [2015/05/28 11:56]
emfeal created
lecture_notes:05-27-2015 [2015/05/28 12:27]
emfeal
Line 1: Line 1:
 ====== Genome Annotation ====== ====== Genome Annotation ======
 ===== Repeat Annotation ===== ===== Repeat Annotation =====
 +==== Masking ====
 +  * Done to facilitate conventional gene annotation efforts.
 +  * Helps avoid false SNP calls and mapping ambiguities.
 +  * Hard Masking: replacing repeats with Ns {ACGTNNNNNNNNNATGG}
 +  * Soft Masking: replacing repeats with lowercase {ACGTtagtagtagATGG}
 +==== Repeat Annotation ====
 +  * Different types of repeats can be studied along with their levels of activity (evolutionary analyses)
 + === Types of Repeats ===
 +  * low-complexity sequence: microsatellites,​ homopolymers,​ etc.
 +  == Transposable Elements ==
 +  * class 1: retrotransposon;​ "copy & paste";​ LTR, LINES, SINES
 +  * class 2: DNA transposons;​ "cut & paste";​ subclass 1 and subclass 2
 + === Repeat Content ===
 +  * Does not necessarily correlate with genome size
 +  * some correlation within the same group
 + === Tools ===
 +  * Homology: RepeatMasker
 +  * denovo: RepeatModeler,​ WindowMasker,​ RepeatScout,​ Piler
 +  * denovo from reads: REPdenovo, TEDNA
 +  * NOTE: denovo tools run risk of false positives from highly conserved protein-coding genes.
 +===== Gene Annotation =====
 +==== Evidence-driven Annotation ====
 +  * protein information,​ EST, **RNA-Seq**
 +==== Ab initio Gene Prediction ====
 +  * doesn'​t require evidence data
 +  * requires training for organism of interest
 +  * most find single most likely CDS
 +  * do not report UTR's (incomplete gene model)
 +  * does not accommodate spliceoforms
 +  * requires high-quality assembly (scaffold N50 ≈ avg gene size)
 +==== Combined Approach ====
 +  * challenge of collating different models and sources of evidence.
 +==== Annotation Metrics ====
 +  * Sensitivity,​ specificity,​ accuracy, AED
 +  * AED = 1 - ACC = 1 - .5(Sensitivity+specificity)
 +  * AED useful for identifying low quality inconsistent annotations (can be manually curated later)
 +
 +
  
  
lecture_notes/05-27-2015.txt · Last modified: 2015/05/28 12:48 by emfeal