====== Genome Annotation ======
===== Repeats =====
Masking:
  * Done to facilitate conventional gene annotation efforts.
  * Helps avoid false SNP calls and mapping ambiguities.
  * Hard Masking: replacing repeats with Ns {ACGTNNNNNNNNNATGG}
  * Soft Masking: replacing repeats with lowercase {ACGTtagtagtagATGG}
Repeat Annotation:
  * Different types of repeats can be studied along with their levels of activity (evolutionary analyses)
Types of Repeats:
  * low-complexity sequence: microsatellites, homopolymers, etc.
  * Transposable Elements:
  *   * class 1: retrotransposon; "copy & paste"; LTR, LINES, SINES
  *   * class 2: DNA transposons; "cut & paste"; subclass 1 and subclass 2
Repeat Content
  * Does not necessarily correlate with genome size
  * some correlation within the same group
Tools
  * Homology: RepeatMasker
  * denovo: RepeatModeler, WindowMasker, RepeatScout, Piler
  * denovo from reads: REPdenovo, TEDNA
  * NOTE: denovo tools run risk of false positives from highly conserved protein-coding genes.
===== Gene Annotation =====
Evidence-driven Annotation
  * protein information, EST, **RNA-Seq**
Ab initio Gene Prediction
  * doesn't require evidence data
  * requires training for organism of interest
  * most find single most likely CDS
  * do not report UTR's (incomplete gene model)
  * does not accommodate spliceoforms
  * requires high-quality assembly (scaffold N50 ≈ avg gene size)
Combined Approach
  * challenge of collating different models and sources of evidence.
Annotation Metrics
  * Sensitivity, specificity, accuracy, AED
  * AED = 1 - ACC = 1 - .5(Sensitivity+specificity)
  * AED useful for identifying low quality inconsistent annotations (can be manually curated later)
Tools
  * Pipelines: Maker2, Pasa, Ensembl, NCBI
  * Evidence Mapping: BLAST/BLAT, Exonerate (computationally expensive)
  * ab initio gene predictors: Augustus, SNAP, GeneMark
  * Choosers and Combiners: JigSaw, Glean
  * Visualization & Curation: Artemis, Apollo, JBROWSE, IGV