User Tools

Site Tools


This is an old revision of the document!

Genome Annotation

Repeat Annotation


  • Done to facilitate conventional gene annotation efforts.
  • Helps avoid false SNP calls and mapping ambiguities.
  • Hard Masking: replacing repeats with Ns {ACGTNNNNNNNNNATGG}
  • Soft Masking: replacing repeats with lowercase {ACGTtagtagtagATGG}

Repeat Annotation

  • Different types of repeats can be studied along with their levels of activity (evolutionary analyses)

Types of Repeats

  • low-complexity sequence: microsatellites, homopolymers, etc.
Transposable Elements
  • class 1: retrotransposon; “copy & paste”; LTR, LINES, SINES
  • class 2: DNA transposons; “cut & paste”; subclass 1 and subclass 2

Repeat Content

  • Does not necessarily correlate with genome size
  • some correlation within the same group


  • Homology: RepeatMasker
  • denovo: RepeatModeler, WindowMasker, RepeatScout, Piler
  • denovo from reads: REPdenovo, TEDNA
  • NOTE: denovo tools run risk of false positives from highly conserved protein-coding genes.

Gene Annotation

Evidence-driven Annotation

  • protein information, EST, RNA-Seq

Ab initio Gene Prediction

  • doesn't require evidence data
  • requires training for organism of interest
  • most find single most likely CDS
  • do not report UTR's (incomplete gene model)
  • does not accommodate spliceoforms
  • requires high-quality assembly (scaffold N50 ≈ avg gene size)

Combined Approach

  • challenge of collating different models and sources of evidence.

Annotation Metrics

  • Sensitivity, specificity, accuracy, AED
  • AED = 1 - ACC = 1 - .5(Sensitivity+specificity)
  • AED useful for identifying low quality inconsistent annotations (can be manually curated later)
You could leave a comment if you were logged in.
lecture_notes/05-27-2015.1432816024.txt.gz · Last modified: 2015/05/28 05:27 by emfeal