This shows you the differences between two versions of the page.
| Next revision | Previous revision | ||
|
lecture_notes:05-27-2015 [2015/05/28 11:56] emfeal created |
lecture_notes:05-27-2015 [2015/05/28 12:48] (current) emfeal |
||
|---|---|---|---|
| Line 1: | Line 1: | ||
| ====== Genome Annotation ====== | ====== Genome Annotation ====== | ||
| - | ===== Repeat Annotation ===== | + | ===== Repeats ===== |
| - | + | Masking: | |
| + | * Done to facilitate conventional gene annotation efforts. | ||
| + | * Helps avoid false SNP calls and mapping ambiguities. | ||
| + | * Hard Masking: replacing repeats with Ns {ACGTNNNNNNNNNATGG} | ||
| + | * Soft Masking: replacing repeats with lowercase {ACGTtagtagtagATGG} | ||
| + | Repeat Annotation: | ||
| + | * Different types of repeats can be studied along with their levels of activity (evolutionary analyses) | ||
| + | Types of Repeats: | ||
| + | * low-complexity sequence: microsatellites, homopolymers, etc. | ||
| + | * Transposable Elements: | ||
| + | * * class 1: retrotransposon; "copy & paste"; LTR, LINES, SINES | ||
| + | * * class 2: DNA transposons; "cut & paste"; subclass 1 and subclass 2 | ||
| + | Repeat Content | ||
| + | * Does not necessarily correlate with genome size | ||
| + | * some correlation within the same group | ||
| + | Tools | ||
| + | * Homology: RepeatMasker | ||
| + | * denovo: RepeatModeler, WindowMasker, RepeatScout, Piler | ||
| + | * denovo from reads: REPdenovo, TEDNA | ||
| + | * NOTE: denovo tools run risk of false positives from highly conserved protein-coding genes. | ||
| + | ===== Gene Annotation ===== | ||
| + | Evidence-driven Annotation | ||
| + | * protein information, EST, **RNA-Seq** | ||
| + | Ab initio Gene Prediction | ||
| + | * doesn't require evidence data | ||
| + | * requires training for organism of interest | ||
| + | * most find single most likely CDS | ||
| + | * do not report UTR's (incomplete gene model) | ||
| + | * does not accommodate spliceoforms | ||
| + | * requires high-quality assembly (scaffold N50 ≈ avg gene size) | ||
| + | Combined Approach | ||
| + | * challenge of collating different models and sources of evidence. | ||
| + | Annotation Metrics | ||
| + | * Sensitivity, specificity, accuracy, AED | ||
| + | * AED = 1 - ACC = 1 - .5(Sensitivity+specificity) | ||
| + | * AED useful for identifying low quality inconsistent annotations (can be manually curated later) | ||
| + | Tools | ||
| + | * Pipelines: Maker2, Pasa, Ensembl, NCBI | ||
| + | * Evidence Mapping: BLAST/BLAT, Exonerate (computationally expensive) | ||
| + | * ab initio gene predictors: Augustus, SNAP, GeneMark | ||
| + | * Choosers and Combiners: JigSaw, Glean | ||
| + | * Visualization & Curation: Artemis, Apollo, JBROWSE, IGV | ||