This shows you the differences between two versions of the page.
| Both sides previous revision Previous revision Next revision | Previous revision | ||
|
lecture_notes:05-27-2015 [2015/05/28 12:23] emfeal |
lecture_notes:05-27-2015 [2015/05/28 12:48] (current) emfeal |
||
|---|---|---|---|
| Line 1: | Line 1: | ||
| ====== Genome Annotation ====== | ====== Genome Annotation ====== | ||
| - | ===== Repeat Annotation ===== | + | ===== Repeats ===== |
| - | ==== Masking ==== | + | Masking: |
| * Done to facilitate conventional gene annotation efforts. | * Done to facilitate conventional gene annotation efforts. | ||
| * Helps avoid false SNP calls and mapping ambiguities. | * Helps avoid false SNP calls and mapping ambiguities. | ||
| * Hard Masking: replacing repeats with Ns {ACGTNNNNNNNNNATGG} | * Hard Masking: replacing repeats with Ns {ACGTNNNNNNNNNATGG} | ||
| * Soft Masking: replacing repeats with lowercase {ACGTtagtagtagATGG} | * Soft Masking: replacing repeats with lowercase {ACGTtagtagtagATGG} | ||
| - | ==== Repeat Annotation ==== | + | Repeat Annotation: |
| * Different types of repeats can be studied along with their levels of activity (evolutionary analyses) | * Different types of repeats can be studied along with their levels of activity (evolutionary analyses) | ||
| - | === Types of Repeats === | + | Types of Repeats: |
| * low-complexity sequence: microsatellites, homopolymers, etc. | * low-complexity sequence: microsatellites, homopolymers, etc. | ||
| - | == Transposable Elements == | + | * Transposable Elements: |
| - | * class 1: retrotransposon; "copy & paste"; LTR, LINES, SINES | + | * * class 1: retrotransposon; "copy & paste"; LTR, LINES, SINES |
| - | * class 2: DNA transposons; "cut & paste"; subclass 1 and subclass 2 | + | * * class 2: DNA transposons; "cut & paste"; subclass 1 and subclass 2 |
| - | === Repeat Content === | + | Repeat Content |
| * Does not necessarily correlate with genome size | * Does not necessarily correlate with genome size | ||
| * some correlation within the same group | * some correlation within the same group | ||
| - | === Tools === | + | Tools |
| * Homology: RepeatMasker | * Homology: RepeatMasker | ||
| * denovo: RepeatModeler, WindowMasker, RepeatScout, Piler | * denovo: RepeatModeler, WindowMasker, RepeatScout, Piler | ||
| Line 22: | Line 22: | ||
| * NOTE: denovo tools run risk of false positives from highly conserved protein-coding genes. | * NOTE: denovo tools run risk of false positives from highly conserved protein-coding genes. | ||
| ===== Gene Annotation ===== | ===== Gene Annotation ===== | ||
| - | ==== Evidence-driven Annotation ==== | + | Evidence-driven Annotation |
| * protein information, EST, **RNA-Seq** | * protein information, EST, **RNA-Seq** | ||
| - | ==== Ab initio Gene Prediction ==== | + | Ab initio Gene Prediction |
| * doesn't require evidence data | * doesn't require evidence data | ||
| * requires training for organism of interest | * requires training for organism of interest | ||
| Line 31: | Line 31: | ||
| * does not accommodate spliceoforms | * does not accommodate spliceoforms | ||
| * requires high-quality assembly (scaffold N50 ≈ avg gene size) | * requires high-quality assembly (scaffold N50 ≈ avg gene size) | ||
| - | ==== Combined Approach ==== | + | Combined Approach |
| * challenge of collating different models and sources of evidence. | * challenge of collating different models and sources of evidence. | ||
| - | ==== Annotation Metrics ==== | + | Annotation Metrics |
| * Sensitivity, specificity, accuracy, AED | * Sensitivity, specificity, accuracy, AED | ||
| * AED = 1 - ACC = 1 - .5(Sensitivity+specificity) | * AED = 1 - ACC = 1 - .5(Sensitivity+specificity) | ||
| * AED useful for identifying low quality inconsistent annotations (can be manually curated later) | * AED useful for identifying low quality inconsistent annotations (can be manually curated later) | ||
| - | + | Tools | |
| - | + | * Pipelines: Maker2, Pasa, Ensembl, NCBI | |
| - | + | * Evidence Mapping: BLAST/BLAT, Exonerate (computationally expensive) | |
| + | * ab initio gene predictors: Augustus, SNAP, GeneMark | ||
| + | * Choosers and Combiners: JigSaw, Glean | ||
| + | * Visualization & Curation: Artemis, Apollo, JBROWSE, IGV | ||