This shows you the differences between two versions of the page.
Both sides previous revision Previous revision | |||
lecture_notes:05-27-2015 [2015/05/28 12:27] emfeal |
lecture_notes:05-27-2015 [2015/05/28 12:48] (current) emfeal |
||
---|---|---|---|
Line 1: | Line 1: | ||
====== Genome Annotation ====== | ====== Genome Annotation ====== | ||
- | ===== Repeat Annotation ===== | + | ===== Repeats ===== |
- | ==== Masking ==== | + | Masking: |
* Done to facilitate conventional gene annotation efforts. | * Done to facilitate conventional gene annotation efforts. | ||
* Helps avoid false SNP calls and mapping ambiguities. | * Helps avoid false SNP calls and mapping ambiguities. | ||
* Hard Masking: replacing repeats with Ns {ACGTNNNNNNNNNATGG} | * Hard Masking: replacing repeats with Ns {ACGTNNNNNNNNNATGG} | ||
* Soft Masking: replacing repeats with lowercase {ACGTtagtagtagATGG} | * Soft Masking: replacing repeats with lowercase {ACGTtagtagtagATGG} | ||
- | ==== Repeat Annotation ==== | + | Repeat Annotation: |
* Different types of repeats can be studied along with their levels of activity (evolutionary analyses) | * Different types of repeats can be studied along with their levels of activity (evolutionary analyses) | ||
- | === Types of Repeats === | + | Types of Repeats: |
* low-complexity sequence: microsatellites, homopolymers, etc. | * low-complexity sequence: microsatellites, homopolymers, etc. | ||
- | == Transposable Elements == | + | * Transposable Elements: |
- | * class 1: retrotransposon; "copy & paste"; LTR, LINES, SINES | + | * * class 1: retrotransposon; "copy & paste"; LTR, LINES, SINES |
- | * class 2: DNA transposons; "cut & paste"; subclass 1 and subclass 2 | + | * * class 2: DNA transposons; "cut & paste"; subclass 1 and subclass 2 |
- | === Repeat Content === | + | Repeat Content |
* Does not necessarily correlate with genome size | * Does not necessarily correlate with genome size | ||
* some correlation within the same group | * some correlation within the same group | ||
- | === Tools === | + | Tools |
* Homology: RepeatMasker | * Homology: RepeatMasker | ||
* denovo: RepeatModeler, WindowMasker, RepeatScout, Piler | * denovo: RepeatModeler, WindowMasker, RepeatScout, Piler | ||
Line 22: | Line 22: | ||
* NOTE: denovo tools run risk of false positives from highly conserved protein-coding genes. | * NOTE: denovo tools run risk of false positives from highly conserved protein-coding genes. | ||
===== Gene Annotation ===== | ===== Gene Annotation ===== | ||
- | ==== Evidence-driven Annotation ==== | + | Evidence-driven Annotation |
* protein information, EST, **RNA-Seq** | * protein information, EST, **RNA-Seq** | ||
- | ==== Ab initio Gene Prediction ==== | + | Ab initio Gene Prediction |
* doesn't require evidence data | * doesn't require evidence data | ||
* requires training for organism of interest | * requires training for organism of interest | ||
Line 31: | Line 31: | ||
* does not accommodate spliceoforms | * does not accommodate spliceoforms | ||
* requires high-quality assembly (scaffold N50 ≈ avg gene size) | * requires high-quality assembly (scaffold N50 ≈ avg gene size) | ||
- | ==== Combined Approach ==== | + | Combined Approach |
* challenge of collating different models and sources of evidence. | * challenge of collating different models and sources of evidence. | ||
- | ==== Annotation Metrics ==== | + | Annotation Metrics |
* Sensitivity, specificity, accuracy, AED | * Sensitivity, specificity, accuracy, AED | ||
* AED = 1 - ACC = 1 - .5(Sensitivity+specificity) | * AED = 1 - ACC = 1 - .5(Sensitivity+specificity) | ||
* AED useful for identifying low quality inconsistent annotations (can be manually curated later) | * AED useful for identifying low quality inconsistent annotations (can be manually curated later) | ||
- | + | Tools | |
- | + | * Pipelines: Maker2, Pasa, Ensembl, NCBI | |
- | + | * Evidence Mapping: BLAST/BLAT, Exonerate (computationally expensive) | |
+ | * ab initio gene predictors: Augustus, SNAP, GeneMark | ||
+ | * Choosers and Combiners: JigSaw, Glean | ||
+ | * Visualization & Curation: Artemis, Apollo, JBROWSE, IGV |