User Tools

Site Tools


computer_resources:assemblies:mitochondrion

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
computer_resources:assemblies:mitochondrion [2011/07/09 14:13]
karplus [Reads] updated estimates of coverage and genome length, adding coverage estimate from 5000 bases in pileup
computer_resources:assemblies:mitochondrion [2015/07/16 18:33]
ceisenhart Deleting this page
Line 1: Line 1:
-====== Mitochondrion ====== 
  
 +
 +===== Mitochondrial sequence =====
 +
 +The mitochondrion was re assembled in 2015, follow the new assembly here, [[::​mitochondrion_2015 | 2015 Mitochondrion assembly ]].  This page is for the 2012 mitochondrial assembly. ​
 +
 +The first draft sequence is available as {{mitochondrion-draft1.fasta.gz|draft1 gzipped fasta file}}. ​ This corresponds to /​campusdata/​BME235/​assemblies/​slug/​barcode-of-life/​map-Illumina-raw-45/​consensus-6 on campusrocks. ​ It has 23,642 bases.
 +
 +
 +====== Mitochondrion ======
 The mitochondrion was assembled by Kevin Karplus in the assemblies/​slug/​barcode-of-life/​ directory. The reason for the strange name for the directory was that at first the attempt was just to recover the COX1 gene that is used for the [[http://​www.boldsystems.org/​|BOLD (barcode of life database)]] project to characterize eukaryotes by their mitochondrial sequences. ​ When it became clear that the whole mitochondrial genome was well covered in the Illumina data, the project switched to trying to reconstruct the full mitochondrial genome. The mitochondrion was assembled by Kevin Karplus in the assemblies/​slug/​barcode-of-life/​ directory. The reason for the strange name for the directory was that at first the attempt was just to recover the COX1 gene that is used for the [[http://​www.boldsystems.org/​|BOLD (barcode of life database)]] project to characterize eukaryotes by their mitochondrial sequences. ​ When it became clear that the whole mitochondrial genome was well covered in the Illumina data, the project switched to trying to reconstruct the full mitochondrial genome.
  
Line 237: Line 245:
 R13: .        .   ​. ​             .       ​. ​            ​. ​ t  .   ​. ​         .                      g   ​t ​   .      g     ​. ​                     ...              .a               ​. ​             R13: .        .   ​. ​             .       ​. ​            ​. ​ t  .   ​. ​         .                      g   ​t ​   .      g     ​. ​                     ...              .a               ​. ​            
 R13: GTTATTATTGAAGTTTATTAACGTAAAGCTGTAACTTTAAAAATATCTCTTTATTATAATAATGTTTAATACATTTATCTTATTAATCTTTATTGTACTATTTGATAATAGTATATATCTAGTAGCTTACCTTTTTGCTGTATAGTATTATTACTATATATAATATATTATTTCCATTATATATA  ​ R13: GTTATTATTGAAGTTTATTAACGTAAAGCTGTAACTTTAAAAATATCTCTTTATTATAATAATGTTTAATACATTTATCTTATTAATCTTTATTGTACTATTTGATAATAGTATATATCTAGTAGCTTACCTTTTTGCTGTATAGTATTATTACTATATATAATATATTATTTCCATTATATATA  ​
-R14: .        .                ​. ​      ​. ​            ​. ​ .  .   ​-a ​       t                      .   ​. ​   .      .     ​. ​                     ...              .a               ​. ​             ​+R14: .        .                ​. ​      ​. ​            ​. ​ .  .   ​-a ​       t                      .   ​. ​   .      .     ​. ​                     ...              .a               ​. ​             ​
 R14: GTTATTATTGAAGGTTATTAACGTAAAGCTGTAACTTTAAAAATATCTCTTTACTATAATATGTTTAATATATTTATCTTATTAGTCTTTATTATACCATTTGATAATAATATATATCTAGTAGCTTACCTTTTTGCTGTATAGTATTATTACTATATATAATATATTATTTCCATTATATATA ​   R14: GTTATTATTGAAGGTTATTAACGTAAAGCTGTAACTTTAAAAATATCTCTTTACTATAATATGTTTAATATATTTATCTTATTAGTCTTTATTATACCATTTGATAATAATATATATCTAGTAGCTTACCTTTTTGCTGTATAGTATTATTACTATATATAATATATTATTTCCATTATATATA ​  
 R15: .        .   ​. ​             .       ​. ​            ​. ​ .  .   ​. ​         t                      .   ​. ​   .      .     ​. ​                     ...              .a               ​. ​             R15: .        .   ​. ​             .       ​. ​            ​. ​ .  .   ​. ​         t                      .   ​. ​   .      .     ​. ​                     ...              .a               ​. ​            
Line 254: Line 262:
 R99: GTTATTATTGAAGTTTATTAACGTAAAGCTGTAACTTTAAAAATATCTCTTTACTATAATAATGTTTAATACATTTATCTTATTAGTCTTTATTATACCATTTGATAATAATATATATCTAGTAGCTTACCTTTTTGCTGTATAGTATTATTACTATCTATAATATATTATTTCCATTATATATA R99: GTTATTATTGAAGTTTATTAACGTAAAGCTGTAACTTTAAAAATATCTCTTTACTATAATAATGTTTAATACATTTATCTTATTAGTCTTTATTATACCATTTGATAATAATATATATCTAGTAGCTTACCTTTTTGCTGTATAGTATTATTACTATCTATAATATATTATTTCCATTATATATA
 </​code>​ </​code>​
 +
 +Note: I fixed the R14 annotation—it is actually identical to R04.
  
 When I put these blocks into the genome, look-for-exit found no variants needed other than a C->T SNP a little earlier than the short repeats that had 7 reads supporting it.  As this was less than 1.5% of the reads for that location, it is quite likely a sequencing error rather than a true variant. When I put these blocks into the genome, look-for-exit found no variants needed other than a C->T SNP a little earlier than the short repeats that had 7 reads supporting it.  As this was less than 1.5% of the reads for that location, it is quite likely a sequencing error rather than a true variant.
Line 356: Line 366:
 </​code>​ </​code>​
  
-===== Mitochondrial sequence ===== 
  
-The first draft sequence is available as {{mitochondrion-draft1.fasta.gz|gzipped fasta file}}. ​ This corresponds to /​campusdata/​BME235/​assemblies/​slug/​barcode-of-life/​map-Illumina-raw-45/​consensus-6 on campusrocks. 
  
 +The second draft sequence (with the short repeats expanded and the repeats ordered as best I can from the short-insert reads) is available as {{mitochondrion-draft2.fasta.gz|draft2 gzipped fasta file}}. ​ This corresponds to /​campusdata/​BME235/​assemblies/​slug/​barcode-of-life/​map-Illumina-raw-47/​draft on campusrocks. ​ It has 36363 bases.
 +
 +
 +**I no longer believe that this sequence is correct. ​ I'm now fairly sure that the region that I assembled as repeats should actually have been assembled as variants from a heterogeneous population of mitochondria. ​ We'll need long reads or PCR products to tell for sure. **
 ===== Annotation ===== ===== Annotation =====
  
Line 381: Line 393:
  
 It would be useful to find some spots that are highly conserved between Biomphalaria and Ariolimax, to place primers for extracting mitochondrial DNA by long-range PCR.  That would make a population study of various Ariolimax species by sequencing their mitochondria quite tractable. It would be useful to find some spots that are highly conserved between Biomphalaria and Ariolimax, to place primers for extracting mitochondrial DNA by long-range PCR.  That would make a population study of various Ariolimax species by sequencing their mitochondria quite tractable.
 +
 +==== Protein genes ====
 +
 +I can find 12 of the 13 standard mitochondrial protein genes using Blast to labeled mitochondria,​ and [[http://​www.ncbi.nlm.nih.gov/​projects/​gorf/​orfig.cgi|ORFfinder]] finds ORFs of about the right length for each.  The missing gene (ATP8) is often missing in mollusks. ​ There is some concern about whether the ORFs found by ORFfinder are correct, as some mollusks have RNA processing in their mitochondrial genes to insert the stop codon.
 +
 +I'll annotate the ORFs as best I can (assuming no post-transcriptional mRNA processing) once the repeats have been resolved.
 +
 +==== Ribosomal RNA genes ====
 +
 +I found both the small subunit rRNA gene and the large subunit rRNA gene using BLAST to a few annotated mollusk mitochondria. I'm not sure how one gets precise ends for these RNA genes, without wet-lab work to get cDNA.
 +
 +==== tRNA genes ====
 +
 +I looked for tRNA genes with [[http://​lowelab.ucsc.edu/​tRNAscan-SE/​|tRNAscan-SE]] run locally, and only found 7, but tRNAscan is known to have problems with mitochondial tRNAs. ​ I'll work with Todd Lowe later this summer to try to improve the tRNAscan covariance models to handle them better.
 +
 +The 7 easy-to-find tRNA genes are Asp-GTC, SeC-TCA, Val-TAC, Pro-TGG, Ala-TGC, Thr-TGT, Met-CAT. ​ These are not correctly labeled for the mitochondrial genetic code:  TCA is codon TGA which is Trp in code 5, though the others are ok.
 +
 +I expect to find many copies of tRNA genes, as the repeat regions are in a tRNA-rich area of homologous mitochondria.