User Tools

Site Tools


assemblies:2015:mitochondrion_assembly

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
assemblies:2015:mitochondrion_assembly [2015/07/17 19:23]
ndudek
assemblies:2015:mitochondrion_assembly [2015/09/16 01:30] (current)
ndudek [PCR results]
Line 20: Line 20:
  
 ====Assembly==== ====Assembly====
 +
 +May 2015
  
 The first step taken was to map reads from the HiSeq SW018 and SW019 reads against the 2012 draft genome. Reads were then assembled using Discovar de novo, first using only the SW018 reads (assembly mitochondrion_SW018_discovar) and then using reads from both SW018 and SW019 (assembly mitochondrion_SW018-9_discovar). This second assembly produced two long contigs that appear to be the majority of the mitochondrion genome since the length of both summed is close to the expected genome size. The first step taken was to map reads from the HiSeq SW018 and SW019 reads against the 2012 draft genome. Reads were then assembled using Discovar de novo, first using only the SW018 reads (assembly mitochondrion_SW018_discovar) and then using reads from both SW018 and SW019 (assembly mitochondrion_SW018-9_discovar). This second assembly produced two long contigs that appear to be the majority of the mitochondrion genome since the length of both summed is close to the expected genome size.
Line 31: Line 33:
 |mitochondrion_iteration2_SW018-9_discovar| 48K | 45,494 | 114 | 9,030 | 9,030 | 15,548 | 0 | 436X | |mitochondrion_iteration2_SW018-9_discovar| 48K | 45,494 | 114 | 9,030 | 9,030 | 15,548 | 0 | 436X |
  
-For comparison, the mitochondrion size of the closest related molluscs that have mitochondrion assemblies are 14,100bp (grove snail - //Cepaea nemoralis//​) and 14,130bp (land snail - //Albinaria coerulea//)+For comparison, the mitochondrion size of the closest related molluscs that have mitochondrion assemblies are 14,100bp (grove snail - //Cepaea nemoralis//​) and 14,130bp (land snail - //Albinaria coerulea//). But there are mollusk mitochondrial genomes that are much larger: sea scallop //​Plactopecten magellanicus//​ is reported to have 30.6-30.7kbp ([[http://​link.springer.com/​article/​10.1007/​s00239-007-9016-x#​page-1|David R. Smith and Marlene Snyder. Complete Mitochondrial DNA Sequence of the Scallop Placopecten magellanicus:​ Evidence of Transposition Leading to an Uncharacteristically Large Mitochondrial Genome. J Mol Evol (2007) 65:​380–391 doi:​10.1007/​s00239-007-9016-x]]). 
  
 It looks as though the majority of the mitochondrion is in two contigs in the assembly using both the SW018 and SW019 assemblies. One of these contigs is 12,884bp and the other (which is the second largest contig) is 3,​425bp. ​ It looks as though the majority of the mitochondrion is in two contigs in the assembly using both the SW018 and SW019 assemblies. One of these contigs is 12,884bp and the other (which is the second largest contig) is 3,​425bp. ​
  
-When these two contigs ​are blasted against the consensus sequence from the 2012 assembly you can see that some of the repeat regions present in the 2012 assembly were merged in the 2015 assembly. In general, there is a pretty good agreement between the two assemblies. In the dot plot below, the 2015 assembly is on the x-axis and the 2012 assembly is on the y-axis. ​+When these the largest of these contigs ​is blasted against the consensus sequence from the 2012 assembly you can see that some of the repeat regions present in the 2012 assembly were merged in the 2015 assembly. In general, there is a pretty good agreement between the two assemblies. In the dot plot below, the 2015 assembly is on the x-axis and the 2012 assembly is on the y-axis. ​
  
 {{:​2012_vs_2015_assemblies.png?​200|}} {{:​2012_vs_2015_assemblies.png?​200|}}
Line 42: Line 45:
  
 ====The COX1 gene sequence==== ====The COX1 gene sequence====
 +
 +May 2015
  
 The COX1 gene sequence (used for barcoding) was extracted from the contigs by blasting contigs against the nr/nt database and looking for a contig with hits to other COX1 genes (there are only two long contigs in the first iteration assembly). This contig was annotated using [[http://​dogma.ccbb.utexas.edu/​|DOGMA]] and the exact COX1 gene sequence was extracted and can be seen below: The COX1 gene sequence (used for barcoding) was extracted from the contigs by blasting contigs against the nr/nt database and looking for a contig with hits to other COX1 genes (there are only two long contigs in the first iteration assembly). This contig was annotated using [[http://​dogma.ccbb.utexas.edu/​|DOGMA]] and the exact COX1 gene sequence was extracted and can be seen below:
Line 53: Line 58:
  
 ====Scaffolding with mate pairs to try and close gaps==== ====Scaffolding with mate pairs to try and close gaps====
 +
 +June 2015
  
 In an attempt to close gaps between scaffolds, the reads from the SW041, SW042, and lucigen mate pair libraries were mapped against the assembly. This was done using bwa samse for each the forward and reverse reads for each library, after which the resulting sam file was visualized using Tablet. Tablet allows the user to see which reads mapped to which scaffolds (and where). For each sam file/mate pair library, the following characteristics were recorded for each read that mapped to the mitochondrion assembly: 1) the name of the read, 2) what scaffold it mapped to, 3) its orientation on the scaffold. This was done for both the forward and reverse reads, and then any pairs where both reads mapped were noted. The hope was that there would be a pair where each read mapped to a different scaffold, but that was not seen. In an attempt to close gaps between scaffolds, the reads from the SW041, SW042, and lucigen mate pair libraries were mapped against the assembly. This was done using bwa samse for each the forward and reverse reads for each library, after which the resulting sam file was visualized using Tablet. Tablet allows the user to see which reads mapped to which scaffolds (and where). For each sam file/mate pair library, the following characteristics were recorded for each read that mapped to the mitochondrion assembly: 1) the name of the read, 2) what scaffold it mapped to, 3) its orientation on the scaffold. This was done for both the forward and reverse reads, and then any pairs where both reads mapped were noted. The hope was that there would be a pair where each read mapped to a different scaffold, but that was not seen.
Line 95: Line 102:
  
 ====Primer design for PCR to close gaps==== ====Primer design for PCR to close gaps====
 +
 +June 2015
  
 Primers were designed to amplify the regions between the two largest scaffolds. These two scaffolds likely contain the majority of the mitochondrion genome (explanation above). ​ Primers were designed to amplify the regions between the two largest scaffolds. These two scaffolds likely contain the majority of the mitochondrion genome (explanation above). ​
Line 136: Line 145:
 </​code>​ </​code>​
  
-The sequence of the two largest sequences can be seen {{:primer_design_locations.pdf|here}},​ with the primer sequences highlighted in red. +The sequence of the two largest sequences can be seen {{:assemblies:​2015:​pcr_primers.pdf|here}},​ with the primer sequences highlighted in red. 
  
 ====PCR results==== ====PCR results====
 +
 +July 20
  
 Originally we thought that the mitochondrion was in two contigs - one at 12,844bp and one at 3,425bp. Blasting these two contigs against one another results in no significant similarity being found. The expected mitochondrion size is ~14,000bp, based on the size of other mollusc mitochondria (all are within a fairly small range around 14,000bp) Originally we thought that the mitochondrion was in two contigs - one at 12,844bp and one at 3,425bp. Blasting these two contigs against one another results in no significant similarity being found. The expected mitochondrion size is ~14,000bp, based on the size of other mollusc mitochondria (all are within a fairly small range around 14,000bp)
Line 144: Line 155:
 Running the PCR was challenging because there is extremely little DNA left from the banana slug specimen. Steven found some very small remnants in a leftover tube and ran the PCRs. Here are the results: Running the PCR was challenging because there is extremely little DNA left from the banana slug specimen. Steven found some very small remnants in a leftover tube and ran the PCRs. Here are the results:
  
-{{:​assemblies:​2015:​mitochondrino_pcr.jpg?200|}}+{{:​assemblies:​2015:​mitochondrion_pcr.jpg?200|}}
  
 Unfortunately we were expecting a smaller product, and so the ladder used does not extend to large enough fragments. However, using a 10kb 2 log ladder we were able to estimate the size of the PCR products. Unfortunately we were expecting a smaller product, and so the ladder used does not extend to large enough fragments. However, using a 10kb 2 log ladder we were able to estimate the size of the PCR products.
Line 186: Line 197:
 {{:​assemblies:​2015:​2nd_longest_contig_pcr.png?​200|}} {{:​assemblies:​2015:​2nd_longest_contig_pcr.png?​200|}}
  
-These results suggest that both contigs would be "​circularized"​ by the PCR products that were amplified. This is somewhat surprising for the second, shorter contig. ​However, I no longer think this second contig is a part of the mitochondrion genome (I think it must be a nump). Here is why: +These results suggest that both contigs would be "​circularized"​ by the PCR products that were amplified. This is somewhat surprising for the second, shorter contig. ​
  
-- If both contigs are a part of the mitochonrial genome, the genome size is over 16,309bp, which is far longer than other assembled mollusc mitochondrial genomes.+July 22
  
-Blasting ​the longest ​contig against the complete ​//Albinaria caerulea// genome ​yields ​the following:+There are multiple explanations for why both fragments were circularized during the PCR. 
 + 
 +  ​Some kind of technical error or PCR artifact. 
 +  - The smaller contig may be a nump that, for some reason, circularized. Supporting this theory (possibly) is     that if you blast the larger ​contig against the //Albinaria caerulea// genome, it looks like we have nearly ​the entire mitochondrial genome, expect for an ~800bp region (see dot plot below). This is only slightly shorter than the fragment amplified with the PCR above. Given that molluscs seem to have fairly conserved mitochondrial genome sizes of ~14,000bp, if both fragments are truly mitochondrial it means the total length must be over 16,309bp, which is unexpectedly large and may be incorrect. On the other hand, if the second largest contig is a nump, you would likely expect it to have some sequence similarity to the largest contig, but blasting them against each other finds no significant sequence similarity. Additionally,​ I used [[http://​dogma.ccbb.utexas.edu/​|DOGMA]] to do a preliminary annotation of both contigs, and it predicted that the largest carries the cox1 and cox 3 genes, whereas the second largest contig carries the cox2 and cob genes, so they appear to be complimentary. 
 +  - The banana slug could possibly have a mitochondrial genome that is present as separate mini circular chromosomes,​ as seen in the human body louse, //Pediculus humanus// (see [[http://​genome.cshlp.org/​content/​19/​5/​904.short|"​The single mitochondrial chromosome typical of animals has evolved into 18 mini chromosomes in the human body louse, Pediculus humanus",​ by Shao et al., doi:​10.1101/​gr.083188.108 Genome Research 2009. 19: 904-912]]). This could explain why both contigs circularized with the PCR that was run. However, it is probably too early to say whether this circularization is really "​valid"​ or not. This is something that will be examined in greater detail in the future. Note: [[http://​genome.cshlp.org/​content/​19/​5/​700.full|‘Why genomes in pieces?’ revisited: Sucking lice do their own thing in mtDNA circle game. David M. Rand,  
 +Genome Research. 2009. 19: 700-702. doi:​10.1101/​gr.091132.109]] lists other animals with multi-chromosome mitochondria,​ including a mollusk (scallop).
  
 {{:​assemblies:​2015:​longest_vs_a_caerulea.png?​200|}} {{:​assemblies:​2015:​longest_vs_a_caerulea.png?​200|}}
  
-It looks like we have nearly ​the entire mitochondrial genomeexpect for an ~800bp region, which is only slightly shorter than the fragment amplified with the PCR above+August 2015 
 + 
 +PCR products were sent for Sanger sequencing. Something went wrong - the result was largely "​N"​s. I checked whether I could improve calls by looking at the chromatogrambut did not have significant success. Below is an example of what the chromatogram looked like. I will be re-sending ​the PCR products for sequencing. 
 + 
 +{{:​assemblies:​2015:​chromatogram_mito_august2015.png?​200|}} 
 + 
 +September 2015
  
-Given that the PCR product in lane 4 is approximately the expected ​size that should complete ​the mitochondrion genome and that Sanger sequencing is only ~$4/tube, I propose that we send it for sequencing.+PCR products were sent for sequencing a second time, this time after a size selection on the strongest band. Results were essentially the same.
  
 The mitochondrion assembly is being worked on by Natasha Dudek (natasha@dudek.org) from Team 5: Discovar //de novo//. The mitochondrion assembly is being worked on by Natasha Dudek (natasha@dudek.org) from Team 5: Discovar //de novo//.
assemblies/2015/mitochondrion_assembly.1437161029.txt.gz · Last modified: 2015/07/17 19:23 by ndudek