Differences

This shows you the differences between two versions of the page.

--- assemblies:2011:mitochondrion_assembly [2015/07/16 19:01]
ceisenhart created
+++ assemblies:2011:mitochondrion_assembly [2015/11/01 03:15] (current)
karplus [Reads] updated mitochondrion-draft2 link
@@ Line 1: / Line 1: @@
 ====== Mitochondrial sequence ======
-The mitochondrion was re assembled in 2015, follow the new assembly here, [[assemblies::2015::mitochondrion_assembly | 2015 Mitochondrion assembly ]].  This page is for the 2012 mitochondrial assembly.
+The mitochondrion was re assembled in 2015, follow the new assembly here, [[assemblies::2015::mitochondrion_assembly | 2015 Mitochondrion assembly ]].  This page is for the 2011 mitochondrial assembly.
 The first draft sequence is available as {{{{:computer_resources:assemblies:mitochondrion-draft1.fasta.gz|draft1 gzipped fasta file}}.  This corresponds to /campusdata/BME235/assemblies/slug/barcode-of-life/map-Illumina-raw-45/consensus-6 on campusrocks.  It has 23,642 bases.
@@ Line 27: / Line 27: @@
    * Iterating search and abyss assembly does not lengthen the large contig.  Cleaning up and calling the consensus with bwa+samtools+bcftools doesn't change things much either.  There seems to be a large variation in coverage (from 20x to 2300x, with a median of 225x), so I suspect that there is a repeat region at the beginning of the current contig that may have 10 repeats in it.
-Alternating finding new reads and assembling them made very slow progress, because the new reads only extended the assembled region by 50–100 bases.  Eventually, I wrote a new program ([[bioinformatic_tools:pluck-scripts:look-for-exit|look-for-exit]]) to manually extend the contigs and find exits from repeat regions, being more aggressive in extending the contig than the automatic assemblers.  I was eventually able to close the circle this way, and get a complete genome, though there is one repeat region with long repeats (about a dozen copies of a 615±1 long repeat) that I could not order, because the differences between repeats were far enough apart that I couldn't disambiguate the order with the [[bioinformatic_tools:bwa#determining_paired-end_insert_size|short fragment lengths]] of the data available.  I think I have all the variants of repeat, but in some cases I can't even tell which first half of the repeat goes with which second half.
+Alternating finding new reads and assembling them made very slow progress, because the new reads only extended the assembled region by 50–100 bases.  Eventually, I wrote a new program ([[archive:bioinformatic_tools:pluck-scripts:look-for-exit|look-for-exit]]) to manually extend the contigs and find exits from repeat regions, being more aggressive in extending the contig than the automatic assemblers.  I was eventually able to close the circle this way, and get a complete genome, though there is one repeat region with long repeats (about a dozen copies of a 615±1 long repeat) that I could not order, because the differences between repeats were far enough apart that I couldn't disambiguate the order with the [[archive:bioinformatic_tools:bwa#determining_paired-end_insert_size|short fragment lengths]] of the data available.  I think I have all the variants of repeat, but in some cases I can't even tell which first half of the repeat goes with which second half.
 At some point in the process, I rotated the genome to correspond to the closest previous mitochondrial genome: //Biomphalaria glabrata// strain M, a gastropod.
@@ Line 36: / Line 36: @@
 We plan to use PCR to amplify parts of the repeat region and do Sanger sequencing to confirm the sequence on those blocks.
-To find distinguishing features in the repeat region to design primers, the [[bioinformatic_tools:pluck-scripts:look-for-exit|look-for-exit]] program was used to walk forward and backward through the repeat, looking for alternative paths that had significant read support.  All the variants were recorded in README files (in assemblies/slug/barcode-of-life/map-Illumina-raw-42/  and assemblies/slug/barcode-of-life/map-Illumina-raw-45/) and look-for-exit was used to build putative single copies of repeats from each of the observed variants.
+To find distinguishing features in the repeat region to design primers, the [[archive:bioinformatic_tools:pluck-scripts:look-for-exit|look-for-exit]] program was used to walk forward and backward through the repeat, looking for alternative paths that had significant read support.  All the variants were recorded in README files (in assemblies/slug/barcode-of-life/map-Illumina-raw-42/  and assemblies/slug/barcode-of-life/map-Illumina-raw-45/) and look-for-exit was used to build putative single copies of repeats from each of the observed variants.
 The repeat region starts at position 7037 in draft1, with CTGTAAGAGAATTATTTTAGTAATAAAATTTAATTTTAAGAAAAGAATTTTTCT
@@ Line 315: / Line 315: @@
 The most frequent 19-mer in the subset occurs 6035 times in the full set (so I may be missing 272 copies in the subset), but there are almost 209,000 more common 19-mers, so selecting by frequency would have gotten me mostly low-complexity junk, not mitochondrial sequence.
-After cleaning the mitochondrial reads with [[bioinformatic_tools:jellyfish|jellyfish]] and [[bioinformatic_tools:quake|quake]], in map-Illumina-raw-45/ we have
+After cleaning the mitochondrial reads with [[archive:bioinformatic_tools:jellyfish|jellyfish]] and [[archive:bioinformatic_tools:quake|quake]], in map-Illumina-raw-45/ we have
     clean_19_dir/merged.fastq has 2,860,095 bases in 26,498 reads.
     clean_19_dir/merged_1.fastq has 1,253,271 bases in 16,885 reads.
@@ Line 366: / Line 366: @@
-The second draft sequence (with the short repeats expanded and the repeats ordered as best I can from the short-insert reads) is available as {{mitochondrion-draft2.fasta.gz|draft2 gzipped fasta file}}.  This corresponds to /campusdata/BME235/assemblies/slug/barcode-of-life/map-Illumina-raw-47/draft on campusrocks.  It has 36363 bases.
+The second draft sequence (with the short repeats expanded and the repeats ordered as best I can from the short-insert reads) is available as {{computer_resources:assemblies:mitochondrion-draft2.fasta.gz|draft2 gzipped fasta file}}.  This corresponds to /campusdata/BME235/assemblies/slug/barcode-of-life/map-Illumina-raw-47/draft on campusrocks.  It has 36363 bases.

Banana Slug Genomics

User Tools

Site Tools

Differences

Page Tools