Banana Slug Genomics

Tool to assemble repetitive regions of genome from whole-genome reads

Creates kmer complexity graph to determine cutoff for repeat sequences

Kmer is assumed to be a part of a repeat sequence if seen too many times in raw reads

Create de Bruijn graph assembly using just the repeat reads to put together a consensus sequence of the repeat elements

As soon as reach region where not consensus, assembly stops, so will get multiple separate consensus seqeunces

Not looking for sequence similarity to a known sequence (Alu, microsatellites, etc)

Can get mitochondria and ribosomal genes in this way

Use CLCbio or Velvet to do kmer assembly to give estimate of kmer coverage of assembled coverage

or use PRICE to see if two contigs are part of same repeat class (consensus seq that exists in multiple places in the genome)

gaps will be short
merge consensus sequence of repeats with it,
once have full length consensus sequence, index, map clean reads back to it (with at least 10X coverage) and ask what coverage you are getting of those , depending on their coverage, can determine how many times the consensus sequence is present in the genome (ie 20x, present twice)

Repeat assembly task list

mtDNA will be scaffolded probably too

You could leave a comment if you were logged in.