Additional tasks to attempt May 22, 2015

repARK

RepARK

Tool to assemble repetitive regions of genome from whole-genome reads

Creates kmer complexity graph to determine cutoff for repeat sequences

Kmer is assumed to be a part of a repeat sequence if seen too many times in raw reads

Create de Bruijn graph assembly using just the repeat reads to put together a consensus sequence of the repeat elements

As soon as reach region where not consensus, assembly stops, so will get multiple separate consensus seqeunces

Not looking for sequence similarity to a known sequence (Alu, microsatellites, etc)

Can get mitochondria and ribosomal genes in this way

Use CLCbio or Velvet to do kmer assembly to give estimate of kmer coverage of assembled coverage

or use PRICE to see if two contigs are part of same repeat class (consensus seq that exists in multiple places in the genome)

Repeat assembly task list

  1. use jellyfish to find repeat kmers
  2. assemblye repeat kmers with copy number partitioining
  3. aggregate/scaffold contigs
  4. copy # in genome
  5. metagnome

mtDNA will be scaffolded probably too