User Tools

Site Tools


lecture_notes:04-06-2015

This is an old revision of the document!


Lecture Notes 4/6/2015

Note Taker: Christopher Kan

A road map to the Denovo-Assembly of the Banana Slug Genome

  1. Stefan Prost

Denovo VS. Reference Genome

  1. Reference can be biased by the assembly itself. Eg some areas may not be annotated or reads are not available.
  2. Denovo costs more

Scaffolds and Contigs

  1. Contigs have little to no gaps
  2. Scaffolds can have missing regions but the linear order of the contigs within each scaffold is known
  3. N50s for Scaffold and Contigs are used as quality measures.

○ You sum the size of the scaffolds or contigs until you reach 1/2 the linear length of a genome. The size of the last constituent part of the N50. It’s a way to obtain a median-esque measure of assembly quality

  1. Ideally # scaffolds = # chromosomes

Definition: Kmer - Short unique element of DNA of a certain length n

  1. The elements can overlap
  2. Used to summarize data by assemblers

A priori knowledge of a genome

  1. Expected Genome Size
    • C-values from www.genomesize.com
    • C-value is the genome size in picrograms
    • 1pg=1C=980MB
    • Depending on clade information from related genomes can be used to provide a-priori knowledge

§ Some have low variation and high synteny - Birds

  • 6-7 GB becomes difficult
  1. Data bases
  1. Expected repeat content
  • Correlated with genome size
  • Small repeats and pseudogenes, genome duplications
  1. Expected Heterozygosity
  2. Haploid? Diploid or polyploid?
  • No assembler that can assemble polyploid currently

Sequencing Technology

  1. 1st Gen

○ Sanger

  1. 2nd Gen (PCR Needed)
    • Illumia

§ It took me a long time to understand how this works these two video helped me: Link

  • Roche:454
  • IONtorrent
  • ABI: Solid
  1. 3rd Gen (Single Molecule Sequencing)
  • Heliscope
  • PacBio RS II

§ Problems

  • Polyerase needs to be fast with low error
  • Poor yield from cell

~ Need to wash with low concentration to ensure most cells only have one molocule

  • Insertions and deletions. Missing or having one that hangs around

~ Random. This property used to error correct

  • Light emission at time of amplification
    • Real time, allows decernment of 3D structure of molocule based on time between incorporations
  • Can circularize small DNA fragments and get multiple reads ~3kb, possible to 8kb
  • MiniION and GridION
  • Sequences by taking molocule apart
  • Nanopore allows the molocule through based on salt gradient
  • Sequence as molocule goes through

~ Molocule held by molocule that clips off one nucleotide at a time - exonuclease

				~ Measure the charge at the nanopore. 
			* OR sequence the molocule as it goes through as its held with an helicase
		* Some systematic errors - Harder to correct 
		* Can use a hair pin to run both side of DNA so its effectiely paired
		* No restriction on size hypothetically

Issues with 3rd Gen

  1. High error
  2. High cost
  3. Error correction very computationally expensive

Note Taker: XXX

Discussion

, 2015/04/06 11:02

Posted notes and link to youtube videos - CK

You could leave a comment if you were logged in.
lecture_notes/04-06-2015.1428343985.txt.gz · Last modified: 2015/04/06 11:13 by chkan