User Tools

Site Tools


lecture_notes:04-27-2011

454 Presentation by Teri Mueller

This talk was hosted by Roche. It was a general overview of some of the capabilities of the 454 platform and its bundled software. 454 website - Access to FLX updates and documentation. Register with UCSC account only.

  • Pre-talk comment: The GUI is accessible by X11 or VNC.

454 Characteristics

  • 1 bead/1 DNA fragment: filter will try to remove beads with >1 fragment (~20%)
  • 200 cycles per FLX Titanium run. New machine does 400.
  • 1 flow: single base. Unit of measure for pyrosequencing, as bases get added by flow.
  • Amount of light ~ # of bases.
  • First 4 bases of each sequence is the key sequence. The first 3 are used to normalize the amount of light for 1 base.
  • Library sequences: TCAG standard / GATC rapid. Do not mix them! There are also control sequences (did not catch them)
  • 1 mil reads / 2 well plate. Lane masks will decrease number of reads.
  • Quality statistics can be viewed in the BaseQualityMetrics and QualityFilterMetrics files.
  • 10 hr runs. Processing can take up to 80 hrs running on the default computer. This is mitigated by processing with a cluster.
  • 37 gb per run (~28 gb of raw images)

Software

  • Amplicon Variant Analyzer: for specific region analysis
  • GS Assembler: de Novo or Reference Mapper
  • GS Reporter
  • GS RunProcessor - Image/signal processing
    • Formats:
    • .cwf: raw image
    • .fna: fasta sequence file, header has a specific format.
    • .qual: quality scores (like Phred, but offset)
    • .sff: standard 454 format. <sfffile -s> will split plate into runs.

Quality Filtering

  • Shotgun vs. Amplicon pipeline defaults
  • Keypass (read rejecting):
    • Checks for key sequence
    • ~20% rejection expected
  • Dot (read rejecting):
    • Checks for too many negative flows.
    • 3 successive negative flows or N>5% of last positive flow.
  • Mix (read rejecting):
    • Checks for too many positive flows.
    • Indication of more than one sequence in bead.
    • >70% positive reads.
  • Signal Intensity (read trimming):
    • Reduces size of read until <3% borderline reads.
  • Primer (read rejecting):
    • Discard overamplified short sequences.
  • Valley (read rejecting):
    • Discard scaled sum scores that are too close to the valleys between base count decision points.
    • Amplicon default: 4/700 0.57%
    • Shotgun default: 4/320 1.25%
  • Trim back (read trimming):
    • Like valley, except trims instead of discarding until ratio is acceptable.
    • Amplicon default: off
    • Shotgun default: on
  • Quality score trim (read trimming):
    • 40 base window: if error rate >1%, trim a base.
    • <40 bases, throws sequence away.
    • (Even unfiltered, quality scores will reflect low quality areas)
  • SFFtools can also perform screen-trimming with a screening db for known contaminants.

A good run: expect a read length mode ~500 and mean >300. ~50% should pass filters.

de Novo Assembly

  • All assemblers use .sff files.
  • 3, 8 & 20kb paired end libraries.
  • Can accept fasta/qual reads.
  • 15-25X coverage best.
    • Low coverage: poor contig building
    • High coverage: may cause contig breaks
  • 4 bytes per read base in RAM.
You could leave a comment if you were logged in.
lecture_notes/04-27-2011.txt · Last modified: 2011/05/02 04:00 by eyliaw