454 Presentation by Teri Mueller

This talk was hosted by Roche. It was a general overview of some of the capabilities of the 454 platform and its bundled software. 454 website - Access to FLX updates and documentation. Register with UCSC account only.

Pre-talk comment: The GUI is accessible by X11 or VNC.

454 Characteristics

1 bead/1 DNA fragment: filter will try to remove beads with >1 fragment (~20%)
200 cycles per FLX Titanium run. New machine does 400.
1 flow: single base. Unit of measure for pyrosequencing, as bases get added by flow.
Amount of light ~ # of bases.
First 4 bases of each sequence is the key sequence. The first 3 are used to normalize the amount of light for 1 base.
Library sequences: TCAG standard / GATC rapid. Do not mix them! There are also control sequences (did not catch them)
1 mil reads / 2 well plate. Lane masks will decrease number of reads.
Quality statistics can be viewed in the BaseQualityMetrics and QualityFilterMetrics files.
10 hr runs. Processing can take up to 80 hrs running on the default computer. This is mitigated by processing with a cluster.
37 gb per run (~28 gb of raw images)

Software

Amplicon Variant Analyzer: for specific region analysis
GS Assembler: de Novo or Reference Mapper
GS Reporter
GS RunProcessor - Image/signal processing
- Formats:
- .cwf: raw image
- .fna: fasta sequence file, header has a specific format.
- .qual: quality scores (like Phred, but offset)
- .sff: standard 454 format. <sfffile -s> will split plate into runs.

Quality Filtering

Shotgun vs. Amplicon pipeline defaults
Keypass (read rejecting):
- Checks for key sequence
- ~20% rejection expected
Dot (read rejecting):
- Checks for too many negative flows.
- 3 successive negative flows or N>5% of last positive flow.
Mix (read rejecting):
- Checks for too many positive flows.
- Indication of more than one sequence in bead.
- >70% positive reads.
Signal Intensity (read trimming):
- Reduces size of read until <3% borderline reads.
Primer (read rejecting):
- Discard overamplified short sequences.
Valley (read rejecting):
- Discard scaled sum scores that are too close to the valleys between base count decision points.
- Amplicon default: 4/700 0.57%
- Shotgun default: 4/320 1.25%
Trim back (read trimming):
- Like valley, except trims instead of discarding until ratio is acceptable.
- Amplicon default: off
- Shotgun default: on
Quality score trim (read trimming):
- 40 base window: if error rate >1%, trim a base.
- <40 bases, throws sequence away.
- (Even unfiltered, quality scores will reflect low quality areas)
SFFtools can also perform screen-trimming with a screening db for known contaminants.

A good run: expect a read length mode ~500 and mean >300. ~50% should pass filters.

de Novo Assembly

All assemblers use .sff files.
3, 8 & 20kb paired end libraries.
Can accept fasta/qual reads.
15-25X coverage best.
- Low coverage: poor contig building
- High coverage: may cause contig breaks
4 bytes per read base in RAM.

You could leave a comment if you were logged in.

Banana Slug Genomics

Table of Contents

454 Presentation by Teri Mueller

454 Characteristics

Software

Quality Filtering

de Novo Assembly

Banana Slug Genomics

User Tools

Site Tools

Table of Contents

454 Presentation by Teri Mueller

454 Characteristics

Software

Quality Filtering

de Novo Assembly

Page Tools