User Tools

Site Tools


lecture_notes:05-13-2015

SOAPdenovo2 Update

Second pass at adapter trimming

  • Trimming was performed using Skewer on the initial datasets SW019_S1/S2 and SW018_S1, all the read sets that were sequenced at UCSC.
  • Used fastqc to confirm adapter presence in reads, primarily in SW018 reads, overrepresented sequence matched adapter sequence.
  • Effective adapter trimming confirmed with another fastqc run.
  • Compared Skewer results to results from SeqPrep (without merging) and found nearly identical results

Second pass Musket error correction

  • Used the second pass adapter-trimmed data, and a kmer size of 31
  • Observed no big differences in kmer spectra compared to previous Musket run

Assembly runs

Runtime stats

  • Did two runs, both with the adapter-trimmed, error-corrected reads described above. First run mis-specified the orientation of one library (reads were specified as ←- –> instead of –> ←-), second run corrected this error in the config file.
  • Split into four steps, sparse pregraph construction, contig generation, mapping, and scaffold generation.
  • Sparse pregraph: 37.8 CPU hours and 60 GB max virtual memory for 1st run || 35.4 CPU hours and 60 GB max virt memory for 2nd run
  • Contig generation: 0.23 CPU hours and 5.8 GB max virt memory for 1st run || 0.24 CPU hours and 5.8 GB max virt memory for 2nd run
  • Mapping: 21.9 CPU hours and 66 GB max virt memory for 1st run || 14.8 CPU hours and 78 GB max virt memory for 2nd run
  • Scaffold generation: 92.7 CPU hours and 20 GB max virt memory for 1st run || Still in progress for 2nd run

Results from run 1

Contigs:
  • Total contig sequence size : 2,051,251,797
  • Contig count : 3,854,379
  • Mean length : 532
  • Longest sequence : 22,512
  • N50 : 1,425 ; count : 389,550
  • length > 1K : 583,671 (15.14%)
  • length > 10K : 891 (0.02%)
Scaffolds:
  • Total assembly size (including ‘N’s) : 2,064,665,199
  • Total assembly size (without ‘N’s) : 1,974,393,478
  • Scaffold count : 2,030,303
  • Mean length : 1,016
  • Longest sequence : 60,333
  • N50 : 5,554 ; count : 105,217
  • length > 1K : 381,668 (18.80%)
  • length > 10K : 35,884 (1.77%)
You could leave a comment if you were logged in.
lecture_notes/05-13-2015.txt · Last modified: 2015/05/13 12:13 by calef