lecture_notes:05-13-2015
SOAPdenovo2 Update
Second pass at adapter trimming
Trimming was performed using Skewer on the initial datasets SW019_S1/S2 and SW018_S1, all the read sets that were sequenced at UCSC.
Used fastqc to confirm adapter presence in reads, primarily in SW018 reads, overrepresented sequence matched adapter sequence.
Effective adapter trimming confirmed with another fastqc run.
Compared Skewer results to results from SeqPrep (without merging) and found nearly identical results
Second pass Musket error correction
Used the second pass adapter-trimmed data, and a kmer size of 31
Observed no big differences in kmer spectra compared to previous Musket run
Assembly runs
Runtime stats
Did two runs, both with the adapter-trimmed, error-corrected reads described above. First run mis-specified the orientation of one library (reads were specified as ←- –> instead of –> ←-), second run corrected this error in the config file.
Split into four steps, sparse pregraph construction, contig generation, mapping, and scaffold generation.
Sparse pregraph: 37.8 CPU hours and 60
GB max virtual memory for 1st run || 35.4 CPU hours and 60
GB max virt memory for 2nd run
Contig generation: 0.23 CPU hours and 5.8
GB max virt memory for 1st run || 0.24 CPU hours and 5.8
GB max virt memory for 2nd run
Mapping: 21.9 CPU hours and 66
GB max virt memory for 1st run || 14.8 CPU hours and 78
GB max virt memory for 2nd run
Scaffold generation: 92.7 CPU hours and 20
GB max virt memory for 1st run || Still in progress for 2nd run
Results from run 1
Contigs:
Total contig sequence size : 2,051,251,797
Contig count : 3,854,379
Mean length : 532
Longest sequence : 22,512
N50 : 1,425 ; count : 389,550
length > 1K : 583,671 (15.14%)
length > 10K : 891 (0.02%)
Scaffolds:
Total assembly size (including ‘N’s) : 2,064,665,199
Total assembly size (without ‘N’s) : 1,974,393,478
Scaffold count : 2,030,303
Mean length : 1,016
Longest sequence : 60,333
N50 : 5,554 ; count : 105,217
length > 1K : 381,668 (18.80%)
length > 10K : 35,884 (1.77%)
lecture_notes/05-13-2015.txt · Last modified: 2015/05/13 19:13 by calef