User Tools

Site Tools


data_overview:data_overview

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Last revision Both sides next revision
data_overview:data_overview [2015/07/15 19:57]
ceisenhart Major changes under way, please let me finish before doing more
data_overview:data_overview [2015/07/28 06:28]
ceisenhart ↷ Links adapted because of a move operation
Line 1: Line 1:
 ====== Data ====== ​ ====== Data ====== ​
  
-==== 2015 Data ====+===== 2015 Data ====
 + 
 +The raw data locations are listed below. However many of these files have been processed through Skewer/​fastUniq and a variety of other programs. To see and download these '​processed'​ files please view the data set page. 
 | Data set | Description | Location |  | Data set | Description | Location | 
-|[[MiSeq data| MiSeq data SW019_S1_L001]] | 2x300bp reads from a single MiSeq lane | +| [[data_overview::​2015::​MiSeq data| MiSeq data SW019_S1_L001]] | 2x300bp reads from a single MiSeq lane | http://​genome-test.cse.ucsc.edu/​~charles/​BME235_backup/​Data/​2015/​MiSeq-data-SW019_S1_L001 ​
-| [[HiSeq data 2| HiSeq data SW018_S1_L007]] | 2x100bp reads from a single HiSeq lane with 597bp insert size | +| [[data_overview:​2015:​hiseq_sw018_s1| HiSeq data SW018_S1_L007]] | 2x100bp reads from a single HiSeq lane with 597bp insert size | http://​genome-test.cse.ucsc.edu/​~charles/​BME235_backup/​Data/​2015/​HiSeq-data-SW018_S1_L007 ​
-| [[HiSeq data 1| HiSeq data SW019_S2_L008]] | 2x100bp reads from a single HiSeq lane with 374bp insert size | +| [[data_overview:​2015:​hiseq_sw019_s2| HiSeq data SW019_S2_L008]] | 2x100bp reads from a single HiSeq lane with 374bp insert size |http://​genome-test.cse.ucsc.edu/​~charles/​BME235_backup/​Data/​2015/​HiSeq-data-SW019_S2_L008 ​
-| [[UCSF_BS-MK | UCSF BS-MK data]] | 2x250bp reads with 450-650bp insert size | +| [[data_overview::​2015::​UCSF_BS-MK | UCSF BS-MK data]] | 2x250bp reads with 450-650bp insert size |http://​genome-test.cse.ucsc.edu/​~charles/​BME235_backup/​Data/​2015/​UCSF_BS-MK ​
-| [[UCSF_BS-tag | UCSF BS-tag data]] | 2x250bp reads with 375-575bp insert size | +| [[data_overview::​2015::​UCSF_BS-tag | UCSF BS-tag data]] | 2x250bp reads with 375-575bp insert size |http://​genome-test.cse.ucsc.edu/​~charles/​BME235_backup/​Data/​2015/​UCSF_BS-tag ​
-| [[UCSF_SW018 | UCSF SW018 Data]] | 2x250bp reads from SW018 library | +| [[data_overview::​2015::​UCSF_SW018 | UCSF SW018 Data]] | 2x250bp reads from SW018 library ​|http://​genome-test.cse.ucsc.edu/​~charles/​BME235_backup/​Data/​2015/​UCSF_SW018 ​
-|[[UCSF_SW019 | UCSF SW019 Data]] | 2x250bp reads from SW019 library | +| [[data_overview::​2015::​UCSF_SW019 | UCSF SW019 Data]] | 2x250bp reads from SW019 library ​|http://​genome-test.cse.ucsc.edu/​~charles/​BME235_backup/​Data/​2015/​UCSF_SW019 ​
-| [[Lucigen mate-pair data]] | 2x300bp reads, expected insert size is greater than 1kb | +| [[data_overview::​2015::​Lucigen mate-pair data]] | 2x300bp reads, expected insert size is greater than 1kb |http://​genome-test.cse.ucsc.edu/​~charles/​BME235_backup/​Data/​2015/​Lucigen-mate-pair ​
-|[[ SW041 | SW041 mate-pair data ]] | 2x76bp reads, expected insert size is 3-4kb | +| [[data_overview::​2015::​SW041 | SW041 mate-pair data ]] | 2x76bp reads, expected insert size is 3-4kb |http://​genome-test.cse.ucsc.edu/​~charles/​BME235_backup/​Data/​2015/​SW041-mate-pair ​
-|[[ SW042 | SW042 mate-pair data ]] | 2x76bp reads, expected insert size is 5-6kb | +| [[data_overview::​2015::​SW042 | SW042 mate-pair data ]] | 2x76bp reads, expected insert size is 5-6kb |http://​genome-test.cse.ucsc.edu/​~charles/​BME235_backup/​Data/​2015/​SW042-mate-pair ​
-|[[RNA-Seq | RNA-Seq data]] | Preliminary data generated as of 06/12/15 |  +| [[data_overview::​2015::​RNA-Seq | RNA-Seq data]] | Preliminary data generated as of 06/12/15 | http://​genome-test.cse.ucsc.edu/​~charles/​BME235_backup/​Data/​2015/​RNA_Seq ​|
- +
-==== 2012 Data ==== +
- +
-[[computer_resources:​assemblies:​mitochondrionMitochondrion assembly]] - Generated in 2012 +
  
 ==== Wet-lab procedures ==== ==== Wet-lab procedures ====
  
 The shotgun library preparation protocol used was provided by Steven Weber, ​ The shotgun library preparation protocol used was provided by Steven Weber, ​
-[[Steven Weber'​s Notes on Lab Prep]]+[[data_overview:​steven_weber_s_notes_on_lab_prep]]
  
 {{:​meyer_kircher.pdf|Library Prep Protocol }} {{:​meyer_kircher.pdf|Library Prep Protocol }}
  
  
-=====Analysis of processed data====+==== Analysis of processed data==== ​
-[[data_overview::​kmergenie | kmergenie Output Showing Kmer Distribution ]] +
- +
-This is absolutely disgusting and not even really usable anymore. I am likely going to delete everything below this line.  Really disappointed with the people who put this stuff up, I moaned every step of the way in the comments and in person, and now I am going to be deleting it for the exact reasons I did not like it in the first place.  +
- +
- +
- +
-====PreqC of adapter-trimmed and PCR duplicate-removed data==== +
-The initial datasets were ran through Skewer and FastUniq to create PCR duplicate free and adaptor free files. ​ The libraries were condensed, so now the MiSeq and HiSeq 19 library are condensed. The information and location for these data are embedded in the library data pages. PreqC was ran on all these data combined.  +
- +
- +
-{{::​pooleddatapreqcresults.pdf|}} +
- +
-{{ :​alldataprocessedpreqcestdupper.png |}}Note that the PCR duplication cannot be directly compared to the unprocessed data since the unprocessed data was ran for each library. ​ By weighing the libraries based on file size a weighted PCR duplication percent for all the unprocessed files was calculated to be roughly 2.7%. This number can be directly compared to the percent duplication in these results above. It seems that the pre processing removed just under half of the total duplicates.+
  
-{{ :alldataprocessedpreqcestkmersize.png ​|}}Additionally the ideal K-mer size for these data is longer than for the unprocessed data. This graph shows the ideal K-mer size to be around 75 bases. ​ This is 15 bases longer than previous estimates. ​+[[data_overview::​2015::​analysis::​kmergenie ​kmergenie Output Showing Kmer Distribution ]]
  
-====FastQC ​of adapter-trimmed and PCR duplicate-removed data==== +[[data_overview::​2015::​analysis::​preqc | PreqC of adapter-trimmed and PCR duplicate-removed data ]]
-After removing adapters and PCR duplicates, we run FastQC in two of the libraries. In general, the quality of the reads decrease in the last base-positions. Also, read 2 of the SW019 library shows problems in the per tile sequence quality. Bellow are the pdf files with the fastqc for the PCR and adapter removed libraries. The protocol we used to run fastqc is uploaded in this link: [[fastqc:​fastqc]].+
  
-{{:sw018_adaptertrimmed_dup..._r1.pdfSW018_R1}}+[[data_overview::​2015::​analysis::​fastQC ​FastQC of adapter-trimmed and PCR duplicate-removed data ]] 
 +===== 2011 Data =====
  
-{{:​sw018_adaptertrimmed_dup..._r2.pdf| SW018_R2}}+===== 2010 Data =====
  
-{{:​sw019_adaptertrimmed_dup..._r1.pdf| SW019_R1}} 
  
-{{:​sw019_adaptertrimmed_dup..._r2.pdf| SW019_R2}} 
  
  
data_overview/data_overview.txt · Last modified: 2015/07/28 06:29 by ceisenhart