Why Mate-Pair library?: long distance information to span contig gaps

Problems:
	* low-complexity data
	* lucigen mate-pair kit not very user friendly
	* only 5-10% of your DNA gets circularized like it's supposed to
	* only 10% of circular DNA contains junctions
	* less than 1 ng of every microg is actual usable data

How to compensate:
	* start with lots of DNA
	* more efficient molecular biology
	* use Tn5 (from Chris Vollmers)
		* recognizes and loads specific sequence (an adapter)
		* cuts the DNA and ligates an adapter in the same step
		* very efficient - all sheared DNA has adapters
		* add a biotinylated linker to the end of the adapters
		* 2x75 data

Other Issues:
	* Lots of linker near the beginning of the reads
	* those reads need to be filtered out
	* We want AT LEAST 30bp of non-linker at the beginning

	* For Tn5 data, linker sequence is more likely to be farther into the read
	* That's a good thing! Almost always have at least 30bp before linker

What to do about read where you don't see any linker?
	* might want to throw them out because we're not confident that they're actually mate pairs
	* throws out tons of data if you are sequencing less than 2x300

How to avoid chimeric circular DNA?
	* can't run it on a gel (circular dna smears)
	* adjust insert size to ~4kb - chimeras are large and unlikely to circularize properly

2x75 data with long linker (60bp): we'll probably not read all the way through the linker, but we'll see bits of it.