Why Mate-Pair library?: long distance information to span contig gaps
Problems:
low-complexity data
lucigen mate-pair kit not very user friendly
only 5-10% of your DNA gets circularized like it's supposed to
only 10% of circular DNA contains junctions
less than 1 ng of every microg is actual usable data
How to compensate:
Other Issues:
Lots of linker near the beginning of the reads
those reads need to be filtered out
We want AT LEAST 30bp of non-linker at the beginning
For Tn5 data, linker sequence is more likely to be farther into the read
That's a good thing! Almost always have at least 30bp before linker
What to do about read where you don't see any linker?
How to avoid chimeric circular DNA?
2×75 data with long linker (60bp): we'll probably not read all the way through the linker, but we'll see bits of it.