User Tools

Site Tools


archive:jolespin_virus

Created contig using these paired-end reads:

/campusdata/BME235/Spring2015Data/SW019_S2_L008_R1_001.fastq 
/campusdata/BME235/Spring2015Data/SW019_S2_L008_R2_001.fastq

Original contig sequence:

>variant-5304/0 4716
TCCACCGCCTTCGTCCAGTGGAGAATTCGTCGATTCAGAACAAGCCAAGAGAAGAAAGACAGATCCTCCTCCTACAACAT
CAACACCAGAGCCGGGTACAGGAACCAGACACGGACTACGTAGTGGAACACCTTTGGTTACTCCGGTTAAACCAAACACA
CCAGCAGCTCCAACAGCTGGGCCTAGTACCAGAACACCACAAAATACACCAGCCGGCTCACCAATGGCAGCATCAGTACC
GGCGGCAAACATGGACACCAGTGGTGCCCCAGGAGGTGATGTAATGCCAGCAGGAGAGGCAGATGCGGCTGGATATCCAG
TACCTGCAGGCATGGCAGGTTCAGGCGGTAATAGGTTTTTCACAGGATTTGGATCCCATACTCAAAAAGAACCAGATGGT
TACAGCTCAGTAACCAGGTCCTACAGCAAGACCTTTCTTGTGCACACCAACTTTGACGGCACTCTAAAAGCACTGATCAA
TATGGAACCTGGGGTCGCAGTACCAACAGGGATTACCAGTTCAGCAAAAGCAGAATGCATTGTGAACCATGGAGGGGTTA
TGATTCCATACATGTATCAAAATTGCTCTACAGACCCTTGGGACTGGAATGTGCCTGATCACTTCATGGGCTGGCAATGC
CAGGAGTATGGCTTTAAGGTGGCAGAAGCCCGCATAGAGACCCTAAACAATGACAAGCCCACACCAGAATCGGTGCCCGG
GCCACCGCCCCCTAGAGCCAGAATGTGGGCTTTTGTGGATGTTGACAACGATTATGGGCTGGATGACCACAGTGGGATCC
TCCAGCACAGTGACTTTTTCAGGGATCAAAATGCCCACAGCCCCAATGCCAATACGCAAGCTAAACTGCCTAACCAGCCA
GACAGAAAATTTTTACTGGACAAGCAAGCTGCACAGCAATCATTGTGCCAAGCCTTTACTAGAGCTTCTGGAGCAGGCAC
TGAGTCATTTGCTACTTATGAACCGAACTACGTGTGGGACATGTTCAAGTCAGATGGTTATGAAGAGTTTCAACTATGCG
ACGGTGACATGCAGCTGGTCTACAAATACAACGGTCCGGTACAACGTTTTCAACACAACCATGACTCCCTTGCACTAGAC
TGCTCAGGATTTACACCAGCCATGAACCTTCAAGATGATTTCTCCAACCTAGATGCTTTTGAGACTCAGCTTTGGCCAGG
AGCCAGAGAGCTACAAAGGCCATCAGGAACAGAGACCAGCGGTCAGCTAGGCATGATCAACTATGTCAACCACAATTTTG
TAAAGGACAGATTTAAAGTGGTGGATACTGACCCACTGTCCACAGTAGAACAGGACTACATAAACCAGACTGGGCAGATG
CCTCTAATAGCCAACATGGGCACCAATCACTTGGGCACTAGAACTGCTACTAATCCCAACATGTCAGATGATAACAAAAC
ATGGAAACAAACCTTCTCAAAAAGGCCCCCCATCTACATGTTTGGGGTTCACAAGGAGTGGGAATTTCAAACCACCGGTA
CACACCCTTACAGGTACTATTTTTGTGCACGGGTGACCTACCACAGCAAAGTCAAGTTTCTAATTAACCAAAGAGGGTGG
AAGCCAGTCATTCACGCAGGCTTTGGACTATATTCAGATTACAGCCCAATAAACAATCTTAACTGTGTATCCGTTCCAAG
GGCACAAGGAGAAGTTGCCCGCGCAAATAGACGTACCAAGGGCCACAAAATGATTGGCACAGGCAGATTTACCGCCAGTT
CTGGTCTATAAAAGCTCAGGGTAAGTAGAGTGCTAGTTATGTTCACTGCCGTTTTCAAGATGTTGAGGTCTTATTTAGCT
CGTGATGAAGTTAGGTACCACATGGTTACTGTGGGTCACTGTAGTAGACCAATTATTCACCCCTCATACTTTGTACCGAG
GGCTTTGGCCAAGGACAAGTCTGGAAATAAAGTTTCAGCATCAGCAGGTTATGACAAGAGCTACGAACTGGACCACTACT
CTGTCTATGAGGAACTGGTAGCAGTTTGGAACACTCCCAATGTTCACTTGGCGGCTATGGATGCCAAACACAAACAAGTG
ATCAATAACATTAAAGGCTTTACCAGAGATGAGAGATTCAAGGTGTTAATGGTGTTTCATGACAAGAGTTGTGCCAAACA
GTACAAGTCAAATGGCAATCACCTTCATTTGGTGATAAAGACATTAGTACCGGTAATGAGTTCTGACAACAAATATAGAG
CCATGATGAGAAGCATGAGCGGCATAGGTGGATACTGTAACACGGCTTTGCTAAAAGGTGACCGTTCATCTTTCCTGAGT
TACCTGGCTTCTGACCCTGAGAAAATGTTTCTAGGGTGTCAAGATGCTGACCTTCTACAAGAGTTTAAAGATGCTGAAAA
CTTTTCTGGTACGATTAAAGACTGGTTACTAGAAGATCAAAGTGACAAAACCAGTGCCATTAGAAGCTGGTCAGATGCCC
TGCCTGTGCCTTCAGATGTGCTCGTTCCTTGTGATTTAAATGTAGCCAGCACCAGCGATACACAGATCCCTAAACACATG
ACCAGTGAAAAGGCATCGGACACTGTGAAATTTCTATATGATGAACTGAAAAAGTTCCCTAATGCCCGGTCACTGACAGA
CCTCATGGGCATGTATGGGGGCTGGACCCCAGTCTGGAGTGCCTTATGTAATGTTGGGGCCACTCAGGCTGGTAAGAATG
CCTTTAACATGGCTTTACAAACAATACTGTTGGAGGCTAGCAAGATGACCCCACTAGCTACATGTGCAGAACTACAGGAT
TCTATTGTCGGCTACATGACCCCCAGGCATTCAGTAGCCATGTTGAACGCTTGGTGCATTGAACAGGGCATATCTCCTCG
CAAATGGTTTGCCTCTATGCACATGCTACTGTCCGGTAAAGGCAAAAAGAGAGTGGGAATTTATATGCAAGGTGAAGCCA
ATTCAGGAAAGACTATGATCACCAATTCCTGCTTTGATTGTTTGAATGACATTGTTGGAAAAATGACCAAAGATGGCTTC
CCCTTTCAACAACTTGGAAATAAGAGGATAGTTATTGGGGAAGAAGTGGCCATTACTACATCTAATTTGGAAAAGTTCAA
AGACCTCATGTCTGGTGGCAATGTGACCTGCGAGCGAAAGTGTACCACCCCACAGTATTGTAAGCCCAATTTGGTTCTGT
TGAACTCTAATGTAACCATCAAGGCTAACCTGTCTCAACATGAAGTGGTGATACTGAAAACCAGACTGTACCTGTTCGAA
AACCTGAAGCGATCAGCTGTTATCAACAGCTGTTACGGACTCATTCACCCCAAGGCATATGCCCTGTGTGAGGGCATTAC
CGATGATGATTACGCTGCCCTGATTTCCAATGAAACAGACCACTGGACAATGGACCCAGTTGAGATTCAAGGGTCCACTG
ATGTGTTTGAAGATGTTTGGGATACGATCCCAAAGGATTATGAGGGTCCACCCTTGACCCCCATTTGTAATCAAATGGAC
GTAGGGGAGATCCCCTGCTCACAGAAAAGATTTCGGCGTCGCCCATCTGATTTTGTTCATGAACCAGACTGGTTGCCTCA
TGATGAATCCTGGCATCCAGATATGGAGACCCCTGTTACCAAGGTTCGCAAGTTCTTGGACTACCCTGCTGAAGACCTTC
AGCACGATGAGCTTGTCCAGTCTGGTGATAAAGAGGTACAGTTTACAGATGAAGAAGTTTGTGATTTAATTGACTCTGAA
ATTGAAGATGCCCTGCATTCACACTGTGCCATTTTTGTTTCAGAACTAGAGACCTCTGGGAATCACATTTACCATGAACC
CATCGTCAACTTCAGAAGTGACGAGGACAGAGTCCCTTTCATTTCTAGATACGCCACGCTTTGGACAGCCGACCAGACTG
ATTTTATTCTTGAAGAAGTGGACGAATTCACTTCTGGTTTTGATAACGCTGACGTCCCTTGCCTTCACTACAGGAGTAGA
GCTCTCCCCCTTAACCAAAGGGGAACATTGCTTCAAGTTAACACCGTCAATGGCTCCATCACAAGACTGCTCGTCCCCCA
ATTGCCAGACTTTCAAGGACGCCAACCAAAGTGCTTCATTTTTCAAGAACGGAGAAAGACTGCTTGTCCCTTTTCCATTT
TGCCTCTGTACCCAGATGAATTCTACAGTGACAATACTTTTCTAATGATGTGCTACGCATATGTTATGCTGTGCACATAT
GAACTTACACACGTGTACCCAGATTCACCTACAGAATACCCAGAAAGTCAAGAAATGACCGACATTGTACTGCCCAAAGA
AGAAGATCCCATCAACACAAAGAACTGTTACTGGCAGGTCAGACAGAAGCTGAGACGCATCTTGGACGAAAAATATGTTG
ATGAATATGAACTTTCATTCAAAAAAGTTTGGTCTTTTACTAGATTTGCTTGCCATCTTTGGATCAGTAATGATTTTTAG
TGACATGACTTTTATATTTTCAGGATCCCTGACAGACAAAGACCATTAATATGCATTGCTTTTGTATTATTGTATTCTCA
GAATTTCATTCAATAAAGTCCTTACAAAGGACACACAAAACCAATGTCATGAATGGCTTGTCCTTTTCCTCTCTGAGCCT
TACAAGGCACCCTTTCTATACCTTTTGTGTGTGGGGGTAGGTCCTTTAAAGGGAAGGTACCACTTTTCCACATAAT

In this contig, I found this ORF

>variant-5304/0 4716|r0.611|2682|r0.611|894
MFTAVFKMLRSYLARDEVRYHMVTVGHCSRPIIHPSYFVPRALAKDKSGNKVSASAGYDKSYELDHYSVYEELVAVWNTPNVHLAAMDAKHKQVINNIKGFTRDERFKVLMVFHDKSCAKQYKSNGNHLHLVIKTLVPVMSSDNKYRAMMRSMSGIGGYCNTALLKGDRSSFLSYLASDPEKMFLGCQDADLLQEFKDAENFSGTIKDWLLEDQSDKTSAIRSWSDALPVPSDVLVPCDLNVASTSDTQIPKHMTSEKASDTVKFLYDELKKFPNARSLTDLMGMYGGWTPVWSALCNVGATQAGKNAFNMALQTILLEASKMTPLATCAELQDSIVGYMTPRHSVAMLNAWCIEQGISPRKWFASMHMLLSGKGKKRVGIYMQGEANSGKTMITNSCFDCLNDIVGKMTKDGFPFQQLGNKRIVIGEEVAITTSNLEKFKDLMSGGNVTCERKCTTPQYCKPNLVLLNSNVTIKANLSQHEVVILKTRLYLFENLKRSAVINSCYGLIHPKAYALCEGITDDDYAALISNETDHWTMDPVEIQGSTDVFEDVWDTIPKDYEGPPLTPICNQMDVGEIPCSQKRFRRRPSDFVHEPDWLPHDESWHPDMETPVTKVRKFLDYPAEDLQHDELVQSGDKEVQFTDEEVCDLIDSEIEDALHSHCAIFVSELETSGNHIYHEPIVNFRSDEDRVPFISRYATLWTADQTDFILEEVDEFTSGFDNADVPCLHYRSRALPLNQRGTLLQVNTVNGSITRLLVPQLPDFQGRQPKCFIFQERRKTACPFSILPLYPDEFYSDNTFLMMCYAYVMLCTYELTHVYPDSPTEYPESQEMTDIVLPKEEDPINTKNCYWQVRQKLRRILDEKYVDEYELSFKKVWSFTRFACHLWISNDF*

I ran this in BLASTp and it mapped to different parovirus NS1 proteins:

Alignment Max Score Total Score Query Coverage e-value Identity Accession
NS1 [Raccoon dog amdo ]60.160.117%7e-0628%AID57418.1
nonstructural protein [Turkey parvo 1078]57.457.417%6e-0528%ACA28962.1
NS1 [Chicken parvo ]57.057.017%7e-0528%AJB28744.1

I chose to look at ACA28962.1. Region 360:519 of my input sequence mapped with high probability this protein (the others as well). The sequence for this is:

PRKWFASMHMLLSGKGKKRVGIYMQGEANSGKTMITNSCFDCLNDIVGKMTKDGFPFQQLGNKRIVIGEEVAITTSNLEKFKDLMSGGNVTCERKCTTPQYCKPNLVLLNSNVTIKANLSQHEVVILKTRLYLFENLKRSAVINSCYGLIHPKAYALCEG

I got the DNA from the contig at this part and it is:

>variant-5304/0 ACA28962.1|360:519aa|mult3original|extract
CGCAAATGGTTTGCCTCTATGCACATGCTACTGTCCGGTAAAGGCAAAAAGAGAGTGGGAATTTATATGCAAGGTGAAGCCAATTCAGGAAAGACTATGATCACCAATTCCTGCTTTGATTGTTTGAATGACATTGTTGGAAAAATGACCAAAGATGGCTTCCCCTTTCAACAACTTGGAAATAAGAGGATAGTTATTGGGGAAGAAGTGGCCATTACTACATCTAATTTGGAAAAGTTCAAAGACCTCATGTCTGGTGGCAATGTGACCTGCGAGCGAAAGTGTACCACCCCACAGTATTGTAAGCCCAATTTGGTTCTGTTGAACTCTAATGTAACCATCAAGGCTAACCTGTCTCAACATGAAGTGGTGATACTGAAAACCAGACTGTACCTGTTCGAAAACCTGAAGCGATCAGCTGTTATCAACAGCTGTTACGGACTCATTCACCCCAAGGCATATGCCCTGTGTGAGGGC

I just ran PriceTI with the following script:

/afs/cats.ucsc.edu/users/b/jolespin/PriceTI/PriceTI -fpp /campusdata/BME235/Spring2015Data/SW019_S2_L008_R1_001.fastq /campusdata/BME235/Spring2015Data/SW019_S2_L008_R2_001.fastq 100 95 -icf /campusdata/BME235/virus/virus_seed.fa 1 1 5 -nc 30 -dbmax 72 -mol 30 -tol 20 -mpi 80 -target 90 2 1 1 -o virus_assembly.fa -a 32

I'll update accordingly.

-jolespin

You could leave a comment if you were logged in.
archive/jolespin_virus.txt · Last modified: 2015/07/18 13:33 by ceisenhart