Created contig using these paired-end reads:
/campusdata/BME235/Spring2015Data/SW019_S2_L008_R1_001.fastq
/campusdata/BME235/Spring2015Data/SW019_S2_L008_R2_001.fastq
Original contig sequence:
>variant-5304/0 4716
TCCACCGCCTTCGTCCAGTGGAGAATTCGTCGATTCAGAACAAGCCAAGAGAAGAAAGACAGATCCTCCTCCTACAACAT
CAACACCAGAGCCGGGTACAGGAACCAGACACGGACTACGTAGTGGAACACCTTTGGTTACTCCGGTTAAACCAAACACA
CCAGCAGCTCCAACAGCTGGGCCTAGTACCAGAACACCACAAAATACACCAGCCGGCTCACCAATGGCAGCATCAGTACC
GGCGGCAAACATGGACACCAGTGGTGCCCCAGGAGGTGATGTAATGCCAGCAGGAGAGGCAGATGCGGCTGGATATCCAG
TACCTGCAGGCATGGCAGGTTCAGGCGGTAATAGGTTTTTCACAGGATTTGGATCCCATACTCAAAAAGAACCAGATGGT
TACAGCTCAGTAACCAGGTCCTACAGCAAGACCTTTCTTGTGCACACCAACTTTGACGGCACTCTAAAAGCACTGATCAA
TATGGAACCTGGGGTCGCAGTACCAACAGGGATTACCAGTTCAGCAAAAGCAGAATGCATTGTGAACCATGGAGGGGTTA
TGATTCCATACATGTATCAAAATTGCTCTACAGACCCTTGGGACTGGAATGTGCCTGATCACTTCATGGGCTGGCAATGC
CAGGAGTATGGCTTTAAGGTGGCAGAAGCCCGCATAGAGACCCTAAACAATGACAAGCCCACACCAGAATCGGTGCCCGG
GCCACCGCCCCCTAGAGCCAGAATGTGGGCTTTTGTGGATGTTGACAACGATTATGGGCTGGATGACCACAGTGGGATCC
TCCAGCACAGTGACTTTTTCAGGGATCAAAATGCCCACAGCCCCAATGCCAATACGCAAGCTAAACTGCCTAACCAGCCA
GACAGAAAATTTTTACTGGACAAGCAAGCTGCACAGCAATCATTGTGCCAAGCCTTTACTAGAGCTTCTGGAGCAGGCAC
TGAGTCATTTGCTACTTATGAACCGAACTACGTGTGGGACATGTTCAAGTCAGATGGTTATGAAGAGTTTCAACTATGCG
ACGGTGACATGCAGCTGGTCTACAAATACAACGGTCCGGTACAACGTTTTCAACACAACCATGACTCCCTTGCACTAGAC
TGCTCAGGATTTACACCAGCCATGAACCTTCAAGATGATTTCTCCAACCTAGATGCTTTTGAGACTCAGCTTTGGCCAGG
AGCCAGAGAGCTACAAAGGCCATCAGGAACAGAGACCAGCGGTCAGCTAGGCATGATCAACTATGTCAACCACAATTTTG
TAAAGGACAGATTTAAAGTGGTGGATACTGACCCACTGTCCACAGTAGAACAGGACTACATAAACCAGACTGGGCAGATG
CCTCTAATAGCCAACATGGGCACCAATCACTTGGGCACTAGAACTGCTACTAATCCCAACATGTCAGATGATAACAAAAC
ATGGAAACAAACCTTCTCAAAAAGGCCCCCCATCTACATGTTTGGGGTTCACAAGGAGTGGGAATTTCAAACCACCGGTA
CACACCCTTACAGGTACTATTTTTGTGCACGGGTGACCTACCACAGCAAAGTCAAGTTTCTAATTAACCAAAGAGGGTGG
AAGCCAGTCATTCACGCAGGCTTTGGACTATATTCAGATTACAGCCCAATAAACAATCTTAACTGTGTATCCGTTCCAAG
GGCACAAGGAGAAGTTGCCCGCGCAAATAGACGTACCAAGGGCCACAAAATGATTGGCACAGGCAGATTTACCGCCAGTT
CTGGTCTATAAAAGCTCAGGGTAAGTAGAGTGCTAGTTATGTTCACTGCCGTTTTCAAGATGTTGAGGTCTTATTTAGCT
CGTGATGAAGTTAGGTACCACATGGTTACTGTGGGTCACTGTAGTAGACCAATTATTCACCCCTCATACTTTGTACCGAG
GGCTTTGGCCAAGGACAAGTCTGGAAATAAAGTTTCAGCATCAGCAGGTTATGACAAGAGCTACGAACTGGACCACTACT
CTGTCTATGAGGAACTGGTAGCAGTTTGGAACACTCCCAATGTTCACTTGGCGGCTATGGATGCCAAACACAAACAAGTG
ATCAATAACATTAAAGGCTTTACCAGAGATGAGAGATTCAAGGTGTTAATGGTGTTTCATGACAAGAGTTGTGCCAAACA
GTACAAGTCAAATGGCAATCACCTTCATTTGGTGATAAAGACATTAGTACCGGTAATGAGTTCTGACAACAAATATAGAG
CCATGATGAGAAGCATGAGCGGCATAGGTGGATACTGTAACACGGCTTTGCTAAAAGGTGACCGTTCATCTTTCCTGAGT
TACCTGGCTTCTGACCCTGAGAAAATGTTTCTAGGGTGTCAAGATGCTGACCTTCTACAAGAGTTTAAAGATGCTGAAAA
CTTTTCTGGTACGATTAAAGACTGGTTACTAGAAGATCAAAGTGACAAAACCAGTGCCATTAGAAGCTGGTCAGATGCCC
TGCCTGTGCCTTCAGATGTGCTCGTTCCTTGTGATTTAAATGTAGCCAGCACCAGCGATACACAGATCCCTAAACACATG
ACCAGTGAAAAGGCATCGGACACTGTGAAATTTCTATATGATGAACTGAAAAAGTTCCCTAATGCCCGGTCACTGACAGA
CCTCATGGGCATGTATGGGGGCTGGACCCCAGTCTGGAGTGCCTTATGTAATGTTGGGGCCACTCAGGCTGGTAAGAATG
CCTTTAACATGGCTTTACAAACAATACTGTTGGAGGCTAGCAAGATGACCCCACTAGCTACATGTGCAGAACTACAGGAT
TCTATTGTCGGCTACATGACCCCCAGGCATTCAGTAGCCATGTTGAACGCTTGGTGCATTGAACAGGGCATATCTCCTCG
CAAATGGTTTGCCTCTATGCACATGCTACTGTCCGGTAAAGGCAAAAAGAGAGTGGGAATTTATATGCAAGGTGAAGCCA
ATTCAGGAAAGACTATGATCACCAATTCCTGCTTTGATTGTTTGAATGACATTGTTGGAAAAATGACCAAAGATGGCTTC
CCCTTTCAACAACTTGGAAATAAGAGGATAGTTATTGGGGAAGAAGTGGCCATTACTACATCTAATTTGGAAAAGTTCAA
AGACCTCATGTCTGGTGGCAATGTGACCTGCGAGCGAAAGTGTACCACCCCACAGTATTGTAAGCCCAATTTGGTTCTGT
TGAACTCTAATGTAACCATCAAGGCTAACCTGTCTCAACATGAAGTGGTGATACTGAAAACCAGACTGTACCTGTTCGAA
AACCTGAAGCGATCAGCTGTTATCAACAGCTGTTACGGACTCATTCACCCCAAGGCATATGCCCTGTGTGAGGGCATTAC
CGATGATGATTACGCTGCCCTGATTTCCAATGAAACAGACCACTGGACAATGGACCCAGTTGAGATTCAAGGGTCCACTG
ATGTGTTTGAAGATGTTTGGGATACGATCCCAAAGGATTATGAGGGTCCACCCTTGACCCCCATTTGTAATCAAATGGAC
GTAGGGGAGATCCCCTGCTCACAGAAAAGATTTCGGCGTCGCCCATCTGATTTTGTTCATGAACCAGACTGGTTGCCTCA
TGATGAATCCTGGCATCCAGATATGGAGACCCCTGTTACCAAGGTTCGCAAGTTCTTGGACTACCCTGCTGAAGACCTTC
AGCACGATGAGCTTGTCCAGTCTGGTGATAAAGAGGTACAGTTTACAGATGAAGAAGTTTGTGATTTAATTGACTCTGAA
ATTGAAGATGCCCTGCATTCACACTGTGCCATTTTTGTTTCAGAACTAGAGACCTCTGGGAATCACATTTACCATGAACC
CATCGTCAACTTCAGAAGTGACGAGGACAGAGTCCCTTTCATTTCTAGATACGCCACGCTTTGGACAGCCGACCAGACTG
ATTTTATTCTTGAAGAAGTGGACGAATTCACTTCTGGTTTTGATAACGCTGACGTCCCTTGCCTTCACTACAGGAGTAGA
GCTCTCCCCCTTAACCAAAGGGGAACATTGCTTCAAGTTAACACCGTCAATGGCTCCATCACAAGACTGCTCGTCCCCCA
ATTGCCAGACTTTCAAGGACGCCAACCAAAGTGCTTCATTTTTCAAGAACGGAGAAAGACTGCTTGTCCCTTTTCCATTT
TGCCTCTGTACCCAGATGAATTCTACAGTGACAATACTTTTCTAATGATGTGCTACGCATATGTTATGCTGTGCACATAT
GAACTTACACACGTGTACCCAGATTCACCTACAGAATACCCAGAAAGTCAAGAAATGACCGACATTGTACTGCCCAAAGA
AGAAGATCCCATCAACACAAAGAACTGTTACTGGCAGGTCAGACAGAAGCTGAGACGCATCTTGGACGAAAAATATGTTG
ATGAATATGAACTTTCATTCAAAAAAGTTTGGTCTTTTACTAGATTTGCTTGCCATCTTTGGATCAGTAATGATTTTTAG
TGACATGACTTTTATATTTTCAGGATCCCTGACAGACAAAGACCATTAATATGCATTGCTTTTGTATTATTGTATTCTCA
GAATTTCATTCAATAAAGTCCTTACAAAGGACACACAAAACCAATGTCATGAATGGCTTGTCCTTTTCCTCTCTGAGCCT
TACAAGGCACCCTTTCTATACCTTTTGTGTGTGGGGGTAGGTCCTTTAAAGGGAAGGTACCACTTTTCCACATAAT
In this contig, I found this ORF
>variant-5304/0 4716|r0.611|2682|r0.611|894
MFTAVFKMLRSYLARDEVRYHMVTVGHCSRPIIHPSYFVPRALAKDKSGNKVSASAGYDKSYELDHYSVYEELVAVWNTPNVHLAAMDAKHKQVINNIKGFTRDERFKVLMVFHDKSCAKQYKSNGNHLHLVIKTLVPVMSSDNKYRAMMRSMSGIGGYCNTALLKGDRSSFLSYLASDPEKMFLGCQDADLLQEFKDAENFSGTIKDWLLEDQSDKTSAIRSWSDALPVPSDVLVPCDLNVASTSDTQIPKHMTSEKASDTVKFLYDELKKFPNARSLTDLMGMYGGWTPVWSALCNVGATQAGKNAFNMALQTILLEASKMTPLATCAELQDSIVGYMTPRHSVAMLNAWCIEQGISPRKWFASMHMLLSGKGKKRVGIYMQGEANSGKTMITNSCFDCLNDIVGKMTKDGFPFQQLGNKRIVIGEEVAITTSNLEKFKDLMSGGNVTCERKCTTPQYCKPNLVLLNSNVTIKANLSQHEVVILKTRLYLFENLKRSAVINSCYGLIHPKAYALCEGITDDDYAALISNETDHWTMDPVEIQGSTDVFEDVWDTIPKDYEGPPLTPICNQMDVGEIPCSQKRFRRRPSDFVHEPDWLPHDESWHPDMETPVTKVRKFLDYPAEDLQHDELVQSGDKEVQFTDEEVCDLIDSEIEDALHSHCAIFVSELETSGNHIYHEPIVNFRSDEDRVPFISRYATLWTADQTDFILEEVDEFTSGFDNADVPCLHYRSRALPLNQRGTLLQVNTVNGSITRLLVPQLPDFQGRQPKCFIFQERRKTACPFSILPLYPDEFYSDNTFLMMCYAYVMLCTYELTHVYPDSPTEYPESQEMTDIVLPKEEDPINTKNCYWQVRQKLRRILDEKYVDEYELSFKKVWSFTRFACHLWISNDF*
I ran this in BLASTp and it mapped to different parovirus NS1 proteins:
| Alignment | Max Score | Total Score | Query Coverage | e-value | Identity | Accession |
|NS1 [Raccoon dog amdo ]|60.1|60.1|17%|7e-06|28%|AID57418.1|
|nonstructural protein [Turkey parvo 1078]|57.4|57.4|17%|6e-05|28%|ACA28962.1|
|NS1 [Chicken parvo ]|57.0|57.0|17%|7e-05|28%|AJB28744.1|
I chose to look at ACA28962.1. Region 360:519 of my input sequence mapped with high probability this protein (the others as well). The sequence for this is:
PRKWFASMHMLLSGKGKKRVGIYMQGEANSGKTMITNSCFDCLNDIVGKMTKDGFPFQQLGNKRIVIGEEVAITTSNLEKFKDLMSGGNVTCERKCTTPQYCKPNLVLLNSNVTIKANLSQHEVVILKTRLYLFENLKRSAVINSCYGLIHPKAYALCEG
I got the DNA from the contig at this part and it is:
>variant-5304/0 ACA28962.1|360:519aa|mult3original|extract
CGCAAATGGTTTGCCTCTATGCACATGCTACTGTCCGGTAAAGGCAAAAAGAGAGTGGGAATTTATATGCAAGGTGAAGCCAATTCAGGAAAGACTATGATCACCAATTCCTGCTTTGATTGTTTGAATGACATTGTTGGAAAAATGACCAAAGATGGCTTCCCCTTTCAACAACTTGGAAATAAGAGGATAGTTATTGGGGAAGAAGTGGCCATTACTACATCTAATTTGGAAAAGTTCAAAGACCTCATGTCTGGTGGCAATGTGACCTGCGAGCGAAAGTGTACCACCCCACAGTATTGTAAGCCCAATTTGGTTCTGTTGAACTCTAATGTAACCATCAAGGCTAACCTGTCTCAACATGAAGTGGTGATACTGAAAACCAGACTGTACCTGTTCGAAAACCTGAAGCGATCAGCTGTTATCAACAGCTGTTACGGACTCATTCACCCCAAGGCATATGCCCTGTGTGAGGGC
I just ran PriceTI with the following script:
/afs/cats.ucsc.edu/users/b/jolespin/PriceTI/PriceTI -fpp /campusdata/BME235/Spring2015Data/SW019_S2_L008_R1_001.fastq /campusdata/BME235/Spring2015Data/SW019_S2_L008_R2_001.fastq 100 95 -icf /campusdata/BME235/virus/virus_seed.fa 1 1 5 -nc 30 -dbmax 72 -mol 30 -tol 20 -mpi 80 -target 90 2 1 1 -o virus_assembly.fa -a 32
I'll update accordingly.
-jolespin