Created contig using these paired-end reads:
/campusdata/BME235/Spring2015Data/SW019_S2_L008_R1_001.fastq /campusdata/BME235/Spring2015Data/SW019_S2_L008_R2_001.fastq
Original contig sequence:
>variant-5304/0 4716 TCCACCGCCTTCGTCCAGTGGAGAATTCGTCGATTCAGAACAAGCCAAGAGAAGAAAGACAGATCCTCCTCCTACAACAT CAACACCAGAGCCGGGTACAGGAACCAGACACGGACTACGTAGTGGAACACCTTTGGTTACTCCGGTTAAACCAAACACA CCAGCAGCTCCAACAGCTGGGCCTAGTACCAGAACACCACAAAATACACCAGCCGGCTCACCAATGGCAGCATCAGTACC GGCGGCAAACATGGACACCAGTGGTGCCCCAGGAGGTGATGTAATGCCAGCAGGAGAGGCAGATGCGGCTGGATATCCAG TACCTGCAGGCATGGCAGGTTCAGGCGGTAATAGGTTTTTCACAGGATTTGGATCCCATACTCAAAAAGAACCAGATGGT TACAGCTCAGTAACCAGGTCCTACAGCAAGACCTTTCTTGTGCACACCAACTTTGACGGCACTCTAAAAGCACTGATCAA TATGGAACCTGGGGTCGCAGTACCAACAGGGATTACCAGTTCAGCAAAAGCAGAATGCATTGTGAACCATGGAGGGGTTA TGATTCCATACATGTATCAAAATTGCTCTACAGACCCTTGGGACTGGAATGTGCCTGATCACTTCATGGGCTGGCAATGC CAGGAGTATGGCTTTAAGGTGGCAGAAGCCCGCATAGAGACCCTAAACAATGACAAGCCCACACCAGAATCGGTGCCCGG GCCACCGCCCCCTAGAGCCAGAATGTGGGCTTTTGTGGATGTTGACAACGATTATGGGCTGGATGACCACAGTGGGATCC TCCAGCACAGTGACTTTTTCAGGGATCAAAATGCCCACAGCCCCAATGCCAATACGCAAGCTAAACTGCCTAACCAGCCA GACAGAAAATTTTTACTGGACAAGCAAGCTGCACAGCAATCATTGTGCCAAGCCTTTACTAGAGCTTCTGGAGCAGGCAC TGAGTCATTTGCTACTTATGAACCGAACTACGTGTGGGACATGTTCAAGTCAGATGGTTATGAAGAGTTTCAACTATGCG ACGGTGACATGCAGCTGGTCTACAAATACAACGGTCCGGTACAACGTTTTCAACACAACCATGACTCCCTTGCACTAGAC TGCTCAGGATTTACACCAGCCATGAACCTTCAAGATGATTTCTCCAACCTAGATGCTTTTGAGACTCAGCTTTGGCCAGG AGCCAGAGAGCTACAAAGGCCATCAGGAACAGAGACCAGCGGTCAGCTAGGCATGATCAACTATGTCAACCACAATTTTG TAAAGGACAGATTTAAAGTGGTGGATACTGACCCACTGTCCACAGTAGAACAGGACTACATAAACCAGACTGGGCAGATG CCTCTAATAGCCAACATGGGCACCAATCACTTGGGCACTAGAACTGCTACTAATCCCAACATGTCAGATGATAACAAAAC ATGGAAACAAACCTTCTCAAAAAGGCCCCCCATCTACATGTTTGGGGTTCACAAGGAGTGGGAATTTCAAACCACCGGTA CACACCCTTACAGGTACTATTTTTGTGCACGGGTGACCTACCACAGCAAAGTCAAGTTTCTAATTAACCAAAGAGGGTGG AAGCCAGTCATTCACGCAGGCTTTGGACTATATTCAGATTACAGCCCAATAAACAATCTTAACTGTGTATCCGTTCCAAG GGCACAAGGAGAAGTTGCCCGCGCAAATAGACGTACCAAGGGCCACAAAATGATTGGCACAGGCAGATTTACCGCCAGTT CTGGTCTATAAAAGCTCAGGGTAAGTAGAGTGCTAGTTATGTTCACTGCCGTTTTCAAGATGTTGAGGTCTTATTTAGCT CGTGATGAAGTTAGGTACCACATGGTTACTGTGGGTCACTGTAGTAGACCAATTATTCACCCCTCATACTTTGTACCGAG GGCTTTGGCCAAGGACAAGTCTGGAAATAAAGTTTCAGCATCAGCAGGTTATGACAAGAGCTACGAACTGGACCACTACT CTGTCTATGAGGAACTGGTAGCAGTTTGGAACACTCCCAATGTTCACTTGGCGGCTATGGATGCCAAACACAAACAAGTG ATCAATAACATTAAAGGCTTTACCAGAGATGAGAGATTCAAGGTGTTAATGGTGTTTCATGACAAGAGTTGTGCCAAACA GTACAAGTCAAATGGCAATCACCTTCATTTGGTGATAAAGACATTAGTACCGGTAATGAGTTCTGACAACAAATATAGAG CCATGATGAGAAGCATGAGCGGCATAGGTGGATACTGTAACACGGCTTTGCTAAAAGGTGACCGTTCATCTTTCCTGAGT TACCTGGCTTCTGACCCTGAGAAAATGTTTCTAGGGTGTCAAGATGCTGACCTTCTACAAGAGTTTAAAGATGCTGAAAA CTTTTCTGGTACGATTAAAGACTGGTTACTAGAAGATCAAAGTGACAAAACCAGTGCCATTAGAAGCTGGTCAGATGCCC TGCCTGTGCCTTCAGATGTGCTCGTTCCTTGTGATTTAAATGTAGCCAGCACCAGCGATACACAGATCCCTAAACACATG ACCAGTGAAAAGGCATCGGACACTGTGAAATTTCTATATGATGAACTGAAAAAGTTCCCTAATGCCCGGTCACTGACAGA CCTCATGGGCATGTATGGGGGCTGGACCCCAGTCTGGAGTGCCTTATGTAATGTTGGGGCCACTCAGGCTGGTAAGAATG CCTTTAACATGGCTTTACAAACAATACTGTTGGAGGCTAGCAAGATGACCCCACTAGCTACATGTGCAGAACTACAGGAT TCTATTGTCGGCTACATGACCCCCAGGCATTCAGTAGCCATGTTGAACGCTTGGTGCATTGAACAGGGCATATCTCCTCG CAAATGGTTTGCCTCTATGCACATGCTACTGTCCGGTAAAGGCAAAAAGAGAGTGGGAATTTATATGCAAGGTGAAGCCA ATTCAGGAAAGACTATGATCACCAATTCCTGCTTTGATTGTTTGAATGACATTGTTGGAAAAATGACCAAAGATGGCTTC CCCTTTCAACAACTTGGAAATAAGAGGATAGTTATTGGGGAAGAAGTGGCCATTACTACATCTAATTTGGAAAAGTTCAA AGACCTCATGTCTGGTGGCAATGTGACCTGCGAGCGAAAGTGTACCACCCCACAGTATTGTAAGCCCAATTTGGTTCTGT TGAACTCTAATGTAACCATCAAGGCTAACCTGTCTCAACATGAAGTGGTGATACTGAAAACCAGACTGTACCTGTTCGAA AACCTGAAGCGATCAGCTGTTATCAACAGCTGTTACGGACTCATTCACCCCAAGGCATATGCCCTGTGTGAGGGCATTAC CGATGATGATTACGCTGCCCTGATTTCCAATGAAACAGACCACTGGACAATGGACCCAGTTGAGATTCAAGGGTCCACTG ATGTGTTTGAAGATGTTTGGGATACGATCCCAAAGGATTATGAGGGTCCACCCTTGACCCCCATTTGTAATCAAATGGAC GTAGGGGAGATCCCCTGCTCACAGAAAAGATTTCGGCGTCGCCCATCTGATTTTGTTCATGAACCAGACTGGTTGCCTCA TGATGAATCCTGGCATCCAGATATGGAGACCCCTGTTACCAAGGTTCGCAAGTTCTTGGACTACCCTGCTGAAGACCTTC AGCACGATGAGCTTGTCCAGTCTGGTGATAAAGAGGTACAGTTTACAGATGAAGAAGTTTGTGATTTAATTGACTCTGAA ATTGAAGATGCCCTGCATTCACACTGTGCCATTTTTGTTTCAGAACTAGAGACCTCTGGGAATCACATTTACCATGAACC CATCGTCAACTTCAGAAGTGACGAGGACAGAGTCCCTTTCATTTCTAGATACGCCACGCTTTGGACAGCCGACCAGACTG ATTTTATTCTTGAAGAAGTGGACGAATTCACTTCTGGTTTTGATAACGCTGACGTCCCTTGCCTTCACTACAGGAGTAGA GCTCTCCCCCTTAACCAAAGGGGAACATTGCTTCAAGTTAACACCGTCAATGGCTCCATCACAAGACTGCTCGTCCCCCA ATTGCCAGACTTTCAAGGACGCCAACCAAAGTGCTTCATTTTTCAAGAACGGAGAAAGACTGCTTGTCCCTTTTCCATTT TGCCTCTGTACCCAGATGAATTCTACAGTGACAATACTTTTCTAATGATGTGCTACGCATATGTTATGCTGTGCACATAT GAACTTACACACGTGTACCCAGATTCACCTACAGAATACCCAGAAAGTCAAGAAATGACCGACATTGTACTGCCCAAAGA AGAAGATCCCATCAACACAAAGAACTGTTACTGGCAGGTCAGACAGAAGCTGAGACGCATCTTGGACGAAAAATATGTTG ATGAATATGAACTTTCATTCAAAAAAGTTTGGTCTTTTACTAGATTTGCTTGCCATCTTTGGATCAGTAATGATTTTTAG TGACATGACTTTTATATTTTCAGGATCCCTGACAGACAAAGACCATTAATATGCATTGCTTTTGTATTATTGTATTCTCA GAATTTCATTCAATAAAGTCCTTACAAAGGACACACAAAACCAATGTCATGAATGGCTTGTCCTTTTCCTCTCTGAGCCT TACAAGGCACCCTTTCTATACCTTTTGTGTGTGGGGGTAGGTCCTTTAAAGGGAAGGTACCACTTTTCCACATAAT
In this contig, I found this ORF
>variant-5304/0 4716|r0.611|2682|r0.611|894 MFTAVFKMLRSYLARDEVRYHMVTVGHCSRPIIHPSYFVPRALAKDKSGNKVSASAGYDKSYELDHYSVYEELVAVWNTPNVHLAAMDAKHKQVINNIKGFTRDERFKVLMVFHDKSCAKQYKSNGNHLHLVIKTLVPVMSSDNKYRAMMRSMSGIGGYCNTALLKGDRSSFLSYLASDPEKMFLGCQDADLLQEFKDAENFSGTIKDWLLEDQSDKTSAIRSWSDALPVPSDVLVPCDLNVASTSDTQIPKHMTSEKASDTVKFLYDELKKFPNARSLTDLMGMYGGWTPVWSALCNVGATQAGKNAFNMALQTILLEASKMTPLATCAELQDSIVGYMTPRHSVAMLNAWCIEQGISPRKWFASMHMLLSGKGKKRVGIYMQGEANSGKTMITNSCFDCLNDIVGKMTKDGFPFQQLGNKRIVIGEEVAITTSNLEKFKDLMSGGNVTCERKCTTPQYCKPNLVLLNSNVTIKANLSQHEVVILKTRLYLFENLKRSAVINSCYGLIHPKAYALCEGITDDDYAALISNETDHWTMDPVEIQGSTDVFEDVWDTIPKDYEGPPLTPICNQMDVGEIPCSQKRFRRRPSDFVHEPDWLPHDESWHPDMETPVTKVRKFLDYPAEDLQHDELVQSGDKEVQFTDEEVCDLIDSEIEDALHSHCAIFVSELETSGNHIYHEPIVNFRSDEDRVPFISRYATLWTADQTDFILEEVDEFTSGFDNADVPCLHYRSRALPLNQRGTLLQVNTVNGSITRLLVPQLPDFQGRQPKCFIFQERRKTACPFSILPLYPDEFYSDNTFLMMCYAYVMLCTYELTHVYPDSPTEYPESQEMTDIVLPKEEDPINTKNCYWQVRQKLRRILDEKYVDEYELSFKKVWSFTRFACHLWISNDF*
I ran this in BLASTp and it mapped to different parovirus NS1 proteins:
Alignment | Max Score | Total Score | Query Coverage | e-value | Identity | Accession |
NS1 [Raccoon dog amdo ] | 60.1 | 60.1 | 17% | 7e-06 | 28% | AID57418.1 |
nonstructural protein [Turkey parvo 1078] | 57.4 | 57.4 | 17% | 6e-05 | 28% | ACA28962.1 |
NS1 [Chicken parvo ] | 57.0 | 57.0 | 17% | 7e-05 | 28% | AJB28744.1 |
I chose to look at ACA28962.1. Region 360:519 of my input sequence mapped with high probability this protein (the others as well). The sequence for this is:
PRKWFASMHMLLSGKGKKRVGIYMQGEANSGKTMITNSCFDCLNDIVGKMTKDGFPFQQLGNKRIVIGEEVAITTSNLEKFKDLMSGGNVTCERKCTTPQYCKPNLVLLNSNVTIKANLSQHEVVILKTRLYLFENLKRSAVINSCYGLIHPKAYALCEG
I got the DNA from the contig at this part and it is:
>variant-5304/0 ACA28962.1|360:519aa|mult3original|extract CGCAAATGGTTTGCCTCTATGCACATGCTACTGTCCGGTAAAGGCAAAAAGAGAGTGGGAATTTATATGCAAGGTGAAGCCAATTCAGGAAAGACTATGATCACCAATTCCTGCTTTGATTGTTTGAATGACATTGTTGGAAAAATGACCAAAGATGGCTTCCCCTTTCAACAACTTGGAAATAAGAGGATAGTTATTGGGGAAGAAGTGGCCATTACTACATCTAATTTGGAAAAGTTCAAAGACCTCATGTCTGGTGGCAATGTGACCTGCGAGCGAAAGTGTACCACCCCACAGTATTGTAAGCCCAATTTGGTTCTGTTGAACTCTAATGTAACCATCAAGGCTAACCTGTCTCAACATGAAGTGGTGATACTGAAAACCAGACTGTACCTGTTCGAAAACCTGAAGCGATCAGCTGTTATCAACAGCTGTTACGGACTCATTCACCCCAAGGCATATGCCCTGTGTGAGGGC
I just ran PriceTI with the following script:
/afs/cats.ucsc.edu/users/b/jolespin/PriceTI/PriceTI -fpp /campusdata/BME235/Spring2015Data/SW019_S2_L008_R1_001.fastq /campusdata/BME235/Spring2015Data/SW019_S2_L008_R2_001.fastq 100 95 -icf /campusdata/BME235/virus/virus_seed.fa 1 1 5 -nc 30 -dbmax 72 -mol 30 -tol 20 -mpi 80 -target 90 2 1 1 -o virus_assembly.fa -a 32
I'll update accordingly.
-jolespin