What is the Frame and length of the longest found ORF when running the program under default settings?
database assignment
UMUC BIOT630
Week 8 Assignment
Question 1.
Using the FASTA formatted Human genomic sequence provided at the end of this Exercise, perform Gene Prediction using the “Pattern-based” program “ORFinder”. What is the Frame and length of the longest found ORF when running the program under default settings?
ORFinder url = https://www.ncbi.nlm.nih.gov/orffinder/
Answer = ?
Question 2.
In the results page returned for Question 1, what is the longest ORF that has a significant human blast hit by E value? To answer, interrogate the ORFs one-by-one from the longest to the shortest until a significant human result is returned by “SmartBLAST”.
Answer = ?
Question 3.
Using the FASTA formatted Human genomic sequence provided at the end of this Exercise, perform Gene Prediction using the “Content-based” program “geneid”. First, click on the “Reset Form” radio button to ensure the default parameters for the tool are loaded. Second, paste in the sequence to perform gene prediction on in the top input window. Third, select from the “Output options” drop down box “geneid including CDS sequence”. Lastly, click on the “Submit” radio button. Paste a copy of the predicted results below as your answer.
geneid url = https://genome.crg.cat/geneid.html
Answer = ?
Question 4.
Using the fasta formatted nucleotide sequence returned from the geneid prediction tool for Question 3, use the TESTCODE tool to verify that the ORF sequence is indeed coding. Copy/paste the TESTCODE returned results as your answer.
TESTCODE url = http://www.bioinformatics.org/SMS/testcode.html
Answer = ?
Question 5.
Using the fasta formatted protein sequence returned from the geneid prediction tool for Question 3, perform a protein BLAST (i.e., blastp) search to determine what the predicted known “human” gene by geneid is.
Answer = ?
Question 6.
Using the FASTA formatted Human genomic sequence and the FASTA formatted “Related” sequence provided at the end of this Exercise, perform Gene Prediction using the “Comparative-based” program “AUGUSTUS”. First, paste in the “Human” sequence in the top input window. Second, paste the “Related” sequence in the “expert options” input window under the title “Upload cDNA (ESTs, mRNAs) sequences”. Lastly, click on the “Run AUGUSTUS” radio button. How many genes are predicted to be present in the sequence and how many exons per gene?
AUGUSTUS url = http://bioinf.uni-greifswald.de/augustus/submission.php
Answer = ?
Question 7.
Using the protein sequence returned from the AUGUSTUS prediction tool for Question 6, put in fasta format and perform a protein BLAST (i.e., blastp) search to determine what the predicted known “human” gene by AUGUSTUS is.
Answer = ?
Question 8.
From the results of the Exercise, which Gene Prediction method performed the best, or did they seem to all perform equally well?
Answer = ?
Sequences to use for this Exercise:
>Human
AACCGCATCTGCAGCGAGCATCTGAGAAGCCAAGACTGAGCCGGCGGCCGCGGCGCAGCGAACGAGCAGT
GACCGTGCTCCTACCCAGCTCTGCTCCACAGCGCCCACCTGTCTCCGCCCCTCGGCCCCTCGCCCGGCTT
TGCCTAACCGCCACGATGATGTTCTCGGGCTTCAACGCAGACTACGAGGCGTCATCCTCCCGCTGCAGCA
GCGCGTCCCCGGCCGGGGATAGCCTCTCTTACTACCACTCACCCGCAGACTCCTTCTCCAGCATGGGCTC
GCCTGTCAACGCGCAGGTAAGGCTGGCTTCCCGTCGCCGCGGGGCCGGGGGCTTGGGGTCGCGGAGGAGG
AGACACCGGGCGGGACGCTCCAGTAGATGAGTAGGGGGCTCCCTTGTGCCTGGAGGGAGGCTGCCGTGGC
CGGAGCGGTGCCGGCTCGGGGGCTCGGGACTTGCTCTGAGCGCACGCACGCTTGCCATAGTAAGAATTGG
TTCCCCCTTCGGGAGGCAGGTTCGTTCTGAGCAACCTCTGGTCTGCACTCCAGGACGGATCTCTGACATT
AGCTGGAGCAGACGTGTCCCAAGCACAAACTCGCTAACTAGAGCCTGGCTTCTCCGGGGAGGTGGCAGAA
AGCGGCAATCCCCCCTCCCCCGGCAGCCTGGAGCACGGAGGAGGGATGAGGGAGGAGGGTGCAGCGGGCG
GGTGTGTAAGGCAGTTTCATTGATAAAAAGCGAGTTCATTCTGGAGACTCCGGAGCGGCGCCTGCGTCAG
CGCAGACGTCAGGGATATTTATAACAAACCCCCTTTCAAGCAAGTGATGCTGAAGGGATAACGGGAACGC
AGCGGCAGGATGGAAGAGACAGGCACTGCGCTGCGGAATGCCTGGGAGGAAAAGGGGGAGACCTTTCATC
CAGGATGAGGGACATTTAAGATGAAATGTCCGTGGCAGGATCGTTTCTCTTCACTGCTGCATGCGGCACT
GGGAACTCGCCCCACCTGTGTCCGGAACCTGCTCGCTCACGTCGGCTTTCCCCTTCTGTTTTGTTCTAGG
ACTTCTGCACGGACCTGGCCGTCTCCAGTGCCAACTTCATTCCCACGGTCACTGCCATCTCGACCAGTCC
GGACCTGCAGTGGCTGGTGCAGCCCGCCCTCGTCTCCTCCGTGGCCCCATCGCAGACCAGAGCCCCTCAC
CCTTTCGGAGTCCCCGCCCCCTCCGCTGGGGCTTACTCCAGGGCTGGCGTTGTGAAGACCATGACAGGAG
GCCGAGCGCAGAGCATTGGCAGGAGGGGCAAGGTGGAACAGGTGAGGAACTCTAGCGTACTCTTCCTGGG
AATGTGGGGGCTGGGTGGGAAGCAGCCCCGGAGATGCAGGAGCCCAGTACAGAGGATGAAGCCACTGATG
GGGCTGGCTGCACATCCGTAACTGGGAGCCCTGGCTCCAAGCCCATTCCATCCCAACTCAGACTCTGAGT
CTCACCCTAAGAAGTACTCTCATAGTTTCTTCCCTAAGTTTCTTACCGCATGCTTTCAGACTGGGCTCTT
CTTTGTTCTCTTGCTGAGGATCTTATTTTAAATGCAAGTCACACCTAGTCTGCAACTGCAGGTCAGAAAT
GGTTTCACAGTGGGGTGCCAGGAAGCAGGGAAGCTGCAGGAGCCAGTTCTACTGGGGTGGGTGAATGGAG
GTGATGGCAGACACTTTTACTGAATGTCGGTCTTTTTTTGTGATTATTCTAGTTATCTCCAGAAGAAGAA
GAGAAAAGGAGAATCCGAAGGGAAAGGAATAAGATGGCTGCAGCCAAATGCCGCAACCGGAGGAGGGAGC
TGACTGATACACTCCAAGCGGTAGGTACTCTGTGGGTTGCTCCTTTTTAAAACTTAAGGGGAAAGTTGGA
GATTGAGCATAAGGGCCCTTGAGTAAGACTGTGTCTTATGCTTTCCTTTATCCCTCTGTATACAGGAGAC
AGACCAACTAGAAGATGAGAAGTCTGCTTTGCAGACCGAGATTGCCAACCTGCTGAAGGAGAAGGAAAAA
CTAGAGTTCATCCTGGCAGCTCACCGACCTGCCTGCAAGATCCCTGATGACCTGGGCTTCCCAGAAGAGA
TGTCTGTGGCTTCCCTTGATCTGACTGGGGGCCTGCCAGAGGTTGCCACCCCGGAGTCTGAGGAGGCCTT
CACCCTGCCTCTCCTCAATGACCCTGAGCCCAAGCCCTCAGTGGAACCTGTCAAGAGCATCAGCAGCATG
GAGCTGAAGACCGAGCCCTTTGATGACTTCCTGTTCCCAGCATCATCCAGGCCCAGTGGCTCTGAGACAG
CCCGCTCCGTGCCAGACATGGACCTATCTGGGTCCTTCTATGCAGCAGACTGGGAGCCTCTGCACAGTGG
CTCCCTGGGGATGGGGCCCATGGCCACAGAGCTGGAGCCCCTGTGCACTCCGGTGGTCACCTGTACTCCC
AGCTGCACTGCTTACACGTCTTCCTTCGTCTTCACCTACCCCGAGGCTGACTCCTTCCCCAGCTGTGCAG
CTGCCCACCGCAAGGGCAGCAGCAGCAATGAGCCTTCCTCTGACTCGCTCAGCTCACCCACGCTGCTGGC
CCTGTGAGGGGGCAGGGAAGGGGAGGCAGCCGGCACCCACAAGTGCCACTGCCCGAGCTGGTGCATTACA
GAGAGGAGAAACACATCTTCCCTAGAGGGTTCCTGTAGACCTAGGGAGGACCTTATCTGTGCGTGAAACA
CACCAGGCTGTGGGCCTCAAGGACTTGAAAGCATCCATGTGTGGACTCAAGTCCTTACCTCTTCCGGAGA
TGTAGCAAAACGCATGGAGTGTGTATTGTTCCCAGTGACACTTCAGAGAGCTGGTAGTTAGTAGCATGTT
GAGCCAGGCCTGGGTCTGTGTCTCTTTTCTCTTTCTCCTTAGTCTTCTCATAGCATTAACTAATCTATTG
GGTTCATTATTGGAATTAACCTGGTGCTGGATATTTTCAAATTGTATCTAGTGCAGCTGATTTTAACAAT
AACTACTGTGTTCCTGGCAATAGTGTGTTCTGATTAGAAATGACCAATATTATACTAAGAAAAGATACGA
CTTTATTTTCTGGTAGATAGAAATAAATAGCTATATCCATGTACTGTAGTTTTTCTTCAACATCAATGTT
CATTGTAATGTTACTGATCATGCATTGTTGAGGTGGTCTGAATGTTCTGACATTAACAGTTTTCCATGAA
AACGTTTTATTGTGTTTTTAATTTATTTATTAAGATGGATTCTCAGATATTTATATTTTTATTTTATTTT
TTTCTACCTTGAGGTCTTTTGACATGTGGAAAGTGAATTTGAATGAAAAATTTAAGCATTGTTTGCTTAT
TGTTCCAAGACATTGTCAATAAA
>Related
ATGATGTTCTCGGGCTTCAACGCAGACTACGAGGCGTCATCCTCCCGCTGCAGCAGCGCGTCCCCGGCCG
GGGATAGCCTCTCTTACTACCACTCACCCGCAGACTCCTTCTCCAGCATGGGTTCGCCTGTCAACGCGCA
GGACTTCTGCACGGACCTGGCCGTCTCCAGTGCCAACTTCATTCCCACGGTCACTGCCATCTCGACCAGT
CCGGACCTGCAGTGGCTGGTGCAGCCCGCCCTCGTCTCCTCCGTGGCCCCATCGCAGACCAGAGCCCCTC
ACCCTTTCGGAGTCCCCACCCCCTCCGCTGGGGCTTACTCCAGGGCTGGCGTTGTGAAGACCATGACAGG
AGGCCGAGCGCAGAGCATTGGCAGGAGGGGCAAGGTGGAACAGTTATCTCCAGAAGAAGAAGAGAAAAGG
AGAATCCGAAGGGAAAGGAATAAGATGGCTGCAGCCAAATGCCGCAACCGGAGGAGGGAGCTGACTGATA
CACTCCAAGCGGAGACAGACCAACTAGAAGATGAGAAGTCTGCTTTGCAGACCGAGATTGCCAACCTGCT
GAAGGAGAAGGAAAAACTAGAGTTCATCCTGGCAGCTCACCGACCTGCCTGCAAGATCCCTGATGACCTG
GGCTTCCCAGAAGAGATGTCTGTGGCTTCCCTTGATCTGACTGGGGGCCTGCCAGAGGTTGCCACCCCGG
AGTCTGAAGAGGCCTTCACCCTGCCTCTCCTCAATGACCCTGAGCCCAAGCCCTCAGTGGAACCTGTCAA
GAGCATTAGCAGCATGGAGCTGAAGACCGAGCCCTTTGATGACTTCCTGTTCCCAGCATCATCCAGGCCC
AGTGGCTCTGAGACAGCCCGCTCCGTGCCAGACATGGACCTATCTGGGTCCTTCTATGCAGCAGACTGGG
AGCCTCTGCACAGTGGCTCCCTGGGGATGGGGCCCATGGCCACAGAGCTGGAGCCCCTGTGCACTCCGGT
GGTCACCTGTACTCCCAGCTGCACTGCTTACACGTCTTCCTTCGTCTTCACCTACCCCGAGGCTGACTCC
TTCCCCAGCTGTGCAGCTGCCCACCGCAAGGGCAGCAGCAGCAATGAGCCTTCCTCTGACTCGCTCAGCT
CACCCACGCTGCTGGCCCTGTGA
Answer preview what is the Frame and length of the longest found ORF when running the program under default settings?
APA
574 words