Introduction to bioinformatics, Autumn 2008
Exercise sessions:
Remember to send your exercise notes to Lauri Eronen before your exercise session begins!
Perform global alignment of the sequences
s = AGCTGCGTACT t = ATGAGCGTTAwith mismatch penalty -1, indel penalty -1 and uniform match score 1 by constructing the dynamic programming matrix. What is the optimal alignment or alignments and corresponding score?
Perform local alignment of the sequences
s = CTCTATGACTCGCAGTGA t = GGCGTAGTATACAGAGCwith mismatch penalty -1, indel penalty -2 and uniform match score 1 by constructing the dynamic programming matrix. What is the optimal alignment or alignments and corresponding score?
For the sequences I = GCATCGGC and J = CCATCGCCATCG, find matching 4-words shared by I and J. Do this by making tables LW(I) and LW(J).
For I = GCTGCTATGCTTGGC and J = CGCGGCTATG, make a 2-word list for J. Compute diagonal common word sums for I and J using the algorithm presented in lectures and in course book.
Run FASTA-Nucleotide tool at EBI (Tools -> Similarity & Homology -> FASTA) against EMBL Coding Sequence database using this sequence as query sequence. Choose "interactive" as the parameter Results. Otherwise use default parameters.
Explain the contents of the result page in your own words. How many matches did you get? How similar were the best matches to the query sequence? How long did the query take?
Note: this assignment gives you two marks.
Write a program implementing the Needleman-Wunsch global alignment algorithm capable of reporting the optimal global alignment score and corresponding alignment.
Test your program with two sequences (first, second) varying parameter values for mismatch and indel penalty while keeping match score constant. For example, use values -20,-10,-5,0 for both penalties, and 10 for match score.
Report the number of matches, mismatches and indels in optimal alignment for each parameter combination. What conclusions can you draw about the effects of different parameter values to alignment result?