The reputation of the group is based on the work done in the 1980's on developing new algorithms for the following problems: edit distance, approximate string matching, DNA sequence assembly, and shortest common superstring. In the 1990's, various sublinear approximate string-matching algorithms and new measures for string similarity have been developed. For the important problem of how a text should be preprocessed to speed-up subsequent approximate string searches, several solutions have been discovered based on the suffix-tree of the text or the so-called q-grams. For constructing suffix-trees, a natural on-line algorithm has been developed. The two-dimensional string matching problem has also been studied. We were able to obtain the first optimal expected time solutions using simple, practical algorithms.
The current research topics include feature-based algorithms for approximate string matching, two and higher dimensional string matching with applications in computational biology (modeling of viruses, protein folding), sequence databases with applications in bioinformatics, the indexing and clustering of strings and documents, and the search by content in image and music databases.
Current members of the group are Prof. Esko Ukkonen (group leader), Doc. Jorma Tarhio, M.Sc. Kimmo Fredriksson, M.Sc. Raul Hakli, M.Sc. Juha Kärkkäinen, Teemu Kivioja, M.Sc. Kai Korpimies, M.Sc. Kjell Lemström, Dr. Matti Nykänen, M.Sc. Janne Ravantti, Ph.Lic. Erkki Sutinen, and M.Sc. Hellis Tamm. The group gets support from the Academy of Finland.
Publications: [84, 88, 90, 92-95, 114-116, 136, 137, 140, 141]. Home Page: http://www.cs.helsinki.fi/research/pmdm/cpm/