Suomeksi In English
University of Helsinki Department of Computer Science
 

Annual report 2007

From Data to Knowledge - FDK

The From Data to Knowledge (FDK) Centre of Excellence develops data-processing methods for forming useful information out of large bodies of data. The unit is multi-disciplinary and combines expertise in algorithms, statistical methods, and application fields like bioinformatics and processing of natural languages in its research groups. The unit was elected a Finnish Academy Centre of Excellence for the six-year period 2002-2007, and in its successive form Algodan (Algorithmic data analysis) for the following period 2008-2013.

FDK is a joint venture between the University of Helsinki and Helsinki University of Technology. Most of its operations are carried out at the Department of Computer Science at the University of Helsinki and at Helsinki Institute for Information Technology. Professor Esko Ukkonen is the leader of the unit, and other professorial members are Helena Ahonen-Myka, Jaakko Hollmén (TKK), Heikki Mannila (Academy Professor at HIIT-BRU) and Hannu Toivonen. In 2007, some 60 researchers and post-graduates were working in the unit.

The unit studies algorithmic problems in data analysis. Its areas of expertise on an international level are combinatorial pattern recognition and string algorithms as well as machine learning and data mining. The FDK unit is geared towards the interaction between theory development and practical applications. The goal of FDK is to find computing problems, whose conceptual basis and solution algorithms have a wider application potential than the research case at hand.

The work of the unit is divided into several inter-connected main themes and the same researchers work within several projects. The first main theme is data mining and algorithmic machine learning. The goal is to develop concepts and original methods for the core field of expertise in FDK. Results pertain to theoretical basic research and are useful in various applications. Some examples of real data that the unit uses are text databases and document collections as well as molecular biological sequences. Filtering information from the Internet and other forms of natural-language IT order under this project, as well as the use of machine-learning methods in image analysis.

The second main theme focuses on applications of the first theme in the field of bioinformatics. The objective is to develop methods for analysing medical genetics and data on genomics, proteomics and metabolics. The collaborating partners include the European Bioinformatics Institute as well as several national Centres of Excellence. This project develops computational methods for creating various gene-regulatory and metabolic networks on the basis of measured data. The most recent research topics include haplotypes, the management of gene-expression data, building metabolic models, and palaeo-ecology. We continued to collaborate with cancer researchers in analysing the mutual effects of gene regulation and mutations. NIH from the US became a funder of one of the unit's projects.

Combinatorial pattern-matching and information retrieval are core fields of the unit. Approximate pattern matching, efficient indexes, and the learning of repeated patterns on the basis of data are the main algorithmic theoretical questions in this theme. From the synthesis of repeated patterns we reached several theoretical results and continued to develop efficient indexing structures for string data.

In addition to its basic research and post-graduate education, the FDK unit strives to act as an 'algorithm atelier' that develops computational solutions to new problems in various application fields. The unit is constantly searching for new partners with computational problems at the cutting edge of their own application fields.

During year 2007, 4 PhD theses were finished under the auspices of the unit.

Contact persons: Professor Esko Ukkonen, Academy Professor Heikki Mannila

Homepage: http://www.cs.helsinki.fi/research/fdk/