From Data to Knowledge - FDK
The From Data to Knowledge research unit (FDK) develops computing methods for discovering useful knowledge from large masses of data. The unit is multidisciplinary, combining in its research groups expertise in algorithmics, statistical methods and application fields such as bioinformatics and human language processing. The unit was elected one of the Finnish Academy 's Centres of Excellence for the six-year period starting 1 January 2002, and re-elected in its renewed form for the next six-year period starting on 1 January 2008.
The FDK unit is shared by the University of Helsinki and Helsinki University of Technology. Most of its operations are located at the Department of Computer Science at the University of Helsinki . Professor Esko Ukkonen is the director of the unit, and Professors Helena Ahonen-Myka, Jaakko Hollmen (HUT), Heikki Mannila (Basic Research Unit/HIIT, Academy Professor) and Hannu Toivonen are members of it. In 2006 the personnel of the unit consisted of about sixty researchers and postgraduate students.
The core competence area of the unit is algorithmics for data analysis. The unit includes leading experts on combinatorial pattern recognition and string algorithms as well as machine learning and data mining among its researchers. In its activities, the unit emphasizes the interaction between theory development and practical applications. The goal is to find research problems, whose conceptual basis and solution algorithms have a wider application potential.
The operations of the unit are divided over several interconnecting main topics and the same persons work with different projects.
The first main topic is data mining and machine learning. The project develops original methods and concepts to strengthen a core area of the unit. Its goals are to discover theoretical basic research results that can be used in different applications. Text databases and document collections as well as molecular biology sequences are examples of the real data we use. Information filtering from the Internet and other human language technology belong to the area of this project as well as using machine learning in image analysis.
The second main theme focuses on applying the first theme in the field of bioinformatics. It studies the methods for medical genetics and for analysing data on genomics, proteomics, and metabolisms. Partners include the European Bioinformatics Institute and several top national research groups. The project develops computational methods for modelling various gene-regulation and metabolism networks on the basis of measurement data. The latest research focuses on such areas as haplotypes, mapping the overall architecture of genomes, managing gene expression data and constructing metabolic models. A new method for computing metabolic fluxes was developed. In collaboration with cancer researchers, the project continued analysing the synergy of gene regulation and mutations.
Combinatorial pattern-matching and information retrieval belong to the focal areas of the unit. The main research questions include approximate pattern matching, efficient index structures, and the synthesis of patterns from data. The group continues to build a program library of string algorithms, as well as the applied research on music-information retrieval. Several new efficient search algorithms were developed for characters given as scoring matrices and the computational competence of the synthesis of character string motives was solved. A search method for XML documents was developed.
In addition to the basic research and doctoral education, the FDK unit also wants to serve as an algorithm 'atelier' that develops computational solutions to new problems in different fields. The unit is always in search for new partners who could pose computational problems at the cutting edge of the research.
During 2006, 4 PhD theses were completed in the unit.
Contact person: Professor Esko Ukkonen
Website: http://www.cs.helsinki.fi/research/fdk/
Projects
Data mining and algorithmic machine learning:
Information extraction
Paleoecological data analysis
APRIL II (EU)
PASCAL (EU NoE)
Computational biology and bioinformatics:
Computational methods for analysing genome structure and function in mammals
Finding predisposition genes in case-control material
A global molecular approach in the study of microbial stress
Yeast systems biology - Integrated analysis of metabolism-related data
BIOSAPIENS (EU NoE)
REGULATORY GENOMICS (EU)
Combinatorial pattern-matching and information retrieval:
C-BRAHMS - music information retrieval
GLAS - Generic software library of algorithms on strings
Computational structural biology
Structure, assembly and dynamics of biological macromolecular complexes
Information about research projects: http://www.cs.helsinki.fi/year2006/research/
Publications
Gionis, A. & Mannila, H. & Mielikäinen, T. & Tsaparas, P.: Assessing data mining results via swap randomization. International Conference on Knowledge Discovery and Data Mining: KDD-2006. - New York , NY : ACM Press 2006. p. 167-176.
Hallikas, O. & Palin, K. & Sinjushina, N. & Rautiainen, R. & Partanen, J. & Ukkonen, E. & Taipale, J.: Genome-wide prediction of mammalian enhancers based on analysis of transcription-factor binding affinity. Cell. - Cambridge , MA : Cell Press. 124 (2006) : 1, p. 47-59.
Koivisto, M.: An O*(2n) algorithm for graph coloring and other partitioning problems via inclusion-exclusion. Symposium on Foundations of Computer Science: 47th Annual IEEE Symposium on Foundations of Computer Science. – Los Alamitos , CA : IEEE Computer Society cop. 2006. p. 583-590.
Rousu, J. & Saunders, C. Szedmak, S. & Shawe-Taylor, J.: Kernel-based learning of hierarchical multilabel classification models. Journal of machine learning research. - Cambridge (MA) : MIT Press. 7 (2006), p. 1601-1626.
Sevon, P. & Toivonen, H. & Ollikainen, V.: TreeDT : tree pattern mining for gene mapping. IEEE/ACM transactions on computational biology and bioinformatics. - New York (NY) : IEEE. 3 (2006) : 2, p. 174-185.