Suomeksi På svenska In English
University of Helsinki Department of Computer Science
 

Annual report 2006

Research projects

Bioinformatics and computational biology

Infections and the origins of the human genome. Infectious origins of the human genome. Human endogenous retroviruses in health and disease (MICMAN)

Period: 1/2003 - 12/2006
Researchers: Merja Oja, Jaakko Peltonen, Samuel Kaski
Funding: The Academy of Finland

The project studies the interaction between viral parasites, symbiotes and human hosts. Transposons have inserted themselves into animal and plant genomes. Most of them are retrovirus sequences. The results of the projects will be important both in the long run, for the understanding of the future of man, and in the short term for the understanding of disease mechanisms and possible the development of gene treatments.

The project maps and develops methods for bioinformatics, with which to find retrovirus sequences in the human genome and characterise them. The results will be connected to sequence expression data. The project will aim to find virus sequences with the help of machine-learning methods and to characterise sequence features with undirected data-analysis and data-mining methods. A database will be created of the retrotransposons of the human genome, and it will be merged with data on transposons in different tissues. Human endogenous retroviruses - a type of human transposon - have been discovered in both healthy and diseased tissue, but which retroviruses that are activated is not yet known. To solve this problem we have developed a statistical mixture density model based on hidden Markov models; it can mine the activity levels of individual retroviruses (retrovirus sequences) out of databases listing active sequences. We have also shown empirically that a simple, faster heuristic method can estimate the activity levels of retroviruses well enough. We used this fast method to investigate the activity levels of endogenous retroviruses in 2,450 patients. Most of them were previously unknown. The results show that some 7 % of human endogenous retroviruses are active, and that the active ones are endogenous retroviruses of mixed types. The tests on simulated material also show that our methods make very exact estimates of the activities.

 

Systemic models for metabolic dynamics and regulation of gene expression

Period: 1/2004 - 4/2006
Researchers: Janne Nikkilä, Antti Ajanki, Janne Sinkkonen, Tapio Rinnet, Samuel Kaski
Funding: Tekes

The project develops new computational methods for modelling of gene-regulatory networks, applies them to the system biology of yeast, and integrates them into analysis tools. The project especially models the role of stress response in gene-extraction mutation experiments and the effect of yeast regulatory proteins in stress response.

The project has developed a new method for estimation of gene-regulatory change in different conditions. The method is based on Bayes networks and a variation of them – developed during the project – that focuses on dependencies that change in different situations. The method has been applied in estimating gene-regulation in yeast during stress reactions. It also enables the combination of data on regulation factors and yeast gene expressions, and it has given rise to new hypotheses on gene-regulation interaction during stress reactions. The software implementation is set to be published in 2007.

In addition, the project has utilised a new, computationally very simple method, developed by the group. With this method, one can search for the shared features of several collections of data. The method has been applied for separating the stress response of yeast from other yeast activity, which is crucial when e.g. trying to understand the effect of gene-extraction tests on the metabolism of yeast. Based on canonical correlation analysis, this general-use method will in future have other application fields as well. The project has developed a software implementation that can be integrated into a cooperation partner's software platform.

Learning methods for bioinformatics

Period: 1/2005 - 12/2007
Researchers: Abhishek Tripathi, Jaakko Peltonen, Antti Ajanki, Samuel Kaski
Funding: University of Helsinki research funding

The project develops methods for combining different biological measurement data and for using background information to analyse new measurement data. In many areas of biological research, it is only possible to make measurements of small amounts of samples, but with the help of microarraytechniques, hundreds of thousands of values can be extracted from one sample. The analysis of this kind of data is challenging and requires in-depth knowledge of biology.

The project develops new computational methods for using pre-existing measurements in the analysis, or for creating new background data efficiently and automatically from large collections of measurements. So far, the project has developed solutions for three areas of this problem field. A rapid linear pre-processing method has been developed for data fusion to retain information shared by different materials, and to discard material-specific information. We have also developed a rapid method for searching for components describing classification (discriminative components). Furthermore, we have developed a method based on Bayes networks for analysis of gene-regulatory networks; it is used to discover changes in situational regulation interaction.

 

Modelling functional shifts in enzyme evolution (UR-ENZYMES)

Period: 1/2006-12/2008
Researchers: Juho Rousu, Katja Astikainen, Esa Pitkänen, Liisa Holm ( Institute of Biotechnology )

Funding: The Academy of Finland

UR-ENZYMES is a multi-disciplinary project that combines machine learning with genomics in order to explain molecule evolution. The project creates new algorithms for mining genomic data, for comparative genomics and for reconstruction of metabolic fluxes. The project core consists of the presentation of enzymatic reactions in such a form that the shifts of enzyme-gene functions during biological evolution can be traced. In 2006, the project focused on developing descriptions of enzyme sequences and chemical reactions as well as developing machine-learning methods that utilise them.

Experimental and computational analysis of physiological regulation at transcriptome, proteome and metabolome level (SYSFYS)

Period: 1/2004-12/2007
Researchers: Juho Rousu, Esko Ukkonen, Ari Rantanen, Paula Jouhten, Esa Pitkänen
Funding: The Academy of Finland (the SYSBIO programme)

The Department of Computer Science at the University of Helsinki , the Institute of Biotechnology and VTT cooperate to form the SYSFYS research consortium with the aim of developing and implementing advanced experimental and computational methods for the analysis of metabolic fluxes in cells.

The landmark event in 2006 was the doctoral thesis of Ari Rantanen, which summarized research results on metabolic flux estimation. In addition, the project studied new, efficient reconstruction algorithms for metabolic networks, that ensure the integrity of the networks, and further developed the analysis of mass spectrometry data from metabolomics research, especially prognostics of molecular fragmentation from tandem mass spectrometry.

Yeast systems biology (YEASTSYS)

Period: 1/2006-7/2006

Researchers: Esa Pitkänen, Pekko Parikka, Markus Heinonen, Arto Åkerlund, Ari Rantanen, Esko Ukkonen

Funding: Tekes

YEASTSYS is a cooperation between the Department of Computer Science at the University of Helsinki , VTT and several business partners. TEKES funds the project as a part of the NeoBio research programme. The YEASTSYS project aims at development of the computational methods for modelling cell metabolism, previously developed by the research group, into web applications.

In 2006, the project developed a program to visualize metabolic network models. The program was integrated into a web portal, which was developed in the previous project. The portal offers an unified user interface to the web applications.

Advanced genomics instruments, technology and methods for determination of transcription factor binding specificities; applications for identification of genes predisposing to colorectal cancer (REGULATORY GENOMICS)

Period: 9/2004-9/2008

Researchers: Kimmo Palin, Cinzia Pizzi, Esko Ukkonen (with 6 other groups from 4 European countries)
Funding: EU

The sequencing of the human genome and determination of the genetic code have allowed rapid progress in identification of mammalian genes. However, less is known about gene expressions and the molecular mechanisms that regulate their variation. This is largely due to lack of information about the 'second genetic code' - binding specificities of transcription factors. The project aims at developing new genomic tools and methods for specifying binding specificities. These tools will be used for identification of regulatory SNPs that predispose to colorectal cancer, and for characterization of downstream target genes that are common to multiple oncogenic transcription factors.

The software Enhanced Element Locator (EEL) developed by the project has attracted much interest internationally. Interesting SNPs have been found in the regulation prognoses given by the program. Lab experiments on how the SNP's affect gene expression are being carried out.

A European Network of Genome Annotation (BIOSAPIENS)

Period:1/2004-12/2008

Researchers: Juha Kärkkäinen, Kimmo Palin, Esa Pitkänen, Pasi Rastas, Esko Ukkonen (total 21 institutes in Europe)

Funding: EU

The aim of the European network of genome annotation BIOSAPIENS is to annotate active areas in the human genome. The areas to be annotated will be located both through experiments and by computational methods. A total of 21 independent institutions around Europe are partners in this network of excellence, and one of its most important goals is to coordinate the research carried out at different laboratories in order to make more efficient use of research resources in Europe . The network arranges small workshops and courses under the name ‘ European School of Bioinformatics.' The genome annotations created by the network are public and available through a distributed annotation system (DAS).

The project developed an efficient filtering method for string similarity searches, providing much faster searches from large sequence databases. The project also maintained the data in the annotation server installed at the department. The international partners in the department's group met up for a mini-conference in England in November.