Annual Report 2005

Research projects

Bioinformatics and computational biology

Advanced genomics instruments, technology and methods for determination of transcription factor binding specificities; applications for identification of genes predisposing to colorectal cancer. (REGULATORY GENOMICS)

Period: 9/2004-9/2008

Researchers: Kimmo Palin, Cinzia Pizzi, Esko Ukkonen (with 6 other groups in 4 European countries)

Funding: EU

Determination of the sequence of the human genome, and genetic code have allowed rapid progress in identification of mammalian genes . However, less is known about gene expressions and the molecular mechanisms that regulate their variation. This is largely due to lack of information about the 'second genetic code' - binding specificities of transcription factors. The project aims at developing new genomic tools and methods for specifying binding specificities. These tools will be used for identification of regulatory SNPs that predispose to colorectal cancer, and for characterization of downstream target genes that are common to multiple oncogenic transcription factors .

The project has developedf the Enhancer Element Locator (EEL) software that has been used for identification of the target genes of transcription factors in the whole genome. The predictions of the programme have been clinically tested and published in a prestigious journal of biology (CELL 13.1.06).

A European Network of Genome Annotation (BIOSAPIENS)

Period: 1/2004-12/2008

Researchers: Kimmo Palin, Juha Kärkkäinen, Esa Pitkänen, Pasi Rastas, Esko Ukkonen (a total of 21 institutions across Europe)

Funding: EU

The aim of the European network of genome annotation BIOSAPIENS is to annotate active areas in the human genome. The areas to be annotated will be located both through experiments and by computational methods. A total of 21 independent institutions around Europe are partners in this network of excellence, and one of its most important goals is to coordinate the research carried out at different _aboratories in order to make more efficient use of research resources in Europe. The network arranges small workshops and courses under the name “European School of Bioinformaitics.” The genome annotations created by the network are public and available to the public through a distributed annotation system (DAS).

The project published gene-expression enhancer elements or the whole human genome. The elements are distributed with the help of n annotation server at the department. The partners in the department’s workshop met for a mini-conference in Berlin in March.

Yeast systems biology (YEASTSYS)

Period: 1/2004-12/2005

Researchers: Esa Pitkänen, Pekko Parikka, Markus Heinonen, Arto Åkerlund, Ari Rantanen, Esko Ukkonen

Funding: Tekes

YEASTSYS is a cooperation between the Department of Computer Science at the University of Helsinki, VTT and several business partners. TEKES funds the project as a part of the NeoBio research pogramme. The YEASTSYS project aims at commercial development of the computational methods for modelling cell metabolisms, previously developed by the research group, into a web application.

In 2005, the project developed a program for detecting variations to the cell metabolism that increase the production of certain metabolic products. In addition, the project developed a biochemical glossary for creating a metabolism model, in order to clarify incongruence. Both programs were implemented as a part of a web portal that offers a common user interface to the programs.

Project: Experimental and computational analysis of physiological regulation (SYSFYS)

Period: 1/2004-12/2007

Researchers: Juho Rousu, Esko Ukkonen, Ari Rantanen, Esa Pitkänen, Markus Heinonen

Funding: The Academy of Finland, SYSBIO Research Programme

The Department of Computer Science at the University of Helsinki, the Institute of Biotechnology and VTT cooperate to form the SYSFYS research consortium with the aim of developing and implementing advanced experimental and computational methods.

During 2005, the partners at the Department of Computer Science continued their work on modelling cell metabolism. A computational method that estimates the speed of metabolic fluxes was developed for utilising expensive so-called isotope-marking data, which is hard to produce, in a more effective way than before [1]. In addition, an automatic experiment-design method was developed for targeting measurements towards more informative metabolic fragments from a computational point of view [2]. To make it easier to produce isotope-marking data, the project continued improving the automation of identification and analysis methods for metabolic fragments produced by tandem mass spectrometers [3]. The analysis of metabolic network structures with the help of distance measurements, developed during the project, with improved recognition of special metabolic features was continued [4]. A new opening for the project was the research into developing core-function-based similarity measures for enzymes. In future, the project will try to use these measurements in the reconstruction of metabolic networks and prediction of enzyme functionality.

Advanced genomics instruments, technology and methods for determination of transcription factor binding specificities; applications for identification of genes

Period: 9/2004-9/2008

Researchers: Kimmo Palin, Cinzia Pizzi, Esko Ukkonen (with 6 other groups from 4 European countries)

Funding: EU

Determination of the sequence of the human genome and genetic code have allowed rapid progress in identification of mammalian genes. However, less is known about gene expressions and the molecular mechanisms that regulate their variation. This is largely due to lack of information about the 'second genetic code' - binding specificities of transcription factors. The project aims at developing new genomic tools and methods for specifying binding specificities. These tools will be used for identification of regulatory SNPs that predispose to colorectal cancer, and for characterization of downstream target genes that are common to multiple oncogenic transcription factors .

The project has developed the Enhancer Element Locator (EEL) software that has been used for identification of the target genes of transcription factors in the whole genome. The predictions of the programme have been clinically tested and published in a prestigious journal of biology (CELL 13.1.06).

Infectious origins of the human genome. Human endogenous retroviruses in health and disease

Period: 1/2003-12/06

Researchers: Merja Oja, Samuel Kaski

Funding: Academy of Finland

The project studies the interaction between viral parasites, symbiotes and human hosts. Transposons have inserted themselves into animal and plant genomes. Most of them are retrovirus sequences. The results of the projects will be important both in the long run, for the understanding of the future of man, and in the short run for the understanding of disease mechanisms and possible the development of gene treatments. The project maps and develops methods for bioinformatics, with which to find retrovirus sequences in the human genome and characterise them. The results will be connected to sequence expression data. The project will aim to find virus sequences with the help of machine-learning methods and to characterise sequence features with undirected data-analysis and data-mining methods. A database will be created of the retrotransposons of the human genome, and it will be merged with data on transposons in different tissues.

One of the first steps in understanding HERV function is to classify HERVs into families. We have studied the relationships of existing HERV families and tried to detect potentially new HERV families. A Median Self-Organizing Map (SOM), a SOM for non-vectorial data, was used to group and visualize a collection of 3661 HERV protein sequences.

The SOM-based analysis was complemented with estimates of the reliability of the results. A novel trustworthiness visualization method was used to estimate which parts of the SOM visualization are reliable and which not. The reliability of extracted interesting HERV groups was verified by a bootstrap procedure suitable for SOM visualization-based analysis. The SOM detected a completely new group of epsilonretroviral sequences and was able to shed light into the relationships of three pre-existing HERV families. The SOM detected a group of ERV9, HERVW, and HUERSP3 sequences, which suggested that ERV9 and HERVW sequences may have a common origin.

Gene mapping and diagnostics: computational tools for new high-throughput laboratory technologies (Altti)

Period: 3/2003-2/2005

Researchers: Hannu Toivonen, Petteri Sevon, Petteri Hintsanen, Lauri Eronen, Kimmo Kulovesi

Funding: Tekes, GeneOS, Jurilab, Cyberell

The laboratory methods for biotechnology are developing quickly. With the help of the new techniques, we can produce large bodies of genetic data of e.g. case-control studies for epidemiologic purposes. The project develops new computational methods with which to analyse large data collections. The methods developed will make genetic analysis in laboratories easier and more effective.

The goal is to be able to haplotype case-control data and locate disease associations (predisposing genes) in this data. The project also developed methods and tools for population simulation, for making extensive comparisons between different gene-mapping studies and methods.

Mining of biological databases (Biomine)

Period: 3/2005-12/2007

Researchers: Hannu Toivonen, Petteri Sevon, Petteri Hintsanen, Lauri Eronen, Kimmo Kulovesi

Funding: Tekes, Jurilab, Biocomputing Platforms, GeneOS (other partners are the Institute of Biomedicine at the University of Helsinki, Karolinska institutet in Stockholm, VTT Biotechnology and CSC)

The project develops methods and tools for the analysis of public biodatabases (sequences, proteins, interactions, articles etc). With their help, bioscientists can enhance their own data, discover previously unknown connections and analogies to public databases, and aim resources at the most promising objects of further study. The main application focus is on further analysis of candidate genes found in gene mapping.

The project has studied the presentation of biological information as a graph, where the nodes represent different concepts (e.g. genes, proteins, tissue, phenotypes, parts of the cell) and the edges represent the relations between them (e.g. the connection between gene and biological process reported in a gene database). The project has developed methods for the analysis of such graphs and the automatic searching and visualisation of such relations between the concepts.

Annual report 2005

Research projects

Bioinformatics and computational biology

Advanced genomics instruments, technology and methods for determination of transcription factor binding specificities; applications for identification of genes predisposing to colorectal cancer. (REGULATORY GENOMICS)

A European Network of Genome Annotation (BIOSAPIENS)

Yeast systems biology (YEASTSYS)

Project: Experimental and computational analysis of physiological regulation (SYSFYS)

Advanced genomics instruments, technology and methods for determination of transcription factor binding specificities; applications for identification of genes

Infectious origins of the human genome. Human endogenous retroviruses in health and disease