Yliopiston etusivulle Suomeksi På svenska In English
Helsingin yliopisto Tietojenkäsittelytieteen laitos
 

Tietojenkäsittelytieteen laitos

Guest Lecture Series: Calendar of Purely Scientific Events at the Department

2006

18 December: Evimaria Terzi's dissertation

University Main Building, Unioninkatu 34, Auditorium XII

Evimaria Terzi will defend her PhD thesis "Problems and Algorithms for Sequence Segmentations" on 18 December 2006 at 12.15.

The opponent is Professor Dimitris Achlioptas, UC Santa Cruz, and kustos is Professor Esko Ukkonen.

Abstract of the thesis

24 November: Guest lecture: Challenges in Peer-to-Peer Content Distribution

by PhD Jussi Kangasharju, Technische Universität Darmstadt

On Friday 24 November at 2pm, room C222.

Abstract: Peer-to-peer content distribution is becoming the de-facto content distribution mechanism for distributing large files to a large number of interested users. Good examples are distribution of Linux or software patches for games over BitTorrent. In this talk, we will outline the future challenges in the area of peer-to-peer content distribution and discuss two important aspects of peer-to-peer content distribution.

In the first part, we will look at the network impact of peer-to-peercontent distribution. Although the practical performance of systems such as BitTorrent is very good from the users point of view, a naïve implementation can cause significant amount of unnecessary wide-area network traffic. We will present an evaluation which compares several content distribution mechanisms and highlights the importance of good peer selection.

In the second part, we consider the efficiency of a peer-to-peer content distribution scheme in distributing the file to the clients. We propose two metrics: distribution time which is commonly used in literature, and delay stretch, which we claim to be more relevant for practical scenarios. We derive a lower bound on the efficiency of any peer-to-peer content distribution scheme and present a near-optimal algorithm which generalizes readily to any situation.

Welcome!

22.11. Ari Rantasen väitös: Algorithms for 13C metabolic flux analysis

FM Ari Rantanen väittelee 22.11.2006 kello 12 Helsingin yliopiston matemaattis-luonnontieteellisessä tiedekunnassa aiheesta "Algorithms for 13C metabolic flux analysis".

Väitöstilaisuus järjestetään osoitteessa Exactum, auditorio CK112, Gustaf Hällströmin katu 2b. Vastaväittäjänä on assistant professor Masanori Arita, University of Tokyo, ja kustoksena on professori Esko Ukkonen.

Väitöskirja julkaistaan sarjassa Department of Computer Science, Series of publications A.

Väitöskirja on myös elektroninen julkaisu ja luettavissa E-thesis -palvelussa

Lue tiivistelmä

Read the abstract

20.11. Sampsa Hautaniemen dosenttikoeluento

maanantai 20.11. klo 16.15, Exactum C221

Aihe: Geenisirut lääketieteellisessä tutkimuksessa: Bioinformaatikon näkökulma

Tiivistelmä: Geenisirujen avulla voidaan tutkia samanaikaisesti kymmenien tuhansien geenien ilmentymistasot solu- tai kudosnäytteissä. Geenisiruja hyväksikäyttävä lääketieteellinen tutkimus poikkeaa merkittävästi perinteisestä lääketieteellisestä tutkimuksesta, jossa on tutkittu kerrallaan yhden geenin toimintaa ja merkitystä. Siksi geenisiruista onkin tullut olennainen osa nykyaikaista lääketieteellistä tutkimusta.

Geenisiruja hyväksikäyttävässä tutkimuksessa tuotetaan miljoonia havaintopisteitä, joiden käsittelyssä tarvitaan bioinformatiikkaa. Aluksi havainnot tulee esikäsitellä luotettavien tulosten varmistamiseksi. Esikäsiteltyjen havaintojen analysoinnissa käytetään usein tilastollista testausta, ryhmittelyä ja luokittelua. Analysoinin tuloksena oleva geenilista koostuu yleensä sadoista geeneistä joiden merkityksen ja tehtävien selvittäminen soluissa edellyttää useiden biologisten tietokantojen käyttöä.

Tässä aihepiiriin johdattelevassa esityksessä käsitellään geenisiruilla saatavien havaintojen tutkimusta esikäsittelystä tulosten biologisen merkityksen selvittämiseen.

Tervetuloa!

17.11. Taneli Mielikäisen dosenttikoeluento

Perjantaina 17.11. klo 12.15, Exactum C220

Aihe: Parametrisoitu vaativuus

Tervetuloa!

16 November: Guest lecture: Music Understanding by Computer

by Roger Dannenberg on Thursday, November 16th, 15:15 Exactum, B119

Abstract: Computers are used in many ways for music analysis and music composition. However, most computers have no real knowledge of music, making it difficult to perform many tasks. Researchers continue to make progress in the area of music understanding systems. This talk will demonstrate some music understanding tasks that have been automated, including the accompaniment of human performers, identifying jazz performance styles in real-time, and finding structure in music audio.

Bio: Roger B. Dannenberg is Associate Research Professor of Computer Science and Art at Carnegie Mellon University, where he received a Ph.D. in 1982. He is well known for his work in interactive computer music systems, especially computers that accompany live musicians. Thousands of music students use this technology every day. Dr. Dannenberg is also a composer, and he plays trumpet professionally, mainly in various jazz groups in the Pittsburgh area.

Welcome!

15 November: Guest lecture: Identifying cis-regulatory modules by combining comparative and compositional analysis of DNA

by Nora Pierstorff (Institute of Genetics of the University of Cologne, Germany) on Wednesday 15 November at 10:15 Exactum, room C221

Abstract: Predicting cis-regulatory modules (CRMs) in higher eukaryotes is a challenging computational task. Commonly used methods to predict CRMs based on the signal of transcription factor binding sites (TFBS) are limited by prior information about transcription factor specificity. More general methods that bypass the reliance on TFBS models are needed for comprehensive CRM prediction.

We have developed a method to predict CRMs called CisPlusFinder that identifies high density regions of perfect local ungapped sequences (PLUSs) based on multiple species conservation. By assuming that PLUSs contain core TFBS motifs that are locally overrepresented, the method attempts to capture the expected features of CRM structure and evolution. Applied to a benchmark dataset of CRMs involved in early Drosophila development, CisPlusFinder predicts more annotated CRMs than all other methods tested. Using the REDfly database, we find that some ¡Èfalse positive¡É predictions in the benchmark dataset correspond to recently annotated CRMs.

Our work demonstrates that CRM prediction methods that combine comparative genomic data with statistical properties of DNA may achieve reasonable performance when applied genome-wide in the absence of an a priori set of known TFBS motifs.

Some words about Nora Pierstorff: I studied in Tübingen Bioinformatics and wrote my diploma thesis with Daniel Huson about comparative gene prediction. At the moment I am finishing my PhD-thesis in Thomas Wiehes group in Cologne at the institute of Genetics. My topic is the conservation of cis-regulatory modules. I have been in Casey Bergmans group in Manchester at the beginning of this year from Feb. until June.

WELCOME!

14.11. Miro Lehtosen väitös: Indexing Heterogeneous XML for Full-Text Search

Miro Lehtonen väittelee 14.11.2006 kello 12 Helsingin yliopiston matemaattis-luonnontieteellisessä tiedekunnassa aiheesta "Indexing Heterogeneous XML for Full-Text Search". Väitöstilaisuus järjestetään yliopiston päärakennuksessa (Fabianinkatu 33 ), Auditoriosssa XII klo 12.

Vastaväittäjänä on prof. dr.-ing. Norbert Fuhr, Universität Duisburg-Essen, ja kustoksena on professori Helena Ahonen-Myka.

Lue tiivistelmä

Read the abstract

13.11. Matti Nykäsen dosenttikoeluento

maanantai 13.11. klo 11.15, Exactumin sali C220

Matti Nykänen pitää julkisen dosenttikoelunnon aiheesta Punamustan puun perusoperaatio.

Tervetuloa!

  12 October: Guest lecture:Mining mobility data - opportunities and privacy threats: the GeoPKDD approach

at 14:15 Exactum, room C221

Francesco Bonchi (Knowledge Discovery and Delivery Laboratory (KDD), Pisa, Italy)

Abstract: A flood of data pertinent to moving objects is available today, and will be more in the near future, particularly due to the automated collection of privacy-sensitive telecom data from mobile phones and other location-aware devices. Such wealth of data, referenced both in space and time, may enable novel classes of applications of high societal and economic impact, provided that the discovery of consumable and concise knowledge out of these raw data is made possible. The goal of the GeoPKDD project (Geographic Privacy-aware Knowledge Discovery and Delivery, project number 01495 within the Future Emerging Technologies program of FP6-IST) is to develop theory, techniques and systems for geographic knowledge discovery and delivery, based on new privacy preserving methods for extracting knowledge from large amounts of raw data referenced in space and time.

In this talk we will first present the GeoPKDD project: its motivations, objectives and on-going activities. We will then focus on the privacy issues in Data Mining, by providing a brief and structured overview of the state of the art, presenting in more details some recent work, and discussing some open research issues.

WELCOME!

9 October: Guest lecture: Experience with Performance Engineering in an Agile Development Process

Dr Andre Bondi, Siemens Corporate Research, Inc.

Monday, October 9, at 14:15-16:00, Exactum CK112

Abstract: In a classical waterfall development environment, performance problems are often only encountered once functional testing is in progress or close to completion. Many of them have their origins in design and architectural decisions and cannot be resolved by performance tuning alone.

An agile process for developing a system based on a service-oriented architecture provides early opportunities for performance testing of system building blocks and the applications on which they are based with simple and complex use cases. This facilitates the early detection, diagnosis, and correction of performance issues. We describe our experience with such a process. We illustrate how carefully structured performance tests and system measurements can yield plots whose analysis can aid developers in identifying the causes of some performance issues, including issues arising from software bottlenecks and/or faulty concurrent programming. These tests must be structured in the context of performance and scalability requirements specified in terms of sound metrics related to the application domain and in terms of anticipated deployment scenarios. The results of some of these tests have led us to identify thread and process scheduling limitations and pitfalls that can occur in Java and other environments, as well as techniques to overcome them.

Welcome!

20 September: Guest lecture: A New Look at Convexity, Duality, and Optimization

Professor Dimitri Bertsekas, MIT

Wednesday, September 20th, at 14:15-15:15, Exactum, B222

Abstract: This talk will review a recent book treatment of convex analysis and optimization. While the subject of the book is classical, the treatment of several of its important topics is new and in some cases relies on new research. The new lines of analysis include:
(a) A unified framework for minimax theory and constrained optimization duality as special cases of duality between two simple geometrical problems. Within this framework, the fundamental constraint qualifications needed for strong duality and existence of saddle points are quite apparent, and admit straightforward proofs.
(b) A unification of  conditions for existence of solutions of convex optimization problems, conditions for the minimax equality to hold, and conditions for the absence of a duality gap in constrained optimization. This unification is based on conditions guaranteeing that a nested family of closed convex sets has a nonempty intersection.
(c) A unification of the major constraint qualifications that guarantee the existence of Lagrange multipliers for nonconvex constrained optimization. This unification is achieved through the notion of constraint pseudonormality, which is motivated by an enhanced form of the Fritz John necessary optimality conditions.
(d) The development of incremental subgradient methods for dual optimization, and the analysis of their advantages over classical subgradient methods.

See http://www.athenasc.com/New_Look.pdf

12 September: Guest lecture: Dynamic and Neuro-Dynamic Programming: An Overview and Recent Work

Professor Dimitri Bertsekas, MIT

Tuesday, September 12th, at 16:15-17:15, Exactum, C222

Abstract: Dynamic and neuro-dynamic programming are broadly applicable methodologies for sequential decision making under uncertainty. The key idea is to use a scoring function to select decisions in complex dynamic systems, arising in a broad variety of applications from engineering design, operations research, resource allocation, finance, etc.  This is much like what is done in computer chess, where positions are evaluated by means of a scoring function and the move that leads to the position with the best score is chosen. Neuro-dynamic programming provides a class of systematic methods for computing appropriate scoring functions using neural network-like approximation schemes and simulation/evaluation of the system's performance.  The talk will overview this methodology and discuss recent work.

See http://web.mit.edu/dimitrib/www/NDP_Review.pdf


------------------------------------------------------------------
A short biography of Prof. Bertsekas:


"Dimitri P. Bertsekas received a combined B.S.E.E. and B.S.M.E. from the National Technical University of Athens, Greece, an M.S.E.E. from George Washington University, and a Ph.D. in system science from the Massachusetts Institute of Technology in 1971.

Dr. Bertsekas has held faculty positions with the Engineering-Economic Systems Dept., Stanford University (1971-1974) and the Electrical Engineering Dept. of the University of Illinois, Urbana (1974-1979). Since 1979 he has been teaching at the Electrical Engineering and Computer Science Department of the Massachusetts Institute of Technology (M.I.T.), where he is currently McAfee Professor of Engineering. He consults regularly with private industry and has held editorial positions in several journals. His research at M.I.T. spans several fields, including optimization, control, large-scale computation, and data communication netwoks, and is closely tied to his teaching and book authoring activities. He has written numerous research papers, and thirteen books, several of which are used as textbooks in MIT classes.

Professor Bertsekas was awarded the INFORMS 1997 Prize for Research Excellence in the Interface Between Operations Research and Computer Science for his book "Neuro-Dynamic Programming" (co-authored with John Tsitsiklis), the 2000 Greek National Award for Operations Research, and the 2001 ACC John R. Ragazzini Education Award. In 2001, he was elected to the United States National Academy of Engineering.

Besides his professional activities, Professor Bertsekas is interested in travel, portrait, and landscape photography. His pictures have been exhibited on several occasions at M.I.T., and can also be accessed from his www site ( http://web.mit.edu/dimitrib/www/home.html )."

30 August: Presentation of current Projects in Bioinformatics & Biomedical Informatics within the Foundation for Biomedical Research of the Academy of Athens

Wednesday 30.8 at 14.00 in seminar room C222, Exactum

Dr. Sophia Kossida (Academy of Athens, Foundation for Biomedical Research, Biotechnology Laboratory, Athens, Greece)

Abstract. There is a wealth of information deposited in biomedical databases derived from experimental work such as the genome sequencing projects, the yeast 2hybrid system assays, the RNAi assays, the microarrays experiments, the proteomics experiments. Using proteomics experiments as case study, it will be shown how closely inter-related and dependent on each other wet science and bioinformatics is. Bioinformatics tools for proteomics will be briefly discussed and sebsequntly using Kallikrein 6 (KLK6) and Integrin gene family as case studies, attempts towards an integrated bioinformatics system will be presented. It will be shown how the procsessed in silico information feeds further experimental work and how this procedure could be implemented to lead us towards a Systems Biology approach.

Welcome!

24.8.Veli Mäkisen dosenttikoeluento: Itseindeksit - kun tiivistetty teksti ja sen indeksi ovatkin sama asia

torstaina 24.8 klo 13.15, Exactum C222

Tiivistelmä: Tekstimuotoisen tiedon talletuksessa käytetään yleisesti kahta tapaa: (1) talletetaan teksti sellaisenaan tai (2) talletetaan teksti pakattuna. Ensimmäisessä vaihtoehdossa etuna on nopea pääsy mielivaltaiseen kohtaan tekstissä, haittana tilan tuhlaus. Toisessa vaihtoehdossa menetetään nopea pääsy tekstin sisälle mutta saavutetaan tilan säästöä. Herää kysymys: olisiko mahdollista saavuttaa molemmat
edut yhtä aikaa?

Vastaus on kyllä. Itseindeksit yhdistävät tiivistämisen ja nopean pääsyn tekstin osiin. Lisäksi ne tuovat mukanaan kokonaan uuden edun: niiden avulla voidaan tehokkaasti selvittää, esiintyykö annettu hakusana tekstissä. Tällaisiin hakuihin osataan vastata oleellisesti lineaarisessa ajassa hakusanan pituuden suhteen. Tämä on huomattava parannus vaihtoehtoon (1), missä vastaavaan kyselyyn vastaaminen vie lineaarisen ajan tekstin pituuden suhteen.

Luento johdattaa itseindeksien perusideoihin, johtamalla samalla hyvin yksinkertaistetun ja siksi helposti toteutettavan esimerkin itseindeksistä

23 August: Guest lecture: Hardness of Optimal Spaced Seed Design

Wednesday 23th August at 14:00 Exactum, room C222

François Nicolas (LIRMM, Laboratoire d'Informatique, de Robotique et de Microélectronique de Montpellier, France)

Abstract: Speeding up approximate pattern matching is a line of research in stringology since the 80's. Practically fast approaches belong to the class of filtration algorithms, in which text regions dissimilar to the pattern are excluded (filtered out) in a first step, and remaining regions are compared to the pattern by dynamic programming in a second step. Among the necessary conditions used to test similarity between the regions and the pattern, many require a minimum number of common substrings between'them. When only substitutions are taken into account for measuring dissimilarity, it was shown recently that counting spaced subwords instead of substrings improve the filtration efficiency. However, a preprocessing step is required to design one or more patterns, called gapped seeds, for the subwords, depending on the search parameters. The seed design problems proposed up to now differ by the way the similarities to detect are given: either a set of similarities is given in extenso (this is a "region specific" problem), or one wishes to detect all similar regions having at most k substitutions (general detection problem). Several articles exhibit exponential algorithms for these problems. In my talk, I will present hardness and inapproximability results for both the region specific and general seed design problems, thereby justifying the exponential complexity of known algorithms.

Note: A shorter version of this talk was presented at CPM'05.

WELCOME!

22 August: Guest lecture: A Framework for Specifying Sourcing Collaboration

Tuesday 22.8. 2006 at 2pm in C222

Alex Norta from Eindhoven University of Technology

ABSTRACT: Dynamic inter-organizational business process management (DIBPM) combines service-oriented business integration (SOBI) and workflow management as a promising approach for supporting commercial business-to-business (B2B) activities over web-based infrastructures. SOBI applies concepts from the field of service-oriented computing in the domain of dynamic business collaboration. Currently, SOBI technologies insufficiently support B2B collaboration where dynamic matching of structures of service consuming and service providing processes is performed. Collaborating parties want to control how much process detail they expose and which parts of them are monitorable. SOBI technology should offer rigor that permits verification of desirable features before enactment, e.g., correct termination. Furthermore, current SOBI technologies lack concepts which are useful for specifying and implementing B2B collaborations. Hence, several related critical issues are explored in this paper. Firstly, how to manage the inherent conceptual, business, and technological complexity of such business collaboration. Secondly, the issue is addressed of laying a foundation for rigor that is instrumental for verifying control-flow adherence and correct termination of coupled business processes. These requirements need to be guiding for the development of specification languages of inter-organizational business processes and related middleware that enact them in a web-based way. Exploring these critical issues leads to the proposal of eSourcing that employs a three-level framework for tackling the complexity of dynamically matching a service consuming and a service providing process. Furthermore, eSourcing offers rigor by utilizing well explored process theory that results in improved control over inter-organizational business process structure. Finally, the issue of suitability is tackled by discovering inherent eSourcing features that permit the positioning of eSourcing configurations in differing perspectives. Those values are instrumental for subsequently discovering patterns that are translatable into a language for full-fledged DIBPM support.

17th August: Guest lecture: Spectral Norm in Learning Theory: Some Selected Topics

Hans Ulrich Simon (Ruhr-University Bochum)

on Thursday 17th August at 11:00 Exactum, room C222

Abstract In this talk, we review some known results that relate the statistical query complexity of a concept class to the spectral norm of its correlation matrix. Since spectral norms are widely used in various other areas, we are then able to put statistical query complexity in a broader context. We briefly describe some non-trivial connections to (seemingly) different topics in learning theory, complexity theory, and cryptography. A connection to the so-called Hidden Number Problem, which plays an important role for proving bit-security of cryptographic functions, will be discussed in somewhat more detail.

WELCOME!

4 July: Guest lecture:A Framework for Human Evaluation of Search Engine Relevance

Kamal Ali, Chi Chao Chang, Yun-fang Juan

room D122, on Tuesday, July 4th at 11:15-12:00

We present a six-dimensional framework for classifying types of human evaluations of search engine relevance. The first dimension is the class(es) of query being used for the test. The second is the modality of acquisition of the human judgment: with implicit judgments such as clickthrough rates and explicit judgments such as scores. The third dimension is the class of judge being used - to date, three main types have been used: editors, panelists and live users. Fourth is the granularity of the judgment: is the given judgement on a single result or on the entire search results page? The fifth dimension is the depth of the judgment: is the judgment based on the information presented at the results page or on the underlying document? And finally, evaluations can be characterized as relative (between two or more search engines) or absolute (for a single engine).

Our results explore three dimensions: for granularity, we find set-level judgments capture factors missed by item-level evaluations; for depth we find (for the news domain) that judgments on abstracts are highly correlated with judgments on the landing page, and for judge class we find that editors are more discerning about results and offer more constructive reasons for improving search engine than do panelists. Both populations are biased and panelist tests need to pay attention to reward structure to ensure motivated panelists.

14 June: Guest lecture: Music Notation Retrieval with Transportation Distances - Distance Measure, Indexing Method, and Human-generated Ground Truth for Melodic Similarity

Wednesday, June 14, 12:15-13:45 Exactum, C222

Rainer Typke (Utrecht University, The Netherlands)

This talk presents a distance measure for melodies that is based on the Earth Mover's Distance (EMD). Notes are represented as dots in the two-dimensional space of onset time and pitch. The dots can be weighted with the importance of notes. The EMD then is used for calculating the similarity of different snippets of music. If a transportation distance is chosen that obeys the triangle inequality (or if it does not disobey it too badly), retrieval can be sped up by using vantage objects. At the ISMIR 2005 conference, various methods for measuring melodic similarity were compared against a ground truth that was created in Utrecht. This ground truth for a collection of melodies is also described in this talk. The talk is concluded with a demonstration of how the EMD-based melody distance measure can be used for identifying melodies, retrieving polyphonic MIDI files containing a given melody, and finding (possibly illegitimate) copies of MIDI files on the Web with Musipedia, a web-based melody search engine.

Welcome!

30 May: Guest lecture: Machine Learning Techniques for Analysing Chemical Data for Hazardous Materials and Illegal Drugs

Tuesday 30th May at 14:15, Exactum, room C222

Dr Michael Madden (Lecturer in Information Technology, National University of Ireland Galway, Ireland)

Abstract: This talk describes an ongoing interdisciplinary research effort at NUI Galway to research and develop machine learning techniques for analysing data from the chemical technique of Raman spectroscopy. This is a rapid, non-destructive technique, and portable Raman systems are being developed for use in the field by non-specialists, but standard analysis techniques based on spectrum-matching do not work well with complex mixtures. Our overall goal is to improve the ability to identify the constituents of mixtures with high confidence, and estimate their concentrations. We specifically focus on analysing hazardous materials and illegal drugs, which has many applications in healthcare, law enforcement, safety, and forensics. We have assessed the applicability of a range of existing ML techniques and continue to develop innovative techniques that are well suited to this domain. Key lessons learned from this work are: the benefit of close interdisciplinary interaction; the value of tailoring algorithms to the application domain; and the importance of providing end-users with scientific insight rather than just black-box results.

Biography: Dr Michael Madden joined NUI Galway as a lecturer in 2000. After graduating with a B.E. in 1991, he worked for four years as a Ph.D. research assistant in NUI Galway. He then worked in for 5 years in an internationally-trading Irish software company, where he was the R&D Manager. He leads the Machine Learning Group in Galway and is currently Principal Investigator on three research projects funded by Enterprise Ireland. Two of these involve developing novel and commericalisable algorithms for interpreting Raman spectra, and the other is focused on new ML techniques for one-class classification. He has recently received funding from the EU?s Marie Curie Programme to develop interdisciplinary research activity in biomedical data mining in NUI Galway. WELCOME!

10 May: Guest lecture: On Identifying Discriminative Combinations of Features for Labeled Data

Wednesday, May 10, 12:15-13:00 Exactum, C222

Gemma Garriga (Universitat Politecnica de Catalunya, Barcelona)

Abstract: Closed sets are being successfully applied in the context of compacted data representation for association rule learning. However, their use is mainly descriptive. This paper shows that, when considering labeled data, closed sets can be adapted for prediction and discrimination purposes by conveniently contrasting covering properties on positive and negative examples. We formally justify that these sets characterize the space of relevant combinations of features for discriminating the target class. In practice, identifying relevant/irrelevant combinations of features through closed sets is useful in many applications. Here we apply it to emerging patterns, subgroup discovery, and learning of predictive rules on datasets characterized by highly unbalanced distribution.

Welcome!

9 May: Guest lecture: Mining top-k covering rule group from microarray data and classifying microarray data

At 13:00-14:00 Exactum, C222

Gao Cong (University of Edinburgh)

Abstract : The associations between genes could help to discover the knowledge of how a particular gene is affected by other genes. The associations between genes and categories can describe what genes are expressed as a result of certain cellular environments (for instance cancer cell). However, due to the large number of genes in microarray data, association rule mining algorithms usually take long time to run and return a huge number of rules, which greatly limit the analysis of such datasets using association rules.

To address these problems, we propose the concept of top-k covering rule group and a new algorithm to discover the top-k covering rule groups for each row of gene expression profiles. Several experiments on real bioinformatics datasets show that the new top-k covering rule mining algorithm is orders of magnitude faster than previous association rule mining algorithms. Furthermore, we propose a new classification method RCBT. RCBT classifier is constructed from the top-k covering rule groups. The rule groups generated for building RCBT are bounded in number. This is in contrast to existing rule-based classification methods like CBA which despite generating excessive number of redundant rules, is still unable to cover some training data with the discovered rules. Experiments show that the RCBT classifier can match or outperform other state-of-the-art classifiers on several benchmark gene expression datasets.

WELCOME!

29 April: Sasu Tarkoma's dissertion

Sasu Tarkoma will defend his PhD thesis "Efficient Content-based Routing, Mobility-aware Topologies, and Temporal Subspace Matching" on 29 April, at 10.00 in the University Main Building, Unioninkatu 34, auditorium XIV.

The opponent is Professor Luís Rodrigues, Universidade de Lisboa, and the custos is Professor Kimmo Raatikainen.

 

28.2.2006 at 13.00, Exactum, D123

Guest lecture: Metabolome Research in Japan

Professor Masanori Arita (University of Tokyo)

Abstract: This talk introduces the ongoing research of three metabolome groups in Japan. First is the metabolite identification from exact mass at Keio University. Even with high-resolution mass spectrometry data, identification process includes many difficulties. Problems and software solutions to overcome these difficulties are shown, by focusing on metabolite databases and search tools developed in cooperation with Riken Plant Science Center. Obtained metabolome data and its dynamics should be analyzed in conjunction with metabolic maps. "Atomic Reconstruction of Metabolism" software system is expected to do this task, and its recent update will be introduced.

Welcome!


21st February at 2pm-4pm, Exactum C221

Guest lecture: Validation and Formal Verification of a Modern Microprocessor

Roope Kaivola, Intel Corporation

The microprocessor presents one of the most challenging design problems known to modern engineering. The number of transistors in each new process generation continues to follow the growth curve outlined by Gordon Moore 40 years ago. Micro-architecture complexity has increased immeasurably since the introduction of out-of-order speculative execution designs in the mid-90s; and there are no signs of a slowdown any time soon. Various schemes for managing and reducing power while retaining peak performance have also added entirely new dimensions of complexity.

Validating the next generation of microprocessors is a significant challenge. Microprocessor validation starts early in the design cycle - these days, even before the start of RTL coding, although most validation activities start with the first release of RTL code. For a new microprocessor, code is typically created and released in a number of carefully planned phases, over the course of approximately one year. The validation relies heavily on Cluster Test Environments (CTEs) to allow microarchitecture validation on logically related subsets of the design, decoupled from their environment.

Within Intel, formal verification has been deployed as an integral part of the validation process for roughly a decade now, and we are currently looking to increase the contribution of formal verification to the overall validation effort. We are developing combined formal and dynamic verification plans whose implementation will be coordinated so that we apply the best approach to the problem at hand.

To illustrate real-world formal verification challenges, we discuss our experiences in the formal verification of key parts of the Intel IA-32 Pentium 4 microprocessor designs, in particular the Pentium 4 register renaming mechanism, and describe a practical methodology for large-scale formal verification of control-intensive industrial circuits. Our approach combines symbolic simulation with human-generated inductive invariants in the context of a verification workbench built over a customized verification-oriented functional programming language.

Welcome!

17th February at 14.15 C222, Exactum

Guest lecture: How to teach Support Vector Machine to learn vector outputs

Dr Sandor Szedmak (Electronics and Computer Science, University of Southampton)

Abstract: The Support Vector Machine(SVM) has proved it is a very useful device of machine learning, but it was restricted to directly solve binary classification problems only. There is a strong demand to extend the underlying ideas towards multiclass classification and learning when the outputs have complex structure. The known approaches are tackling with the exploding computational complexity and the range of potential applications is generally very limited. We show there is a straightforward algebraic generalization of the SVM which can handle arbitrary vector outputs and preserves the same computational complexity of its binary ancestor. The structural learning problems can then be solved via an embedding into a properly chosen vector space.

The address of the mmr papers and codes is http://www.ecs.soton.ac.uk/~ss03v/mmr.html

Welcome!

February 6th, at 10.15am, Exatum C222

Wray Buntine's docent lecture

The title of the lecture is "Independence in probabilities and undirected graphs" and it will be given in English. The length of the lecture is 25 minutes.

Welcome!

February 2nd  at 12:15, Exactum C221

Guest lecture by PhD Rene Mayrhofer (University of Lancaster):

Context authentication: making secure communication more user-friendly

ABSTRACT: Mobile users depend heavily on ad-hoc connections for many, if not most, of their communication. The need for ad-hoc communication without any prior knowledge about the potential communication partners is often seen as a security risk and is certainly difficult to tackle, as is evident by the typical struggles that mobile users face when trying to connect via established communication technologies like WLAN or Bluetooth. For widespread use of secure communication in settings where no administrator can be expected to configure devices, but where users are on their own, more intuitive means of establishing secure channels are necessary. The security of such a channel usually rests on one seemingly simple property: authentication. Without proper authentication, one can not be sure who it is that the connection is made to. By solving the authentication problem, secure channels can be constructed easily by using one of the well-known standard protocols. Authentication based on context common to two or more devices or based on properties of the current context promises to make authentication more intuitive and simpler to use.

Bio data: Rene Mayrhofer is currently a post doc researcher at the Lancaster University  (UK) and explores the possibilities of context-based authentication in ubiquitous computing. Previously, he was with the Johannes Kepler University  of Linz (AT), where he submitted his PhD on context prediction.

Homepage: http://www.mayrhofer.eu.org/

WELCOME!

1st February at 12:15, room B222, Exactum

Guest lecture by PhD Tommi Jaakkola (MIT CSAIL)

Game theory, biology, and the binding game

ABSTRACT: Biological processes span across vastly different scales and necessarily have to be understood at multiple levels of abstraction. Towards clarifying the role that computation plays in such understanding, we have recently developed a class of game theoretic models for capturing coordinate operation of DNA binding regulators. Our work builds in part on the argument that the roles of various molecular interactions cannot be understood in isolation but that it is necessary to also capture the context provided by other mutually constraining processes. Our game theoretic model allocates proteins to neighborhoods of sites, and to sites themselves, in a resource constrained manner, while explicitly capturing coordinate and competitive relations among proteins with affinity to the site or region. We provide examples of known biological subsystems that are naturally translated into our framework, and illustrate predictions that can be derived from the model. The focus of the talk will be on mathematical foundations of the modeling approach and requires little or no biological background. This is joint work with Luis Perez-Breva, Luis Ortiz, and Chen-Hsiang Yeang.

Tommi S. Jaakkola received the M.Sc. degree in theoretical physics from Helsinki University of Technology, Finland, and Ph.D. from MIT in computational neuroscience. Following a postdoctoral position in computational molecular biology (DOE/Sloan fellow, UCSC) he joined the MIT EECS faculty 1998. He received the Sloan Research Fellowship 2002. His research interests include many aspects of machine learning, statistical inference and estimation in the context of graphical models, and analysis and development of algorithms for various modern estimation problems such as those involving multiple predominantly incomplete data sources. His applied research focuses on problems in computational biology such as transcriptional regulation.

Website: http://people.csail.mit.edu/tommi/

January 31th at 14:15, C221, Exactum

Guest lecture by PhD Rene Mayrhofer (University of Lancaster):

Context prediction based on learning user habits: A next step towards  "smart" systems

ABSTRACT: Pervasive Computing is a new research area at the intersection between human/computer interaction, embedded and distributed systems and networking technology. Its aim - the disappearance of computer technology into the periphery of daily life - necessitates an adaption of systems to the respective context in which they are used. Context-based interaction is therefore one of the building blocks of Pervasive Computing. Within the last five years, a number of seminal publications on the recognition of current context from a combination of different sensors have been written. The next logical step is the prediction of future contexts, with the general concept of predicting abstract contexts to allow computer systems to proactively prepare for future situations. This high-level context prediction allows - in contrast to the autonomous prediction of individual aspects like the geographical position - to consider patterns and interrelations in the user behavior which are not apparent at the lower levels of raw sensor data. In this talk, user-centered prediction of context is analyzed and an architecture for autonomous, background context recognition and prediction is presented, building upon established methods for data based prediction. Especial attention is turned to implicit user interaction to prevent disruptions of users during their normal tasks, to continuous adaption of systems to changed conditions, and to economical use of resources.

Bio data: Rene Mayrhofer is currently a post doc researcher at the Lancaster University  (UK) and explores the possibilities of context-based authentication in ubiquitous computing. Previously, he was with the Johannes Kepler University  of Linz (AT), where he submitted his PhD on context prediction.

Homepage: http://www.mayrhofer.eu.org/

WELCOME!

January 12, at 15:00, Exactum, C222

Guest lecture by Jiaheng Lu

Abstract: Finding all the occurrences of a twig pattern in an XML database is a core operation for efficient evaluation of XML queries. A number of algorithms have been proposed to process a twig query based on region encoding labeling scheme. While region encoding supports efficient determination of structural relationship between two elements, we observe that the information within a single label is very limited. In this talk, I will propose a new labeling scheme, called extended Dewey. This is a powerful labeling scheme, since from the label of an element alone, we can derive all the elements names along the path from the root to the element. Based on extended Dewey, we design a novel holistic twig join algorithm, called TJFast. Unlike all previous algorithms based on region encoding, to answer a twig query, TJFast only needs to access the labels of the leaf query nodes. Through this, not only do we reduce disk access, but we also support the efficient evaluation of queries with wildcards in branching nodes, which is very difficult to be answered by algorithms based on region encoding. Finally, we report our experimental results to show that our algorithms are superior to previous approaches in terms of the number of elements scanned, the size of intermediate results and query performance.

Bio data: Mr Lu Jiaheng is a fourth-year PhD candidate in National University of Singapore, supervised by Prof. Ling Tok Wang. His research is about XML query processing, XML full text search and data mining for semi-structured data. This talk is based on their paper in VLDB 2005.

2005

19th November, 10am, University main building, Small hall, Fabianinkatu 33

Antoine Doucet will defend his doctoral thesis "Advanced Document Description, a Sequential Approach" on Saturday 19th November at 10am. The dissertation takes place in the University main building, Small hall, Fabianinkatu 33.

The thesis is published in the Series of Publications A of the Department of Computer Science. Also, there will be an electronic publication in http://ethesis.helsinki.fi

The opponent is Assistant Professor Gael Dias, from the University of Beira Interior, Portugal, and custos Professor Helena Ahonen-Myka, from the University of Helsinki.

You can read abstract of the thesis here.

November 18th, at 14.15, room B222, Exactum

Guest lecture: Multiword Unit Extraction and Efficiency

Assistant professor Gaël Dias (University of Beira Interior, Portugal)

Abstract: Multiword units (MWUs) include a large range of linguistic phenomena, such as compound nouns (e.g. interior designer), phrasal verbs (e.g. run through), adverbial locutions (e.g. on purpose), compound determinants (e.g. an amount of), prepositional locutions (e.g. in front of) and institutionalized phrases (e.g. con carne). MWUs are frequently used in everyday language, usually to precisely express ideas and concepts that cannot be compressed into a single word. As a consequence, their identification is a crucial issue for applications that require some degree of semantic processing (e.g. machine translation, summarization, information retrieval).

In recent years, there has been a growing awareness in the Natural Language Processing (NLP) community of the problems that MWUs pose and the need for their robust handling. For that purpose, syntactical (Didier Bourigault, 1993), statistical (Frank Smadja, 1993; Ted Dunning, 1993; Gaël Dias, 2002) and hybrid syntaxico-statistical methodologies (Béatrice Daille, 1996; Jean-Philippe Goldman et al. 2001) have been proposed.

In this presentation, we will review the state-of-the-art in Multiword Unit extraction and present new methodologies developed at the Centre of Human Language Technology and Bioinformatics of the University of Beira Interior. In particular, we will present two softwares: SENTA and its evolution HELAS. Finally, we will present a new implementation based on suffix-arrays to extract Multiword Units in real-world applications.

November 18th, at 15.00, room B222, Exactum

Guest lecture: Stochastic models for XML document classification and semi-structured document mapping

Professor Patrick Gallinari

We present a family of generative techniques for modelling semi-structured (e.g. XML) documents where content elements are organized according to some structure that reflects logical, syntactic or semantic relations between these elements. The models allow handling both structure and content information.

We show how they can be used for generic machine learning problems like classification and clustering on this type of data. We then address the more complex problem of document (re)structuration. This consists in mapping documents coming from different sources, with different formats onto predefined semi-structured formats. This generic problem appears in different applications settings like heterogeneous semi-structured databases querying, peer to peer systems, legacy document conversion, XML information retrieval. We consider different instances of the restructuration problem like structuring flat documents or learning the correspondence between structured formats, and describe experiments performed on different XML document collections.

November 11th, at 14.15, room D122, Exactum

Guest lecture: Java-Native Integration: Solutions and Open Issues

Dr. Cristiano di Flora (Nokia Research Center, Tampere)

Abstract: Though optimization techniques in current JVMs allow them to support several classes of applications efficiently, the performance and effectiveness of several system components (e.g., 3D rendering engines, device drivers, file systems) when implemented on the native platform (i.e., the runtime of the operating system hosting the VM) are much more better than those achievable with a pure Java implementation. Hence, it is very likely that the java runtime will go on coexisting with the native (usually C or C++) runtime on the same device. This talk will go over state-of-the-art solutions and ongoing research works for integrating Java with the native platform. The lecture will cover issues related to both mobile embedded devices and stationary general-purpose systems based on commercial operating systems. The talk will also shed some light on the solutions currently adopted on Nokia Series 60 and Series 80 devices. The presentation will specifically outline the main logic behind the solution adopted to integrate the J2ME MIDP runtime with Nokia Series 60 platform and with Symbian OS. In addition, the support for Java Native Interface calls from J2ME CDC VM in Series 80 devices will be described.

Dr Cristiano di Flora is a Research Engineer in the End-user Runtime Platforms team of the Software and Application Technologies Laboratory, Nokia Research Center, Tampere, Finland. He has visited the Computer Science Department and HIIT BRU several times during the last 2 years, and his interaction with Professor Kimmo Raatikainen and Oriana Riva is still going on.

October 18th, at 14:00, room C221, Exactum

Guest lecture: Convolutive Blind Source Separation for Speech Enhancement

Prof. Scott C. Douglas Department of Electrical Engineering Southern Methodist University, Dallas

Abstract: In this talk, we outline and present procedures for the separation of multiple speech signals from multi-microphone recorded mixtures of speech. The techniques can be considered as time-domain extensions of well-known procedures for independent component analysis of instantaneous mixtures to the case of time-dispersive mixing channels. An emphasis is placed on adaptive procedures for lossless paraunitary systems and their role in contrast-based blind separation and signal reconstruction. Examples involving acoustic signal mixtures are given.

October 14th at 14:15, room D122, Exactum

Guest lecture: Recent steps in mobile ad hoc networking

Nokia Fellow Charles E. Perkins

Abstract: Within the IETF, the working group for Mobile Ad Hoc Networking [manet] has recently made new steps towards standardizing new routing protocols.  In particular, there is now a document specifying a new protocol called DYMO (for Dynamic Mobile Networks).

I will discuss these recent steps forward, and to give concrete examples I will describe more specifically some recent improvements to "Ad Hoc On-Demand Distance Vector" protocol (AODV) [RFC 3561].  Four protocols have been published as experimental specifications within the [manet] working group of the IETF.  I will describe in brief certain aspects of these protocols, and then go into a more detailed description of DYMO.  After describing these protocols, I will then give some opinions about how a convergence may be effected.  Convergence between protocols does provide a significant force for acceptance, and so technology that finds commonality between otherwise divergent protocols is highly desirable. The good and recent news is that commonality has been identified, so that link-state and distance-vector protocols can be merged, as well as (possibly!) proactive and reactive protocols.  Along the way, much has been learned about the basic nature of routing protocols, and yet much remains to be learned. Welcome!

October 12th at 14:15, room B119, Exactum

Guest seminar: Estimating haplotype frequencies efficiently

Dr. Eran Halperin (International Computer Science Institute, Berkeley; http://www.icsi.berkeley.edu/~heran )

Abstract: In this talk I will introduce a new method, HAPLOFREQ, to estimate haplotype frequencies over a short genomic region given the genotypes or haplotypes with missing data and/or sequencing errors. Our approach is based on rigorous analysis of the likelihood function, and in particular the method is guaranteed to efficiently converge to the global optimum of the likelihood function. Finally, I will discuss the relations between haplotype frequency estimation and tag snp selection.

Welcome!

October 4th, at 15:15, in room C222, Exactum

Guest lecture: " nFOIL: Integrating Naive Bayes and FOIL "

Niels Landwehr (University of Freiburg)

Abstract: Inductive Logic Programming (ILP) is concerned with inferring first-order logical rules from data. Approaches for combining probabilistic modeling and ILP, or probabilistic modeling and relational learning in general, have received a lot of attention recently. This is interesting because in many real-world learning problems the data is both multi-relational and noisy/imperfect, which makes probabilistic modeling approaches attractive.

FOIL is a simple but popular ILP algorithm, which greedily searches for a set of first-order rules. In the talk, the system nFOIL is presented. It tightly integrates FOIL and the Naive Bayes learning scheme, and thus represents a very simple instance of a probabilistic relational learner. Experimental evidence shows that nFOIL outperforms its baseline FOIL on typical ILP benchmark datasets, and is competitive with more sophisticated algorithms.

September 23th, 2pm, Exactum D122

Professor Tatuo Nakajima's (Waseda University, Tokyo)

Professor Nakajima's six months visit at our department is coming to end. He will give a talk entitled Future Directions in Systems Research.

September 20th, at 15:15, Exactum, C220

Guest lecture: Probabilistic models of dynamic interactions

Ata Kaban, University of Birmingham, UK

Abstract: The ability to analyze and predict apparently complex processes of multi-user activity from on-line interaction recordings is an extremely timely challenge from both practical and scientific perspectives, given the widespread use and impact of on line technologies. We present a summary of our recent and ongoing work on developing probabilistic distributed representation models of multiple dynamics, with applications in this area. These are capable of identifying communities, creating compact profiles, and capture both the heterogeneity of individual behaviors and the global traits of activity. Uncovering low complexity structures that underlie the apparent complexity of multi-user activity may be used to efficiently predict individual actions from population level activity histories.

Dr Ata Kaban is a Lecturer in the School of Computer Science, University of Birmingham, United Kingdom. She is visiting HIIT BRU for 2 weeks until September 24. http://www.cs.bham.ac.uk/~axk/
September 15th, at 11:15, Exactum, C222

Guest lecture: ModelGen: Model Independent Schema Translation

Paolo Atzeni

Abstract: A customizable and extensible tool is proposed to implement ModelGen, the model manage­ment operator that translates a schema from one model to another. A wide family of models is handled, by using a metamodel in which models can be succinctly and precisely described. The approach is novel because the tool exposes the dictionary that stores models, schemas, and the rules used to implement translations. In this way, the transformations can be customized and the tool can be easily extended.

(joint work with P. Cappellari and P.A. Bernstein)

August 29th at 10am, Exactum, C222

Guest lecture: Outdoor Distributed Computing
Professor Liviu Iftode (Rutgers University, USA)

Abstract: For several decades, distributed computing has been performed over networks of computers sitting indoor. From the computation point of view, the computer systems were identical, the configuration stable and networking robust. Computer and link failures were handled as accidents and here was no doubt that a correct program will return deterministically the expected result. With computers moving outdoors embedded in cars, clothes and buildings, a new and totally different computing landscape has emerged. When nodes and networking links are volatile and mobile, location dictates the role of a node in computation, and access time in the network cannot be bound, programming using conventional methods becomes hard if not impossible.

In this talk, I will introduce spatial programming, a new distributed computing model targeting highly-dynamic outdoor distributed systems. Central to spatial programming is the concept of spatial reference, which is used to virtualize the naming of a network resource with certain properties in a certain space. Using spatial references, reference consistency, space casting and access timeout, a programmer can express meaningful applications over a dynamic, physically distributed network of embedded systems. I will also describe an implementation of a spatial programming platform using Smart Messages, a lightweight mobile agent-like software architecture, which we developed for networks of embedded systems.

***

Liviu Iftode is an Associate Professor in the Department of Computer Science at Rutgers University, New Jersey. He received his Ph.D. in Computer Science from Princeton University in 1998. His research interests include distributed systems, operating systems, mobile networking and pervasive computing. Most of his work has been conducted with his students in the Distributed Computing (DISCO) Laboratory at Rutgers ( http://discolab.rutgers.edu ).

Liviu Iftode is the vice-chair of IEEE Technical Committee on Operating Systems and a member of the editorial boards of IEEE Pervasive Computing and IEEE Distributed Systems Online. He served in numerous program committees of technical conferences. More information can be found at http://www.cs.rutgers.edu/~iftode .

August 25th, at 15:15-16, Exactum room B222

Guest lecture: Fully Incremental LCS Computation

Dr. Shunsuke Inenaga, Department of Informatics, Kyushu University, Japan

Abstract: Sequence comparison is a fundamental task in pattern matching. Its applications include file comparison, spelling correction, information retrieval, and computing (dis)similarities between biological sequences. A common scheme for sequence comparison is the longest common subsequence (LCS) metric. This paper considers the fully incremental LCS computation problem as follows: For any strings $A,B$ and characters $a,b$, compute $\LCS(aA,B)$, $\LCS(A,bB)$, $\LCS(Aa,B)$, and $\LCS(A,Bb)$, provided that $L = \LCS(A,B)$ is already computed. We present an efficient algorithm that computes the four LCS values above, in $O(L)$ or $O(n)$ time depending on where a new character is added, where $n$ is the length of $A$. Our algorithm is superior in both time and space complexities to the previous known methods.

Welcome!

August 24th at 13:00-14:00 "Consequences of Minimum Description Length for Neural Nets and Gaussian Mixtures"

Professor Andrew Barron (Department of Statistics, Yale University)
Front meeting area of HIIT at Tammasaarenkatu 3 (High Tech Center Helsinki, "Pinta" Building), 6th floor.

More information about the speaker:
http://www.stat.yale.edu/people/andrewbarron.html

More about MDL:
http://www.mdl-research.org/

More about the location:
http://cosco.hiit.fi/location.html


August 23rd at 13:00-14:00
"Statistical Foundations and Properties of the Minimum Description Length Principle"

Professor Andrew Barron (Department of Statistics, Yale University)

Front meeting area of HIIT at Tammasaarenkatu 3 (High Tech Center Helsinki, "Pinta" Building), 6th floor.

More information about the speaker:
http://www.stat.yale.edu/people/andrewbarron.html

More about MDL:
http://www.mdl-research.org/

More about the location:
http://cosco.hiit.fi/location.html

August 18th at 14.15, Exactum C220

Guest lecture: Visual Tools for Collaborative Decision Making in E-learning Madhumita Bhattacharya

Abstract: Visual tools are used for representing ones thought or mental model in a pictorial format. This presentation will describe and demonstrate the use of visual tools for collaborative decision-making in an e-learning environment. Madhumita has used visual tools in various online courses for many different purposes for example: to create concept map to represent conceptual understanding, to organize and plan content, to brainstorm ideas and to make presentations. She has also designed a collaborative concept-mapping tool for critical thinking, knowledge construction, decision making, feedback and assessment in an online learning environment. Analysis and evaluation of students learning outcomes shows that students learn and understand better about their own and others thought processes and engage in meaningful communication with support of a visual representational tool.

About the speaker: Madhumita Bhattacharya was educated in India. She did her postdoctoral research in Japan. She has more than 18 years of research and teaching experience in universities in India, UK, Japan, Singapore, Australia, Estonia and New Zealand. Madhumita is a Senior Lecturer at Massey University. She is also a Visiting Professor at the University of Tartu, Estonia. Madhumita?s specialization and research interest is in science education, multicultural issues and learning technologies. She is a recipient of several research grants. She is the author and co-author of more than 80 publications in journals, conference proceedings and book chapter. She supervises research students both at the Massey University and at the University of Tartu. Madhumita serves in many academic committees and journals as executive committee member and as a member of review board respectively. Welcome!

June 30th, 2pm, Exactum room B119.

Guest lecture: Conservative Approaches to Pattern Discovery in Biosequences

Cinzia Pizzi, University of Padova

The problem of discovery of significant patterns within a sequence or among a set of sequences is a key issue common to several application domanins in Computational Biology and Bioinformatics. The searching for shared or over-represented patterns is motivated by a simple biological principle: if two or more sequences perform the same functions or have the same structure, then the common elements among the sequences might be responsible for the observed similarity. Moreover, the genetic code must be fault-tolerant to deal with errors that may occur during transcription of the genetic code or to random mutations, so that an intrinsic variability characterizes biological elements.This variability is responsible for the possible explosion of the size of the data under study, due to the rich underlying combinatorics. For this reason, approaches that exhaustively enumerate the search space are hardly applicable on a large scale, due to the exponential time and space required. On the other hand, heuristics approaches are not guaranteed to find the optimal solution. A compromise is to rely on Conservative Approaches to partition the search space in classes in such a way that only one representative element per class need to be evaluated according to some counting or scoring measure. We specifically considered the problem of applying the conservative approach to patterns with mismatches which location is not known "a priori". We designed two algorithms for the efficient computation of the probability and expected number of occurrences for substrings of the input text with (either exactly or up to) k mismatches. Further, monotonicity conditions are introduced and studied for probabilities and expected occurrences of a substring under increases in its length, allowed number of errors, or both. Over intervals of constant frequency count, these monotonicities translate to some of the scores in use, thereby reducing the size of tables at the outset and enhancing the process of discovery.

June 7th, 14:00 Exactum, B222

Guest lecture: Reliable identification of significant sets of episodes in event sequences

Robert Gwadera

In this talk we present a solution to the problem of identification of significant sets of episodes in event sequences. In order to determine significance of an episode in a monitored event sequence we compare its observed frequency to its frequency in a reference model of normal behavior. The reference model, in our work, is represented by a variable-length Markov model of generating symbols in the event stream. An episode is significant if the probability that it would have a given frequency by chance, in the reference model, is very small. In order to identify significant episodes we first show how to select the sliding window size to ensure that a discovered set of episodes is meaningful and then we show how to compute a lower threshold for under-represented and the upper threshold for overrepresented significant episodes. Note that the frequency of occurrence is not enough to determine significance, i.e, an infrequent episode can be more significant than a frequent one and it depends on probabilistic characteristics of the event stream.

As an extension, we propose a novel method for providing approximate answers with probabilistic guarantees to a class of ad hoc sliding window queries referencing past data in data streams. The queries in that class compute the frequency of past windows that satisfy given join conditions among tuples in a window comprising multiple streams. To represent the join conditions we introduce a concept of a 2D-episode to represent intra-stream and inter-stream constraints between tuples in the window.

May 27th, 12.00, Exactum (Gustaf Hällströmin katu 2b), B123

Public defence of a doctoral thesis: "Summarization Techniques for Pattern Collections in Data Mining"

Taneli Mielikäinen

Discovering patterns from data is an important task in data mining. There exist techniques to find large collections of many kinds of patterns from data very efficiently. A collection of patterns can be regarded as a summary of the data. A major difficulty with patterns is that pattern collections summarizing the data well are often very large.

In this dissertation we describe methods for summarizing pattern collections in order to make them also more understandable. More specifically, we focus on the following themes:

Quality value simplifications.
We study simplifications of pattern collections based on simplifying the quality values of the patterns. Especially, we study simplification by discretization.
Pattern orderings.
It is difficult to find a suitable trade-off between the accuracy of the representation and its size. As a solution to this problem, we suggest that patterns could be ordered in such a way that each prefix of the pattern ordering gives a good summary of the whole collection.
Pattern chains and antichains.
Virtually all pattern collections have natural underlying partial orders. We exploit the partial orders over pattern collections by clustering the patterns into chains and antichains.
Change profiles.
We describe how patterns can be related to each other by comparing how their quality values change with respect to their common neighborhoods, i.e., by comparing their change profiles.
Inverse pattern discovery.
As the patterns are often used to summarize data, it is natural to ask whether the original data set can be deduced from the pattern collection. We study the computational complexity of such problems.

May 9th, 4:15 pm, room D122, Exactum

Guest lecture: Lessons learnt and research intentions for addressing
interoperability and dependability issues in nomadic computing systems

PhD Cristiano di Flora, University of Naples

Abstract:

A nomadic computing environment consists of heterogeneous sub-systems which enable the discovery and the delivery of services to mobile users. This talk will describe the main results achieved and the lessons learnt during my PhD on interoperable nomadic computing systems. In particular the talk will describe the novel strategy that I designed and implemented to effectively deal with the heterogeneity of both discovery and delivery infrastructures, and of specific applications as well.
My PhD has been carried out within the framework of several research
projects in which the Mobilab Research Group at the University of Naples has been involved during the last years, including research projects on strong heterogeneity, context awareness, dynamic service discovery and composition, dependability, and new models and infrastructures for nomadic multimedia services.  I also take a look for interesting and
valuable research themes to lay the groundwork for post-doctoral research activities and projects with the NODES research group. In my talk, I will outline themes related either to ongoing activities or interesting future research directions.

May 2nd, 4:15 pm, Exactum D122

Guest lecture: Software Infrastructures for Ubiquitous Computing

Professor Tatsuo Nakajima, Waseda university

Abstract:

In the near future, our daily objects will contain computers and sensors  to make our daily lives more intelligent. Our current software  infrastructures are not enough to realize the vision. Our research group is working on a wide range of research activities for making the vision realistic. Our activities are conducted by three groups. The first group is developing intelligent daily objects called "Sentient Artefact". The second group is investigating a future mobile device, and in the last project, we are working on developing operating systems for future appliances. In my talk, I'll describe an overview of the activities.

April 6th, 14.15, Exactum, C222
Guest lecture: Implicit modelling in information retrieval

Dr. Ian Ruthven, Department of Computer and Information Sciences, University of Strathclyde (Glasgow, UK)

The talk discusses a study of implicit search behaviour on the web looking at how the quality of new query terms suggested by an implicit feedback system can vary according to the user task, experience and interaction.

March 1st, 14.00 (sharp), Exactum, B222
Guest lecture: Clustering and Indexing Methods for High Dimensional Data and Moving Objects

Dimitris Papadopoulos

In this talk we present methods for clustering and indexing high dimensional data and moving objects.

The clustering problem concerns the discovery of homogeneous groups of data according to a certain similarity measure. It is not meaningful to look for clusters in high dimensional spaces as the average density of points anywhere in input space is likely to be low. We introduce the Locally Adaptive Clustering (LAC) algorithm, which discovers clusters in subspaces spanned by different combinations of dimensions via local weightings of features. Our method associates to each cluster a weight vector, whose values capture the relevance of features within the corresponding cluster. We formally prove that the algorithm converges, and experimentally demonstrate the gain in performance our method achieves, using both synthetic and real data sets. Furthermore, we present an SQL implementation of LAC.

We propose methods to index moving objects in order to efficiently answer range queries about their current and future positions. We address the problem in external memory and present dynamic solutions, both for the one-dimensional, as well as the two-dimensional cases. Our approach transforms the problem into a dual-space that is easier to index. An experimental evaluation with the TPR-tree shows that the dual transformation approach provides comparable query performance but has much faster update processing. Moreover, unlike the TPR-tree, the dual method does not require establishing a predefined query horizon.

Past events