Discovery group: Data Mining for Pattern and Link Discovery
This is our (very) old web page. The new page is at http://www.helsinki.fi/discovery.
We develop novel methods and tools for pattern and link discovery. Our focus is on structured and heterogeneous data, such as graphs, and also on sequences. The importance of data mining in heterogeneous and structured data will only grow in the future. There will be an increasing amount of challenging and important problems, especially in scientific applications. Our current main applications are in bioinformatics, in collaboration with applied scientists and companies.
Our research topics are motivated by novel problems in applications. Our current emphasis is on analysis and link discovery in weighted (biological) graphs (Biomine project). We identify computational problems in them, develop new algorithms, and apply them. While we value fielded applications with an impact, we also emphasize solid, application independent methods and results. We recently introduced (variable length) Markov models to the problem of reconstructing haplotype strings [BMC Bioinformatics, software]. We have developed novel concepts and methods for gene mapping, for instance, based on discovery of genetically motivated tree-structured patterns [EEE/ACM Transactions on Computational Biology and Bioinformatics, American Journal of Human Genetics, software]. These methods have turned out to be very useful in the practice of medical genetics. In context- sensitive computation, the group developed the ContextPhone software that is in wide use in several research institutions all over the world [IEEE Pervasive Computing, software].
The group works jointly under the Department of Computer Science at University of Helsinki, and The Helsinki Institute for Information Technology (HIIT). We constitute a part of Algodan (Algorithmic Data Analysis), a national Centre of Excellence in research.
Contents:
Projects
This is our old web page. The new page containing this information is at http://www.cs.helsinki.fi/discovery/projects.
Biomine: A biological search engine
We view biological databases of sequences, proteins, genes etc. as
weighted graphs
and develop methods for link discovery and analysis in such graphs.
Try out the prototype search engine at
biomine.cs.helsinki.fi!
(Funding: National Technology Agency (Tekes) and companies.)
Bison: Bisociation
Networks for Creative Information Discovery
The aim is to develop and validate a novel computational methodology,
which facilitates bisociative information discovery in large-scale
heterogeneous information environments.
(Funding: European Commission under the Framework 7 programme;
on-going.)
Context: Context Recognition by User Situation
Data Analysis
The Context project studies characterization and analysis of
information about user's context and its use in proactive adaptivity.
We have developed data analysis algorithms as well as
ContextPhone, a mobile context-aware prototyping platform, available as
free software.
(Funding: Academy of Finland, PROACT Programme; formally finished, work
continues with internal funding.)
Data mining in genetics
We develop models, methods and tools for analyzing genetic data, in
particular
for gene mapping and haplotype analysis.
(Funding: National Technology Agency (Tekes) and companies, HIIT;
formally finished, work
continues with internal funding.)
Software
This is our old web page. The new page containing this information is at http://www.cs.helsinki.fi/discovery/software.
Biomine search engine
A prototype of an associative search engine for biological information,
integrated from multiple
public sources and implemented using techniques developed in the project.
HaploRec
- Haplotype reconstruction
Scalable software for population-based haplotype phasing, especially
for
sparse marker maps.
ContextPhone -
Context-aware platform for mobile phones
ContextPhone is an open software platform for context-aware
applications.
It can be used to collect, analyze and transmit information about its
context, as well as to tag and publish contextual media.
HPM and
TreeDT - Gene mapping methods
Software for association analysis, i.e., gene mapping based on linkage
disequilibrium
AsVis -
Visualization of association rules in SNP neighborhoods
A web application, available as source code, for visualizing
association rules
obtained from short, sequential data, such as SNP neighborhoods
(on-line demo)
Bassist -
MCMC simulation for Bayesian statistical models
Bassist is a tool that automates the use of hierarchical Bayesian
models
in complex analysis tasks, by generating a model-specific MCMC sampler.
Bassist is not supported any more.
People
This is our old web page. The new page containing this information is at http://www.cs.helsinki.fi/discovery/people.
Group leader
Postdocs
- Dr. Alessandro Valitutti, postdoc (8/2011-)
PhD students
- Laura Langohr, PhD student (7/2008-) and MSc student (10/2007-6/2008)
- Fang Zhou, PhD student (10/2008-9/2012)
- Esther Galbrun, PhD student (9/2010-; co-supervised with Mikko Koivisto)
- Oskar Gross, PhD student (1/2012-) and exchange MSc student (5/2010-12/2011)
- Jukka Toivanen, PhD student (1/2012-) and trainee (5-12/2011)
- Joonas Paalasmaa, PhD student (9/2010-; employed by Finsor Ltd.)
- Maria-Eleni Skarkala, visiting PhD student (9/2011-3/2012)
Past visitors and postdocs
- Prof. Ehud Gudes, Ben-Gurion University, Israel (visit 16. - 29.5.2010)
- Assoc. Prof. Jiuyong Li, University of South Australia (visit 16.8. - 15.10.2010)
- Dr. Floris Geerts, postdoc (9/2002-4/2004; left to University of Edinburgh)
- Dr. Bart Goethals, postdoc (1/2003-9/2004; now professor in Antwerp)
- Dr. Päivi Onkamo, postdoc (11/2002-12/2004; left to the Department of Biological and Environmental Sciences)
- Dr. Sebastien Mahler, postdoc (2/2009-7/2010)
Alumni
- Dr. Kari Vasko, PhD 2004; left to CSC - IT Center for Science
- Dr. Petteri Sevon, PhD 2004; left to Biocomputing Platforms Finland
- Dr. Mika Raento, PhD 2007; left to Jaiku, then Google
- Dr. Kimmo Hätönen, PhD 2009; employed by Nokia Siemens Networks
- Dr. Kari Laasonen, PhD 2009; left to Google Zurich
- Dr. Petteri Hintsanen, PhD 2011, left to GE Healthcare
- Lauri Eronen, PhD expected in 2012; left to Biocomputing Platforms Finland
Publications
This is our old web page. The new page containing this information is at http://www.cs.helsinki.fi/discovery/publications.
-
Compression of weighted graphs
Hannu Toivonen, Fang Zhou, Aleksi Hartikainen, Atte Hinkka. In The 17th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), San Diego, CA, USA, August 2011. -
Predicting and preventing student failure -
using the k-nearest
neighbour method to predict student performance in an
online course environment
Tuomas Tanner and Hannu Toivonen. International Journal of Learning Technology 5 (4): 356-377. 2010. -
Probabilistic Inductive Querying Using
ProbLog
Luc De Raedt, Angelika Kimmig, Bernd Gutmann, Kristian Kersting, Vitor Santos Costa, and Hannu Toivonen. In Inductive Databases and Constraint-Based Data Mining by S. Dzeroski, B. Goethals, and P. Panov (Eds.), 229-262, Springer, 2010. -
Frequent Pattern
Hannu Toivonen. In Encyclopedia of Machine Learning by Claude Sammut and Geoffrey I. Webb (Eds.), Springer, 2011. (H. Toivonen also contributed short entries for Apriori Algorithm1>, Association Rule, Basket Analysis, and Frequent Itemset. -
Network Simplification with
Minimal Loss of Connectivity
Fang Zhou, Sebastien Mahler, and Hannu Toivonen. The 10th IEEE International Conference on Data Mining (ICDM), Sydney, Australia, December 2010. -
Fast Discovery of Reliable Subnetworks
Petteri Hintsanen, Hannu Toivonen, and Petteri Sevon. The 2010 International Conference on Advances in Social Networks Analysis and Mining (ASONAM), Odense, Denmark, August 2010. - A Framework for Path-Oriented Network Simplification, Hannu Toivonen, Sebastien Mahler, and Fang Zhou. The Ninth International Symposium on Intelligent Data Analysis (IDA), Tucson, Arizona, US, May 2010.
- Fast Discovery of Reliable k-terminal Subgraphs, Melissa Kasari, Hannu Toivonen, and Petteri Hintsanen. The 14th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), Hyderabad, India, June 2010.
- Bisociative Knowledge Discovery for Microarray Data Analysis, Igor Mozetic, Nada Lavrac, Vid Podpecan, Petra Kralj Novak, Helena Motaln, Marko Petek, Kristina Gruden, Hannu Toivonen, and Kimmo Kulovesi. The 1st International Conference on Computational Creativity (ICCC-X), 190-199, Lisbon, Portugal, January 2010.
- Constructing Information Networks from Textual Documents, Matjaz Jursic, Nada Lavrac, Igor Mozetic, Vid Podpecan, and Hannu Toivonen. Workshop on Explorative Analytics of Information Networks at ECML PKDD, 23-38, Bled, Slovenia, September 2009.
- Finding representative nodes in probabilistic graphs, Laura Langohr, Hannu Toivonen. Workshop on Explorative Analytics of Information Networks at ECML PKDD, 65-76, Bled, Slovenia, September 2009.
- Review of Network Abstraction Techniques, Fand Zhou, Sebastien Mahler, Hannu Toivonen Workshop on Explorative Analytics of Information Networks at ECML PKDD, 50-63, Bled, Slovenia, September 2009.
- Finding Reliable Subgraphs from Large Probabilistic Graphs, Petteri Hintsanen, Hannu Toivonen. Data Mining and Knowledge Discovery 17 (1): 3-23. 2008.
- Compressing Probabilistic Prolog Programs, Luc De Raedt, Kristian Kersting, Angelika Kimmig, Kate Revoredo, Hannu Toivonen. Machine Learning 70 (2-3): 151-168. 2008.
- Probabilistic Explanation Based Learning, Luc De Raedt, Angelika Kimmig, Hannu Toivonen. 18th European Conference on Machine Learning (ECML), 176-187, Warsaw, Poland, September 2007. Winner of the ECML-07 Best Paper Award.
- ProbLog: A Probabilistic Prolog and its Application in Link Discovery, Luc De Raedt, Angelika Kimmig, Hannu Toivonen. Twentieth International Joint Conference on Artificial Intelligence (IJCAI-07), 2468-2473, Hyderabad, India, January 2007.
- The Most Reliable Subgraph Problem, Petteri Hintsanen. 11th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD), 471-478, Warsaw, Poland, September 2007.
- Constrained Hidden Markov Models for Population-based Haplotyping, Niels Landwehr, Taneli Mielikainen, Lauri Eronen, Hannu Toivonen, Heikki Mannila. BMC Bioinformatics 2007, 8(Suppl 2):59.
- Interpreting and Acting on Mobile Awareness Cues, Antti Oulasvirta, Renaud Petit, Mika Raento, and Sauli Tiitta. Human-Computer Interaction 22:97-135. 2007.
- Algorithms for unimodal segmentation with applications to unimodality detection, Niina Haiminen, Aristides Gionis, and Kari Laasonen. Knowledge and Information Systems 14:1, 39-57. 2007.
- HaploRec: Efficient and accurate large-scale reconstruction of haplotypes, Lauri Eronen, Floris Geerts, and Hannu Toivonen. BMC Bioinformatics 7:542. 2006.
- An empirical comparison of case-control and trio-based study designs in high-throughput association mapping, Petteri Hintsanen, Petteri Sevon, Päivi Onkamo, Lauri Eronen, and Hannu Toivonen. Journal of Medical Genetics 43: 617-624, 2006.
- Closed Non-Derivable Itemsets, Juho Muhonen and Hannu Toivonen. The 10th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD 06), 601-608, Berlin, Germany, September 2006.
- Revising Probabilistic Prolog Programs, L. De Raedt, K. Kersting, A. Kimmig, K. Revoredo, and H. Toivonen. The 16th International Conference on Inductive Logic Programming (ILP 2006), Santiago de Compostela, Spain, August 2006.
- Visualisation of Associations Between Nucleotides in SNP Neighbourhoods, Kimmo Kulovesi, Juho Muhonen, Ilkka Lappalainen, Pentti T. Riikonen, Mauno Vihinen, Hannu Toivonen and Tomi A. Pasanen. Workshop on Intelligent Data Analysis in bioMedicine and Pharmacology (IDAMAP 06), 61-62, Verona, Italy, August 2006.
- Link discovery in graphs derived from biological databases, Petteri Sevon, Lauri Eronen, Petteri Hintsanen, Kimmo Kulovesi, Hannu Toivonen. 3rd International Workshop on Data Integration in the Life Sciences 2006 (DILS'06), LNBI 4705, 35-49, Hinxton, UK, July 2006. Springer.
- Constrained Hidden Markov Models for Population-based Haplotyping, Niels Landwehr, Taneli Mielikainen, Lauri Eronen, Hannu Toivonen, Heikki Mannila. Workshop on Probabilistic Modeling and Machine Learning in Structural and Systems Biology, Tuusula, Finland, June 2006.
- TreeDT: Tree pattern mining for gene mapping, Petteri Sevon, Hannu Toivonen, Vesa Ollikainen. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 3 (2): 174-185, April-June 2006.
- A survey of data mining methods for linkage disequilibrium mapping, Päivi Onkamo and Hannu Toivonen. Human Genomics, 2 (5): 336-340, 2006.
- The Data Subjects Right of Access and to be Informed in Finland: An Experimental Study, Mika Raento. International Journal of Law and Information Technology. In Press, Advance Access published online on October 27, 2006.
- LOCA: ``Set to discoverable'', an interactive installation, was presented at ZeroOne San Jose / ISEA 2006.
- Loca: Location Oriented Critical Arts, John Evans, Drew Hemment, Theo Humphries, and Mika Raento. LEONARDO electronic almanac, 14(3), MIT Press, 2006.
- Evaluating classifiers for mobile-masquerader detection, Oleksiy Mazhelis, Seppo Puuronen, and Mika Raento. In Proceedings of the Security and Privacy in Dynamic Environments (SEC2006), 21st IFIP TC-11 International Information Security Conference. Springer, pp. 271-283, 2006.
- Estimating Accuracy of Mobile-Masquerader Detection Using Worst-Case and Best-Case Scenario, Oleksiy Mazhelis, Seppo Puuronen and Mika Raento. In Information and Communications Security, LNCS 4307. Springer, pp. 302-321, 2006.
- Loca: Location Oriented Critical Arts, John Evans, Drew Hemment, Theo Humphries, and Mika Raento. In Hothaus Papers: perspectives and paradigms in media arts, Eds Joan Gibbons and Kaye Winwood. Article Press 2006.
- ContextPhone - A prototyping platform for context-aware mobile applications, Mika Raento, Antti Oulasvirta, Renaud Petit, Hannu Toivonen. IEEE Pervasive Computing, 4 (2): 51-59, 2005.
- Combining phenotypic and genotypic data to discover multiple disease genes, Hannu Toivonen, Saara Hyvönen, Petteri Sevon. Symposium on Knowledge Representation in Bioinformatics (KRBIO'05), 7-14, Espoo, Finland, June 2005.
- Mining non-derivable association rules, Bart Goethals, Juho Muhonen, and Hannu Toivonen. SIAM International Conference on Data Mining, 239-249, Newport Beach, CA, April 2005. SIAM.
- Data Mining in Bioinformatics, Jason Wang, Mohammed Zaki, Hannu Toivonen, and Dennis Shasha (Eds.), Springer, 2005. ISBN 1-85233-671-4.
- Data Mining for Gene Mapping, Hannu Toivonen, Päivi Onkamo, Petteri Hintsanen, Evimaria Terzi, and Petteri Sevon. In Next Generation of Data Mining Applications by Mehmed Kantardzic and Jozef Zurada (Eds.), 263-293. Wiley-IEEE Press, 2005. (manuscript)
- Mobile HCI 2004 Location Systems Privacy and Control Workshop, Giovanni Iachello, Mika Raento, and Ian Smith. IEEE Pervasive Computing, 4, (1): 90, 2005.
- ContextContacts: Re-Designing SmartPhone's Contact Book to Support Mobile Awareness and Collaboration, Antti Oulasvirta, Mika Raento, Sauli Tiitta, Proc. of MobileHCI 05, p. 167-174. ACM, 2005.
- Clustering and Prediction of Mobile User Routes from Cellular Data, Kari Laasonen. PKDD 2005, LNAI 3721, Springer Verlag (2005), 569-576. (C) Springer-Verlag.
- Sosiaalista tilatietoa kontekstipuhelimella, Mika Raento, Antti Oulasvirta, Hannu Toivonen ja Martti Mäntylä. Prosessori 1/2005, s. 54-56. (käsikirjoitus)
- Gene Mapping by Pattern Discovery, Petteri Sevon, Hannu T.T. Toivonen, and Päivi Onkamo. In J. Wang et al (Eds.), Data Mining in Bioinformatics, 105-126. Springer, 2005. (manuscript)
- Route Prediction from Cellular Data, Kari Laasonen. In Proceedings of the workshop on Context Awareness for Proactive Systems, CAPS 2005, 147-158.
- Privacy management for social awareness applications, Mika Raento, Antti Oulasvirta. In Proceedings of the workshop on Context Awareness for Proactive Systems, CAPS 2005, 105-114.
- Mining relaxed graph properties in internet, Wilhelmiina Hämäläinen, Hannu Toivonen, and Vladimir Poroshin. IADIS International Conference WWW/Internet 2004, 152-159, Madrid, Spain, October 2004.
- Paleoekologia - kadonneen aarteen metsästys, Atte Korhola, Hannu Toivonen, Kari Vasko. Tietoyhteys 3/2004 (lokakuu), s. 26-28.
- Segmentation of paleoecological spatio-temporal count data, Kari Vasko, Hannu Toivonen, Atte Korhola. Proceedings of the Fourth International Workshop on Environmental Applications of Machine Learning (EAML 2004), 61-62, Bled, Slovenia, Sep-Oct 2004.
- A Markov chain approach to reconstruction of long haplotypes, Lauri Eronen, Floris Geerts, and Hannu Toivonen. Pacific Symposium on Biocomputing (PSB 2004), 104-115, Hawaii, USA, January 2004. World Scientific.
- Adaptive On-device Location Recognition, K. Laasonen, M. Raento, H. Toivonen. Pervasive Computing: Second International Conference, PERVASIVE 2004, LNCS 3001, Springer Verlag (2004), 287-304. (C) Springer-Verlag.
- Mobile Communication and Context Dataset, Mika Raento. In Proceedings of the Workshop Towards Benchmarks and a Database for Context Recognition, In International Conference on Pervasive Computing. ETH, 2004.
- Kill your Personal Data Dead, Mika Raento. In Proceedings of the Workshop On Location Systems Privacy and Control, in 6th International Conference on Human Computer Interaction with Mobile Devices and Services MobileHCI'04. University of Strathclyde, 2004.
- Ethic of choice for Location-based systems, Mika Raento. In Proceedings of the Workshop on Usability, Utility and Ethics of Location Based Services, in NordiCHI 2004. University of Tampere, 2004.
- Context software - A prototype platform for contextual mobile applications, Mika Raento. In Proceedings of the International Proactive Computing Workshop. University of Helsinki, 2004.
- Proceedings of the Workshop On Location Systems Privacy and Control, in 6th International Conference on Human Computer Interaction with Mobile Devices and Services MobileHCI'04 Ian Smith, Giovanni Iachello, and Mika Raento, editors. University of Strathclyde, 2004.
- Context - Prototyping platform for contextual media. Mika Raento. Poster at the 12th International Symposium on Electronic Art ISEA2004. Helsinki, Finland. 2004.
- Automated Detection of Epidemics from the Usage Logs of a Physicians' Reference Database, Jaana Heino and Hannu Toivonen. 7th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD-2003), 180-191, Cavtat-Dubrovnik, Croatia, September 2003. Springer.
- Proceedings of BIOKDD'03, 3rd ACM SIGKDD Workshop on Data Mining in Bioinformatics, Mohammed Zaki, Jason T.L. Wang, and Hannu Toivonen (Eds.). Washington DC, August 2003. Report No. 03-11, Rensselaer Polytechnic Institute, Troy, NY. 2003. (electronic proceedings)
- Statistical evaluation of the predictive toxicology challenge 2000-2001, Hannu Toivonen, Ashwin Srinivasan, Ross D. King, Stefan Kramer, and Christoph Helma. Bioinformatics 19 (10): 1183 - 1193, 2003.
- Discovering all most specific sentences, Dimitrios Gunopulos, Roni Khardon, Heikki Mannila, Sanjeev Saluja, Hannu Toivonen, and Ram Sewak Sharma. ACM Transactions on Database Systems 28 (2): 140 - 174, June 2003. (DOI: http://doi.acm.org/10.1145/777943.777945)
- BIOKDD 2002: Recent Advances in Data Mining for Bioinformatics, M.J. Zaki, J.T.L. Wang, and H.T.T.Toivonen. SIGKDD Explorations 4 (2): 112 - 114, January 2003.
- Estimating the number of segments in time series data using permutation tests, Kari Vasko and Hannu Toivonen. The 2002 IEEE International Conference on Data Mining (ICDM'02), 466 - 473, Maebashi City, Japan, December 2002. IEEE.
- Association analysis for quantitative traits by data mining: QHPM, P. Onkamo, V. Ollikainen, P. Sevon, HTT. Toivonen, H. Mannila, and J. Kere. The Annals of Human Genetics 66: 419 - 429, 2002.
- Bayesian analysis of metapopulation data, R.B. O'Hara, E. Arjas, H. Toivonen, and I. Hanski. Ecology 83 (9): 2408 - 2415, 2002.
- Holocene temperature changes in northern Fennoscandia reconstructed from chironomids using Bayesian modeling, A. Korhola, K. Vasko, H. Toivonen, and H. Olander. Quaternary Science Reviews 21(16-17), 1841 - 1860, 2002.
- Machine Learning: ECML 2002 - 12th European Conference on Machine Learning, LNCS 2430, T. Elomaa, H. Mannila, H. Toivonen (Eds.). Springer 2002.
- Principles of Data Mining and Knowledge Discovery - 6th European Conference, PKDD 2002, LNCS 2431, T. Elomaa, H. Mannila, H. Toivonen (Eds.). Springer 2002.
- BIOKDD01: Workshop on Data Mining in Bioinformatics, M.J. Zaki, J.T.L. Wang, and H.T.T.Toivonen. SIGKDD Explorations 3 (2): 71 - 73, January 2002.
Contact
Contact: Prof. Hannu Toivonen, email firstname.lastname@cs.helsinki.fi.