Seminar: Constraint Solving Meets Machine Learning and Data Mining (Spring 2013)
Introduction
Constraint programming, maching learning, and data mining are today well-established and thriving research fields within computer science. Each of the fields
have contributed fundamental techniques and algorithmic solutions
that are today routinely applied for addressing hard computational problems
in various real-world contexts, including both industrial and scientific applications.
However, these fields have developed somewhat independently of each other.
Especially, the possibilities of exploiting the highly optimized constraint solving technology available today in providing generic and efficient solutions to various
machine learning and data mining problems, such as different kinds of classification, structure-learning, probabilistic reasoning, and pattern mining tasks,
have only recently been realized.
In addition, machine learning techniques are starting to be employed in
further improving constraint solvers e.g. by building intelligent
constraint solver portfolios based on model-based
selection.
The aim of this seminar is to provide a snapshot into the current state of forefront research in the intersection of constraint solving, machine learning and data mining, by studying recently published research articles.
Each student will select one or more articles based on which (s)he will
write a seminar report and give a seminar presentation during two workshop days to be held during May 6-14.
This is a very topical area of research, and possibilities of extending the seminar work into an MSc thesis may be available (consult the teacher) within the COIN Centre of Excellence in Computational Inference Research.
For more background on the seminar topic, see e.g.
Dagstuhl seminar on Constraint Programming meets Machine Learning and Data Mining, and
CoCoMile 2012: First workshop on COmbining COnstraint solving with MIning and LEarning.
Course Information
- Dates and Places:
- First meeting, Introduction and choosing topics: March 13, 12-14 in Exactum B119 (sorry, the place was wrong!!)
- Seminar presentations: 1-2 workshop days during May 6-14 in Exactum B119.
- Course code: 58313103
- Credit units: 3
- Teacher-in-charge: Dr. Matti Järvisalo,
matti.jarvisalocs.helsinki.fi
- Language: The seminar is held by default in English (given that there are participants who do not speak Finnish).
It is also possible to give a presentation / write the report in Finnish (however, English is recommended).
- Material: Selected scientific articles.
See the list of possible topics for details.
- Prerequisites: You must have passed the course Scientific Writing or have equivalent skills. Otherwise, no formal prerequisites.
Basic knowledge on at least one of the areas constraint programming, data mining, and machine learning, is a great asset.
Courses from the CS curriculum which provide useful background include
Introduction to Artificial Intelligence,
Discrete Optimization,
Introduction to Machine Learning,
and
Data Mining.
Most important is a desire to actively and independently learn new things!
Course Requirements
See also the introductory slides from the first meeting on March 13.
- Requirement for passing the course: (i) A seminar presentation (30 minutes), (ii) a report paper (10-15 pages plus references) on a selected topic related to the seminar, (iii) acting as an opponent for other student(s).
You must individually pass all three parts in order to pass the course.
- Deadlines: All deadlines are strict, and you will fail the course if you miss a deadline. Deadline extensions are given only in case of an illness or for similar reasons; a doctor's certificate will be required.
- Preliminary draft of the report, at least 5 pages: April 13
- Feedback on the preliminary draft of your opponent: April 22
-
Full report and preliminary presentation slides:
- May 2 for those presenting on May 7
- May 7 for those presenting on May 14
-
Final report: May 21
- Grading: The presentation and report paper are both graded on the scale 1-5. Active participation, including acting as an opponent and actively attending the presentations, also has an influence on the grade. Most emphasis is put on the report.
For passing the course, you have to complete all compulsory parts in time, your report and presentation must be of an acceptable level, and you have acceptably acted as an opponent.
The highest grade should be achievable if you work hard, do your best, and listen to the feedback (both from the instructor and from other students).
This course is worth 3 credits, which entails approximately 80 hours of work. The grading is calibrated so that roughly this amount of work suffices for a good grade.
Guidelines
Keep in mind that the written report and the oral presentation have partially different goals.
-
The oral presentation should explain the main ideas of the content, simplifying concepts when necessary. Depending on the topic, a good presentation should include many examples to illustrate the subject matter and only some choice technical details that are important and can be discussed thoroughly enough during the presentation. The oral presentation should last 30 minutes (plus discussions).
-
For the report, put more emphasis on exactness and scientific representation. The report will often be a summary of the source material, so you must pick and choose what to include. The things you have picked to discuss in your report must then be described in enough detail; for the things you leave out, refer to the source material. A suitable length for the report is 10-15 pages + references.
Your report should look like a typical scientific article. However, it will not contain any new scientific results, just a survey of previously published work.
Everything that you write in your seminar report must be written in your own words. Use appropriate citations to make it clear which result comes from which publication.
All illustrations and examples should be your own original work. There is usually no need to use the same example as what was used in the original publications; by constructing your own examples you are also forcing yourself to actually understand the details of the result.
(If it is absolutely necessary to reuse illustrations or examples from other sources, make it very explicit in your report, with appropriate citations, and make sure you are not violating anyone else's copyrights. Remember that simply re-drawing a figure does not make it your own work.)
Important:
Make sure that your seminar report (and presentation) shows that you have actually understood everything, it is completely clear to you, and now you are doing your best to explain the key ideas to others. Do not follow the structure of the original articles; come up with your own presentation that is much better than the original version! Ideally, pick ideas from several papers and synthesise.
Schedule
The seminar meetings start on the hour, not 15 minutes past. You must arrive at least 15 minutes before, in order to setup your presentation!
Date | Presenter |
Topic | Opponent |
Tuesday May 7 14-18 in C220 |
Jeremias Berg | G1 | Timo Koivisto |
| Mikko Sysikaski | G2 | Jeremias Berg |
| Timo Koivisto | G3 | Mikko Sysikaski |
| Quan Nguyen | D1 | Aleksi Hartikainen |
| Aleksi Hartikainen | D2 | Jussi Kokkala |
| Jussi Kokkala | D4 | Quan Nguyen |
Tuesday May 14 12-16 in B119 |
Alejandro Sanchez Guinea | A1 | |
| Lari Rasku | A3 | Simo Linkola |
| Simo Linkola | A2 | Lari Rasku |
| Liye He | D3 | Matthew Pierce |
| Matthew Pierce | B1 | Kustaa Kangas |
| Kustaa Kangas | B2 | Liye He |
Topics
Below are listed possible topics for giving a presentation and writing a report. Each student chooses one topic, and one topic can be chosen only by one student. The provided articles for each topic serve as the main references,
but it is a plus if you independently consult additional suitable references.
The topics will be assigned during the first seminar meeting (introduction to the topic presented by the teacher) on a first-come-first-served basis. You can also reserve a topic before the first meeting by contacting the teacher.
It is also possible suggest a topic outside this list; if you wish to suggest one, please discuss details with the teacher.
A. Data Mining using constraint solvers / knowledge compilation
- A1: (Topic reserved / Sanchez Guinea)
Tias Guns, Siegfried Nijssen, Luc De Raedt: Itemset mining: A constraint programming perspective. Artif. Intell. 175(12-13): 1951-1983 (2011)
http://dx.doi.org/10.1016/j.artint.2011.05.002
- A2: (Topic reserved / Linkola)
Tias Guns, Siegfried Nijssen, Luc De Raedt: k-Pattern Set Mining under Constraints. IEEE Trans. Knowl. Data Eng. 25(2): 402-418 (2013)
http://doi.ieeecomputersociety.org/10.1109/TKDE.2011.204
- A3: (Topic reserved / Rasku)
Jean-Philippe Métivier, Patrice Boizumault, Bruno Crémilleux, Mehdi Khiari, Samir Loudni:
A Constraint-Based Language for Declarative Pattern Discovery.
http://doi.ieeecomputersociety.org/10.1109/ICDMW.2011.11
- A4:
Hadrien Cambazard, Tarik Hadzic, Barry O'Sullivan: Knowledge Compilation for Itemset Mining. ECAI 2010: 1109-1110
http://dx.doi.org/10.3233/978-1-60750-606-5-1109
- A5:
Emmanuel Coquery, Said Jabbour, Lakhdar Sais, Yakoub Salhi: A SAT-Based Approach for Discovering Frequent, Closed and Maximal Patterns in a Sequence. ECAI 2012: 258-263
http://dx.doi.org/10.3233/978-1-61499-098-7-258
- A6:
Thanh Le Van, Ana Carolina Fierroy, Tias Guns, Matthijs van Leeuwen, Siegfried Nijssen, Luc De Raedt, Kathleen Marchal: Mining Local Staircase Patterns in Noisy Data. ICDM Workshops 2012: 139-146
http://doi.ieeecomputersociety.org/10.1109/ICDMW.2012.83
- A7:
Said Jabbour, Lakhdar Sais, Yakoub Salhi, Karim Tabia: Symmetries in Itemset Mining. ECAI 2012: 432-437
http://dx.doi.org/10.3233/978-1-61499-098-7-432
B. Using machine learning to configure / speed-up constraint solving
- B1: (Topic reserved / Pierce)
Frank Hutter, Holger H. Hoos, Kevin Leyton-Brown, Thomas Stützle: ParamILS: An Automatic Algorithm Configuration Framework. J. Artif. Intell. Res. (JAIR) 36: 267-306 (2009)
http://dx.doi.org/10.1613/jair.2861
- B2: (Topic reserved / Kangas)
Lin Xu, Frank Hutter, Holger H. Hoos, Kevin Leyton-Brown: SATzilla: Portfolio-based Algorithm Selection for SAT. J. Artif. Intell. Res. (JAIR) 32: 565-606 (2008)
http://dx.doi.org/10.1613/jair.2490
- B3:
Lin Xu, Holger Hoos, Kevin Leyton-Brown: Hydra: Automatically Configuring Algorithms for Portfolio-Based Selection. AAAI 2010
http://www.aaai.org/ocs/index.php/AAAI/AAAI10/paper/view/1929
- B4:
Serdar Kadioglu, Yuri Malitsky, Ashish Sabharwal, Horst Samulowitz, Meinolf Sellmann: Algorithm Selection and Scheduling. CP 2011: 454-469
http://dx.doi.org/10.1007/978-3-642-23786-7_35
- B5:
Matthew J. Streeter, Stephen F. Smith: New Techniques for Algorithm Portfolio Design. UAI 2008: 519-527
http://uai.sis.pitt.edu/displayArticleDetails.jsp?mmnu=1&smnu=2&article_id=1970&proceeding_id=24
- B6:
Matthew J. Streeter, Daniel Golovin, Stephen F. Smith: Combining Multiple Heuristics Online. AAAI 2007: 1197-1203
http://www.aaai.org/Library/AAAI/2007/aaai07-190.php
C. Clustering with constraints
D. Learning Bayesian networks using constraint solvers / heuristic search
E. Learning causal models using constraint solving
F. Model counting and probabilistic inference
- F1:
Mark Chavira, Adnan Darwiche: On probabilistic inference by weighted model counting. Artif. Intell. 172(6-7): 772-799 (2008)
http://dx.doi.org/10.1016/j.artint.2007.11.002
- F2:
Tian Sang, Paul Beame, Henry A. Kautz: Performing Bayesian Inference by Weighted Model Counting. AAAI 2005: 475-482
http://www.aaai.org/Papers/AAAI/2005/AAAI05-075.pdf
- F3:
Wei Li, Pascal Poupart, Peter van Beek: Exploiting Structure in Weighted Model Counting Approaches to Probabilistic Inference. J. Artif. Intell. Res. (JAIR) 40: 729-765 (2011)
http://jair.org/papers/paper3232.html
- F4:
Wei Li, Peter van Beek, Pascal Poupart: Performing Incremental Bayesian Inference by Dynamic Model Counting. AAAI 2006: 1173-1179
http://www.aaai.org/Library/AAAI/2006/aaai06-184.php
- F4:
Carla P. Gomes, Ashish Sabharwal, Bart Selman: Model Counting: A New Strategy for Obtaining Good Bounds. AAAI 2006: 54-61
http://www.aaai.org/Library/AAAI/2006/aaai06-009.php
- F5:
Jinbo Huang, Mark Chavira, Adnan Darwiche: Solving MAP Exactly by Searching on Compiled Arithmetic Circuits. AAAI 2006: 1143-1148
http://www.aaai.org/Library/AAAI/2006/aaai06-179.php
- F6:
Guy Van den Broeck, Nima Taghipour, Wannes Meert, Jesse Davis, Luc De Raedt: Lifted Probabilistic Inference by First-Order Knowledge Compilation. IJCAI 2011: 2178-2185
http://ijcai.org/papers11/Papers/IJCAI11-363.pdf
G. Further topics
- G1: (Topic Reserved / Berg)
Marijn Heule, Sicco Verwer: Exact DFA Identification Using SAT Solvers. ICGI 2010: 66-79
http://dx.doi.org/10.1007/978-3-642-15488-1_7
- G2: (Topic Reserved / Sysikaski)
Christian Bessiere, Emmanuel Hebrard, Barry O'Sullivan: Minimising Decision Tree Size as Combinatorial Optimisation. CP 2009: 173-187
http://dx.doi.org/10.1007/978-3-642-04244-7_16
- G3: (Topic reserved / Koivisto)
Christian Bessiere, Remi Coletta, Frederic Koriche, Barry O'Sullivan: A SAT-Based Version
Space Algorithm for Acquiring Constraint Satisfaction Problems. ECML 2005: 23-34
http://dx.doi.org/10.1007/11564096_8
- G4:
Xuan-Ha Vu, Barry O'Sullivan: A Unifying Framework for Generalized Constraint Acquisition.
International Journal on Artificial Intelligence Tools 17(5): 803-833 (2008)
http://dx.doi.org/10.1142/S0218213008004175
- G5:
Alfredo Braunstein, Marc Mézard, Riccardo Zecchina: Survey propagation: An algorithm for satisfiability. Random Struct. Algorithms 27(2): 201-226 (2005)
http://dx.doi.org/10.1002/rsa.20057