Main Page
Data Sets
Rules
Example
Submission
Results
Credits
Discussion
| |
A challenge funded by the EU Pascal network. Participation is open
to all.
The Challenge
is over. The data-sets will be maintained and updated in order to
create a benchmark for existing and new methods for
stemmatology. |
Stemmatology (a.k.a. stemmatics) studies relations among
different variants of a document that have been gradually built from
an original by copying and modifying earlier versions. The aim of such
study is to reconstruct the family tree of the variants.
We invite applications of established and, in particular, novel
approaches, including but of course not restricted to hierarchical
clustering, graphical modeling, link analysis, phylogenetics,
string-matching, etc.
The objective of the challenge is to evaluate the performance of
various approaches. Several sets of variants for different texts are
provided, and the participants should attempt to reconstruct the
relationships of the variants in each data-set. This enables the
comparison of methods usually applied in unsupervised scenarios.
Notifications (most recent first)
March 14, 2009: A paper on the challenge has appeared in
Literary and Linguistic Computing.
November 11, 2008: Correct graph, evaluation script, and
numeric version of Heinrichi data available at the
Causality
workbench.
October 15, 2008: More results added, including SplitsTree4 (see Results).
August 30, 2007: All data-sets provided in Nexus format.
August 13, 2007: Primary data provided as an
aligned table. More results for the primary data-set.
June 14, 2007: Some more results included for comparison, including PAUP*.
May 4, 2007: Secondary ranking results announced. Winner (secondary ranking): Rudi Cilibrasi.
May 2, 2007: Results announced. Winner (primary ranking): Team Demokritos.
April 11, 2007: The score of the hierarchical clustering method is corrected (see Example).
March 28, 2007: The submission deadline has been extended from March 30 to April 14.
March 27, 2007: Some preliminary results available.
March 22, 2007: Submission is open.
February 20, 2007: Solution to validation data-set
available.
December 1, 2006: Primary data-set available.
October 14, 2006: A discussion group for the Challenge is created.
October 6, 2006: First-phase data available.
Important Dates
First-phase data available |
October 6, 2006 |
Primary data-set available |
December 1, 2006 |
Solution to validation data-set available |
February 20, 2007
|
Submission deadline |
April 14, 2007
|
Results | May 2, 2007 |
Organizers
-
Teemu Roos (teemu.roos at cs.helsinki.fi),
Helsinki Institute for Information Technology
(contact person)
-
Tuomas Heikkilä, Department of History, University of Helsinki
- Petri
Myllymäki, Department of Computer Science, University of
Helsinki
| |