Frequent Pattern Discovery and Decision Tree Induction
in First-Order Logic
Wed 20 Jan at 4:45 pm, A414
I will discuss a general formulation of two central data mining tasks,
where both the database and the patterns are represented in some subset of
first-order logic. These tasks are frequent pattern discovery and decision
tree induction.
In recent years, the usage and size of databases have grown dramatically,
due to a constant decrease in the cost of both the collection and the
storage of huge amounts of data. The need for tools to exploit the
popular Data Warehouse has grown accordingly and has given rise to a
rapidly evolving research field at the intersection of statistics,
databases and machine learning: Data Mining and Knowledge Discovery in
Databases (KDD).
Within KDD, the discovery of frequent patterns has been studied in a
variety of settings. In its simplest form, known from association rule
mining, the task is to discover all frequent item sets, i.e., all
combinations of items that are found in a sufficient number of examples.
The fundamental task of association rule and frequent set discovery has
been extended in various directions, allowing more useful patterns to be
discovered with special purpose algorithms. A unified representation in
first-order logic gives insight to the blurred picture of the frequent
pattern discovery domain. Within the first-order logic formulation a
number of dimensions appear that relink diverged settings. I will present
the Warmr algorithm for frequent pattern discovery in first-order logic
that is well-suited for exploratory data mining: it offers the flexibility
required to experiment with standard and --in particular-- novel settings
not supported by special purpose algorithms.
Warmr upgrades the well-known Apriori algorithm to first-order logic. At
the K.U.Leuven a number of similar upgrades have been realized. As a
second example of the ``Leuven strategy'', I will present the Tilde
system, which is an adaptation of C4.5 and induces first-order logical
decision trees.
To conclude, I will demonstrate the scientific and commercial potential of
our approach via an application in chemical toxicology, where the task is
to identify cancer-causing chemical substances.
Hands-on exercises with Warmr and Tilde
Thu 21 Jan at 4 pm, D324
This session is meant to be a first introduction to the practice of data
mining in first-order logic. Participants will be guided through the
different steps involved in setting up an application with Warmr and
Tilde, two tools developed at the K.U.Leuven for frequent pattern
discovery and decision tree induction in first-order logic. No knowledge
of logic (Prolog) will be assumed.