Tane: Efficient Discovery of Functional and Approximate
Dependencies Using Partitions
Ykä Huhtala,
Juha Kärkkäinen,
Pasi Porkka, and
Hannu Toivonen
University of Helsinki, Department of Computer Science and
Rolf Nevanlinna Institute
Abstract
The discovery of functional dependencies from relations is an important
database analysis technique. We present TANE, an efficient algorithm for
finding functional dependencies from large databases. TANE is based on
partitioning the set of rows with respect to their attribute values,
which makes testing the validity of functional dependencies fast even
for a large number of tuples. The use of partitions also makes the
discovery of approximate functional dependencies easy and efficient and
the erroneous or exceptional rows can be identified easily. Experiments
show that TANE is fast in practice. For benchmark databases the running
times are improved by several orders of magnitude over previously
published results. The algorithm is also applicable to much larger
datasets than the previous methods.
Full journal article
TANE:
An Efficient Algorithm for Discovering Functional and
Approximate Dependencies.
The Computer Journal 42 (2), 1999, pp. 100-111.
An early conference paper
Efficient Discovery of Functional and Approximate
Dependencies Using Partitions.
In 14th International Conference on Data Engineering (ICDE'98),
pp. 392-401, Orlando, Florida, February 1998. IEEE Computer Society Press.
Small print for the conference paper:
Copyright 1998 IEEE. Published in the Proceedings of ICDE'98,
February 1998 in Orlando, Florida. Personal use of this material
is permitted. However, permission to reprint/republish this
material for advertising or promotional purposes or for creating
new collective works for resale or redistribution to servers or
lists, or to reuse any copyrighted component of this work in
other works, must be obtained from the IEEE. Contact: Manager,
Copyrights and Permissions / IEEE Service Center / 445 Hoes Lane
/ P.O. Box 1331 / Piscataway, NJ 08855-1331, USA. Telephone: +
Intl. 908-562-3966.
TANE program
An implementation of the algorithms presented in the above paper
is available: tane-1.0.tar.gz.
(Installation directions are also available)
(...as are some additional instructions
for using it.)
Go up to the
Data Mining Group or to the
Department of Computer Science.