AbstractWe consider the problem of partitioning the n nodes of a complete edge weighted graph into k clusters so as to minimize the sum of the diameters of the clusters. Since the problem is NP-complete, our focus is on the development of good approximation algorithms. When edge weights satisfy the triangle inequality, we present the first approximation algorithm for the problem. The approximation algorithm yields a solution which has no more than O(k) clusters such that the sum of cluster diameters is within a factor O(ln(n/k)) of the optimal value using exactly k clusters. Our approach also permits a tradeoff among the constant terms hidden by the two big-O terms and the running time. For any fixed k, we present an approximation algorithm that produces k clusters whose total diameter is at most twice the optimal value. When the distances are not required to satisfy the triangle inequality, we show that, unless P = NP, for any Rho >= 1, there is no polynomial time approximation algorithm that can provide a performance guarantee of Rho even when the number of clusters is fixed at 3. We also present some results for the problem of minimizing the sum of cluster radii.
Categories and Subject Descriptors: G.2.2 [Discrete Mathematics]: Graph Theory
Additional Key Words and Phrases: clustering, data mining, approximation algorithms, bicriteria problems, combinatorial algorithms
Selected references
- Pankaj K. Agarwal and Cecilia M. Procopiuc. Exact and approximation algorithms for clustering (extended abstract). In Proceedings of the Ninth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 658-667, San Francisco, California, 25-27 January 1998.
- Vladimir Batagelj, Simona Korenjak-Cerne, and Sandi Klavzar. Dynamic programming and convex clustering. Algorithmica, 11(2):93-103, February 1994.
- Vasilis Capoyleas, Günter Rote, and Gerhard Woeginger. Geometric clusterings. Journal of Algorithms, 12(2):341-356, June 1991.
- Moses Charikar, Chandra Chekuri, Tomás Feder, and Rajeev Motwani. Incremental clustering and dynamic information retrieval. In Proceedings of the Twenty-Ninth Annual ACM Symposium on Theory of Computing, pages 626-635, El Paso, Texas, 4-6 May 1997.
- Tomás Feder and Daniel H. Greene. Optimal algorithms for approximate clustering. In Proceedings of the Twentieth Annual ACM Symposium on Theory of Computing, pages 434-444, Chicago, Illinois, 2-4 May 1988.
- Robert J. Fowler, Michael S. Paterson, and Steven L. Tanimoto. Optimal packing and covering in the plane are NP-complete. Information Processing Letters, 12(3):133-137, 13 June 1981.
- Dorit S. Hochbaum and David B. Shmoys. A unified approach to approximate algorithms for bottleneck problems. Journal of the ACM, 33(3):533-550, July 1986.
- Madhav V. Marathe, R. Ravi, Ravi Sundaram, S. S. Ravi, Daniel J. Rosenkrantz, and Harry B. Hunt III. Bicriteria network design problems. Journal of Algorithms, 28(1):142-171, July 1998.
- Nimrod Megiddo and Kenneth J. Supowit. On the complexity of some common geometric location problems. SIAM Journal on Computing, 13(1):182-196, February 1984.
- U. Pferschy, R. Rudolf, and G. J. Woeginger. Some geometric clustering problems. Nordic Journal of Computing, 1(2):246-263, Summer 1994.
- Prabhakar Raghavan. Information retrieval algorithms: A survey. In Proceedings of the Eighth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 11-18, New Orleans, Louisiana, 5-7 January 1997.
- Tian Zhang, Raghu Ramakrishnan, and Miron Livny. BIRCH: An efficient data clustering method for very large databases. In H. V. Jagadish and Inderpal Singh Mumick, editors, Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data, pages 103-114, Montreal, Quebec, Canada, 4-6 June 1996. SIGMOD Record 25(2), June 1996.