AbstractThe approximate average storage utilization of bucket methods with fanout of n, assuming a uniform distribution of data, is shown to depend only on the fanout and not on any other factors such as the directory structure. It obeys the formula (ln n)/(n-1). The analysis makes use of a generalization of previously published methods for n=2 and n=4 and its predictions match these results. The formula is applicable to methods that are based on digital searching (e.g., tries) or balancing rather than comparison-based methods. The formula is also used to detect an erroneous statement about the average storage utilization of a ternary system in Nievergelt, J., Hinterberger, H., and Sevcik, K. C. 1984. The grid file: an adaptable, symmetric multikey file structure. ACM Transactions on Database Systems 9, 1 (March), 38-71.
Categories and Subject Descriptors: D.4.3 [Operating Systems]: File Systems Management; E.1 [Data Structures]; E.2 [Data Storage Representations]; F.2.2 [Analysis of Algorithms and Problem Complexity]: Nonnumerical Algorithms and Problems; H.2.2 [Database Management]: Physical Design
Additional Key Words and Phrases: data structures, analysis of algorithms, bucket methods, average storage utilization, B-tree, Grid file, EXHASH, EXCELL, quadtries, ternary system
Selected references
- Rudolf Bayer and Edward M. McCreight. Organization and maintenance of large ordered indices. Acta Informatica, 1:173-189, 1972.
- Ronald Fagin, Jürg Nievergelt, Nicholas Pippenger, and H. Raymond Strong. Extendible hashing -- a fast access method for dynamic files. ACM Transactions on Database Systems, 4(3):315-344, September 1979.
- Raphael A. Finkel and Jon Louis Bentley. Quad trees: A data structure for retrieval on composite keys. Acta Informatica, 4:1-9, 1974.
- Randal C. Nelson and Hanan Samet. A population analysis for hierarchical data structures. In Umeshwar Dayal and Irving L. Traiger, editors, Proceedings of the Association for Computing Machinery Special Interest Group on Management of Data 1987 Annual Conference, pages 270-277, San Francisco, California, 27-29 May 1987. SIGMOD Record 16(3), December 1987.
- Jürg Nievergelt, Hans Hinterberger, and Kenneth C. Sevcik. The grid file: An adaptable, symmetric multikey file structure. ACM Transactions on Database Systems, 9(1):38-71, March 1984.
- Andrew Chi-Chih Yao. On random 2-3 trees. Acta Informatica, 9:159-170, 1978.