A New Indexing Method for Uncertain Databases

  • Guang-Ho Cha Seoul National University of Science and Technology
Keywords: Uncertain database, Multidimensional index, Range query, Index clustering, Sequential disk access.

Abstract

This paper presents an indexing method called the uncertain data index (UD-index) for uncertain databases. The design objectives of the UD-index are improving the range query performance of the multidimensional indexing methods and providing a compromise between optimal index node clustering. Although more than ten years of database research has resulted in a great variety of multidimensional indexing methods, most efforts have focused on the data-level clustering and there has been no attempt to cluster index nodes themselves in dynamic environments. As a result, most related index nodes are widely scattered on the disk due to dynamic page allocation, and it requires many random disk accesses during the range search. The UD-index avoids that by storing the related nodes contiguously in a segment that contains a sequence of contiguous disk pages. The UD-index improves the range query performance by offering high-performance sequential disk access within a segment. A new cost model is introduced to estimate the range query performance. It takes into consideration the physical adjacency of pages read as well as the number of pages accessed. The analytic performance analysis indicates that the UD-index shows better performance than the traditional indexing methods in most cases.

Author Biography

Guang-Ho Cha, Seoul National University of Science and Technology

Professor, Department of Computer Engineering

References

(1) N. Beckmann, H.-P. Kriegel, R. Schneider, and B. Seeger, “The R*-tree: An efficient and robust access method for points and rectangles,” Proc. of ACM SIGMOD International Conference on Management of Data, pp. 322-331, 1990.

(2) S. Berchtold, C. Boehm, D.A. Keim, and H.-P. Kriegel, “A Cost Model for Nearest Neighbor Search in High-Dimensional Data Space,” Proc. of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database System, pp. 78-86, 1997.

(3) S. Berchtold, D.A. Keim, H.-P. Kriegel, “The X-tree: An Index Structure for High-Dimensional Data,” Proc. of the 22nd International Conference on Very Large Data Bases, pp. 28-39, 1996.

(4) J.v.d. Bercken, B. Seeger, and P. Widmayer, “A Generic Approach to Bulk Loading Multidimensional Index Structures,” Proc. of the International Conference on Very Large Data Bases, pp. 406-415, 1997.

(5) P. Ciaccia, M. Patella, and P. Zezula, “M-tree: An Efficient Access Method for Similarity Search in Metric Space,” Proc. of the International Conference on Very Large Data Bases, pp. 426-435, 1997.

(6) D. Comer, “The Ubiquitous B-tree,” ACM Computing Surveys, Vol. 11, No. 2, pp. 121-137, 1979.

(7) C. Faloutsos and I. Kamel, “Beyond Uniformity and Independence: Analysis of R-trees Using the Concept of Fractal Dimension,” Proc. of SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pp. 4-13, 1994.

(8) M. Flickner, H. Sawhney, W. Niblack, J. Ashley, Q. Huang, B. Dom, M. Gorkani, J. Hafner, D. Lee, D. Petkovic, D. Steele, and P. Yanker, “Query by image and video content: the QBIC system,” IEEE Computer, Vol. 28, pp. 23-32, Sep. 1995.

(9) V. Gaede and O. Gunther, “Multidimensional Access Methods,” ACM Computing Surveys, Vol. 30, No. 2, pp. 170-231, June 1998.

(10) J. Gray and G. Graefe, “The Five-Minute Rule Ten Years Later, and Other Computer Storage Rules of Thumb,” ACM SIGMOD Record, Vol 26, No. 4, pp. 63-68, Dec. 1997.

(11) A. Guttman, “R-Trees: A Dynamic Index Structure for Spatial Searching,” Proc. of the ACM SIGMOD International Conference on Management of Data, pp. 47-57, 1984.

(12) A. Henrich, "The LSDh-tree: An Access Structure for Feature Vectors," Proc. of the 14th International Conference on Data Engineering, pp. 362-369, 1998.

(13) D. Knuth, The Art of Computer Programming, vol. 3: Sorting and Searching, Addison Wesley, Reading, MA, 1973.

(14) J.-H. Lee, D.-H. Kim, and C.-W. Chung, “Multidimensional Selectivity Estimation Using Compressed Histogram Information,” Proc. of the ACM SIGMOD International Conference on Management of Data, pp. 205-214, 1999.

(15) K.-I. Lin, H.V. Jagadish, and C. Faloutsos, “The TV-tree: An Index Structure for High-Dimensional Data,” VLDB Journal, Vol. 3, No. 4, pp. 517-542, 1994.

(16) W. Litwin and D.B. Lomet, “The Bounded Disorder Access Method,” Proc. of the IEEE International Conference on Data Engineering, pp. 38-48, 1986.

(17) D.B. Lomet, “A Simple Bounded Disorder File Organization with Good Performance,” ACM Transactions on Database Systems, Vol. 13, No. 4, pp. 525-551, Dec. 1988.

(18) J. Nievergelt, H. Hinterberger, and K.C. Sevcik, “The grid file: an adaptable, symmetric multikey file structure,” ACM Transactions on Database Systems, Vol. 9, No.1, pp. 38-71, 1984.

(19) P.E. O’Neil, “The SB-tree: An Index-Sequential Structure for High-Performance Sequential Access,” Acta Informatica, Vol. 29, pp. 241-265, 1992.

(20) J.T. Robinson, “The K-D-B-Tree: A Search Structure for Large Multidimensional Dynamic Indexes,” Proc. of the ACM SIGMOD International Conference on Management of Data, pp. 10-18, 1981.

(21) B. Seeger and H.-P. Kriegel, “The Buddy-tree: An Efficient and Robust Access Method for Spatial Data Base Systems,” Proc. of the 16th International Conference on Very Large Data Bases, pp. 590-601, 1990.

Published
2019-05-01